Hi everone the email support suggested me to write here,
I’m building a hardware prototype using a Raspberry Pi Zero 2 W (headless, no GUI) integrated inside a vintage telephone handset. The device uses a GPIO hook switch to start and end calls.
My goal is to create a fully standalone AI-powered phone, where:
Lifting the handset starts a call
The user hears the AI voice through the speaker
(Later) the user speaks through a microphone
All audio is handled directly on the Raspberry Pi (no browser, no WebRTC client)
Right now I’m using the Python SDK with client.call.create_web_call(), but I understand that this is designed for browser-based WebRTC usage, and it does not expose raw audio streams.
What I need instead is:
Access to a real-time audio streaming API (e.g. WebSocket)
Ability to send and receive raw audio (PCM 16kHz) directly from my Python application
A way to connect to a call session without requiring a browser client
In short, I need to run Retell in a fully headless embedded environment.
Could you please clarify:
Do you provide a real-time / streaming audio API (WebSocket or similar) for this use case
What is the correct flow to create and connect to a call using this API?
Retell AI does not provide a raw WebSocket audio streaming API for headless/embedded devices. The two supported approaches are:
Web Call — requires the browser-based JavaScript SDK (WebRTC), not suitable for your headless Pi.
Custom Telephony (SIP) — this is your best path. You can use the “Dial to SIP URI” method: call the Register Phone Call API to get a call_id, then connect via SIP to sip:{call_id}@sip.retellai.com. On your Raspberry Pi, you’d run a lightweight SIP client (e.g., PJSIP/Opal) that handles the RTP audio streams as raw PCM, which you can pipe to your speaker/mic via GPIO.
This lets you operate fully headless — no browser needed. Your Pi acts as a SIP endpoint.
I can confirm that the working approach for my headless Raspberry Pi phone is indeed:
use the Register Phone Call API to create a call
get the call_id
place a SIP call to sip:{call_id}@sip.retellai.com
handle audio locally on the Pi through a SIP client (pjsua / PJSIP)
So the final architecture is fully headless and does not require any browser/WebRTC client.
What was blocking me was not Retell itself, but the local audio setup on Raspberry Pi / ALSA:
RTP packets from Retell were actually arriving correctly
the SIP call was established correctly
the real issue was ALSA device mapping and default playback/capture selection on reboot
In particular, after enabling snd-aloop, card indexes changed across reboots, so using numeric ALSA devices like hw:0,0 became unreliable. The fix was to use stable card names in .asoundrc, for example:
playback → plughw:CARD=sndrpihifiberry,DEV=0
capture → plughw:CARD=Loopback,DEV=0
and make sure snd-aloop is loaded automatically at boot.
After that, the Pi can successfully:
start the call headlessly
receive and play the AI voice through the handset speaker
record the incoming audio locally as WAV
be controlled by the hook switch logic
So for anyone attempting a similar embedded setup: the SIP path works, but on Raspberry Pi the main difficulty is usually ALSA / device persistence, not the Retell SIP flow itself.