Building a hardware prototype using a Raspberry Pi Zero 2 W headless

Hi everone the email support suggested me to write here,

I’m building a hardware prototype using a Raspberry Pi Zero 2 W (headless, no GUI) integrated inside a vintage telephone handset. The device uses a GPIO hook switch to start and end calls.

My goal is to create a fully standalone AI-powered phone, where:

  • Lifting the handset starts a call

  • The user hears the AI voice through the speaker

  • (Later) the user speaks through a microphone

  • All audio is handled directly on the Raspberry Pi (no browser, no WebRTC client)

Right now I’m using the Python SDK with client.call.create_web_call(), but I understand that this is designed for browser-based WebRTC usage, and it does not expose raw audio streams.

What I need instead is:

  • Access to a real-time audio streaming API (e.g. WebSocket)

  • Ability to send and receive raw audio (PCM 16kHz) directly from my Python application

  • A way to connect to a call session without requiring a browser client

In short, I need to run Retell in a fully headless embedded environment.

Could you please clarify:

  1. Do you provide a real-time / streaming audio API (WebSocket or similar) for this use case

  2. What is the correct flow to create and connect to a call using this API?

Any guidance would be extremely helpful.

Thanks a lot!

Hi @fabryz

Retell AI does not provide a raw WebSocket audio streaming API for headless/embedded devices. The two supported approaches are:

  1. Web Call — requires the browser-based JavaScript SDK (WebRTC), not suitable for your headless Pi.

  2. Custom Telephony (SIP) — this is your best path. You can use the “Dial to SIP URI” method: call the Register Phone Call API to get a call_id, then connect via SIP to sip:{call_id}@sip.retellai.com. On your Raspberry Pi, you’d run a lightweight SIP client (e.g., PJSIP/Opal) that handles the RTP audio streams as raw PCM, which you can pipe to your speaker/mic via GPIO.

This lets you operate fully headless — no browser needed. Your Pi acts as a SIP endpoint.

See the full guide: Custom Telephony Overview.

Thank You

1 Like

Hi, thanks for the guidance this clarified a lot

I’m now moving forward with the SIP (custom telephony) approach.
My plan is:

  • Use register_phone_call to get the call_id

  • Connect from my Raspberry Pi using a SIP client (pjsua) to sip:{call_id}@sip.retellai.com

  • Handle audio directly via ALSA (headless, no browser)

I’ll first test the SIP connection manually to confirm audio is working, and then integrate it with my GPIO-based phone hook logic.

I’ll report back once I have results, really appreciate the help!

Thanks, this was the right direction.

I can confirm that the working approach for my headless Raspberry Pi phone is indeed:

  1. use the Register Phone Call API to create a call
  2. get the call_id
  3. place a SIP call to sip:{call_id}@sip.retellai.com
  4. handle audio locally on the Pi through a SIP client (pjsua / PJSIP)

So the final architecture is fully headless and does not require any browser/WebRTC client.

What was blocking me was not Retell itself, but the local audio setup on Raspberry Pi / ALSA:

  • RTP packets from Retell were actually arriving correctly
  • the SIP call was established correctly
  • the real issue was ALSA device mapping and default playback/capture selection on reboot

In particular, after enabling snd-aloop, card indexes changed across reboots, so using numeric ALSA devices like hw:0,0 became unreliable. The fix was to use stable card names in .asoundrc, for example:

  • playback → plughw:CARD=sndrpihifiberry,DEV=0
  • capture → plughw:CARD=Loopback,DEV=0

and make sure snd-aloop is loaded automatically at boot.

After that, the Pi can successfully:

  • start the call headlessly
  • receive and play the AI voice through the handset speaker
  • record the incoming audio locally as WAV
  • be controlled by the hook switch logic

So for anyone attempting a similar embedded setup: the SIP path works, but on Raspberry Pi the main difficulty is usually ALSA / device persistence, not the Retell SIP flow itself.

Thanks again