Skip to content

Add voice chat with real-time speech-to-text and model responses#71

Open
devin-ai-integration[bot] wants to merge 9 commits intomainfrom
devin/1777482929-voice-chat
Open

Add voice chat with real-time speech-to-text and model responses#71
devin-ai-integration[bot] wants to merge 9 commits intomainfrom
devin/1777482929-voice-chat

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Apr 29, 2026

Summary

Adds a new Voice Chat tab that enables voice-driven conversations with AI models. Users tap a microphone button to record speech, which is transcribed in real-time via OpenAI Whisper and sent to the selected AI model (Claude, GPT, or Gemini). Model responses stream back and are read aloud via text-to-speech.

What's included

App (React Native / Expo)

  • New VoiceChat screen (app/src/screens/voice.tsx) with:
    • Microphone recording via expo-av
    • Audio upload to server for transcription
    • Streaming AI responses via existing SSE infrastructure
    • Text-to-speech playback of responses via expo-speech
    • Animated pulsing mic button with state indicators (idle → recording → transcribing → responding)
    • Conversation history with chat bubbles and markdown rendering
    • Clear conversation and stop speaking controls
  • New "Voice" tab in bottom navigation with mic icon
  • Dependencies: expo-av, expo-speech

Server (Node.js / Express)

  • New /chat/transcribe endpoint (server/src/chat/transcribe.ts) that:
    • Accepts audio file uploads via multer
    • Forwards to OpenAI Whisper API for transcription
    • Returns transcribed text
    • Cleans up temporary files after processing

Works with all existing chat providers — respects the user's model selection from Settings.

Review & Testing Checklist for Human

  • Test voice recording and transcription on iOS and Android (requires microphone permissions)
  • Verify OPENAI_API_KEY is set in server .env for Whisper transcription
  • Test with different AI models (Claude, GPT, Gemini) to confirm SSE streaming works in voice mode
  • Verify text-to-speech reads responses aloud and can be stopped mid-speech

Notes

  • The transcription uses OpenAI's Whisper API (whisper-1 model), which requires the OPENAI_API_KEY already used by GPT chat
  • Audio is recorded in high-quality m4a format via expo-av
  • The voice screen maintains conversation history per session with Claude prompt context tracking
  • TTS strips markdown formatting before speaking for cleaner audio output

Link to Devin session: https://app.devin.ai/sessions/e8849f2537fb43108fa7294521fe0ae7
Requested by: @dabit3


Devin Review

Status Commit
⚪ Not started

💡 Connect your GitHub account to enable automatic code reviews.

Open in Devin Review (Staging)
Open in Devin Review

- Add VoiceChat screen with microphone recording, real-time transcription
  via OpenAI Whisper, and streaming AI model responses
- Add server-side /chat/transcribe endpoint for audio transcription
- Add text-to-speech for reading AI responses aloud (expo-speech)
- Support all chat providers (Claude, GPT, Gemini)
- Add Voice tab to bottom navigation with mic icon
- Install expo-av for audio recording and expo-speech for TTS

Co-Authored-By: Nader Dabit <dabit3@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration[bot]

This comment was marked as resolved.

Co-Authored-By: Nader Dabit <dabit3@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Co-Authored-By: Nader Dabit <dabit3@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Co-Authored-By: Nader Dabit <dabit3@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…nner

Co-Authored-By: Nader Dabit <dabit3@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Co-Authored-By: Nader Dabit <dabit3@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…cription preview code

Co-Authored-By: Nader Dabit <dabit3@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

…recording state

Co-Authored-By: Nader Dabit <dabit3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant