Skip to content
Last updated

Realtime Transcription

Real-time transcription allows you to receive live transcript data as the meeting progresses, rather than waiting for the meeting to end. This is enabled by using any of the realtime transcription models.

Key Features

  • Live transcript streaming via WebSocket connection
  • Immediate access to spoken content as it happens
  • Suitable for applications requiring real-time processing
  • Optional realtime audio streaming for raw audio access

How it works

After creating a new meeting bot with a realtime transcription model, you'll receive websocket_url & websocket_read_only_url (both provide the same information, read_only_url denies any Actions, this is therefor safe to provide to customers front-end directly). Simply connect to the websocket url and you'll start receiving live updates. Keep in mind that old events will not be pushed through, only live events are provided.

Realtime Audio Streaming

In addition to transcript events, you can also receive raw audio data in realtime via a separate WebSocket connection.

Enabling Realtime Audio

If you make use of a realtime model, then this is enabled by default. If you want to have realtime audio streaming for non-realtime models, set realtime_audio: true in your request:

{
  "transcription_model": "none",
  "meeting_url": "https://meet.google.com/abc-defg-hij",
  "service": "gmeet",
  "bot_name": "My Meeting Bot",
  "realtime_audio": true
}

The bot response will include a websocket_audio_url field containing the WebSocket URL for receiving audio data.

Using the SDK

// Create bot with realtime audio enabled
const bot = await client.createBot({
  transcription_model: "none",
  meeting_url: "https://meet.google.com/abc-defg-hij",
  service: "gmeet",
  bot_name: "My Audio Bot",
  realtime_audio: true,
});

// Get the realtime client (includes audio by default)
const realtimeClient = bot.getRealtimeClient();

// Listen to audio events
realtimeClient.on("audio", (buffer: Buffer) => {
  // buffer is 16-bit PCM audio at 16kHz sample rate
  processAudio(buffer);
});

await realtimeClient.connect();

// Check audio connection status
console.log("Audio connected:", realtimeClient.audioConnected);

Without Audio Streaming

If you don't need audio streaming, you can get a realtime client without it:

// Get realtime client WITHOUT audio streaming
const transcriptOnlyClient = bot.getRealtimeClient(true);
await transcriptOnlyClient.connect();

Audio Format

The audio data received via the audio event is:

  • Format: 16-bit PCM (signed, little-endian)
  • Sample Rate: 16kHz
  • Channels: Mono

Websocket Events

Events will be websocket messages as strings. Parse these to JSON to receive the payload. The JSON will always be structured as follow;

{
  "type": "[event]",
  "data": {...}
}

Ping

This event is simply making sure the connection stays alive. We're pinging all active clients once every minute. Does not contain much interesting information so may be ignored.

{
  "type": "ping"
}

Start

Once the meeting bot has joined the meeting and started recording, it'll send this event.
Just to inform you that from now on you'll start receiving transcripts. Refer to Bot Lifecycle Documentation for a better understanding of the joining process

{
  "type": "start"
}

Transcript

This event contains the live transcript of the meeting as well as timestamps and speaker information. speaker_name may be null at the start, this is because we're live calculating which name belongs to which speaker id. After a couple of sentences speaker_name should be provided.

{
  "type": "ts",
  "data": {
    "transcript": "This contains the spoken text.",
    "start": 1.23,
    "end": 4.56,
    "speaker": 0,
    "speaker_name": "John Doe"
  }
}

Chat Message

This event triggers when a new chat message is sent

{
  "type": "chat-message",
  "data": {
    "username": "John Doe",
    "content": "Foo bar.",
    "user_avatar": null // Either a URL or null. Do not rely on this field for permanent access.
  }
}

Participant Tracked

Whenever we detect a new participant in the meeting, we'll throw the participant-tracked event. This gets triggered for all existing participants at the start of the meeting.

{
  "type": "participant-tracked",
  "data": {
    "participantId": "John Doe",
    "participantName": "John Doe"
  }
}

Started Speaking

When we detect a participant as actively speaking, we'll throw this event.

{
  "type": "started-speaking",
  "data": {
    "participantId": "John Doe",
    "participantName": "John Doe"
  }
}

Stopped Speaking

When we detect a participant has stopped speaking, we'll throw this event.

{
  "type": "stopped-speaking",
  "data": {
    "participantId": "John Doe",
    "participantName": "John Doe"
  }
}

Stop

Once the meeting is over and therefor the recording has stopped, this event will be fired. Informing you that no further transcripts will be shared and you're safe to disconnect from the websocket server.

{
  "type": "stop"
}

Error

If anything goes wrong with our transcription provider, then this error will event will be fired. This can be considered the same as stop, no further transcription will be shared. We will have been alerted right away to solve the issue.

{
  "type": "error",
  "data": {
    "message": "Error message"
  }
}

Websocket Actions

You can also interact with the bot via the websocket to provide feedback as soon as possible. This once again works via a pre-defined JSON structure you send as a message. Examples below will contain JSON, but keep in mind that websocket messages are strings. So you'll need to parse this JSON to string before sending. Validation is complicated with websocket events. So at this point in time there is no validation, if your action does not contain the required data then the action will not be performed.

{
  "action": "[action]",
  "data": {...}
}
Important to remember

If you connect to the websocket via websocket_read_only_url then these actions will not be executed. Only connecting via websocket_url will make actions available to you.

Send Chat Message

Via this action you can send a message to the chat of the meeting.

{
  "action": "chat-message",
  "data": {
    "content": "Welcome to the meeting!"
  }
}

Stop the bot

This will simply stop the bot.

{
  "action": "stop"
}

More events coming soon. If you have specific feature requests, please let us know via Discord.