On this page

TwiML™ Voice: <Transcription>

(warning)

Legal Notice and Public Beta

Real-Time Transcriptions, including the <Transcriptions> TwiML noun and API, use artificial intelligence or machine learning technologies. By enabling or using any of the features or functionalities within Programmable Voice that are identified as using artificial intelligence or machine learning technology, you acknowledge and agree that your use of these features or functionalities is subject to the terms of the Predictive and Generative AI/ML Features Addendum.

Real-Time Transcriptions is not PCI compliant or a HIPAA Eligible Service and should not be used in Voice Intelligence workflows that are subject to HIPAA or PCI.

Real-Time Transcription is currently available as a Public Beta product and information contained in this document is subject to change. This means that some of the features are not yet implemented and others may be changed before the product is declared as Generally Available. Public Beta products are not covered by a Twilio Service Level Agreement.

The <Transcription> TwiML noun allows you to transcribe live calls in near real-time. It is used in conjunction with <Start>. When Twilio executes the <Start><Transcription> instruction during a call, it forks the raw audio stream to a speech-to-text transcription engine that can provide streaming responses almost instantly.

This page covers <Transcription>'s supported attributes and provides sample code.

(information)

Other Transcriptions at Twilio

Please note that the <Transcription> TwiML noun is associated with Twilio's Real-Time Transcriptions product. It is not to be confused with Recording Transcriptions.

For Public Beta, Real-Time Transcriptions will not be stored on Twilio, so consumers of <Transcription> should plan to leverage the statusCallbackUrl accordingly if transcript storage is required.

Below is a basic example of <Start><Transcription>:


_10<Start>
_10  <Transcription statusCallbackUrl="https://example.com/your-callback-url"/> 
_10</Start>

Noun attributes

The table below lists <Transcription>'s supported attributes, which modify the <Transcription> behavior. All attributes are optional.

Attribute Name	Allowed Values	Default Value
name	Unique name for the Real-Time Transcription	none
statusCallbackUrl	Valid relative or absolute URL	none
languageCode	A BCP-47 standard code (e.g. "en-US")	`en-US`
track	`inbound_track`, `outbound_track`, `both_tracks`	`both_tracks`
inboundTrackLabel	An alphanumeric label to associate to the inbound track being transcribed	none
outboundTrackLabel	An alphanumeric label to associate to the outbound track being transcribed	none
transcriptionEngine	Name of speech-to-text transcription provider. Valid values are: `google`	`google`
speechModel	(Google only) Any speechModel value	`telephony`
profanityFilter	(Google only) `true` or `false`	`true`
partialResults	(Google only) `true` or `false`	`false`
hints	(Google only) Comma-separated list of expected phrases or keywords for recognition	None
enableAutomaticPunctuation	(Google only) `true` or `false`	`true`

name

The user-specified name of this Real-Time Transcription. This name can be used to stop the Real-Time Transcription.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', name: 'Contact center transcription'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" name="Contact center transcription" />
_10    </Start>
_10</Response>

statusCallbackUrl

The statusCallbackUrl attribute is the relative or absolute URL of an endpoint. Twilio sends Real-Time Transcription status updates and the call's transcript data to this URL.

Twilio sends a POST request to this URL whenever one of the following occurs:

A Real-Time Transcription session starts. This is called the transcription-started event.
Utterances (partial or final) of transcribed audio is available. This is called the transcription-content event.
A Real-Time Transcription session stops. This is called the transcription-stopped event. This event occurs when a Real-Time Transcription session is stopped via API or TwiML, or when the call ends.
An error occurs. This is called the transcription-error event.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url"/> 
_10    </Start>      
_10</Response>

The transcription-started event

When a Real-Time Transcription is started and a session is created, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-started event. This event provides initial details about the transcription session.

These HTTP requests contain the properties listed below.

Property	Description	Example
AccountSid	Twilio Account SID	`AC11b76cdc7d217e72a72be6422d46a7ca`
CallSid	Twilio Call SID	`CA57af2620f427810cb4e430371e8d6e0f`
TranscriptionSid	Unique identifier for this Real-Time Transcription session	`GT20dfa03c8cf8aa8d0c4aeccde5558b66`
Timestamp	Time of the event in UTC ISO 8601 timestamp	`2023-10-19T22:33:22.611Z`
SequenceId	Integer sequence number of the event	`1`
TranscriptionEvent	The event type	`transcription-started`
ProviderConfiguration	JSON configuration of the transcription provider	`{\"profanityFilter\":\"true\",\"speechModel\":\"telephony\",\"enableAutomaticPunctuation\":\"true\",\"hints\":\"Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback\"}`
TranscriptionEngine	The name of the transcription engine	`google`
Name	Friendly name of the Real-Time Transcription session	`session1`
Track	The track being transcribed: `inbound_track`, `outbound_track`, or `both_tracks`	`inbound_track`
InboundTrackLabel	Label associated with the inbound track	`customer`
OutboundTrackLabel	Label associated with the outbound track	`agent`
PartialResults	Whether partial results are enabled (`true` or `false`)	`true`
LanguageCode	The language code for the transcription	`en-US`

Example of a transcription-started event payload:


_16{
_16  "TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
_16  "Timestamp": "2024-06-25T18:45:12.135751Z",
_16  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_16  "ProviderConfiguration": "{\"profanityFilter\":\"true\",\"speechModel\":\"telephony\",\"enableAutomaticPunctuation\":\"true\",\"hints\":\"Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback\"}",
_16  "Name": "Chris Transcription",
_16  "OutboundTrackLabel": "agent",
_16  "LanguageCode": "en-US",
_16  "PartialResults": "false",
_16  "InboundTrackLabel": "customer",
_16  "TranscriptionEvent": "transcription-started",
_16  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_16  "TranscriptionEngine": "google",
_16  "Track": "both_tracks",
_16  "SequenceId": "1"
_16}

The transcription-content event

When an individual utterance (partial or final) of audio is transcribed, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-content event. This event provides TranscriptionData results for the transcribed audio.

(information)

Stability and Confidence

Stability and Confidence depend on partialResults. For example, if partialResults is true, then the stability property will be included in the event payload, and confidence will not. However, if partialResults is false, the opposite will be true. Always refer to Google's specific documentation (examples) for more details on each of these properties.

These HTTP requests contain the properties listed below.

Property	Description	Example
AccountSid	Twilio Account SID	`AC11b76cdc7d217e72a72be6422d46a7ca`
CallSid	Twilio Call SID	`CA57af2620f427810cb4e430371e8d6e0f`
TranscriptionSid	Unique identifier for this Real-Time Transcription session	`GT20dfa03c8cf8aa8d0c4aeccde5558b66`
Timestamp	Time of the event in UTC ISO 8601 timestamp	`2023-10-19T22:33:22.611Z`
SequenceId	Integer sequence number of the event	`2`
TranscriptionEvent	The event type	`transcription-content`
LanguageCode	A BCP-47 standard language code (e.g. "en-US")	`en-US`
Track	The track being transcribed: `inbound_track` or `outbound_track`	`inbound_track`
TranscriptionData	JSON string containing transcription content. Note that `TranscriptionData.Confidence` is a decimal number.	`{"Transcript": "to be or not to be", "Confidence": 0.96823084}`
Stability	String representing estimate of the likelihood Google will not change the guess it made about this partial result transcript. This property is only provided when `partialResults` is `true`.	Range between 0.0 (unstable) and 1.0 (stable). Example: 0.8
Final	Boolean value indicating whether this event contains the final utterance (or partial utterance)	`false`

Example of a transcription-content event payload when partialResults is equal to false:


_12{
_12  "LanguageCode": "en-US",
_12  "TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
_12  "TranscriptionEvent": "transcription-content",
_12  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_12  "TranscriptionData": "{\"transcript\":\"Hello, this is Sam from Horizon Financial Services. Just letting you know this call may be recorded for quality purposes. How can I assist you today?\",\"confidence\":0.9956335}",
_12  "Timestamp": "2024-06-25T18:45:21.454203Z",
_12  "Final": "true",
_12  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_12  "Track": "outbound_track",
_12  "SequenceId": "2"
_12}

Example of a transcription-content event payload when partialResults is equal to true:


_13{
_13  "LanguageCode": "en-US",
_13  "TranscriptionSid": "GT6ebb54a123f0c86b70605a4925836f69",
_13  "Stability": "0.9",
_13  "TranscriptionEvent": "transcription-content",
_13  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_13  "TranscriptionData": "{\"transcript\":\"Hello, this is Sam from Horizon Financial Services. Just letting you know this call may be recorded for\"}",
_13  "Timestamp": "2024-06-25T16:30:21.600697Z",
_13  "Final": "false",
_13  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_13  "Track": "outbound_track",
_13  "SequenceId": "70"
_13}

The transcription-stopped event

When a Real-Time Transcription session is stopped or ends, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-stopped event. This event provides final details about the transcription session.

These HTTP requests contain the properties listed below.

Property	Description	Example
AccountSid	Twilio Account SID	AC11b76cdc7d217e72a72be6422d46a7ca
CallSid	Twilio Call SID	CA57af2620f427810cb4e430371e8d6e0f
TranscriptionSid	Unique identifier for this Real-Time Transcription session	GT20dfa03c8cf8aa8d0c4aeccde5558b66
Timestamp	Time of the event, in UTC ISO 8601 format	2023-10-19T22:33:22.611Z
SequenceId	Integer sequence number of the event	3
TranscriptionEvent	The event type	transcription-stopped

An example of the transcription-stopped event payload:


_10{
_10  "TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
_10  "TranscriptionEvent": "transcription-stopped",
_10  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_10  "Timestamp": "2024-06-25T18:45:23.839266Z",
_10  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_10  "SequenceId": "3"
_10}

The transcription-error event

When an error occurs during a Real-Time Transcription session, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-error event.

(information)

Error Documentation

Documentation on Real-Time Transcription errors can be found on the Error and Warning Dictionary and range from 32650-32655. Errors are also viewable in the Twilio Console.

These HTTP requests contain the properties listed below.

Property	Description	Example
AccountSid	Twilio Account SID	`ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX`
CallSid	Twilio Call SID	`CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX`
TranscriptionSid	Unique identifier for this Real-Time Transcription session	`GT20dfa03c8cf8aa8d0c4aeccde5558b66`
Timestamp	Time of the event in UTC ISO 8601 timestamp	`2023-10-19T22:33:22.611Z`
SequenceId	Integer sequence number of the event	`3`
TranscriptionEvent	The event type	`transcription-error`
TranscriptionErrorCode	Error code	`32655`
TranscriptionError	Error description	`Provider Unavailable`

Example of a transcription-error event payload:


_10{
_10  "TranscriptionSid": "GT20dfa03c8cf8aa8d0c4aeccde5558b66",
_10  "Timestamp": "2023-10-19T22:33:22.611Z",
_10  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_10  "SequenceId": "3",
_10  "TranscriptionEvent": "transcription-error",
_10  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
_10  "TranscriptionErrorCode": "32655",
_10  "TranscriptionError": "Provider Unavailable"
_10}

languageCode

The languageCode attribute specifies the language in which the transcription should be performed. It accepts a BCP-47 standard language code, such as en-US for American English. This attribute is useful for ensuring that the transcription engine correctly understands and processes the spoken language.

The following TwiML example demonstrates how to specify the languageCode attribute for a transcription for Mexican Spanish. This ensures that the transcription is performed in the specified language, which is particularly useful for calls in languages other than English.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', languageCode: 'es-MX'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" languageCode="es-MX" /> 
_10    </Start>      
_10</Response>

track

The track attribute specifies which audio track should be transcribed. It can take one of the following values: inbound_track, outbound_track, or both_tracks. This attribute is useful for determining whether to transcribe the audio coming from the caller, the callee, or both.

The following TwiML example demonstrates how to specify the track attribute for a transcription.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', track: 'inbound_track'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" track="inbound_track" /> 
_10    </Start>      
_10</Response>

inboundTrackLabel

The inboundTrackLabel attribute allows you to associate an alphanumeric label with the inbound track being transcribed. This can be useful for identifying and differentiating the inbound audio stream in the transcription results. Using labels helps to clearly identify who is speaking, especially in multi-party conversations or call center scenarios.

Refer to the Track labels section below to understand the importance of using labels.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', inboundTrackLabel: 'agent', outboundTrackLabel: 'customer'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" inboundTrackLabel="agent" outboundTrackLabel="customer" /> 
_10    </Start>      
_10</Response>

Example 1: Inbound Call

In an inbound call scenario, the call is initiated by the customer and received by the agent or business person. Here, the inbound audio track (agent's speech) is labeled for clarity in the transcription results.


_10<Response>
_10  <Start>
_10    <Transcription track="inbound_track" inboundTrackLabel="agent" />
_10  </Start>
_10</Response>

In this example, the inbound audio track is labeled as "agent". This is useful for scenarios like customer support calls, where distinguishing the agent's responses from the customer's speech is crucial for understanding the interaction.

Example 2: Outbound Call

In an outbound call scenario, the call is initiated by the agent or business person and received by the customer. Here, the inbound audio track (customer's speech) is labeled for clarity in the transcription results.


_10<Response>
_10  <Start>
_10    <Transcription track="inbound_track" inboundTrackLabel="customer" />
_10  </Start>
_10</Response>

In this example, the inbound audio track is labeled as "customer". This is useful for scenarios like sales calls, where distinguishing the customer's speech in the transcription can help in analyzing customer feedback and engagement.

outboundTrackLabel

The outboundTrackLabel attribute allows you to associate an alphanumeric label with the outbound track being transcribed. This can be useful for identifying and differentiating the outbound audio stream in the transcription results. Using labels helps to clearly identify who is speaking, especially in multi-party conversations or call center scenarios.

Refer to the Track labels section below to understand the importance of using labels.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', inboundTrackLabel: 'agent', outboundTrackLabel: 'customer'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" inboundTrackLabel="agent" outboundTrackLabel="customer" /> 
_10    </Start>      
_10</Response>

Example 1: Inbound Call

In an inbound call scenario, the call is initiated by the customer and received by the agent or business person. Here, the outbound audio track (customer's speech) is labeled for clarity in the transcription results.


_10<Response>
_10  <Start>
_10    <Transcription track="outbound_track" outboundTrackLabel="customer" />
_10  </Start>
_10</Response>

In this example, the outbound audio track is labeled as "customer". This is useful for scenarios like customer support calls, where distinguishing the customer's speech from the agent's responses is crucial for understanding the interaction.

Example 2: Outbound Call

In an outbound call scenario, the call is initiated by the agent or business person and received by the customer. Here, the outbound audio track (agent's speech) is labeled for clarity in the transcription results.


_10<Response>
_10  <Start>
_10    <Transcription track="outbound_track" outboundTrackLabel="agent" />
_10  </Start>
_10</Response>

In this example, the outbound audio track is labeled as "agent". This is useful for scenarios like sales calls, where distinguishing the agent's speech in the transcription can help in analyzing the effectiveness of the sales pitch.

transcriptionEngine

The transcriptionEngine attribute allows you to specify the name of the speech-to-text transcription provider to be used. This can be useful for leveraging specific features or optimizations provided by different transcription engines.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', transcriptionEngine: 'google'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" transcriptionEngine="google" /> 
_10    </Start>      
_10</Response>

speechModel

The speechModel attribute allows you to specify which speech model to use for the transcription.

Maps to Transcription Model in Google terminology. Different speech models can optimize for different use cases, such as phone calls, video, or enhanced models for higher accuracy.

Refer to the Google documentation to understand each speech model's specific capabilities and configurations.

The telephony speech model is optimized for phone call audio and can provide better accuracy for this type of audio.

The long speech model is optimized for long-form audio, such as lectures or extended conversations, and can provide better accuracy for lengthy audio.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', speechModel: 'telephony', transcriptionEngine: 'google'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" speechModel="telephony" transcriptionEngine="google" /> 
_10    </Start>      
_10</Response>

profanityFilter

Maps directly to the profanityFilter in Google's RecognitionFeatures object. The profanityFilter attribute allows you to enable or disable the filtering of profane words in the transcription. When enabled, the transcription engine will attempt to mask or omit any detected profanities in the transcription results.

The example below demonstrates how to enable the profanity filter for the transcription. This is useful for ensuring that any profane language is masked or omitted in the transcription output.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', profanityFilter: false, transcriptionEngine: 'google'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" profanityFilter="false" transcriptionEngine="google" /> 
_10    </Start>      
_10</Response>

partialResults

Maps to StreamingRecognitionResult specifically when ("is_final"=false) in Google Terminology. The partialResults attribute allows you to enable or disable the delivery of interim transcription results. When enabled, the transcription engine will send partial (interim) results as the transcription progresses, providing more immediate feedback before the final result is available.

The example below demonstrates how to enable partial results for the transcription.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', partialResults: true, transcriptionEngine: 'google'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" partialResults="true" transcriptionEngine="google" /> 
_10    </Start>      
_10</Response>

hints

The hints attribute contains a list of words or phrases that the transcription provider can expect to encounter during a Real-Time Transcription. Using the hints attribute can improve the transcription provider's recognition of words or phrases you expect from your callers.

You may provide up to 500 words or phrases in this list, separating each entry with a comma. Your hints may be up to 100 characters each, and you should separate each word in a phrase with a space, e.g.:

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', hints: 'Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" hints="Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback" /> 
_10    </Start>      
_10</Response>

The hints attribute also supports Google's class token list to improve recognition. You can pass a class token directly in the hints attribute, as shown in the example below.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', hints: '$OOV_CLASS_ALPHANUMERIC_SEQUENCE'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" hints="$OOV_CLASS_ALPHANUMERIC_SEQUENCE" /> 
_10    </Start>      
_10</Response>

enableAutomaticPunctuation

Maps to Automatic Punctuation in Google Terminology. The enableAutomaticPunctuation attribute allows you to enable or disable automatic punctuation in the transcription. When enabled, the transcription engine will automatically insert punctuation marks such as periods, commas, and question marks, improving the readability of the transcribed text.

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const start = response.start();
_10start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', enableAutomaticPunctuation: true, transcriptionEngine: 'google'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Start>
_10        <Transcription statusCallbackUrl="https://example.com/your-callback-url" enableAutomaticPunctuation="true" transcriptionEngine="google" /> 
_10    </Start>      
_10</Response>

Supported language and model combinations

Twilio's transcription service supports a variety of languages and models. The examples provided below are specific to Google Speech-to-Text. Depending on the language, certain attributes like speechModel, profanityFilter, and enableAutomaticPunctuation may have different levels of support. For the most up-to-date and comprehensive information, please refer to the Google Speech-to-Text Supported Languages documentation.

(warning)

Warning

These examples are accurate as of June 2024 and are subject to changes. Customers should always refer back to the Google Speech-to-Text Supported Languages page for the most current information.

Example 1: Chinese (Simplified, China) with Chirp Model

This example demonstrates how to configure transcription for Chinese (Simplified, China) using the Chirp Model with support for automatic punctuation.


_10<Response>
_10  <Start>
_10    <Transcription 
_10      transcriptionEngine="google" 
_10      languageCode="cmn-Hans-CN" 
_10      speechModel="chirp" 
_10      enableAutomaticPunctuation="true" />
_10  </Start>
_10</Response>

In this example, the profanityFilter attribute, hints attribute, and other advanced features are not supported for this configuration.

Example 2: Spanish (Spain) with Telephony Model

This example demonstrates how to configure transcription for Spanish (Spain) using the telephony model with full support for all attributes.


_10<Response>
_10  <Start>
_10    <Transcription 
_10      transcriptionEngine="google" 
_10      languageCode="es-ES" 
_10      speechModel="telephony" 
_10      profanityFilter="true" 
_10      enableAutomaticPunctuation="true" />
_10  </Start>
_10</Response>

In this example, the telephony model supports automatic punctuation and profanity filter, but not model adaptation (e.g., hints).

Example 3: Hindi (India) with Short Model

This example demonstrates how to configure transcription for Hindi (India) using the short model with support for specific attributes.


_12<Response>
_12  <Start>
_12    <Transcription 
_12      transcriptionEngine="google" 
_12      languageCode="hi-IN" 
_12      speechModel="short" 
_12      enableAutomaticPunctuation="true" 
_12      profanityFilter="true" 
_12      hints="संपर्क, सेवा, समर्थन, ग्राहक" 
_12      modelAdaptation="true" />
_12  </Start>
_12</Response>

In this example, the short model supports automatic punctuation, profanity filter, model adaptation, and hints.

Example 4: French (Canada) with Long Model

This example demonstrates how to configure transcription for French (Canada) using the long model with support for specific attributes.


_10<Response>
_10  <Start>
_10    <Transcription 
_10      transcriptionEngine="google" 
_10      languageCode="fr-CA" 
_10      speechModel="long" 
_10      hints="service à la clientèle, rendez-vous, commande" />
_10  </Start>
_10</Response>

In this example, the long model supports model adaptation through hints, but does not support automatic punctuation, profanity filter, or spoken punctuation.

Track labels

If specifying inboundTrackLabel or outboundTrackLabel, the call direction mapping table below can be used as a guide.

Track	Call Direction	Call Resource Mapping	TrackLabel
Inbound-track	Outbound	TO #	Label for "who is being called" in an outbound call from Twilio (e.g., `inboundTrackLabel`="customer").
Outbound-track	Outbound	FROM #	Label for "who is calling" in an outbound call from Twilio (e.g., `outboundTrackLabel`="agent").
Inbound-track	Inbound	FROM #	Label for "who is being called" in an inbound call to Twilio (e.g., `inboundTrackLabel`="agent").
Outbound-track	Inbound	TO #	Label for "who is calling" in an inbound call to Twilio (e.g., `outboundTrackLabel`="customer").

Note: A call that has an "outbound" direction is a call that is outbound from Twilio, i.e., from Twilio to a customer.

Stop a Real-Time Transcription

If you provided a name attribute when starting a Real-Time Transcription session, you can stop a Real-Time Transcription using TwiML or via API.

Given a Real-Time Transcription that was started with the following TwiML instructions:


_10<Response>
_10  <Start>
_10    <Transcription name="Contact center transcription" />
_10  </Start>
_10</Response>

You can stop the Real-Time Transcription with the following TwiML instructions:

Node.js

Python

Java

PHP

Ruby


_10const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10const response = new VoiceResponse();
_10const stop = response.stop();
_10stop.transcription({name: 'Contact center transcription'});
_10
_10console.log(response.toString());

Output


_10<?xml version="1.0" encoding="UTF-8"?>
_10<Response>
_10    <Stop>
_10        <Transcription name="Contact center transcription" />
_10    </Stop>
_10</Response>

If a name was not provided, you can stop an in-progress Real-Time Transcription via API using the SID of the Transcription. See the RealtimeTranscription resource API reference page for more information.

AI nutrition facts

(information)

AI Nutrition Facts

Real-Time Transcriptions, including <Transcriptions> TwiML noun and API, uses third-party artificial technology and machine learning technologies.

Twilio's AI Nutrition Facts provide an overview of the AI feature you're using, so you can better understand how the AI is working with your data. Real-Time Transcriptions AI qualities are outlined in the following Speech to Text Transcriptions - Programmable Voice Nutrition Facts label. For more information and the glossary regarding the AI Nutrition Facts Label, please refer to Twilio's AI Nutrition Facts page.

AI Nutrition Facts

Speech to Text Transcriptions - Programmable Voice and Voice Intelligence

Description: Generate speech to text voice transcriptions (real-time and post-call) in Programmable Voice and Voice Intelligence.
Privacy Ladder Level: N/A
Feature is Optional: Yes
Model Type: Generative and Predictive - Automatic Speech Recognition
Base Model: Google Speech-to-Text, Amazon Transcribe
Base Model Trained with Customer Data: No
Customer Data is Shared with Model Vendor: No
Training Data Anonymized: N/A
Data Deletion: Yes
Human in the Loop: Yes
Data Retention: Until the customer deletes
Logging & Auditing: Yes
Guardrails: Yes
Input/Output Consistency: Yes
Other Resources: https://www.twilio.com/docs/voice/intelligence

Learn more about this label at nutrition-facts.ai

TwiML™ Voice: <Transcription>

Legal Notice and Public Beta

Other Transcriptions at Twilio

Noun attributes

name

statusCallbackUrl

The transcription-started event

The transcription-content event

Stability and Confidence

The transcription-stopped event

The transcription-error event

Error Documentation

languageCode

track

inboundTrackLabel

Example 1: Inbound Call

Example 2: Outbound Call

outboundTrackLabel

Example 1: Inbound Call

Example 2: Outbound Call

transcriptionEngine

speechModel

profanityFilter

partialResults

hints

enableAutomaticPunctuation

Supported language and model combinations

Warning

Example 1: Chinese (Simplified, China) with Chirp Model

Example 2: Spanish (Spain) with Telephony Model

Example 3: Hindi (India) with Short Model

Example 4: French (Canada) with Long Model

Track labels

Stop a Real-Time Transcription

AI nutrition facts

AI Nutrition Facts

AI Nutrition Facts

Speech to Text Transcriptions - Programmable Voice and Voice Intelligence

Trust Ingredients