There are several different ways to insert captions into a television program. The method chosen depends on whether the program is recorded or live.
Recorded: This process combines precision and creativity. The audio information must be transcribed first. Then, the audio and the text information are synchronized using caption software. The captions should be placed so that it is easy for the viewer to identify the speakers and without covering any images that are essential to the show. Sound descriptions are included as well, such as gunshots, airplanes, people clapping, etc. Finally, a copy of the program is created, and the text is inserted using a caption decoder or specialized editing software.
Live: Appropriate technology is required in order to turn audio into text. This can be done by using speech recognition software or with a stenotype machine which requires a specialized operator. The main limitations when the process needs to be completed in Spanish are a lack of technology for the language and a shortage of specialized operators.
When using a stenotype machine, accuracy can reach up to 99%. Using speech recognition technology reaches an accuracy of 95%. The advantage of using speech recognition technology is that it does not rely on a specialized operator, and it takes less than a month to get the system operational.
The generated text is sent through a caption decoder which is responsible for inserting the text into the program.
For the analog television system (NTCS), the television format used in The United States up until June 2009, captioning is included as part of the video signal (line 21). For digital television (DTV), the television format used in The United States beginning in June 2009, captioning is included with the program information being transmitted.