Skip to page content or Skip to Accesskey List.


Main Page Content

Youtube Will Automatically Caption Your Video

Rated 3.89 (Ratings: 0)

Want more?

  • More articles in News
Picture of aardvark

Adrian Roselli

Member info

User since: 14 Dec 1998

Articles written: 85

Three years ago YouTube/Google added the ability for video authors to add captions to videos. Over time support for multiple caption tracks was included, the expansion of search to consider text in captions, and even machine translation support for the captions (see my other post about machine translation risks).

Even with hundreds of thousands of captioned videos on YouTube, new videos are posted at the rate of 20 hours of video per minute. For many companies (and not-for-profits and government agencies), YouTube provides the most cost-effective and ubiquitous method to distribute video content to users. So many of these organizations (particularly not-for-profits and government agencies) are required by law (US and elsewhere) to provide captions for video, but don't have the experience or tools to do so. Users who are deaf are excluded from fully understanding this content as a result.

This is where the speech recognition features (ASR) of Google Voice come into play. This technology can parse the audio track of your videos and create captions automatically. Much like machine translation, the quality of these captions may not be the best, but it can at least provide enough information for a user who could not otherwise understand the video at all to glean some meaning and value.

In addition, Google is launching "automatic caption timing," essentially allowing authors to easily make captions using a text file. As the video creator, an author will be able to create a text file with all the words in the video and Google's speech recognition software will figure out where those words are spoken and take care of the timing. This technique can greatly increase the quality of captions on videos with very little effort (or cash outlay for tools) on the part of the video creator.

You can read more at the the YouTube Help Center article. You can also read the blog post announcing this feature at the Google Blog. The video below shows a short demo about the auto-captioning and auto-timing features.

A founder of, Adrian Roselli (aardvark) is the Senior Usability Engineer at Algonquin Studios, located in Buffalo, New York.

Adrian has years of experience in graphic design, web design and multimedia design, as well as extensive experience in internet commerce and interface design and usability. He has been developing for the World Wide Web since its inception, and working the design field since 1993. Adrian is a founding member, board member, and writer to In addition, Adrian sits on the Digital Media Advisory Committee for a local SUNY college and a local private college, as well as the board for a local charter school.

You can see his brand-spanking-new blog at as well as his new web site to promote his writing and speaking at

Adrian authored the usability case study for in Usability: The Site Speaks for Itself, published by glasshaus. He has written three chapters for the book Professional Web Graphics for Non Designers, also published by glasshaus. Adrian also managed to get a couple chapters written (and published) for The Web Professional's Handbook before glasshaus went under. They were really quite good. You should have bought more of the books.

The access keys for this page are: ALT (Control on a Mac) plus: is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.