Broadcast Club: Audio Video Sync

Written by on May 7, 2019

Understanding Audio Video Sync


As we transition into our more technical chapters regarding audiovisual equipment and workflow, let’s review the importance of audio video sync. The number one issue I find among newly setup video production systems is an audio and video sync problem. The good news is that all video production software systems include a way to compensate for audio and video delay. The bad news is that most people don’t know how to fix it because they are unable to measure how many milliseconds the audio and video is out of sync by.


In my experience, the audio is almost always processed by your computer faster than the video. This makes sense because audio is a much lower bitrate bandwidth stream for your computer to process compared to video. In order to properly diagnosis a possible audio and video sync issue, it is important to understand some of the most common root causes. When you plug an audio or video device into your computer via USB, the device will use drivers to connect and transfer its data stream to the operating system that your live streaming software is using. Your computer has different ports and there are varying speeds at which these input ports can process data. Because audio is generally a low bitrate data stream, most USB audio interfaces only use USB 2.0 for connectivity. Higher bandwidth devices such as a 1080p high definition video camera leverage USB 3.0 connections which can support data transfer speeds up to 10 times faster than USB 2.0.


Type of Connection Bandwidth Main Use
USB 2.0 480 Mbps Audio & non-video USB interface devices
USB 3.0 4.8 Gbps High-quality bandwidth video devices
PCIe 3.0 8 Gbps Multiple high-quality video devices
Thunderbolt 3.0 40 Gbps Multiple high-quality video devices

As you can see from the chart above, USB 2.0 is generally used for lower bandwidth input devices. Most audio mixer and USB audio interfaces will use the super popular USB 2.0 connection that has been around since 2004. Most higher bandwidth devices such as cameras and video capture cards will leverage USB 3.0. When you need to have multiple cameras or video inputs many production systems build around PCIe or Thunderbolt connections which provide increased transfer speeds. Here is a chart that compares popular video cabling, the bandwidth speeds they support with their maximum distances.


Cable Name Bandwidth Maximum Functional Distance
Cat 5e 1 Gbps 328′ (100 meters)
Cat 6 1 Gbps 328′ (100 meters)
Cat 6a 10 Gbps 328′ (100 meters)
Cat 7 10 Gbps 328′ (100 meters)
HDMI 1.4 10.2 Gbps 50′ (15 meters)
HDMI 2.0 18 Gbps 50′ (15 meters)
SDI 270 Mbps 1000′ (300 meters)
HD-SDI 1.5 Gbps 300′ (90 meters)
3G-SDI 3 Gbps 200′ (60 meters)
USB 2 480 Mbps 15’ (5 Meters)
USB 3.0 4.8 Mbps 9’ (3 Meters)
Thunderbolt 30 Gbps 3’ (1 Meter)

With so many different connection types it no wonders why your computer may process one device slightly faster than another. Since you may have multiple devices using various drivers, it’s very common for production studios to use audio and video syncing tools to optimize their productions performance. As you can see in the diagram above, we have a USB 2.0 audio interface with only 25 milliseconds of processing delay. We also have a camera and video capture card with a processing delay of 75 milliseconds. If you are lucky, the difference between your audio and video sources may be unnoticeable. In fact, many of the more advanced video production software systems make minor adjustments on your behalf by using the data produced by the computer’s operating system. If you think there is still room for improvement, you can add an audio delay to sync your audio and video properly.

There is a study published by PLOS Computational Biology, with Dr. Magnotti and Michael Beauchamp that explains the importance of matching audio and video correctly and its effect speech detection. They found “by comparing mathematical models for how the brain integrates senses important in detecting speech… the brain uses vision, hearing, and experience when making sense of speech. If the mouth and voice are likely to come from the same person, the brain combines them; otherwise, they are kept separate.” Dr. Beauchamp summarized the study by saying “You may think that when you’re talking to someone you’re just listening to their voice… but it turns out that what their face is doing is actually profoundly influencing what you are perceiving” (Klien, 2017).

The best way to test your system is by using an audiovisual syncing tool. To make the process more accurate, I have designed an audio and video syncing tool you can download in the course files to test your system. You can play this video on a laptop computer and capture the audio and video playing from your laptop with the live streaming system you would like to test. This tool is handy because the video screen includes a color-coded scale that uses a millisecond counter you can use to determine the latency apparent between audio and video in your system. Once you have recorded about 10 seconds worth of video, you can take this recording and analyze it with a video editing software. We want to look at the millisecond countdown and determine where each audio blip lines up with the real-time millisecond scale. With this information, we adjust our audio delay to match our video. This process can take multiple attempts and you should always check your work. Keep in mind that if make any major changes to your camera or microphone setup this test may need to be done again.


After a couple attempts of adjusting your audio delay, you should have a good idea of how to accurately match up your audiovisual sources. Once you have made your adjustments create some test video recordings with a person speaking on camera. Have them hold up their hand and count to 5 with one finger at a time speaking each number in time. If your audio has too much delay, you will hear the person speaking after their hand movements. If you can read lips, see if the audio looks like it’s in sync. If the audio is being delivered too quickly, it will be very noticeable.


Tagged as

Current track