Internet Video Streaming — ABR part 1
Background
Do you remember the old days when you had to choose between two or three quality levels of an internet video before you could play it? The choice was usually related to the internet connection that you had, which during these times were ADSL or other modem based internet connections.
Today you don’t have to think about choosing between quality levels. Today you are free to view any video stream anywhere, from any device and at any time and expect top quality, without worrying at all how the technology behind it works.
This publication is part of a series of articles describing the principles of the technology behind video streaming. It could be read without any prior knowledge on the subject.
Glossary
- ABR streaming — Adaptive bitrate streaming
- Segments — parts of a video file
- Manifest file — file containing reference information to each file in a collection that are parts of the same video asset
- HTTP streaming — video streaming over HTTP protocol
Quality video streaming
The history of ABR streaming goes back to 2002 but in 2009 the technology finally reached the broad consumer electronics market. Not having to choose between quality levels before playing the video was nice, but the more important advantage was the ability for the connected device (PC, mobile, tablet etc.) to automatically switch between quality levels to ensure a smooth playout while avoiding buffering freezes. HTTP was chosen as the distribution protocol which ensured availability to any device, to browsers or player applications. HTTP running through port 80 would also ensure that video would work smoothly even past various simpler proxies, firewalls or gateways.
So how does this work? You may have heard about terminology like HTTP streaming, manifest files and segments. Let us explain this further.
Progressively downloading the right quality content
Imagine a video file on disk. Don’t mind the video encoding technique for now, nor that it is a video file. This will be explained in another tutorial. For now, just think of a video asset as an ordinary file of any given size just like this:
The length of this file corresponds to the video (or movie) length in time. The height of the file is the video bitrate which is stated in bits per second can simply be seen as the viewing quality of the video. A higher bitrate content means better quality, or that more video data is available for each second of video. This is actually the same for any media would it be audio or video. The area of the file would therefore be equal to the file size.
As we mentioned initially in this tutorial, at least for us above 30 watching video meant first choosing between different video qualities. This meant that you were choosing between files like this:
Note that the three videos all have the same length but different height = quality/bitrate. What your player would do was to download the selected quality file. Using a player to do the download meant that you could start watching the video during the download just as the first bytes would start getting received. Switching between the quality files was not supported.
Note that download as referred to above is a pull based streaming method as opposed to push based streaming. You can think of the traditional broadcast TV being push based — some transmitting antenna sends out the video data, and anyone who wants to receive the data may do so just by picking up the signals. Pull based streaming is where the video data is specifically asked for by the video playing device, and the data is simply served as an answer. So push based streaming is initiated by a transmitter while pull based streaming is initiated by the playing device, or by the consumer.
Adaptive bitrate streaming
Then came the idea about adaptive streaming. How about chopping up the video files in segments like this:
It is important to understand that the segmentation of the video file is time based. Each cut is at the exact same point in time for any of the video files. Additionally, the cuts are also positioned so that each segment starts with a complete picture. The “complete picture” concept is explained more in detail in the Encoding part of this tutorial series. Anyway, cutting segments time synchronized like this means that a player can in mid play session freely pick and choose between the segments from any of the original quality level files. The only possibly comprehensible impact would be a sudden quality switch of the video.
The player can be programmed to start aggressively with the 1st segment from the highest quality file, discover that the bitrate of that quality level is actually too high, and then pick the 2nd segment from the medium quality. Generally the players are programmed with a more modest streaming methodology. The usual player starts by choosing the 1st segment from the lowest quality level, and then works its way up the quality levels with the goal to play the highest quality without any freezes or buffering. More about this in the Player part of this tutorial series.
Let’s look at an example session. The downloaded segments are coloured green. From left to right:
Having watched some internet videos, especially on a larger screen like your TV or so, you have maybe noticed that the quality is pretty bad in the start, and suddenly the quality greatly improves. Next time it happens, try to count the seconds from the start of the video until the quality improves. This indicates the segment lengths. Usually they vary between 2 and 10 seconds, depending on the format.
Again, keep in mind that HTTP ABR streaming is pull based meaning that the client chooses what segment it wants to download. And it happens for every segment. Smaller segment sizes result in a lot more HTTP/TCP traffic between the client and the streaming server.
Manifest files
So far so good! Only, how does the player know what quality levels and segments exist?
Enter manifest files.
When segmenting your three video files you will need to create a manifest file too. You can picture the manifest file as a menu declaring all segments for all the quality levels that exist for the video. The manifest file contains a lot more information than that, but let’s keep that treat for a later tutorial. But, as the manifest file is an xml syntaxed text file, the next time you stumble over one go ahead and open it with a text editor and have a look.
Every video viewing session will start with the client (your phone, tv, tablet or whatever) downloading the manifest file for the video to understand what quality levels are available for the video, the name of the segments and much more. The following steps for the client would be to start downloading the segments one by one, hopefully with increasing quality level without causing any buffering pauses.
Final Words
Simple, right! Part 2 of this tutorial explains live streaming, recording and storing ABR content. Various formats are explained briefly together with some of the main challenges of ABR streaming.
Eyevinn Technology is the leading independent consultant firm specializing in video technology and media distribution, and proud organizer of the yearly nordic conference Streaming Tech Sweden.