Demystifying HTML5 Video Player
The HTML5 APIs that makes it possible to build a video player are MSE (Media Source Extensions), EME (Encrypted Media Extensions) and VTTCue (for subtitles). The MSE API is an API to the media (audio and video) pipeline within the browser and provides the interface to the internal media codecs that decodes the audio and video. MSE does not define which codecs are supported and that is specific to each browser however most of the browsers today support AVC (h264) and AAC audio which is commonly used today. For next generation codecs such as HEVC, VP9 and AV1 the support are much more fragmented. But as long as you don’t intend to go 4K/UHD and above you are “safe” today. The EME API provides access to DRM (Digital Rights Management) decryption modules (content decryption module or CDM) and similar to the situation with codecs there is not a specific CDM that is specified and it is up to the browser on which they support. The VTTCue API is part of the WebVTT API which provides a way to expose WebVTT cues in the DOM (Document Object Model API).
A bit simplified the process of playing media is to push the undecoded (and sometimes undecrypted) video and audio data to an MSE source buffer and if the media is encrypted the EME API triggers events to handshake with a DRM license server to get the information needed to decrypt the data before rendering it on the screen.
So if all that is handled within the browser, what is actually left to do in our application. As it turns out there are quite a lot that remains. The browser APIs have no knowledge of what a video streaming format is actually. It is limited to handle encoded media but how the actual data is retrieved and collected is outside their scope. Let us now go through the components we need in our application.
The Manifest Parser is responsible for downloading and parsing the streaming manifest. Depending on streaming format the actual content of this manifest can vary but in general it provides the video player application with all the available options provided by the streaming server. Available options are for example what audio and subtitle tracks are available, and what different video qualities are available. Information on how to access and download the available media segments are also information included in this manifest. The parser can then provide the Player Engine with information on what video quality levels are available and bandwidth each quality level requires.
The Player Engine is basically the “heart” of this application. The engine decides what segments the Segment Downloader should download and in the situation where the manifest is dynamically updated (for live content) it decides when a new manifest should be fetched. Once a media segment is downloaded the engine is responsible for pushing the audio and video data to the MSE source buffer, and if it is a subtitle segment it passes the subtitle data to the subtitle parser. The Player Engine also decides when to switch to a higher (or lower) quality level based on the recommendation by the ABR Manager or a switch manually instructed by the user.
The Segment Downloader is responsible for the download of the media segments as instructed by the Player Engine. It also provides feedback to the ABR Manager on how much time was required to download a segment and how big the segment was. This makes it possible for the ABR Manager to estimate the available bandwidth. The Segment Downloader can also handle the retry logic for segments that failed to download.
The ABR Manager (Adapative Bitrate Manager) is the brain of the ABR heuristics the player application implements. Different ABR heuristics is a blog post of its own but to simplify it is about being able to predict network bandwidth changes to avoid that the video playback stalls (buffering). It continuously estimates the available bandwidth and give suggestions to the Player Engine on when to switch quality level. The performance of this component is what has most impact to the user.
License Request Wrapper and Remuxer
The License Request Wrapper and Remuxer are optional components and not always needed depending on what content protection schemas or streaming formats to support. Some DRM providers requires that you add some extra data for the license handshake between the CDM and the DRM server, and this is what the License Request Wrapper would take care of. The Remuxer is needed if you aim to support streaming formats that uses video segments with a container format that is not supported by MSE. Basically change how the video data is packaged before pushing it on to the MSE video buffer. For example HLS can contain AVC video data within an MPEG-TS container which needs to be repackaged to an MP4 container before pushing it to the MSE video buffer.
- Shaka Player (open source) supports MPEG-DASH and have limited support for HLS (under development).
- HLS.js (open source) supports HLS.
- Flowplayer (open source and commercial) supports HLS (based on HLS.js)
- Theoplayer (commercial) supports MPEG-DASH and HLS
- Bitmovin Player (commercial) supports MPEG-DASH and HLS
- RX Player supports MPEG-DASH and Microsoft Smooth Streaming
Hope you with this post got some better understanding on the fundamental components needed for the playback of video streams.
If you have any further questions and comments on this blog drop a comment below or tweet me on Twitter (@JonasBirme).
Jonas Birmé is a Solution Architect at Eyevinn Technoloy. A Swedish based consultancy company specialized in the video and streaming technology.