How to build your own streaming video HTML player

We have in the article, Demystifing HTML5 Video Player, described what is under the hood of a HTML video player and that a streaming video HTML player utilizes the HTML5 MSE API (Media Source Extensions) for video decoding and playback. In this article, written by Jonas Rydholm Birmé, we will, by using Javascript code, show how this is done.

This article is for web developers who wants to get more familiar with the video domain and it is recommended to have read the article Demystifing HTML5 Video Player and our ABR tutorial first (part 1, part 2 and part 3).

While playing video and audio in the browser has been possible for many years now it was when the Media Source Extensions API (MSE) was introduced that streaming media could be played without the need for plugins such as Flash. What MSE introduced was the ability to attach a SourceBuffer to a media element track. This SouceBuffer can then be appended with audio and video data in the form of an ArrayBuffer. This gives the application more control on how the audio and video is fetched and pushed on to the media pipeline and also not tying it to a specific streaming format. The downside of this approach though is that you are required to handle the parsing of the streaming format, ABR heuristics etc in your application. Luckily there are a number of Javascript libraries and players available that can be used so you don’t have to reinvent the wheel. However, if you wish to understand the basic principle to build your own player or library you can continue reading.

As explained in the ABR tutorial every streaming format has a Manifest that contains all info about the media stream. Where the media chunks can be downloaded, how the media is encoded and what tracks are available. The first thing our streaming video player must do is to fetch and parse this manifest. Using the streaming format MPEG-DASH as an example a snippet from such manifest can look like this:

This snippet shows one of the representations of the content. A representation with the width of 1280 pixels and the height of 720 pixels, encoded with AVC1 (H.264) and average bandwidth of 1660 kbps. Representations belonging to the same AdaptationSet are aligned in a way that it is possible to jump between the different representations and by having representations of different resolution and sizes we can adjust and select the representation that guarantees playback without any interruptions. The syntax differs with other streaming formats but the principle is the same.

Instead of specifying the filename for each media chunk we are presented with a pattern, in this case, vinn-$RepresentationID$-$Time$.dash. The SegmentTimeline specifies the sequence of media chunks where t is the time offset, d is the duration of the chunk and r is number of repetitions. For example to get the filename of the first media chunk we replace $RepresentationID$ with video=1660000 and $Time$ with t=0 resulting in vinn-video=1660000–0.dash. The next media chunk then get the filename vinn-video=1660000–25600.dash, and the following vinn-video=1660000–51200.dash, etc. The timescale is 12800 which specifies that t=12800 is 1 second and we can conclude that this example has media chunks where every chunk is 2 seconds. The initialization chunk is called vinn-video=1660000.dash and contains the actual mp4 header as the media chunks only contains the encoded video data.

Before creating the MediaSource object we need to check whether the browser supports this video encoding. The MediaSource class has a static function isTypeSupported() that we can use to check this. We construct a mime type string that we pass to this function and in our case it is video/mp4; codecs="avc1.4D401F":

When creating a MediaSource object it is in the state closed at first which can be verified by the object attribute mediaSource.readyState.

The next thing to do is to attach this MediaSource object to a HTML Media Element, for example a video element. Attaching the MediaSource object is achieved by using the static method URL.createObjectURL(). This method creates a DOMString containing an URL representing the MediaSource object.

Update: This way is according to newer versions of the specification deprecated and you should simply set the srcObject to the MediaStream directly.

Once we have attached the MediaSource object to the Media Element we need to wait for the sourceopen event before we can continue.

The MediaSource is now open and we can add a SourceBuffer to it. We use the method mediaSource.addSourceBuffer(mimeCodec) to do that. What we later will do is to use the method sourceBuffer.appendBuffer(arrayBuffer) to append video data to the MediaSource.

We also set the duration of the MediaSource and in this example we will append three chunks where each chunk is 2 seconds.

To simplify and focus on the basic principles we will in this example assume that the manifest is parsed and we have three chunks that we want to append and play.

The first chunk (segment) we need to fetch is the initialization segment containing the mp4 header. It is important that the mime type of the SourceBuffer matches the container and video format in the initialization segment.

Once that chunk is fetched and appended to the SourceBuffer we can download and append the other segments. As the above code snippet shows we can start video playback before all segments are appended. The fetchSegmentAndAppend() function is presented in full here below.

What we have described here is basically the main task for the Player Engine. Fetching segments and appending to a MediaSource. Determine which representation to use is the task of the ABR Manager and I will leave it as a good exercise for the reader to extend this example to show how we can append chunks of various resolutions. This example code in its full is available below.

If you have any further questions and comments on this blog drop a comment below or tweet me on Twitter (@JonasBirme).

Eyevinn Technology is the leading independent consultant firm specializing in video technology and media distribution, and proud organizer of the yearly nordic conference Streaming Tech Sweden.

We are consultants sharing the passion for the technology for a media consumer of the future.