Silverlight Adaptive Streaming: How it works

Monday, November 3, 2008

Recently an announcement was made by Akamai that it was partnering with Microsoft to provide an Adaptive Streaming solution for Silverlight and IIS 7.0. Since I work in the online video industry I found the announcement very interesting, especially considering Move Network's previous announcement that it had formed a "Strategic Releationship" with Microsoft to provide this exact functionality.

I'm not sure if this represents the fruits of this relationship or not, but I can say that the two technologies are extremely similar in how they provide an almost "instant-on", high-definition stream of video to the client. Which brings me to the "meat" of this article - how the heck to they do that?

Specifically, you can see an example of what I'm talking about at www.smoothHD.com. You'll need Silverlight of course, but once you do you'll notice that the time it takes the video to begin playing is almost negligible and the quality is outstanding... unless you have a crappy connection... in which case you'll notice that it starts up quickly and has so-so to poor quality; which is by design. This is because Adaptive Streaming "adapts" to your bandwidth so that you get the best experience available to you. The thought is that it would be better for a client with slower access to get "something" rather than "buffering".

Step By Step - What makes this thing tick?

Let me start with the caveat that this is a high level overview of the technology - it's not perfect, but should provide you the jist of it. Now that I've said that, let's get started.

It all starts with encoding. In order to provide users with a stream appropriate for their bandwidth, the video needs to be encoded at various bitrates from low to very high quality. The more versions created the better the stream will be able to adapt.

Besides just creating several versions, the video needs to be "cut up" into many pieces. Most likely, this is done by cutting at particular times or frames in the video instead of cutting at particular sizes. So for example, if my video was encoded at 3 different bitrates (by the way 3 is far to few but makes my example easier) I might slice each of them at 5 second intervals ( I would probably make the interval shorter than that but again, this is an example). This way, the first slice of the video encoded using bitrate A would be 2kb, the first slice of the one using bitrate B would be 5kb, and the first slice of bitrate C would be 8kb. They are all different sizes but contain the exact same portion of video as their peers.

This is where Akamai comes in (although you could likely use any modern CDN). You need to ensure that access to these video "pieces" is fast, reliable, etc... So using a CDN like Akamai puts the files closer to the video consumer making delivery problems less likely. Once the video is on CDN, it's all up to the player.

The player bears the brunt of the work of making sure that the video stays smooth regardless of network conditions. The first thing that happens when the player starts up, is it grabs a manifest file containing information about all of those video pieces. It then makes a quick determination of the user's network bandwidth and grabs the first piece of video that is appropriate for the client's connection and begins playing. As that first piece begins playing, it looks closer at the bandwidth of the client and then adjusts the video quality when it grabs the next piece of video. It will continue to monitor the client's performance as it plays, and adjusting which piece of video to grab based on the information it has most recently gathered.

If you look at what it's doing here, you'll see that we aren't really talking about a "stream" but instead the player is playing small video files in a seamless sequence so that there is no interruption to the client. Because the video files are very small they load quickly and don't require waiting for "buffering". The video can begin playing when the first "piece" is downloaded (even before if the codec is one that allows for partial file playback). This also allows a client to "jump" to another spot in the video without having to wait for the player to re-buffer, because it just has to grab the small video file at the point you want to jump to.

One cool thing to note here; if you think about how the Internet works, you'll realize that there are many many more significant benefits to this than just smooth video for clients. As more and more clients watch a particular video, proxy caches all across the Internet will cache these small "pieces" of video files (as they tend to do with smaller files that are requested frequently). In essence these proxies become an extension of your CDN, except that it's free; because the proxies will serve up these small cached files without ever talking to Akamai or whatever CDN you are working with to deliver your video.

Hopefully this has provided some insight for those of you who wondered about this technology - I am excited to see more of this coming soon.

If I wanted to use this technology inside my intranet environment servicing outlying locations what would I need to encode the video in this way?

Josh Lewis - Wednesday, December 10, 2008 6:57:48 PM

@Josh Lewis:

You're going to need Microsoft Expression Encoder 2 Service Pack 1. Unfortunately the beta for this won't be out until Q1 2009.

Freedom Dumlao - Wednesday, December 10, 2008 7:18:32 PM

As a follow up question. I have downloaded the Expression Encoder 2 SP1 and encoded a job using the default player. When it generates the .xap file I changed it to a .zip and extracted it and noticed the adaptive streaming .dll in it. If this isn't technically using adaptive streaming then can I modify the template and remove the adaptive streaming to cut down on the overall file size of the .xap file. Thanks for your quick response.

Josh Lewis - Wednesday, December 10, 2008 7:25:36 PM

Step By Step - What makes this thing tick?

3 Comments