This is part of a three-part series on demystifying transcoding. See part 1 here.
The Realities of Streaming in 2018
The true challenge of deciding which transcoding settings to use is understanding your target audience’s streaming capabilities. In my time here at Kaltura I’ve encountered customers who wanted to appeal to those with the best possible conditions and those who had to consider the limitations of their users. And though connectivity has increased over the last several years streaming is still an untamed landscape filled with fast-flowing rivers and still ponds.
Media customers often want to appeal to those with fast connections and the latest televisions while still considering those with limited bandwidth or older devices. Since media companies tend to be less limited by budgetary restraints, they can serve up multiple variants/flavors spanning the whole range of bandwidths and devices.
Many customers, however, are not able to be so expansive in their flavor/variant offerings.
Enterprise and education customers, for example, both often struggle with the realities of limited bandwidth. The office Wi-Fi and sheer volume of devices accessing it from mobile users, guests, printers, and even the coffee machine can make a corporate-wide video announcement stream like Max Headroom of the 1980’s. (Points to those who get that reference!)
A university has the unique limitation of not only having thousands of users accessing its Wi-Fi at any given time, but the additional burden of an enormous volume of content being generated on a daily basis by both university staff and students. Each day, a typical university customer might be processing a hundred or more videos from the various lecture captures and UGC (user generated content). A media company, by contrast, might only have a handful of videos for the latest episodes or movies being released. This additional volume of content inhibits universities from storing a full suite of flavors. This often limits the quality of the source files. As a result, upper bitrates and resolutions become unnecessary.
But media companies are not out of the woods, either. The reality is bandwidth is not the same worldwide. It’s not even consistent within a single territory.
So even ginormous video content creators with seemingly limitless budgets like Netflix have to deal with the realities of bandwidth limitations. I’ve had customers who live in areas where their ISP’s cap bandwidth or where the bandwidth average is lower than other territories. In fact, I even experienced a bit of a cap on my mobile device recently after upgrading to newer iPhones. Our data plan was outdated for the scale of the devices we are using (iPhone 7+ and iPhone 8) and we ended up getting capped at 128 Kbps. Yes, 128 Kbps. You read that right, 128 Kbps. I’ve kept this limited plan in order to experiment and see how well certain apps behave and how easy it is to still stream video and my conclusion is…well…it’s not easy at all. But I’m not alone in this capping and many consumers out there have to deal with these limitations all the time. We still want to be able to serve them good video. And it is entirely possible to do so.
In the following excerpt from the Best Practices of Multi-Device Transcoding you’ll read about where the world stands in bandwidth capabilities and I detail the different streaming methodologies.
It’s Still About Balance
The challenge of streaming video is still to find the right balance between bitrate and resolution as it relates to an end user’s connection speed and system capability. Though connectivity is improving globally, there are still a variety of factors that can affect a video’s ability to play back smoothly including: the specific network they are on, the device they use, and how they are using their device.
Though bandwidth is faster than ever, it is not an even landscape worldwide. According to Akamai’s recent 2017 report on connection speeds currently South Korea has the fastest on average connection speed of 28.6 Mbps—the United States by comparison has just made its way into the top ten fastest nations with 18.7 Mbps. That is a 10 Mbps spread!
The global average per Akamai’s report is 7.2 Mbps.
This means that the transcoding solutions for South Korea would be different than those for say Brazil, which has an average of 6.8 Mbps. This impacts not only bitrates but also the resolutions that are possible to stream in those regions. Higher resolutions demand higher bitrates to maintain good image quality—so though South Korea might be able to stream 4K, Brazil might only be able to handle 720.
Adding challenge to this balance is that some ISP’s and even some countries can cap or limit bandwidth on top of the overall limit of the region. Meaning someone with a good 20 Mbps connection plan might, at certain points in the billing cycle, get capped to something much lower, like say 10 Mbps or even as low as 1 Mbps. This is especially true of mobile consumers on smart phones.
And connection speed is just one variable that might impact a users’ streaming experience.
The average user will connect to the internet over their home Wi-Fi or office network. This home network is shared by everyone in the household and the number of devices attempting to connect to this home network can affect the overall bandwidth.
For example, in this white paper writer’s home Wi-Fi network currently, I get a solid 50 Mbps download speed with only the main living room HD television streaming Netflix and one computer browsing the internet. However, if a second TV starts streaming, say on my home office ROKU where I frequently stream Food Network, my download speed plummets to 36.7 Mbps. If my wife then starts streaming on her iPhone 7, our download speed bottoms out to 13.6 Mbps. All good connection speeds but it illustrates how a couple of devices can change the overall network conditions on a home Wi-Fi network—even a fast network like mine. Interesting to note—if that computer that was already connected to the network started also playing HD content the network plummets to 9.57 Mbps.
Now imagine that same scenario in a household that only has an overall bandwidth of 7.2 Mbps.
This problem of network sharing is even more challenging for corporations and universities where a larger number of people are trying to access the network, not just the wife and kids. This means that the solution for a premium media company will be vastly different than the solution for a University streaming class lectures or a corporation who might stream internal training videos over their building’s broadband. In a shared setting like an office or college campus, when Wi-Fi is used, not only might there be a large number of actual users accessing it at the same time, but the building’s physical layout could mean one person might be closer to the Wi-Fi and another might be in the corner of the office that barely gets a good signal. Yet both must stream the same video.
A viewer may be accessing the content on a variety of devices—each device has its own advantages and disadvantages and potential ways by which the device may be limited. We will get into specific device settings later in this document.
The device’s age can play greatly into its ability to handle streaming content. An older device may not have the hardware, memory, or OS to handle HD content—a user with an older device might rely on lower bitrates and resolutions being available.
The browser they are using may not be compatible with the streaming options available for a given video, though this is becoming less of a problem with the adoption of HTML5 and MPEG DASH delivery. The browser might also just need an update or the update may have limited certain playback scenarios—for example, many browsers have now limited and turned off Adobe Flash delivery as it’s being phased out and has security risks.
Another factor that might limit a device’s ability to stream video is how many other programs or Apps are open on the device at the same time. Even having multiple tabs open in a browser can greatly limit a systems CPU. A Facebook page or YouTube page in an unopened tab is still draining resources—the more images and video on a page, the heavier the overall drain.
This problem is easy to solve as the user just needs to close the applications or browser windows they are not using.
There are two primary streaming methodologies Progressive Download and Adaptive Streaming.
As the name might suggest, progressive download is when the video being viewed has to be downloaded to the user’s computer prior to playback. Though not as frequently used today as it once was at the beginning of streaming video (say 10 years ago), it is still useful for when a good internet connection is not available. In the original days of streaming, when a user played a video on a website inside a player, the player would first have to download a certain percentage of the file first before playback began. If the file was large or of a higher resolution the player might buffer for a little bit waiting for another portion of the video to download. Typically, a user might select a video, pause it, and walk away to wait for the whole video to download first so that playback would be smooth. The limitation of this method is, of course, flexibility and consistent playback, as well as the ability to quickly seek/scrub.
Both the antiquated Flash Player and the now widely adopted HTML5 Video/MPEG DASH standards utilize Adaptive Streaming, whereby a player or application will monitor the network traffic and bandwidth of the user, determine also what device the user is using and serve up an appropriate video and audio file that matches best to those changing conditions. As conditions fluctuate (say, on a mobile device over Wi-Fi) the player/application requests a matching file, or VARIANT. This allows a user to stream video under a variety of network limitations without creating a huge load on the end user’s device.
An Adaptive Set is a package of transcodes for the same video that span multiple bit rates and are meant to find a balance between connection speed and resolution. Every video you watch on the internet most likely was transcoded into 6-8 different MP4 files of various resolutions and bitrates.
In order for Adaptive to work, all the streams in an Adaptive Set must be in alignment. Each variant should have the same GOP/Key Frame Interval, audio sample rate, and, ideally, video frame rate. Only resolutions, bitrates, and profile levels should change.
Each video is transcoded into a variety of variants (at Kaltura, we call them Flavors). Each variant represents a bandwidth level and resolution that incrementally scales down from a 1080 or 4K top level down to a low resolution 240p at 100 Kbps level or even lower. Apple iOS spec, for example, recommends including an audio only stream for the lowest level in order to preserve the stream and keep the player from crashing, requiring a player restart by the user.
What adaptive streaming does is switch which variant you might be seeing at any given time to one that fits your bandwidth and devices capabilities. So, if you’re on a mobile device, you might see your video go from high quality down to a lower quality as your mobile or Wi-Fi signal fluctuates.
When most refer to a video or audio file, what they are really referencing is what is called a container format. A container holds the metadata, video, and audio streams along with subtitles or timecode when needed. It can often contain a variety of video and audio formats. For example, a Quicktime Container format .MOV could contain 1 video track using H264 and 1 audio track using AAC or it could contain 1 Video Track using ProRes (Apple Codec) and 1 Audio Track of Apple Audio (ALAC Codec).
It is important to differentiate between the .MP4 container and an MPEG-4 Codec. Many confuse the two terms and think that .MP4 is a codec. It is not. The actual MPEG-4 Codec (as discussed in the Codec Section) is also called H264 or AVC (Advanced Video Codec).
Streaming video today deals with two primary container formats, MP4 and WebM.
MP4 Container is for H264/AVC encoded content often with AAC audio. WebM is a Google format that uses VP9 video codec and Vorbis/Opus Audio codecs. It is the primary codec for YouTube, given that Google/Alphabet is YouTube’s parent company. Both formats result in good compression, adaptive switching, and solid visual quality.
Now source files could be formatted in a wide variety of containing formats. We will list some of the main ones used below. Containers can have multiple video and audio streams and it’s important to understand your source format/layout prior to creating your transcoding work flow. Often studio level source files use .mxf or .mov containers with ProRes or DNxHD video codec and multiple audio streams.
A typical studio level source file formatting might go as follows:
…or it could be far more complex:
In the next and final blog post we’ll dive into specific devices and how to best approach them for success. Please practice safe streaming and remember to always check your blind spots, I mean, bandwidths.