Hybrid AI Video Codecs and Modern Streaming Techniques
Table of Contents
Introduction
AI and Hybrid Codecs
Modern Video codecs are all based on similar principles. Recently, these have been complemented by AI techniques, such as super-resolution, to form hybrid codecs. The current state of the art is one of transition to have one based only on AI eventually:
Codecs using only AI are several years away. For the foreseeable future, we will use conventional codecs supplemented with AI.
This insights article will start by looking at conventional codecs and gradually build up a hybrid one.
H264
H.264 Overview
The easiest way to understand modern codecs is to look at the most common one in use today and the one most other codecs are based on, H264:
https://www.youtube.com/watch?v=ZXXDXZfEcAQ&t=17s
EVC Base Line
EVC Baseline Explained
Conventional codecs have moved on. For our purposes, we will start with the EVC Baseline codec – a simple but effective codec:
https://thebroadcastknowledge.com/2021/02/18/video-mpeg-5-essential-video-coding-evc-standard/
Immediately, it can be improved by AI:
https://www.mdpi.com/1424-8220/24/4/1336/pdf?version=1708338117
Scalenet
ScaleNet (Samsung)
The first use of AI is a system proposed by Samsung called Scalenet:
Please view the video link at the end and read the paper below that link.
Note that EVC Baseline has performance about the same as HEVC, but is royalty free.
Invertible Image Rescaling
Invertible Rescaling (State of the Art)
Scalenet uses convolutional neural networks (variations of TAD-TAU)
But things move on, and a new downscaling and restoration method called Invertible Image Rescaling is now the state of the art
https://arxiv.org/abs/2210.04188
This is particularly useful in converting colour to greyscale and encoding the colour information. To understand how colour is encoded, the concept of the Bayer Filter is needed.
The Bayer Filter
Bayer Filter Basics
For exactness, I will assume an 8K television system.
To understand how AI can help, we will start at the camera. It may be thought that each pixel contains a red, blue and green pixel.
But that assumption is incorrect. What is used is called the Bayer Filter
https://en.wikipedia.org/wiki/Bayer_filter
This means that an 8K camera produces four 4K streams. As we will see, this can be used to convert the output to a single 4K greyscale stream.
Converting Bayer Filter Output To Greyscale
Greyscale Conversion Process
Invertible image rescaling can convert the four different streams into a single 4K greyscale stream with a close to non-observable degradation of about 40 dB. This stream could then be encoded as an EVC video.
As detailed in the Invertible Image Rescaling paper, this can be combined with a convolutional neural network (e.g. Scalenet) for the best image reconstruction before inverting back to a colour to create a hybrid AI-based system.
Shot Based Encoding
Shot-Based Encoding (Netflix)
Netflix has developed a major advance in encoding that efficiently uses available internet speed called Shot Based Encoding:
Using that alone, Netflix has reduced 4 K to as low as 2 MBS. It is very effective. I rarely have problems with 4 K content and internet speed using Netflix, but other video services like Prime have problems, even internet dropouts. Netflix handles that using sophisticated buffering algorithms.
Conclusion
Future Outlook
The above presents some state-of-the-art AI enhancements in streaming video.
Changes are coming thick and fast. Companies like Netflix will incorporate them into their streaming services. An overview of further possibilities can be found here:
https://arxiv.org/abs/2101.06341
My favourite interest is exactly how can we view the world so what science tells us is intuitive.




Leave a Reply
Want to join the discussion?Feel free to contribute!