digital audio guide

Hybrid AI Video Codecs and Modern Streaming Techniques

📖Read Time: 3 minutes
📊Readability: Moderate (Standard complexity)
🔖Core Topics: ai, codecs, based, netflix, evc

Introduction

AI and Hybrid Codecs

Modern Video codecs are all based on similar principles. Recently, these have been complemented by AI techniques, such as super-resolution, to form hybrid codecs. The current state of the art is one of transition to have one based only on AI eventually:

https://deeprender.ai/

Codecs using only AI are several years away. For the foreseeable future, we will use conventional codecs supplemented with AI.

This insights article will start by looking at conventional codecs and gradually build up a hybrid one.

H264

H.264 Overview

The easiest way to understand modern codecs is to look at the most common one in use today and the one most other codecs are based on, H264:

https://www.youtube.com/watch?v=ZXXDXZfEcAQ&t=17s

EVC Base Line

EVC Baseline Explained

Conventional codecs have moved on. For our purposes, we will start with the EVC Baseline codec – a simple but effective codec:

https://thebroadcastknowledge.com/2021/02/18/video-mpeg-5-essential-video-coding-evc-standard/

Immediately, it can be improved by AI:

https://www.mdpi.com/1424-8220/24/4/1336/pdf?version=1708338117

Scalenet

ScaleNet (Samsung)

The first use of AI is a system proposed by Samsung called Scalenet:

https://research.samsung.com/blog/VIDEO-SCALENET-VSN-TOWARDS-THE-NEXT-GENERATION-VIDEO-STREAMING-SERVICE

Please view the video link at the end and read the paper below that link.

Note that EVC Baseline has performance about the same as HEVC, but is royalty free.

Invertible Image Rescaling

Invertible Rescaling (State of the Art)

Scalenet uses convolutional neural networks (variations of TAD-TAU)

https://openaccess.thecvf.com/content_ECCV_2018/papers/Heewon_Kim_Task-Aware_Image_Downscaling_ECCV_2018_paper.pdf

But things move on, and a new downscaling and restoration method called Invertible Image Rescaling is now the state of the art

https://arxiv.org/abs/2210.04188

This is particularly useful in converting colour to greyscale and encoding the colour information. To understand how colour is encoded, the concept of the Bayer Filter is needed.

The Bayer Filter

Bayer Filter Basics

For exactness, I will assume an 8K television system.

To understand how AI can help, we will start at the camera. It may be thought that each pixel contains a red, blue and green pixel.

But that assumption is incorrect. What is used is called the Bayer Filter

https://en.wikipedia.org/wiki/Bayer_filter

This means that an 8K camera produces four 4K streams. As we will see, this can be used to convert the output to a single 4K greyscale stream.

Converting Bayer Filter Output To Greyscale

Greyscale Conversion Process

Invertible image rescaling can convert the four different streams into a single 4K greyscale stream with a close to non-observable degradation of about 40 dB. This stream could then be encoded as an EVC video.

As detailed in the Invertible Image Rescaling paper, this can be combined with a convolutional neural network (e.g. Scalenet) for the best image reconstruction before inverting back to a colour to create a hybrid AI-based system.

Shot Based Encoding

Shot-Based Encoding (Netflix)

Netflix has developed a major advance in encoding that efficiently uses available internet speed called Shot Based Encoding:

https://hackaday.com/2020/09/16/decoding-the-netflix-announcement-explaining-optimized-shot-based-encoding-for-4k/

Using that alone, Netflix has reduced 4 K to as low as 2 MBS. It is very effective. I rarely have problems with 4 K content and internet speed using Netflix, but other video services like Prime have problems, even internet dropouts. Netflix handles that using sophisticated buffering algorithms.

Conclusion

Future Outlook

The above presents some state-of-the-art AI enhancements in streaming video.

Changes are coming thick and fast. Companies like Netflix will incorporate them into their streaming services. An overview of further possibilities can be found here:

https://arxiv.org/abs/2101.06341

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply