Artificial Intelligence in Video

  • Thread starter Thread starter bhobba
  • Start date Start date
  • Tags Tags
    Neural
AI Thread Summary
Artificial intelligence significantly enhances video processing, primarily through Neural Networks, particularly Convolutional Neural Networks (CNNs) and General Adversarial Networks (GANs) for image super-resolution. Recent advancements include using CNNs for down-scaling images before applying super-resolution techniques, exemplified by the TAD-TAU method. The introduction of Autoencoders has simplified and improved performance in this area, allowing for flexible down-scaling and color encoding. Additionally, video super-resolution can leverage sequences of images, with VMAF emerging as a preferred metric over SSIM for assessing image quality. These innovations contribute to the development of efficient AI-driven video codecs, promising substantial reductions in bit rates for high-resolution content.
Messages
10,901
Reaction score
3,782
Behind the scenes, artificial intelligence usually makes use of what is known as a Neural Network:



In image applications, an implementation called a Convolutional Neural Network is often used:



In particular, for image super-resolution a General Adversarial Network or GAN is often used:



These form the basis of modern super-resolution:



For those who are interested in the details, see:
https://arxiv.org/abs/2204.13620

But things move on. Someone thought of using a CNN to down-scale the image first, then using super-resolution to recover the original image. One example is TAD-TAU:
https://openaccess.thecvf.com/conte...k-Aware_Image_Downscaling_ECCV_2018_paper.pdf

This is an example of an important AI concept - the Autoencoder:



Again, things move on, and it has been improved to not only be simpler and give improved performance but also allow down-scaling and super-resolution by arbitrary amounts, as well as encoding the colour in a resultant black and white image:
https://arxiv.org/pdf/2201.12576

So far, super-resolution has been done using lower-resolution images, but can also be done using a sequence of images from a video:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4088133

It was mentioned to quantify how close a super-resolution image is to the original as perceived by a human being, and SSIM was invented. However, further work has been done on this, and a new measure, invented and used a lot by NETFLIX, has largely replaced it, called VMAF:
https://visionular.ai/vmaf-ssim-psnr-quality-metrics/

Image super-resolution is one of many proposals for reducing the bit rate of high-resolution images. ISIZE (recently acquired by SONY) preprocesses an image to make it more efficient to encode, yet still has a high VMAF:
https://discovery.ucl.ac.uk/10152967/1/SMPTE_v9_RPS.pdf

It produces substantial reductions in the bit rate of 8K video:
https://8kassociation.com/industry-info/8k-news/pre-encoding-8k-with-isize-bitsave/

A lot of ideas and concepts have been introduced in this post. If the reader has not seen them before, like anything new, it may take a while to get up to speed. However, they form the basis of my proposed method of an all-AI video codec,

My next post will be an overview of current video codecs, including EVC baseline, which forms the basis of the AI codec.

Thanks
Bill
 
Last edited:
  • Like
Likes russ_watters, FactChecker and jedishrfu
Computer science news on Phys.org
very interesting
 
In my discussions elsewhere, I've noticed a lot of disagreement regarding AI. A question that comes up is, "Is AI hype?" Unfortunately, when this question is asked, the one asking, as far as I can tell, may mean one of three things which can lead to lots of confusion. I'll list them out now for clarity. 1. Can AI do everything a human can do and how close are we to that? 2. Are corporations and governments using the promise of AI to gain more power for themselves? 3. Are AI and transhumans...
Thread 'ChatGPT Examples, Good and Bad'
I've been experimenting with ChatGPT. Some results are good, some very very bad. I think examples can help expose the properties of this AI. Maybe you can post some of your favorite examples and tell us what they reveal about the properties of this AI. (I had problems with copy/paste of text and formatting, so I'm posting my examples as screen shots. That is a promising start. :smile: But then I provided values V=1, R1=1, R2=2, R3=3 and asked for the value of I. At first, it said...
Back
Top