2020-09-14

DXVA AV1 decoding in VLC

As AV1, the royalty free video codec, is getting some traction, hardware decoding is finally arriving. On PC the new NVIDIA RTX 30 GPUs and Intel Tiger Lake/Xe GPUs can support hardware decoding of 8-bit and 10-bit 4:2:0 sources (the most common formats for online videos). That's sufficient to decode 4K HDR content with moderate power usage.

dav1d, the AV1 software decoder, has been adopted by almost everyone. It's the fastest on all CPU platforms. So it was only natural to add hardware decoder in there to make it even faster when the hardware can help. With the help of some test code by one of the DXVA AV1 spec author, after some fixing tweaking, testing, wrapping in VLC I managed to get AV1 decoding on the NVIDIA and Intel (test) hardware on Windows 10.

Since the DXVA AV1 spec is still marked as subject to changes and Microsoft didn't add the relevant structures in its latest Windows 10.0.19041.0 SDK, the code is still in alpha status and not merged in dav1d/VLC (and wine/mingw64 which are the toolchains VLC uses).

If you have any of these hardware and want to test AV1 decoding you can download this special VLC 4.0 build. Only Windows 7/8/10 64 bits is supported. It is signed by VideoLAN to make sure it's legit.

You also get a glimpse of the new VLC 4.0 refeshed UI and medialibrary as a bonus.

2019-12-08

Happy Birthday Matroska

On Friday Matroska turned 17. There was a celebration at the No Time To Wait 4 conference in Budapest with some nice cake with the MKV initials. I wanted to make a small speech for the occasion but didn't find the opportunity (and then I had a train back to France to catch). So here's a few of the things I would have said and more.

NTTW 4 MKV17 Cake


It was nice to be back at No Time To Wait. It's such a fantastic gathering of archivists and developers and the atmosphere is always great. It's nice to feel welcome although I am not an archivist and know little of all the things they have to go through in their job. Although every time after NTTW I feel like I know a bit more and thus can help more.

The work they do with Matroska (combined with FFV1) is exactly one of the main use Matroska was designed for. Except neither me nor the other creators of Matroska ever dreamed it would ever be used professionally for prestigious archival like at the BFI (British Film Institute) and possibly at the Library of Congress in the USA. Of course there was already traction in the "corporate" world as MKV is used for all kinds of video sharing as files and also the basis of WebM. It's officially supported in OSes like Windows 10 or Android. But the archival world is a different thing. It's not only about sharing the latest movie//TV show you got, but it's keeping important content for a long time. IMO it has a deeper impact in the long term and some historical, political and artistic value that can't be beaten. This is also what makes it so special to me as I always try to find an extra bit of "soul" in whatever I do. It's not just about writing some code or documentation. It's also a great motivation knowing it will benefit something bigger.

It's amazing what a hobby project started (forked) 17 years ago has become. In my mind (and probably all other involved) we were driven by the same spirit that was on the Internet during that time. Creating something great for free and possibly challenge the "corporate" world. One of the U in my robUx4 nickname stands for Utopia. That was always part of the goal.

Just talking about why some details of Matroska and how we came to the conclusion of that detail always remind me of the amount of work we put in this, all the challenges we faced (like trying to be as good as ogg for streaming). It's fun having to go back to all these memories when we're doing the IETF specifications and realize the things we got right as complete n00bs and some we got wrong (nanoseconds non rationale timestamps that we called timecodes, to show how n00b we were, the clock was even in floating point because in the analog world clocks aren't perfect). Matroska was designed to last 10 years in a time were there was a new video codec every 3 months. It was still hard to predict the full evolution of things (no VR for example). The challenges posed by long term archival is also interresting. Here we have to support all kinds of sources (analog and digital) with very specific characteristics (and keep everything as does RAWcooked thanks to attachments). There's still plenty of areas to improve (Timecodes, Bayer support). After 17 years, Matroska still has room to grow.

I am very grateful to have meet all kinds of new people who care deeply about the work we do on Matroska and see again familiar faces who care at least as much. You have no idea how great is to have all of your support and that you took a leap of faith choosing Matroska when we just started making proper specifications. From NTTW1 it was not very clear whether people would actually use Matroska at all.

I would like to thank in particular the organizers of the event: Dave, Jerome, Ashley, Alessandra and Zsuzsa. It's a privilege to be welcome in your community.

👓👓

Budapest ShowYourStripes

2019-03-10

Windows Video Playback Performance

As a VLC developer I have spent a lot of time working on video decoding and displaying for Windows, especially with Direct3D11. VLC 3.0 is the result of that work and we keep on improving it.

Despite that, people are still complaining about the performance of VLC compared to other Windows players. I know we did a good job, so I wanted to know where we're at regarding the performance.

I tested the following software with various source files on Windows 64:
All players have been used with their default settings, fresh from installation.

My system is an i7-8700 and I used the integrated Intel 630 GPU connected to a 2560x1440 display at 120Hz connected by DisplayPort. It has 12 logical threads at up to 4 GHz, so CPU decoding is supposedly OK.

Here are the results for the various test files:

Sony Camp HEVC HDR 4K 60 fps


This is the main sample I used when working on HDR. It has a high bitrate that I have a hard time playing over my NAS and even locally it can stutter in the hardware decoder (we only found a fix for that recently). Apart from 8K and AV1 that's pretty much the hardest thing to decode right now. Not only that but the HDR content needs to be handled properly. In my case the screen is not HDR so tone mapping has to be applied in the player (the HDR mode of Windows is not enabled).


PlayerCPU %Memory UsageGPU 3DGPU DecodeGPU ProcessorSmooth PlaybackColoursRealtime
VLC2184040400yesokyes
MPV1008356500stuttersdarkyes
MPC-BE21250603041notoo brightno
MPC-HC11054606053yeswashed outyes
Movies & TV165060400yessaturatedyes
Kodi21045404055yeswashed outyes
MPV Ctrl+H293280450yesdarkyes

The first thing noticeable is that by default MPV doesn't use the GPU to decode this file. You have to manually tell it to do it. The last line adds the performance of MPV with hardware decoding (d3d11va) on.

The second thing is that apart from VLC, no player display the HDR colours/luminance correctly (there is a SDR version of the same file for comparison, but I don't know how official it is, I also compare to what my HDR TV does). It's surprising from MPV as the tone mapping in VLC is inspired by their code.

The third thing is that MPC-BE cannot play this file in real time, even though the CPU and GPU are not maxed out. Maybe a buffering issue. The audio stops every few second and then playback resumes.

The stuttering in MPV means the 60fps of the source is not respected. The frames are either skipped or not displayed at the right time (something we fixed in VLC after some hard work).

DNCE H264 1080i 29.97fps

(found on https://kodi.wiki/view/Samples)

This sample is more simple to decode but the interlacing still needs to be done. Either by the CPU or the GPU.

PlayerCPU %Memory UsageGPU 3DGPU DecodeGPU ProcessorSmooth PlaybackDeinterlaced
VLC2399301218yesyes
MPV7133700yesno
MPC-BE1306601230veryyes
MPC-HC1340401230yesyes
Movies & TV216201127yesyes
Kodi230773928yesyes
MPV Ctrl+H / D1160141135yesyes

The last column shouldn't be there, but by default MPV doesn't deinterlace the file. You have to press the d key to enable deinterlacing. The last line adds the performance of MPV with hardware decoding and deinterlacing on.

MPC-BE seems to double the original frame rate by default and interpolate between frames (soap opera effect). It may be good for sport but this is not a sport sample...

Movies & TV is impressing as it manages to display the content with 0% GPU 3D usage. It likely because they do all the processing in the Video Processing and nothing during display. That's an area we could improve in VLC.

Big Buck Bunny H264 1080p 30fps


This is the most common kind of file people are playing (apart from 720p files).

PlayerCPU %Memory UsageGPU 3DGPU DecodeGPU Processor
VLC1418990
MPV3148800
MPC-BE0280301013
MPC-HC1265301012
Movies & TV195090
Kodi223570625
MPV Ctrl+H01048100

As expected the CPU usage is negligeable. The DirectShow based players seems to take a lot of GPU to display this simple file. And Kodi even more, even though it's using less GPU to decode. Not sure why they need some GPU processing here, maybe color conversion which VLC does in the shader. That would explain the extra GPU processor for the 1080i sample as well.

Freedom '90 Music Video Outtakes VP9 1080p

(from YouTube)

If you watch a lot of YouTube there's a chance you might be decoding VP9 so I tested that as well. This is decoded by the GPU.

PlayerCPU %Memory UsageGPU 3DGPU DecodeGPU ProcessorPicture Quality
VLC1196860normal
MPV1100600normal
MPC-BE021540610macroblocks
MPC-HC218330013macroblocks
Movies & TV076167normal
Kodi328070525macroblocks
MPV Ctrl+H166660normal

In this case MPC-HC, MPC-BE and Kodi show noticeable macroblocks that the other players don't have.

LG 4K Tech Demo HEVC 60 fps 


A more regular 4K file that has no HDR, so should have less to do in the GPU.

PlayerCPU %Memory UsageGPU 3DGPU DecodeGPU ProcessorSmooth PlaybackRealtime
VLC4121518650yesyes
MPV346153000yesyes
MPC-BE2840704060nono
MPC-HC2765605065yesyes
Movies & TV129315700yesyes
Kodi3485604555yesyes
MPV Ctrl+H346933000yesyes

As with the HDR sample, MPC-BE can't play this file in realtime. The audio stops once in a while.

Despite the request to enable hardware decoding, MPV doesn't seem to be using it.

Movies & TV does an impressive job of using little memory.

Conclusion


VLC seems to be the overall best player with Movies & TV for all this content. The main drawback of VLC is currently the memory usage. It's possible to decrease it by using --avcodec-threads=1 but if you set this, you may have problems playing files your GPU can't decode.

We are working on this memory consumption which should be reduced in all cases for VLC 4.0.

2017-11-19

The Many Fathers of Matroska

I think I'm done giving talks about Matroska for this year. And one of the thing that bothers me each time (to the point I might embarrass people unwillingly, sorry Kieran O'Leary) is that I take credit for all of Matroska although there were many people involved almost as much as me during its long birth. So I would like to set the record straight for posterity.
Also I say fathers because it was all men (or boys) involved. Only Liisachan on Doom9 was involved in creating the original logo.

Lasse Kärkkäinen (FI)


Lasse is the creator of MCF. The project that Matroska was forked from. Although forks are usually not a great idea, there was so many differences between his original format and how we turned it into what is now Matroska that we couldn't continue working on the same project. We agreed to disagree and went on separate ways. But there were no hard feelings, we met on a few occasions after that. He even asked me for a letter recommendation for a job in Finland once.

Frank Klemm (DE)


One of the key difference between MCF and Matroska is the use of EBML. And one of the key feature of EBML is the way header values are coded in an UTF-8 like manner. This was Frank's idea. And it gave a great boost to the format and why going back to MCF was not possible after that.
Frank was one of the developper of MPC (Musepack) codec which combined lossless and lossy audio compression in the same format. People were so happy with his work that there was a crowdfunding (which didn't exist at the time) on Doom9 to buy him a new PC.

Christian HJ Wiesner (DE)


Christian is not a developer. He's not really a technical guy either. But he liked so much what we were doing that he was organizing everything around the project. He was also the first to join me  when I created the fork on Sourceforge. He's also the one who organized the crowdfunding for Frank Klemm and delivered him his PC. He also held the matroska.org domain safe for a long time which he then donated to the Matroska non-profit.

John Cannon, Paul Bryson, Jory Stone (USA)


Apart from Frank and I they were the main input to make changes to MCF that ended up as Matroska. IIRC John Cannon was the one to suggest that the Matryoshka name were planning to use was too complicated for USAns and reduce it to Matroska.

Alexander Noe (DE)


Alexander was also developing an AVI muxer and a Matroska muxer at the same time we created libebml/libmatroska. He gave a lot of input on the format and some refinement which helped a lot. He later turned into artificial intelligence, so I guess he's a millionaire now.

Moritz Bunkus


Everyone who has dealt with Matroska has been using mkvtoolnix at some point. It's almost entirely done by Moritz. He joined the project a bit later after it was almost stable. At the time he was working on an OGM tool for Linux and got interrested in doing the same for Matroska. It became mkvmerge. Since then he has been the main maintainer of the Matroska libraries and the main Matroska tool. He's also part of the non-profit.

Михаил "Haali" Мацнев (RU)


Mike create the famous Haali DirectShow demuxer based on his own C library. He also worked a lot on the Segment linking, even doing his own version that was easy to use with DirectShow (but not really clean standardwise). Most people have been playing Matroska files using his code for a long time.

Ludovic Vialle / Dan Marlin (FR/US)


Ludovic is the one that got me into this. I was looking for a container to replace AVI and MPEG PS and he pointed me in the MCF direction. He was working on his own DirectShow player at the time and later founded CoreCodec with Dan Marlin. Corecodec has helped a lot in the Matroska development, helping with the website and mailing lists hosting. At some point we also had our own web forum. They also worked a lot on cleaning the specs that are currently on matroska.org. I also worked there for many years and later with Ludovic's other company LevelUp Studios. Ludovic is also part of the Matroska non-profit.

For reference there's also a longer list of people involved on our website. This list also contains a lot of people who helped develop the many softwares you might have used. It should be updated with all the people involved in CELLAR like Dave Rice, Ashley Blewer, Jerome Martinez, Reto Kromer, Michael Bradshaw, Martin Below or Tobias Rapp.

2017-11-12

Still No Time To Wait

The second edition of No Time To Wait was a success. It's a conference where archivists meet the developers of the software and formats they use or might use.

Since last year a lot has changed. We were advocating people to use Matroska and FFV1 because they meet their needs in a very good manner. This year we heard many stories of people who actually did the move and are happy about it. Reto Kromer even made a presentation explaining he actually does the conversation on the fly when transferring between tapes.

One presentation particularly caught my attention: the look for the perfect player by Aghate Jarczyk (University Of The Arts, Bern). Working daily on improving VLC that's certainly something we want to do and make every user happy, even in a professional way not just for casual file playback. It turns out many of the issues mentioned, preventing the switch from QuickTime Pro 7, are already solved. Here's the list:
  • display metadata from the file. It's there with Ctrl+i (Cmd+i on macOS I suppose) in the metadata tab. It's not at the MediaInfo level but useful nonetheless. It's also refreshed during playback so if you switch between formats midstream you can see it there. It won't tell you if the data come from the codec or the container, it's aggregated by the player. If you really need that feature file an issue in Trac
  • the list of codecs used for playback. It's also available in a tab when you do Ctrl+m (Cmd+m) and can be refreshed during playback (for example with streams that have mixed interlacing). It's probably more an issue with QuickTime Pro where there might be plug-ins in the system you're not aware of. It's much less likely with VLC. It doesn't load modules compiled for an older version and usually doesn't have extra modules coming from third parties.
  • added black borders when opening a video. This is surprising as that's not the behavior on Windows or the Qt interface in general. It may be a mac version specific behavior or an option to use the "fit screen" aspect ratio. A reset of the preferences should fix that.
  • Can we display timecodes? It's technically possible, we decode them but they are not frame accurate because of our internal clock design. To be accurate it needs a redesign that we are going to do for VLC 4.0. And that version will take less time to be done that it took to do 3.0.
  • To go back one frame at a time: it's possible to use a LUA script to do that, see: https://forum.videolan.org/viewtopic.php?p=462937#p462937
  • Émilie Magnin who hosted the Format Implementation panel also mentioned the possibility of launching the player more than once at a time. It is an option that's possible on Windows and Linux but apparently it takes a little more work on macOS. You'll need an external AppleScript to do that.

There were a lot of talks about open source in general as well. Everyone is pretty much sold on the idea now and how crucial it is for archivists that they can rely on code that can be reused and tweak for decades. A guaranteed no proper software can offer. An interesting twist is that sometimes the software to play the content has to be archived as well. Usually when using proprietary solutions that might (will) die over time. Another good reason not to use that in the first place.

Some people are still not using Matroska. One of the reasons, which make sense in their context, is that it's not (yet) a standard. That is endorsed by a standards body you trust. As pointed out by Ethan Gates that level of trust may vary and totally arbitrary. For example some still use AVI even though the specifications has never gone through any of the common standards bodies (AFAIK) . This is on us, and particularly me, to make the standardization of Matroska happen and finish the work that is already on the way. The main issue being that we all do that on our free time, so we may look for funding to be done sooner rather than later. A crowdfunding was mentioned. We're going to discuss how we can make this happen (suggestions welcome). That would be a first for Matroska as we never received money for the project (apart from around 200$ of PayPal donations over 15 years).

A big thanks to all the organizers and especially Dave Rice and Jerome Martinez and to Michael Loebenstein of the Austrian Film Museum for a great venue.
My apologies to Kieran O'Leary, I promised I'd bring the VLC hat on the second day and then I forgot.

2017-11-05

Matroska versus fragmented MP4

In an earlier post I was worried that Matroska might have lost its edge compared to MP4 when it comes to overhead size. So I dug a little deeper with some real life samples from no other than Apple to see if what we could improve. It turns out that Matroska is still the best when it comes to overhead (and just about everything else).

Here are some comparison from the Apple adaptive streaming sample page. I don't how they compare to real life files, maybe they are improperly muxed but the results are always in the favor of Matroska even when large padding and tags are left in the file.

Advanced Stream

The lowest bitrate video is 530kbps according to the manifest and 369 kbps according to MediaInfo. Then I remuxed it with mkvmerge. Then go through mkclean and here are the results:
  • 27 672 619 original fMP4 with H264
  • 27 449 794 mkvmerge with default options (we win already)
  • 27 447 068 mkclean with default options
  • 27 439 197 mkclean with --live
  • 27 357 090 mkclean with --remux --optimize
  • 27 349 220 mkclean with --remux --optimize --live
The normal usage when preparing with for streaming would be mkclean with --remux --optimize and that gives a 1.1% size advantage that could be better used for the codec. That stream even includes checksums, tags and is fully seekable.

Advanced Stream HEVC

Here Matroska doesn't have the advantage of using Header Compression as with H264, which saves 3 bytes per frame as they are always the same. The 145 kbps is also closer to the limit of everyday files.
  • 11 492 052 original fMP4 with HEVC
  • 11 410 786 mkvmerge with default options (we win already)
  • 11 407 257 mkclean with default options
  • 11 371 266 mkclean with --remux --optimize
But we're still 1.1% percent smaller than the same content in fragmented MP4.

Advanced Stream H264

This is the same as above but in H264 format, so we get to use header compression.
  • 10 663 861 original fMP4 with H264
  • 10 558 115 mkvmerge with default options (we win already)
  • 10 554 002 mkclean with default options
  • 10 498 886 mkclean with --remux --optimize

Conclusion

So Matroska is still the best when it comes to overhead and still keeps all its advantages. Only very very small fine tuned files might actually go in favor of fMP4. I'd really like to have such real life samples if you have some.

2017-10-25

FOMS and Demuxed

On October 3rd and 4th I attended the FOMS workshop in San Francisco then Demuxed on the 5th. There were a lot discussions about video, mostly distribution and playback via web browsers. It was interesting as it’s a different take from my daily work on VLC. Vendors developed very specific techniques targeted at their particular use case, often to get around bogus past decisions or competing solutions.

As Matroska (used by WebM) was primarily designed for playback over network connections (that were slow at the time of design) it was interesting to see if we can cover all these use cases in an optimal way. It is especially important to remain relevant as the AV1 codec is coming soon. It seems to be getting huge traction already and might end up being the main codec everyone uses in the years to come, especially for web videos. Even though it’s targeted at high quality it seems people want to use it ASAP for very low bitrates. I suppose the quality gain for the same bitrate is even more significant there.
FOMS 2017

Two subjects particularly caught my attention in terms of challenges for the container.

Extremely low latency

It seems a lot of companies are looking at reducing the time between the moment something happens and the time it’s displayed on your screen. In the age of Twitter it sucks to see a goal or other (e)sport event happening on your feed before you actually get to see it. In games it also means the people streaming the game can interact in real time with what people are seeing.

Due to the nature of video encoding you can hardly get lower than one frame delay (17 ms in 60fps) and the transmission latency (10 ms if you have an incredible ping). But right now the target is more around a few second or a single second. One of the issue here is how adaptive streaming is currently used. It encodes a bunch of frames and then tell the user it’s available (in various bitrates). That’s because the container needs to know all the frames it contains before it can actually be used. So they wrap about 1s of video to have a minimum latency.

Matroska and EBML have a mode called live streaming. It allows writing frames as they come in and never rewriting the beginning of the file to tell how much data it contains or where the data actually are. So you can start reading the file even while it’s being written. Many years ago GStreamer was used to stream conferences that way (without even an actual file being written) and that’s how VLC 3.0 sends videos to the Chromecast. This is also how most Matroska/WebM muxers work. They write in “live streaming” mode by default: they write a special “unknown” value in the length field and when the size is known this value is overwritten. So a streamer can create files on the fly that people could start reading. And when the file is done write the proper values so that the next people reading from that file actually get proper values they can use to seek.

I hope the web people get a look at this as it would allow to go way below the 1s latency target they currently have. It would also work for adaptive streaming as you still get Clusters that you can cut in many parts on a CDN as currently done for WebM. This solution is already compatible with most Matroska/WebM readers. It’s been in our basic tests suite for at least 7 years.

CMAF

I learned the existence of a new MP4 variant called CMAF (Common Media Application Format). It’s an ISOBMFF profile based on Fragmented MP4 (fMP4). It was developed by Microsoft and Apple. The goal was to use a similar format between DASH and HLS to reduce the cost of storage on CDNs and get better caching. In the end it might not be of much use because the different vendors don’t support the same DRM systems and so at least 2 variants of the same content will still be needed.

This is an interesting challenge for Matroska as with AV1 coming there will be a battle for what container to use to distribute videos. It’s not the main adoption issue anymore though. For example Apple only supported HLS with MPEG TS until iOS10 so many Javascript frameworks remux the incoming fragmented fMP4 to TS on the fly and feed that to iOS.

Regular MP4 files were not meant to be good for progressive downloading, nor fragmented playback needed for adaptive streaming as the index was needed for playback and so needed to be loaded beforehand and not necessarily at the front of the file. The overhead (the amount of data the container adds on top of the actual codec data) wasn’t not great either. So far it was a key advantage towards Matroska/WebM as these were two of the main criteria when the format was designed 15 years ago. There were cases where MP4 could be smaller by at the price of using compressed headers. The situation changes with fMP4 and CMAF. In fact the overhead is slightly lower than Matroska/WebM. And that’s pretty much the only advantage it has over Matroska.

On a 25 MB file of 44 kbps (where overhead is really hurting) the difference between the fMP4 file and one passed through mkclean is 77 KB or 0.3%. It may seem peanuts, especially for such a small bitrate, but I think Matroska should do better.

Looking at the fMP4 file, it seems the frames are all packed in a blob and the boundaries between each frame in a separate blob (‘trun’ box). And that’s about it. It must only work with fixed frame rates and probably allows no frame drop. But that’s efficient for the use case of web video over CDNs that were encoded and muxed for that special purpose. There’s hardly any overhead apart from the regular track header.

One way Matroska could be improved for such a case would be to allow frame lacing for video. It is already used heavily for audio to reduce the overhead and since audio doesn’t need a timestamp for each block, the sampling rate is enough (except when there are drops during recording, in which case lacing is not used). We could allow lacing video frames as long as the default duration for the track is set (similar to a frame rate) and that each frame has the same characteristics in the Matroska Block, especially the keyframe flag. So keyframes would stand alone and many other video frames could be laced to reduce the overhead, the same way it’s done for audio. With such a small bitrate it could make a significant difference. On higher bitrates not really, but the overhead difference between fMP4 and Matroska is probably small if not at the advantage of Matroska in this case (thanks to header compression).

I will submit the proposal to the CELLAR workgroup of the IETF (Internet Engineering Task Force), a group that is currently working on specifying properly EBML, Matroska but also FFv1 and FLAC. This is not a big change, it’s just something that we didn’t allow before. And because it’s already in use for audio in just about every Matroska/WebM file that exists, the parsing already exists in current players and may work out of the box with video frame lacing. It doesn’t add any new element.

The advantages of Matroska over MP4 remain the same for fMP4.
Demuxed 2017

TL;DR

Matroska has a lot to offer to web distribution, like one frame latency at scale not possible with ISOBMFF formats, doesn’t require new designs for current and future use cases and is the most open and free solution.