2017-11-19

The Many Fathers of Matroska

I think I'm done giving talks about Matroska for this year. And one of the thing that bothers me each time (to the point I might embarrass people unwillingly, sorry Kieran O'Leary) is that I take credit for all of Matroska although there were many people involved almost as much as me during its long birth. So I would like to set the record straight for posterity.
Also I say fathers because it was all men (or boys) involved. Only Liisachan on Doom9 was involved in creating the original logo.

Lasse Kärkkäinen (FI)


Lasse is the creator of MCF. The project that Matroska was forked from. Although forks are usually not a great idea, there was so many differences between his original format and how we turned it into what is now Matroska that we couldn't continue working on the same project. We agreed to disagree and went on separate ways. But there were no hard feelings, we met on a few occasions after that. He even asked me for a letter recommendation for a job in Finland once.

Frank Klemm (DE)


One of the key difference between MCF and Matroska is the use of EBML. And one of the key feature of EBML is the way header values are coded in an UTF-8 like manner. This was Frank's idea. And it gave a great boost to the format and why going back to MCF was not possible after that.
Frank was one of the developper of MPC (Musepack) codec which combined lossless and lossy audio compression in the same format. People were so happy with his work that there was a crowdfunding (which didn't exist at the time) on Doom9 to buy him a new PC.

Christian HJ Wiesner (DE)


Christian is not a developer. He's not really a technical guy either. But he liked so much what we were doing that he was organizing everything around the project. He was also the first to join me  when I created the fork on Sourceforge. He's also the one who organized the crowdfunding for Frank Klemm and delivered him his PC. He also held the matroska.org domain safe for a long time which he then donated to the Matroska non-profit.

John Cannon, Paul Bryson, Jory Stone (USA)


Apart from Frank and I they were the main input to make changes to MCF that ended up as Matroska. IIRC John Cannon was the one to suggest that the Matryoshka name were planning to use was too complicated for USAns and reduce it to Matroska.

Alexander Noe (DE)


Alexander was also developing an AVI muxer and a Matroska muxer at the same time we created libebml/libmatroska. He gave a lot of input on the format and some refinement which helped a lot. He later turned into artificial intelligence, so I guess he's a millionaire now.

Moritz Bunkus


Everyone who has dealt with Matroska has been using mkvtoolnix at some point. It's almost entirely done by Moritz. He joined the project a bit later after it was almost stable. At the time he was working on an OGM tool for Linux and got interrested in doing the same for Matroska. It became mkvmerge. Since then he has been the main maintainer of the Matroska libraries and the main Matroska tool. He's also part of the non-profit.

Михаил "Haali" Мацнев (RU)


Mike create the famous Haali DirectShow demuxer based on his own C library. He also worked a lot on the Segment linking, even doing his own version that was easy to use with DirectShow (but not really clean standardwise). Most people have been playing Matroska files using his code for a long time.

Ludovic Vialle / Dan Marlin (FR/US)


Ludovic is the one that got me into this. I was looking for a container to replace AVI and MPEG PS and he pointed me in the MCF direction. He was working on his own DirectShow player at the time and later founded CoreCodec with Dan Marlin. Corecodec has helped a lot in the Matroska development, helping with the website and mailing lists hosting. At some point we also had our own web forum. They also worked a lot on cleaning the specs that are currently on matroska.org. I also worked there for many years and later with Ludovic's other company LevelUp Studios. Ludovic is also part of the Matroska non-profit.

For reference there's also a longer list of people involved on our website. This list also contains a lot of people who helped develop the many softwares you might have used. It should be updated with all the people involved in CELLAR like Dave Rice, Ashley Blewer, Jerome Martinez, Reto Kromer, Michael Bradshaw, Martin Below or Tobias Rapp.

2017-11-12

Still No Time To Wait

The second edition of No Time To Wait was a success. It's a conference where archivists meet the developers of the software and formats they use or might use.

Since last year a lot has changed. We were advocating people to use Matroska and FFV1 because they meet their needs in a very good manner. This year we heard many stories of people who actually did the move and are happy about it. Reto Kromer even made a presentation explaining he actually does the conversation on the fly when transferring between tapes.

One presentation particularly caught my attention: the look for the perfect player by Aghate Jarczyk (University Of The Arts, Bern). Working daily on improving VLC that's certainly something we want to do and make every user happy, even in a professional way not just for casual file playback. It turns out many of the issues mentioned, preventing the switch from QuickTime Pro 7, are already solved. Here's the list:
  • display metadata from the file. It's there with Ctrl+i (Cmd+i on macOS I suppose) in the metadata tab. It's not at the MediaInfo level but useful nonetheless. It's also refreshed during playback so if you switch between formats midstream you can see it there. It won't tell you if the data come from the codec or the container, it's aggregated by the player. If you really need that feature file an issue in Trac
  • the list of codecs used for playback. It's also available in a tab when you do Ctrl+m (Cmd+m) and can be refreshed during playback (for example with streams that have mixed interlacing). It's probably more an issue with QuickTime Pro where there might be plug-ins in the system you're not aware of. It's much less likely with VLC. It doesn't load modules compiled for an older version and usually doesn't have extra modules coming from third parties.
  • added black borders when opening a video. This is surprising as that's not the behavior on Windows or the Qt interface in general. It may be a mac version specific behavior or an option to use the "fit screen" aspect ratio. A reset of the preferences should fix that.
  • Can we display timecodes? It's technically possible, we decode them but they are not frame accurate because of our internal clock design. To be accurate it needs a redesign that we are going to do for VLC 4.0. And that version will take less time to be done that it took to do 3.0.
  • To go back one frame at a time: it's possible to use a LUA script to do that, see: https://forum.videolan.org/viewtopic.php?p=462937#p462937
  • Émilie Magnin who hosted the Format Implementation panel also mentioned the possibility of launching the player more than once at a time. It is an option that's possible on Windows and Linux but apparently it takes a little more work on macOS. You'll need an external AppleScript to do that.

There were a lot of talks about open source in general as well. Everyone is pretty much sold on the idea now and how crucial it is for archivists that they can rely on code that can be reused and tweak for decades. A guaranteed no proper software can offer. An interesting twist is that sometimes the software to play the content has to be archived as well. Usually when using proprietary solutions that might (will) die over time. Another good reason not to use that in the first place.

Some people are still not using Matroska. One of the reasons, which make sense in their context, is that it's not (yet) a standard. That is endorsed by a standards body you trust. As pointed out by Ethan Gates that level of trust may vary and totally arbitrary. For example some still use AVI even though the specifications has never gone through any of the common standards bodies (AFAIK) . This is on us, and particularly me, to make the standardization of Matroska happen and finish the work that is already on the way. The main issue being that we all do that on our free time, so we may look for funding to be done sooner rather than later. A crowdfunding was mentioned. We're going to discuss how we can make this happen (suggestions welcome). That would be a first for Matroska as we never received money for the project (apart from around 200$ of PayPal donations over 15 years).

A big thanks to all the organizers and especially Dave Rice and Jerome Martinez and to Michael Loebenstein of the Austrian Film Museum for a great venue.
My apologies to Kieran O'Leary, I promised I'd bring the VLC hat on the second day and then I forgot.

2017-11-05

Matroska versus fragmented MP4

In an earlier post I was worried that Matroska might have lost its edge compared to MP4 when it comes to overhead size. So I dug a little deeper with some real life samples from no other than Apple to see if what we could improve. It turns out that Matroska is still the best when it comes to overhead (and just about everything else).

Here are some comparison from the Apple adaptive streaming sample page. I don't how they compare to real life files, maybe they are improperly muxed but the results are always in the favor of Matroska even when large padding and tags are left in the file.

Advanced Stream

The lowest bitrate video is 530kbps according to the manifest and 369 kbps according to MediaInfo. Then I remuxed it with mkvmerge. Then go through mkclean and here are the results:
  • 27 672 619 original fMP4 with H264
  • 27 449 794 mkvmerge with default options (we win already)
  • 27 447 068 mkclean with default options
  • 27 439 197 mkclean with --live
  • 27 357 090 mkclean with --remux --optimize
  • 27 349 220 mkclean with --remux --optimize --live
The normal usage when preparing with for streaming would be mkclean with --remux --optimize and that gives a 1.1% size advantage that could be better used for the codec. That stream even includes checksums, tags and is fully seekable.

Advanced Stream HEVC

Here Matroska doesn't have the advantage of using Header Compression as with H264, which saves 3 bytes per frame as they are always the same. The 145 kbps is also closer to the limit of everyday files.
  • 11 492 052 original fMP4 with HEVC
  • 11 410 786 mkvmerge with default options (we win already)
  • 11 407 257 mkclean with default options
  • 11 371 266 mkclean with --remux --optimize
But we're still 1.1% percent smaller than the same content in fragmented MP4.

Advanced Stream H264

This is the same as above but in H264 format, so we get to use header compression.
  • 10 663 861 original fMP4 with H264
  • 10 558 115 mkvmerge with default options (we win already)
  • 10 554 002 mkclean with default options
  • 10 498 886 mkclean with --remux --optimize

Conclusion

So Matroska is still the best when it comes to overhead and still keeps all its advantages. Only very very small fine tuned files might actually go in favor of fMP4. I'd really like to have such real life samples if you have some.

2017-10-25

FOMS and Demuxed

On October 3rd and 4th I attended the FOMS workshop in San Francisco then Demuxed on the 5th. There were a lot discussions about video, mostly distribution and playback via web browsers. It was interesting as it’s a different take from my daily work on VLC. Vendors developed very specific techniques targeted at their particular use case, often to get around bogus past decisions or competing solutions.

As Matroska (used by WebM) was primarily designed for playback over network connections (that were slow at the time of design) it was interesting to see if we can cover all these use cases in an optimal way. It is especially important to remain relevant as the AV1 codec is coming soon. It seems to be getting huge traction already and might end up being the main codec everyone uses in the years to come, especially for web videos. Even though it’s targeted at high quality it seems people want to use it ASAP for very low bitrates. I suppose the quality gain for the same bitrate is even more significant there.
FOMS 2017

Two subjects particularly caught my attention in terms of challenges for the container.

Extremely low latency

It seems a lot of companies are looking at reducing the time between the moment something happens and the time it’s displayed on your screen. In the age of Twitter it sucks to see a goal or other (e)sport event happening on your feed before you actually get to see it. In games it also means the people streaming the game can interact in real time with what people are seeing.

Due to the nature of video encoding you can hardly get lower than one frame delay (17 ms in 60fps) and the transmission latency (10 ms if you have an incredible ping). But right now the target is more around a few second or a single second. One of the issue here is how adaptive streaming is currently used. It encodes a bunch of frames and then tell the user it’s available (in various bitrates). That’s because the container needs to know all the frames it contains before it can actually be used. So they wrap about 1s of video to have a minimum latency.

Matroska and EBML have a mode called live streaming. It allows writing frames as they come in and never rewriting the beginning of the file to tell how much data it contains or where the data actually are. So you can start reading the file even while it’s being written. Many years ago GStreamer was used to stream conferences that way (without even an actual file being written) and that’s how VLC 3.0 sends videos to the Chromecast. This is also how most Matroska/WebM muxers work. They write in “live streaming” mode by default: they write a special “unknown” value in the length field and when the size is known this value is overwritten. So a streamer can create files on the fly that people could start reading. And when the file is done write the proper values so that the next people reading from that file actually get proper values they can use to seek.

I hope the web people get a look at this as it would allow to go way below the 1s latency target they currently have. It would also work for adaptive streaming as you still get Clusters that you can cut in many parts on a CDN as currently done for WebM. This solution is already compatible with most Matroska/WebM readers. It’s been in our basic tests suite for at least 7 years.

CMAF

I learned the existence of a new MP4 variant called CMAF (Common Media Application Format). It’s an ISOBMFF profile based on Fragmented MP4 (fMP4). It was developed by Microsoft and Apple. The goal was to use a similar format between DASH and HLS to reduce the cost of storage on CDNs and get better caching. In the end it might not be of much use because the different vendors don’t support the same DRM systems and so at least 2 variants of the same content will still be needed.

This is an interesting challenge for Matroska as with AV1 coming there will be a battle for what container to use to distribute videos. It’s not the main adoption issue anymore though. For example Apple only supported HLS with MPEG TS until iOS10 so many Javascript frameworks remux the incoming fragmented fMP4 to TS on the fly and feed that to iOS.

Regular MP4 files were not meant to be good for progressive downloading, nor fragmented playback needed for adaptive streaming as the index was needed for playback and so needed to be loaded beforehand and not necessarily at the front of the file. The overhead (the amount of data the container adds on top of the actual codec data) wasn’t not great either. So far it was a key advantage towards Matroska/WebM as these were two of the main criteria when the format was designed 15 years ago. There were cases where MP4 could be smaller by at the price of using compressed headers. The situation changes with fMP4 and CMAF. In fact the overhead is slightly lower than Matroska/WebM. And that’s pretty much the only advantage it has over Matroska.

On a 25 MB file of 44 kbps (where overhead is really hurting) the difference between the fMP4 file and one passed through mkclean is 77 KB or 0.3%. It may seem peanuts, especially for such a small bitrate, but I think Matroska should do better.

Looking at the fMP4 file, it seems the frames are all packed in a blob and the boundaries between each frame in a separate blob (‘trun’ box). And that’s about it. It must only work with fixed frame rates and probably allows no frame drop. But that’s efficient for the use case of web video over CDNs that were encoded and muxed for that special purpose. There’s hardly any overhead apart from the regular track header.

One way Matroska could be improved for such a case would be to allow frame lacing for video. It is already used heavily for audio to reduce the overhead and since audio doesn’t need a timestamp for each block, the sampling rate is enough (except when there are drops during recording, in which case lacing is not used). We could allow lacing video frames as long as the default duration for the track is set (similar to a frame rate) and that each frame has the same characteristics in the Matroska Block, especially the keyframe flag. So keyframes would stand alone and many other video frames could be laced to reduce the overhead, the same way it’s done for audio. With such a small bitrate it could make a significant difference. On higher bitrates not really, but the overhead difference between fMP4 and Matroska is probably small if not at the advantage of Matroska in this case (thanks to header compression).

I will submit the proposal to the CELLAR workgroup of the IETF (Internet Engineering Task Force), a group that is currently working on specifying properly EBML, Matroska but also FFv1 and FLAC. This is not a big change, it’s just something that we didn’t allow before. And because it’s already in use for audio in just about every Matroska/WebM file that exists, the parsing already exists in current players and may work out of the box with video frame lacing. It doesn’t add any new element.

The advantages of Matroska over MP4 remain the same for fMP4.
Demuxed 2017

TL;DR

Matroska has a lot to offer to web distribution, like one frame latency at scale not possible with ISOBMFF formats, doesn’t require new designs for current and future use cases and is the most open and free solution.