2017-11-19

The Many Fathers of Matroska

I think I'm done giving talks about Matroska for this year. And one of the thing that bothers me each time (to the point I might embarrass people unwillingly, sorry Kieran O'Leary) is that I take credit for all of Matroska although there were many people involved almost as much as me during its long birth. So I would like to set the record straight for posterity.
Also I say fathers because it was all men (or boys) involved. Only Liisachan on Doom9 was involved in creating the original logo.

Lasse Kärkkäinen (FI)


Lasse is the creator of MCF. The project that Matroska was forked from. Although forks are usually not a great idea, there was so many differences between his original format and how we turned it into what is now Matroska that we couldn't continue working on the same project. We agreed to disagree and went on separate ways. But there were no hard feelings, we met on a few occasions after that. He even asked me for a letter recommendation for a job in Finland once.

Frank Klemm (DE)


One of the key difference between MCF and Matroska is the use of EBML. And one of the key feature of EBML is the way header values are coded in an UTF-8 like manner. This was Frank's idea. And it gave a great boost to the format and why going back to MCF was not possible after that.
Frank was one of the developper of MPC (Musepack) codec which combined lossless and lossy audio compression in the same format. People were so happy with his work that there was a crowdfunding (which didn't exist at the time) on Doom9 to buy him a new PC.

Christian HJ Wiesner (DE)


Christian is not a developer. He's not really a technical guy either. But he liked so much what we were doing that he was organizing everything around the project. He was also the first to join me  when I created the fork on Sourceforge. He's also the one who organized the crowdfunding for Frank Klemm and delivered him his PC. He also held the matroska.org domain safe for a long time which he then donated to the Matroska non-profit.

John Cannon, Paul Bryson, Jory Stone (USA)


Apart from Frank and I they were the main input to make changes to MCF that ended up as Matroska. IIRC John Cannon was the one to suggest that the Matryoshka name were planning to use was too complicated for USAns and reduce it to Matroska.

Alexander Noe (DE)


Alexander was also developing an AVI muxer and a Matroska muxer at the same time we created libebml/libmatroska. He gave a lot of input on the format and some refinement which helped a lot. He later turned into artificial intelligence, so I guess he's a millionaire now.

Moritz Bunkus


Everyone who has dealt with Matroska has been using mkvtoolnix at some point. It's almost entirely done by Moritz. He joined the project a bit later after it was almost stable. At the time he was working on an OGM tool for Linux and got interrested in doing the same for Matroska. It became mkvmerge. Since then he has been the main maintainer of the Matroska libraries and the main Matroska tool. He's also part of the non-profit.

Михаил "Haali" Мацнев (RU)


Mike create the famous Haali DirectShow demuxer based on his own C library. He also worked a lot on the Segment linking, even doing his own version that was easy to use with DirectShow (but not really clean standardwise). Most people have been playing Matroska files using his code for a long time.

Ludovic Vialle / Dan Marlin (FR/US)


Ludovic is the one that got me into this. I was looking for a container to replace AVI and MPEG PS and he pointed me in the MCF direction. He was working on his own DirectShow player at the time and later founded CoreCodec with Dan Marlin. Corecodec has helped a lot in the Matroska development, helping with the website and mailing lists hosting. At some point we also had our own web forum. They also worked a lot on cleaning the specs that are currently on matroska.org. I also worked there for many years and later with Ludovic's other company LevelUp Studios. Ludovic is also part of the Matroska non-profit.

For reference there's also a longer list of people involved on our website. This list also contains a lot of people who helped develop the many softwares you might have used. It should be updated with all the people involved in CELLAR like Dave Rice, Ashley Blewer, Jerome Martinez, Reto Kromer, Michael Bradshaw, Martin Below or Tobias Rapp.

2017-11-12

Still No Time To Wait

The second edition of No Time To Wait was a success. It's a conference where archivists meet the developers of the software and formats they use or might use.

Since last year a lot has changed. We were advocating people to use Matroska and FFV1 because they meet their needs in a very good manner. This year we heard many stories of people who actually did the move and are happy about it. Reto Kromer even made a presentation explaining he actually does the conversation on the fly when transferring between tapes.

One presentation particularly caught my attention: the look for the perfect player by Aghate Jarczyk (University Of The Arts, Bern). Working daily on improving VLC that's certainly something we want to do and make every user happy, even in a professional way not just for casual file playback. It turns out many of the issues mentioned, preventing the switch from QuickTime Pro 7, are already solved. Here's the list:
  • display metadata from the file. It's there with Ctrl+i (Cmd+i on macOS I suppose) in the metadata tab. It's not at the MediaInfo level but useful nonetheless. It's also refreshed during playback so if you switch between formats midstream you can see it there. It won't tell you if the data come from the codec or the container, it's aggregated by the player. If you really need that feature file an issue in Trac
  • the list of codecs used for playback. It's also available in a tab when you do Ctrl+m (Cmd+m) and can be refreshed during playback (for example with streams that have mixed interlacing). It's probably more an issue with QuickTime Pro where there might be plug-ins in the system you're not aware of. It's much less likely with VLC. It doesn't load modules compiled for an older version and usually doesn't have extra modules coming from third parties.
  • added black borders when opening a video. This is surprising as that's not the behavior on Windows or the Qt interface in general. It may be a mac version specific behavior or an option to use the "fit screen" aspect ratio. A reset of the preferences should fix that.
  • Can we display timecodes? It's technically possible, we decode them but they are not frame accurate because of our internal clock design. To be accurate it needs a redesign that we are going to do for VLC 4.0. And that version will take less time to be done that it took to do 3.0.
  • To go back one frame at a time: it's possible to use a LUA script to do that, see: https://forum.videolan.org/viewtopic.php?p=462937#p462937
  • Émilie Magnin who hosted the Format Implementation panel also mentioned the possibility of launching the player more than once at a time. It is an option that's possible on Windows and Linux but apparently it takes a little more work on macOS. You'll need an external AppleScript to do that.

There were a lot of talks about open source in general as well. Everyone is pretty much sold on the idea now and how crucial it is for archivists that they can rely on code that can be reused and tweak for decades. A guaranteed no proper software can offer. An interesting twist is that sometimes the software to play the content has to be archived as well. Usually when using proprietary solutions that might (will) die over time. Another good reason not to use that in the first place.

Some people are still not using Matroska. One of the reasons, which make sense in their context, is that it's not (yet) a standard. That is endorsed by a standards body you trust. As pointed out by Ethan Gates that level of trust may vary and totally arbitrary. For example some still use AVI even though the specifications has never gone through any of the common standards bodies (AFAIK) . This is on us, and particularly me, to make the standardization of Matroska happen and finish the work that is already on the way. The main issue being that we all do that on our free time, so we may look for funding to be done sooner rather than later. A crowdfunding was mentioned. We're going to discuss how we can make this happen (suggestions welcome). That would be a first for Matroska as we never received money for the project (apart from around 200$ of PayPal donations over 15 years).

A big thanks to all the organizers and especially Dave Rice and Jerome Martinez and to Michael Loebenstein of the Austrian Film Museum for a great venue.
My apologies to Kieran O'Leary, I promised I'd bring the VLC hat on the second day and then I forgot.

2017-11-05

Matroska versus fragmented MP4

In an earlier post I was worried that Matroska might have lost its edge compared to MP4 when it comes to overhead size. So I dug a little deeper with some real life samples from no other than Apple to see if what we could improve. It turns out that Matroska is still the best when it comes to overhead (and just about everything else).

Here are some comparison from the Apple adaptive streaming sample page. I don't how they compare to real life files, maybe they are improperly muxed but the results are always in the favor of Matroska even when large padding and tags are left in the file.

Advanced Stream

The lowest bitrate video is 530kbps according to the manifest and 369 kbps according to MediaInfo. Then I remuxed it with mkvmerge. Then go through mkclean and here are the results:
  • 27 672 619 original fMP4 with H264
  • 27 449 794 mkvmerge with default options (we win already)
  • 27 447 068 mkclean with default options
  • 27 439 197 mkclean with --live
  • 27 357 090 mkclean with --remux --optimize
  • 27 349 220 mkclean with --remux --optimize --live
The normal usage when preparing with for streaming would be mkclean with --remux --optimize and that gives a 1.1% size advantage that could be better used for the codec. That stream even includes checksums, tags and is fully seekable.

Advanced Stream HEVC

Here Matroska doesn't have the advantage of using Header Compression as with H264, which saves 3 bytes per frame as they are always the same. The 145 kbps is also closer to the limit of everyday files.
  • 11 492 052 original fMP4 with HEVC
  • 11 410 786 mkvmerge with default options (we win already)
  • 11 407 257 mkclean with default options
  • 11 371 266 mkclean with --remux --optimize
But we're still 1.1% percent smaller than the same content in fragmented MP4.

Advanced Stream H264

This is the same as above but in H264 format, so we get to use header compression.
  • 10 663 861 original fMP4 with H264
  • 10 558 115 mkvmerge with default options (we win already)
  • 10 554 002 mkclean with default options
  • 10 498 886 mkclean with --remux --optimize

Conclusion

So Matroska is still the best when it comes to overhead and still keeps all its advantages. Only very very small fine tuned files might actually go in favor of fMP4. I'd really like to have such real life samples if you have some.