PseudoTV Development Blog: Video Formats

One of the things that PseudoTV relies in order to populate the EPG correctly is knowing the runtime of all of your videos. For the most part, XBMC tells it these in the JSON calls. Sometimes, though, it will return the oh-so-useful value of zero. This has absolutely no meaning to me. I can't put a video in the EPG with some unknown length. So I have to try and determine how long the video is myself. This is done by reading the video file directly and determining how long it is. It took me quite a bit of research to figure out how to do this on the 4 file formats that I support (MKV, MP4, FLV, and AVI), so I figured I'd post something as a reference for anyone doing this sort of thing.

All of these video formats are composed of blocks. In some cases, you need to read a block to figure out what the next block will be (and hence the amount of data to read). In other cases, the block has a size field that tells you how big the block is.

Let's start with the easiest of the 4 file types:

MP4
MP4 Parser Python code

Each block in MP4 looks like this:

Block size - 4 bytes. This is the entire size of this block, including the standard fields (size, type, etc).
Box type - 4 bytes. A small string of 4 characters that tell you what type of box it is (header, video, audio, etc).

If the box size is 1, then this means there is another field here to get the real size:
Real block size - 8 bytes. This just allows the box size to be bigger than 4 bytes.

If the box type is 'uuid' then there is another field here:
Real block type - 16 bytes. Again, this is so the block type can be expanded if necessary.

Block Data - x bytes, based on the block size. Remember, the block size includes the stuff you've already read in.

To verify that you're reading an MP4 file, the first block type in the file should be 'ftyp'.

In MP4, the Block Data can actually contain other blocks. You must understand the "block type" format in order to use this. If you don't know what the block type is, then you can skip over it using the block size. While determining the video duration, though, I generally don't care about the blocks or their sub-blocks. What I'm looking for is the block type 'moov'. This contain the movie information.

So keep going through blocks until you've found the 'moov' block. After it has been found, look for the sub-block 'mvhd'. In order to do this, just start your block search inside of the "Block Data" section of the 'moov' block. After you've found 'mvhd', the block data format is:

Standard block junk - Between 8 and 32 bytes (see above).
Version - 1 byte.
Flags - 3 bytes. Don't care about these.

If the version is 1:
Created - 8 bytes
Modified - 8 bytes
Scale - 4 bytes
Duration - 8 bytes
otherwise:
Created - 4 bytes
Modified - 4 bytes
Scale - 4 bytes
Duration - 4 bytes

Now that you have the duration, you just need to use the scale value to figure out the length:
Total Length - duration / scale

FLV
FLV Parser Python code

The FLV format is a bit different than the others in that there is no header to just grab the duration from. What you need to do is grab the last video block and just see what the timestamp is on it. So here we go:

The first 3 bytes of an FLV should read 'FLV'. From there, jump to the end of the file. You need to go backwards and find the last video block. To do that, read the final 4 bytes of the file. This tells you how big the previous block was. So you go back that amount and read the header of the block:

Tag type - 1 byte. This tells what type of block this is. A value of 9 means video (what we're looking for).
Data size - 3 bytes. The size of the block. This includes the header data.
Timestamp - 3 bytes. The time stamp of the block in ms.
Timestamp ext - 1 byte. This is actually the upper 8 bits of the timestamp value.

First of all, you need a real single time stamp value. So voltron it up between the timestamp and timestamp ext value (final timestamp= timestamp ext << 24 | timestamp). So now that you've grabbed the block header you're looking for a tag type of 9, the final video block. If you have found it, then the length of the video is:

Total Length - final timestamp / 1000

If you haven't found the video block, then the 4 bytes before the block header will tell you how big the previous block was. Skip backwards that amount and see if that's a video block. Keep doing this until you find what you're looking for.

MKV
MKV Parser Python code

The Matroska Media Container is very flexible, but kind of a pain in the ass. Here we go.

Each block starts with an EBML ID. The length of the EBML ID depends on the first nibble. Here's the craziness:

Read the first byte. If the most significant bit (MSB) is 1, then you have the proper ID.
If the MSB is zero, read another byte onto the end of your ID. Now check the next bit in the nibble you read up above. If it's 1, then stop. Otherwise read yet another byte and look at the next bit. In summery, if the nibble is 8, the number of bytes in the ID is 1. If it's 4 then the number of bytes is 2. A nibble of 2 is 3 bytes, and 1 is 4 bytes.

Good times. The first EBML ID of the entire file should be 0x1A45DFA3. If you don't see that, then you don't have an MKV.

Next is the data size of the block. This is pretty much the same thing as the ID, except that it's the first byte that denotes how long the data size is. In this case, though, don't count in the 1 as part of the data size. For example, if you see:

1000 0010

Then your data size is 2. Since there is a 1 as the MSB, you only read 1 byte. Also you mask it out so you're left with a value of 2. Hopefully this makes sense...

Finally, the block has the data portion. The data size field does not include the bytes for itself or the EBML ID, so just read data size bytes to get the block data.

So skip through blocks until you find a block with an EBML ID of 0x18538067. Once you've found it, you need to look for the proper sub-block. This part is similar to the embedded blocks in MP4...sub-blocks are found by looking into the block data itself. Find the sub-block with the EBML ID of 0x1549A966. This is the video header block.

Each field inside of the video header is actually contained in a sub-block of the header itself. So search inside of the block data of the video header for another block with the ID of 0x2ad7b1. This is the timecode value. Just read the number of bytes in the data size field to get the timecode. You also need the duration field, which is again in the video header with a block ID of 0x4489. It will either be a float or a double float depending on the data size.

Whew, so now you have the timecode and duration of the video. The video duration:
Total Length - (duration * timecode) / 1,000,000,000

AVI
AVI Parser Python code

This is probably one of the more common video formats right now. Also, one of the more annoying to parse. There are 2 types of items in this file: chunks and lists. A chunk is just some data, while a list is a collection of other chunks and lists. Whether a block is a list or chunk is determined by a 4 byte string, similar to the block type in MP4. This type is called the fourCC value. If the fourCC value is RIFF or LIST then you have a list on your hands, otherwise it's a chunk.

Since a list is the first think you'll get in an AVI file, here's the format:
Initial fourCC - 4 bytes. This will be either RIFF or LIST for a list.
Data size - 4 bytes. The total size of all of the data inside of the list.
FourCC - 4 bytes. This is the real fourCC of the list. In the case of the first list of the file, this is "AVI " (note the space at the end, to make it 4 bytes).
Data - x bytes. The amount is based on the "Data size" field.

The format for a chunk:
FourCC - 4 bytes. This is the same as above, but for a chunk this will NOT be RIFF or LIST.
Data size - 4 bytes. Again, the size of the data inside of the chunk. This value does not include these chunk header bytes.
Data - x bytes. This is the size of the "Data size" value.

The first thing in the file is a list with the fourCC of "AVI ". The data size of this list will just be the size of the file...not helpful. So start reading inside of that all-encompassing list's block data.

Inside of the "AVI " list you should find another list with a fourCC of "hdrl". Inside of the "hdrl" list is a chunk with the fourCC of "avih". This is the header you're looking for. Excellent. Let's look at the format for the data inside of "avih":

Micro-sec per frame - 4 bytes.
Max bytes per sec - 4 bytes.
Padding Granularity - 4 bytes.
Flags - 4 bytes.
Total frames - 4 bytes.
Initial frame - 4 bytes.
Number of streams - 4 bytes.
Suggested buffer size - 4 bytes.
Width - 4 bytes.
Height - 4 bytes.

For our purposes, we don't care about most of this. One thing we do care about is the number of streams. So what needs to be done is to go through all of the streams until we find the video one. So keep getting data from inside of "hdrl". You'll get a list of type "strl". This contains a description of all of the streams. For each stream inside of the "strl" list, you'll first get a header chunk:

Type - 4 bytes.
Handler - 4 bytes.
Flags - 4 bytes.
Priority - 2 bytes.
Language - 2 bytes.
Initial Frame - 4 bytes.
Scale - 4 bytes.
Rate - 4 bytes.
Start - 4 bytes.
Length - 4 bytes.
Suggested Buffer - 4 bytes.
Quality - 4 bytes.
Sample size - 4 bytes.

Again, don't care about this stuff for the most part. After this stream header, there may or may not be some number of lists related to this stream...ignore them. What you're looking for is a stream header where the Type field is "vids". This is the video header. Ah ha! So take the video stream header. The duration of the file is:

Total Length - Length / (Rate / Scale)

Hopefully this has all been helpful to you!

* If you liked this post please click on one of the ads, or donate through paypal to help support the PseudoTV developer

PseudoTV Development Blog

Wednesday, June 8, 2011

Video Formats

No comments:

Post a Comment