Wednesday, November 17, 2010

Podcast file name convention

What's in a name? If you're publishing a podcast, the filename used makes a difference. File names must be unique, otherwise your subscribers will overwrite older shows that they may not have listened to yet. The file name should distinguish the show from other subscriptions the subscriber may have, much for the same issue. So, don't use 'episode'. Someone else may do that. Better, is something from the name itself. All In The Mind becomes aim, for example. And this show-unique bit should come first in the name, so the subscriber, using a sorted list, can see all your shows together.

After the show title part of the name should come a numeric part that makes each episode unique. One way to do that is to number them. The first show could be '1', and the second could be '2'. But, such numbers should have leading zeros so that in a sorted list, the lexicographic sort order also is a sequence sort order. So, use '01', at least, so that the first nine episodes sort properly with the tenth. It may be arrogance or optimism to use '001' or '0001' for your first show, suggesting that the expectation is over a hundred or thousand shows. But there are plenty of shows out there with more than one hundred episodes already. And some monthly shows are getting close.

Another way to do this is to encode the date. Some shows use a 2 digit year, 2 digit month, and 2 digit day. For example, 100823 is 2010, August 23rd. This has the advantage that the lexicographic sorting is also the date order. And the sequence won't break for another 90 years. Of course, a 4 digit year such as 20100823 also sorts properly, and won't break sort order for nearly eight thousand years. Either is fine. But i find that the four digit year is easier for a human to read. That is, while one hopes that it's a date, and one hopes that it's in the form of year, month, day for sorting, one must still guess that 10 is the year and not October. Dates come in all the permutations of order, in different cultures. IMO, the military gets it right with YYYYMMDD. While 2010Aug23 may be easier for a human to read, it fails the sort order requirement, and is therefore unacceptable.

Underscores are optional in filenames. aim100823.mp3 is OK. But they must be consistent. You can't use aim100823.mp3 one week and aim_100830.mp3 the next week. This error breaks the sorting order. Best to name these things with a script. Does it matter if the file name is the recording date or publish date? Probably not. There should be a publishing script that gets all the RSS details right. If there is, it could get the file name right as one of those details.

Speaking of underscores, are there characters that should not go into file names? Yes. No colons (:), no slashes (/), and no backslashes (\), because these characters are directory separators on various operating systems. But really, one should stick to alphanumerics, hyphen (-) and underscore. In command line environments, (parenthesis), dots (.), quotes ("'`), brackets (<{[]}>), pipes (|) and so on (~!@#$%^&*+=;?) can all be interpreted, making it difficult (but almost never impossible) to cope. Simply avoid these.

After the sequence number or date, a very brief description of the show may be included. This information can very easily be included in id3 tags within the file - and they should be there. But one or two words will often help the subscriber. Don't make it too long. Windows may have long filenames but DOS does not. And, like it or not, there are mp3 players out there that have 8.3 filenames. So long file names show up as micros~1.mp3 on these players.

What can be included as text within mp3 files? Some of the shows i listen to have complete transcripts. It's incredible.

What if you got it wrong? Should one rename old shows? Absolutely not. Once you've made an error, changing an old filename risks having thousands of podcasting software suites download these old shows again.

This podcast filename convention should also work for any other RSS published material, such as a blog. However, for blogs, the file name length does not have to observe the 8.3 convention. Short file names have mostly gone the way of the dinosaurs. You do use Rock Ridge extensions on your CDs, right?

No comments: