pyid3lib 0.5

project home page

This is a short tutorial on using the pyid3lib module. For instructions on installing it, see the README file included in the distribution.

Getting started

You start by using the tag function to create a tag object for a given MP3 file.

>>> import pyid3lib
>>> x = pyid3lib.tag( 'track01.mp3' )
>>> x
<pyid3lib.tag object at 0x8155b70>
>>>

You can read and change the data contained in this object. To write any changes back into the original MP3 file, use the update() method.

>>> x.update()
>>>

There are two ways to access the data: the basic way, and the advanced way.

The basic way

The most common pieces of tag data can be accessed via attributes on the tag object. For example:

>>> x = pyid3lib.tag( 'track01.mp3' )
>>> x.artist
'Aphex Twin'
>>> x.title
'Jynweythek'
>>> x.year
2001
>>> x.track
(1, 15)
>>>

You can also assign new values to them:

>>> x.artist = 'Meat Beat Manifesto'
>>>

Don't forget that you have to call update() to actually write changes out to the file!

Most attributes require a string value, or ID3Error is raised:

>>> x.artist = 12
Traceback (most recent call last):
  File "", line 1, in ?
pyid3lib.ID3Error: 'artist' attribute must be string
>>>

Deleting an attribute (with "del x.attr") or assigning None to it deletes the corresponding piece of tag data.

Here are all the attributes that take strings:

album	artist	band
bpm	composer	conductor
contentgroup	contenttype	copyright
date	encodedby	encodersettings
fileowner	filetype	initialkey
involvedpeople	isrc	language
leadartist	lyricist	mediatype
mixartist	netradioowner	netradiostation
origalbum	origartist	origfilename
origlyricist	origyear	playlistdelay
publisher	recordingdates	size
songlen	subtitle	time
title	wwwartist	wwwaudiofile
wwwaudiosource	wwwcommercialinfo	wwwcopyright
wwwpayment	wwwpublisher	wwwradiopage

Note that artist is a synonym for leadartist — the two attributes modify the same underlying piece of data.

There are a few other attributes that get special processing. "year" is treated as an int, even though it is stored in the tag as a string. You can assign either a string or an int to it, but its value will always be returned as an int.

"tracknum" and "partinset" can be either a 1- or 2-tuple of ints. Setting tracknum to (4,17) indicates that this is track 4 of 17 on the original album. partinset functions similarly, when album is divided into several chunks of media (e.g., a double-CD album). "track" is a synonym for "tracknum".

Here are some examples of using the track attribute:

>>> x.track = '4'          # all these are equivalent ways 
>>> x.track = 4            # of saying "track 4"
>>> x.track = (4,)
>>>

>>> x.track = (4,17)       # these two are equivalent ways
>>> x.track = '4/17'       # of saying "track 4 of 17"
>>>

>>> x.track = 9            # no matter how it is set, the value
>>> x.track                # is returned as a 1- or 2-tuple of ints.
(9,)
>>> x.track = '10/12'
>>> x.track
(10, 12)
>>>

The advanced way

There are some kinds of tag data you can't access via the basic method. Most of them are pretty obscure tags that don't seem to be used much, but two important ones are the "Comments" frame and the "Attached picture" frame. (FYI, "Attached picture" is what is shown by Windows Media Player if you select the "Album art" visualization.)

To get at all the data, you have to use a different access method. First, I'll give a very brief introduction to how ID3v2 tags are structured (version 2.3 and higher, at least).

An ID3 tag consists of a header, plus one or more frames. Each frame has a four-character frame ID identifying what's stored in that frame, plus some data. There are a bunch of standardized frame IDs defined in the standard at id3.org. For instance, the "TALB" frame is used to store the name of the album that the track came from. The "TPE1" frame stores the name of the artist, and so on.

pyid3lib "tag" objects support Python's sequencing and iteration protocols. Accessing an item of this sequence gives you a dictionary with the contents of the corresponding frame. For instance:

>>> x = pyid3lib.tag( 'track01.mp3' )
>>> for i in x: print i     # iterate over all frames, printing them out
... 
{'text': 'Aphex Twin', 'textenc': 0, 'frameid': 'TPE1'}
{'text': 'Drukqs [1/2]', 'textenc': 0, 'frameid': 'TALB'}
{'text': '1/2', 'textenc': 0, 'frameid': 'TPOS'}
{'text': '2001', 'textenc': 0, 'frameid': 'TYER'}
{'text': 'Jynweythek', 'textenc': 0, 'frameid': 'TIT2'}
{'text': '1/15', 'textenc': 0, 'frameid': 'TRCK'}
{'text': '(26)', 'textenc': 0, 'frameid': 'TCON'}
{'text': '143386', 'textenc': 0, 'frameid': 'TLEN'}
>>> x[4]                    # access a single frame 
{'text': 'Jynweythek', 'textenc': 0, 'frameid': 'TIT2'}
>>> x[:2]                   # access a slice of frames 
[{'text': 'Aphex Twin', 'textenc': 0, 'frameid': 'TPE1'},
 {'text': 'Drukqs [1/2]', 'textenc': 0, 'frameid': 'TALB'}]
>>>

You can modify the tag in all the usual ways you can manipulate a list: assign to an element or slice, or via the append, extend, insert, pop, and remove methods. In each case the thing you put into the tag must be a dictionary, and the dictionary must contain a 'frameid' key whose value is a legal frame ID. (Of course, extend and slice assignment both require a sequence of legal dictionaries.)

>>> d = { 'frameid' : 'TPE1', 'text' : 'New Artist Name' }
>>> x[0] = d
>>> x.pop()
{'text': '143386', 'textenc': 0, 'frameid': 'TLEN'}
>>> [i['frameid'] for i in x]
['TPE1', 'TALB', 'TPOS', 'TYER', 'TIT2', 'TRCK', 'TCON']
>>>

The methods index and remove, which search the sequence for a value, take a frame id string as argument.

>>> i = x.index( 'TIT2' )
>>> print i
4
>>> x[i]
{'text': 'Jynweythek', 'textenc': 0, 'frameid': 'TIT2'}
>>>

It's important to remember that the dictionaries you get out of a tag object are merely copies of the frame data — modifying the dictionary does not modify the tag! To change the tag, you have to explicitly assign back into it. For instance:

>>> x.title                       # here is the track's title
'Jynweythek'
>>> d = x[x.index('TIT2')]        # access the corresponding frame
>>> d
{'text': 'Jynweythek', 'textenc': 0, 'frameid': 'TIT2'}
>>> d['text'] = 'New Title'       # modify the returned dictionary
>>> x.title                       # see? the tag data hasn't changed.
'Jynweythek'
>>> x[x.index('TIT2')] = d        # set the frame based on the modified dictionary
>>> x.title                       # now the tag data reflects the change.
'New Title'
>>>

Modifying the tag through attributes works on exactly the same data as modifying it through the sequence operations. The attributes are provided simply for convenience; it's easier to remember names like "artist" than sometimes-cryptic frame IDs like "TPE1".

Setting the value of an attribute will first go through and delete all frames of the corresponding frame ID, then append a new frame with the new value. So saying:

x.artist = 'Aphex Twin'

is roughly equivalent to:

try:
    while 1: x.remove( 'TPE1' )
except ValueError:
    pass
x.append( { 'frameid' : 'TPE1', 'text' : 'Aphex Twin' } )

What goes in the dictionary?

pyid3lib has a query function which will look up a frame ID and give you a short description, and a tuple of the keys it looks for in a dictionary (in addition to the required 'frameid') key:

>>> pyid3lib.query( 'TALB' )
(24, 'Album/Movie/Show title', ('textenc', 'text'))
>>> pyid3lib.query( 'WOAR' )
(69, 'Official artist/performer webpage', ('url',))
>>> pyid3lib.query( 'APIC' )
(2, 'Attached picture', ('textenc', 'imageformat', 'mimetype', 'picturetype', 'description', 'data'))
>>> pyid3lib.query( 'QQQQ' )
Traceback (most recent call last):
  File "", line 1, in ?
pyid3lib.ID3Error: frame ID 'QQQQ' is not supported by id3lib
>>>

The return value from query has three values. The first can be ignored; it's used internally. The second is a string with a brief description of that frame's purpose. The third is a tuple of strings with the names of individual fields of that frame. Hopefully many of these will be self-explanatory, for more information you could look at the standard.

Attached pictures

Here's how to embed a JPEG image in the tag. Assume you have the image in a file "pic.jpg":

>>> f = open( 'pic.jpg', 'rb' )
>>> d = { 'frameid' : 'APIC', 'mimetype' : 'image/jpeg',
...       'description' : 'A pretty picture.',
...       'picturetype' : 3,
...       'data' : f.read() }
>>> f.close()
>>> x.append( d )
>>>

See? Nothing to it. The value 3 that was assigned to 'picturetype' identifies the picture as the front of the album cover. For a list of all the picturetypes, see section 4.14 of the standard.

To extract an embedded picture, you do pretty much the opposite thing:

>>> d = x[x.index('APIC')]          # this finds the first embedded picture in the tag
>>> d['mimetype']
'image/jpeg'
>>> f = open( 'output.jpg', 'wb' )
>>> f.write( d['data'] )
>>> f.close()
>>>

For maximum compatibility, you should limit your pictures to JPEGs and PNGs (mimetypes "image/jpeg" and "image/png", respectively). Most software that reads picture tags will be able to support at least these two image formats (and your software should, too!)

Known issues

To be fixed before I can call it version 1.0:

Unknown frames. If you read in a tag with frames that id3lib doesn't parse, and then write it back out again, the unknown frames are dropped — even if you make no other changes to the tag data. Even this:
```
>>> x = pyid3lib.tag( 'song.mp3' )
>>> x.update()
```
causes unknown frames to be stripped from the tag. This is a limitation of the underlying id3lib library.
Unicode support. There is none to speak of yet. All string data is assumed to be ISO-8859-1 (i.e., extended ASCII). Accordingly, the "textenc" field of all frames should be set to zero.