This is part 3 of my explanation of my gmail_imap (python) example library, please refer to parts 1, 2, 4.
Next we want to load brief forms of our messages so that we could, say, display a list of messages in our inbox.
gmail.messages.process("INBOX")
print gmail.messages
Unfortunately, the code is a bit too long post, so instead, pull the code for this file up here, and I’ll post the important snippets as we go along.
Reading through the code, the first weird bit is:
self.metadataExtracter = re.compile(r'(?P\d*) \(UID (?P\d*) FLAGS \((?P.*)\)\s')
What huh…?
This is an example of an incredibly powerful and incredibly irritating tool known as a regular expression parser. A full rant on what regular expressions are and why I hate them would probably run a bit long. But, to amuse myself, some quick examples drawn from around the net:
“RegExp” : “Translation”
- “/\s.*\s/” : “Match any string of characters with two spaces around it”
- “/\S.*\S/” : “Match any string of characters with two NON-space characters around it, obviously“
- “[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}" : "Match an email address ... I think"
- "^([A-Z]{3}\s?(\d{3}|\d{2}|d{1})\s?[A-Z])|([A-Z]\s?(\d{3}|\d{2}|\d{1})\s?[A-Z]{3})$” : “You don’t want to know.“
it’s basically absurd, impossible to validate, and just plain unreadable. It’s also the only way to do string parsing in a decent fashion.
The line above takes the string burped out by the IMAP server, something like:
“55 (UID 82 FLAGS (\Seen) BODY[HEADER.FIELDS (SUBJECT FROM)] {65}”
and turns it into a python dictionary, like:
{ id:’55′, uid:’82′, flags:’\\Seen’ }
to further your education, let me explain in words, what each part of the regexp does:
- (?P<id>\d*) : “create a python dictionary entry ‘id’ and fill it with the first number you find”
- \(UID (?P\d*) : “create entry ‘uid’ and fill it with the first number you find after the string ‘(UID’ “
- FLAGS \((?P<flags>.*)\)\s : “fill entry ‘flags’ with anything inside these two strings: ‘FLAGS (‘ and ‘) ‘ “.
The worst part about using regexp is that with the wrong sort of programmer around, it becomes a pissing contest where they insist with a straight face that their 180 character monstrosity is both ‘intutitive’ and ‘unlikely to fail’. And I’m the queen of France.
Back to the code,
typ, data = self.server.imap_server.search(None, '(UNDELETED)')
fetch_list = string.split(data[0])[-10:]# limit to N most recent messages in mailbox
fetch_list = ','.join(fetch_list)
imap_server.search(character-set, search_string) is decently well documented in the python library, or you can always refer to the RFC3501 docs. If reading an RFC memo doesn’t fill your heart with dread, you haven’t been doing this long enough … or too long, I dunno. In any case, search() returns a string with a list of message IDs that you need to join into a comma delimited string. Once that’s done, you can actually fetch the message excerpt data,
f = self.server.imap_server.fetch(fetch_list, '(UID FLAGS BODY.PEEK[HEADER.FIELDS (FROM SUBJECT DATE)])')
for fm in f[1]:
if(len(fm)>1):
metadata = self.parseMetadata(fm[0])
headers = self.parseHeaders(fm[1])
imap_server.fetch takes the comma delimited fetch_list we’ve prep’d before, as well as a list of the IMAP metadata and RFC822 headers we want. Note we call the headers using the BODY.PEEK flag so to not change the ‘Unread’ flag of the messages.
Once we’ve got the huge-ass array of messages in f[1] (with f[0] containing our ‘OK’,'NOT OK’ status), we loop through the fetched message string (fm) entries. The length test “len(fm) > 1″ is because the buggy Gmail IMAP implementation seems to toss back an extra ‘)’ that trips up imap_server.fetch. Now we parse the metadata contained in fm[0] and the RFC822 headers in fm[1]. With that, the rest of the code should be fairly readable. We populate a gmail_message object and toss it onto the messages list.
[...] Python Gmail IMAP : part 3 January 2010 5 [...]