I have a need for an implementation of the Knuth-Morris-Pratt algorithm (or similar) handling streaming input. I’m pretty sure I’ve implemented this previously and thought I might my find my code in an old .zip file I’d sent someone in e-mail.
But Gmail refuses a direct download of the attachment, I guess due to it containing a .jar file. I was able to get the full contents of the e-mail via the “Download Original” link though. This gave me a .eml file.
Peeking at the contents of it, I could see it had a pretty simple format:
...
Date: Mon, 9 Aug 2004 22:59:46 -0500
...
Subject: Fwd: latest version of applet
...
Mime-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_97_19043715.1092110386307"
...
------=_Part_97_19043715.1092110386307
Content-Type: application/zip; name=SearchToHTML.zip; x-unix-mode=0644
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="SearchToHTML.zip"
UEsDBAoAAAAAAE...
...NQA9EAAA50wCAAAA
------=_Part_97_19043715.1092110386307--
I’m pretty sure there are existing tools to extract these attachments. In fact, I would’ve used Apple’s Mail program to extract it, except it annoyed me by demanding that I set up an e-mail provider before opening the file.
I decided writing a parser for .eml files would be a good exercise for getting back into Go a bit.
You can see the results here: https://github.com/fadend/eml.
I was a little surprised how many bumps I hit along the way; I wrote a fairly large amount of Go over my time at Google. On the other hand, I did this sporadically so that there was a lot of forgetting between each round.
I’m reading Ricardo Gerardi‘s Powerful Command-Line Applications in Go, which was helpful; I happened across it at random while looking for something else in the library. I’m enjoying it, and despite the bumps, this time I had a lot of fun working with Go. I’ve shared one other (much larger) commandline utility using Go before: https://github.com/fadend/go-photos, but now I’ll try to keep up with it, experimenting more with using it for commandline utilities where I probably would’ve used Python otherwise. I also have some server use cases in mind too.
I did extract that attachment:
% ./eml_dump --input_eml $HOME/Downloads/search_to_html_latest.eml --output_dir $HOME/Downloads/searchtohtml_attachment
And the KMP code isn’t there 🙂 Doh.
//XXX! If this step were replaced with a look up table,
//SearchSieves would roughly implement the Knuth-Morris-Pratt
//algorithm...
Looking at this old stuff, I can see I’ve definitely learned a lot since then. (See https://revfad.com/SearchToHTML/ if you’re curious, which started as a search engine for the high school paper I founded.)
Looking forward to learning more.
Leave a Reply