MHTML is a handy format for archiving pages. In Chrome, you can get it by selecting “Save Page As…” and choosing “Webpage, single file” as the output format. In a Chrome extension, you can get it via chrome.pageCapture.saveAsMHTML, which is what I use in the gg_download extension I’ve been hacking on.
In the project I’m using it for, that extension mostly worked but failed for 13 pages (I’m guessing independent of anything to do with each page) in a way that I believe should be impossible: “Tab navigated before capture could complete.” Probably my fault somehow, but it led me to start looking at the Chromium source code: chrome/browser/extensions/api/page_capture/page_capture_api.cc.
I didn’t want to run the full capture via the Chrome extension again. I could’ve pretty easily manually downloaded each of the pages using the GUI in probably under 20 minutes, but instead, I decided it would be fun to try using Apple Script for it instead.
I have a bit of a soft spot for Apple Script. It’s different and a little fun.
Also, a hero of mine, Jens Alfke , who I first got to know of through the Apple Java developer mailing list (no longer online?), was “one of the creators of Apple’s Script Editor” (Archive) (One regret I have is that when I saw him in person at WWDC as a college student, I was too shy to say hello.) He talks more about his work on Apple Script in his post Jens’s Tangled Job History (Archive).
And in some forum — LiveJournal? — someone offered a Gmail signup in exchange for writing a little Apple Script code. That’s how I got my Gmail address.
Here’s the code I came up with to complete my task, spending a lot of time with Apple’s documentation and conversing with Gemini (though Gemini did lie to me a few times). This forum post helped too: AppleScript. Split the string into parts by the separator character.
One problem with this though: Chrome doesn’t currently support MHTML as an output type for the “save” action:
You can also see that in the source code here: chrome/browser/ui/cocoa/applescript/tab_applescript.mm
content::SavePageType savePageType = content::SAVE_PAGE_TYPE_AS_COMPLETE_HTML;
if (saveType) {
if ([saveType isEqualToString:@"only html"]) {
savePageType = content::SAVE_PAGE_TYPE_AS_ONLY_HTML;
} else if ([saveType isEqualToString:@"complete html"]) {
savePageType = content::SAVE_PAGE_TYPE_AS_COMPLETE_HTML;
} else {
AppleScript::SetError(AppleScript::Error::kInvalidSaveType);
return;
}
}
So, I fixed that, at least in a locally built copy of Chromium. It was really just a two line change to add it, but I had a lot of fun. (Now I’ll just have to see if I can get someone to approve it. It has at least worked well enough for me to fetch those last 13 last pages that I was missing before though.)
Here’s a guide on contributing in case you want to give it a try too: Contributing to Chromium.
This was my first time trying to contribute something to Chrome from outside Google, but I also added a thing or two while inside too like https://codereview.chromium.org/25752007. (What I found when I searched just now.)
Leave a Reply