Adding support for MHTML in Chrome’s Apple Script “save” action

MHTML is a handy format for archiving pages. In Chrome, you can get it by selecting “Save Page As…” and choosing “Webpage, single file” as the output format. In a Chrome extension, you can get it via chrome.pageCapture.saveAsMHTML, which is what I use in the gg_download extension I’ve been hacking on.

In the project I’m using it for, that extension mostly worked but failed for 13 pages (I’m guessing independent of anything to do with each page) in a way that I believe should be impossible: “Tab navigated before capture could complete.” Probably my fault somehow, but it led me to start looking at the Chromium source code: chrome/browser/extensions/api/page_capture/page_capture_api.cc.

I didn’t want to run the full capture via the Chrome extension again. I could’ve pretty easily manually downloaded each of the pages using the GUI in probably under 20 minutes, but instead, I decided it would be fun to try using Apple Script for it instead.

I have a bit of a soft spot for Apple Script. It’s different and a little fun.

Also, a hero of mine, Jens Alfke , who I first got to know of through the Apple Java developer mailing list (no longer online?), was “one of the creators of Apple’s Script Editor” (Archive) (One regret I have is that when I saw him in person at WWDC as a college student, I was too shy to say hello.) He talks more about his work on Apple Script in his post Jens’s Tangled Job History (Archive).

And in some forum — LiveJournal? — someone offered a Gmail signup in exchange for writing a little Apple Script code. That’s how I got my Gmail address.

Here’s the code I came up with to complete my task, spending a lot of time with Apple’s documentation and conversing with Gemini (though Gemini did lie to me a few times). This forum post helped too: AppleScript. Split the string into parts by the separator character.

on split(s, newDelimiter)
	try
		set oldDelimiters to AppleScript's text item delimiters
		set AppleScript's text item delimiters to {newDelimiter}
		set pieces to text items of s
		set AppleScript's text item delimiters to oldDelimiters
		return pieces
	on error
		set AppleScript's text item delimiters to oldDelimiters
		return {s}
	end try
end split

on getURLFileName(urlString)
	set lastPart to item -1 of split(urlString, "/")
	set lastPart to item 1 of split(lastPart, "?")
	return item 1 of split(lastPart, "#")
end getURLFileName

display dialog "Comma-separated URLs to be saved: " default answer ""
set urlsStr to text returned of result

if urlsStr is equal to "" then
	error "No input"
end if

set downloadsDir to POSIX path of (path to downloads folder)

set counter to 0
repeat with aUrl in split(urlsStr, ",")
	set baseName to getURLFileName(aUrl)
	if (counter) > 0 then
		set randomSeconds to random number from 0 to 20
		delay 10 + randomSeconds
	end if
	tell application "Chromium"
		open location aUrl
		set t to active tab of front window
		-- At least for my locally built Chromium instance, the location is
		-- populated as expected but does nothing till we
		-- take some other actions, like stopping and reloading the tab.
		-- Reloading by itself doesn't work either.
		t stop
		t reload
		repeat until not t's loading
			delay 0.1
		end repeat
		save t in (POSIX file (downloadsDir & baseName & ".mhtml")) as "single file"
	end tell
	set counter to counter + 1
end repeat
Oops… Script Editor doesn’t seem to actually reformat the saved code; it just pretty prints it to the screen. And then the default WordPress code block is not handling even that output very nicely. So, an image for now with the code hopefully making it through via the alt text. I release this code into the public domain under https://unlicense.org/.

One problem with this though: Chrome doesn’t currently support MHTML as an output type for the “save” action:

Chrome's dictionary entry for "save"

You can also see that in the source code here: chrome/browser/ui/cocoa/applescript/tab_applescript.mm

  content::SavePageType savePageType = content::SAVE_PAGE_TYPE_AS_COMPLETE_HTML;
  if (saveType) {
    if ([saveType isEqualToString:@"only html"]) {
      savePageType = content::SAVE_PAGE_TYPE_AS_ONLY_HTML;
    } else if ([saveType isEqualToString:@"complete html"]) {
      savePageType = content::SAVE_PAGE_TYPE_AS_COMPLETE_HTML;
    } else {
      AppleScript::SetError(AppleScript::Error::kInvalidSaveType);
      return;
    }
  }

So, I fixed that, at least in a locally built copy of Chromium. It was really just a two line change to add it, but I had a lot of fun. (Now I’ll just have to see if I can get someone to approve it. It has at least worked well enough for me to fetch those last 13 last pages that I was missing before though.)

Here’s a guide on contributing in case you want to give it a try too: Contributing to Chromium.

This was my first time trying to contribute something to Chrome from outside Google, but I also added a thing or two while inside too like https://codereview.chromium.org/25752007. (What I found when I searched just now.)


Comments

One response to “Adding support for MHTML in Chrome’s Apple Script “save” action”

Leave a Reply

Your email address will not be published. Required fields are marked *