I have a meeting coming up where some new members are joining a team I’ve been in for some time. The facilitator has asked us to introduce ourselves, “Pecha Kucha” style, with one-minute-per-slide pictures. It’s a big team, so we only get three slides each. I’m an obsessive logger of everything I read, watch, and post — so I thought a handy way to pictorialise me is with “photo mosaics” or “photo collages”. A table of thumbnails of movies I’ve seen, covers of books I’ve read, and Instagram posts of cocktails I’ve made (and enjoyed) summarises three of my hobbies nicely.
Movies are easy — I use Letterboxd for my movie logging, and they have a beautiful stats view, year-by-year. Screen-shots of the mosaics for the last two-and-a-half years of my viewing with a quick paste into Gimp do the trick (side-bar: I’ve watched more movies in 9 months of 2021 than I watched all 2020!).
A couple of years ago, after spending a couple of weeks ordering everything on the cocktail menu at a resort in the Philippines, I wanted to recapture the cocktail magic at home. Like a number of my obsessions, this led to working through a list — in this case the International Bartenders Association 77 official cocktails. Since I was going to the trouble of buying ingredients and making the drinks, I figured I would post them to Instagram as well. A couple of years later I have a well-stocked bar, and have posted over 200 cocktails.
Making mosaics from Instagram is a little bit tricky. Being a Facebook company, Instagram likes to keep its users inside the walls of its garden. A Google search will reveal lots of web-based SaaS tools that prod the Instagram API, but they are all someone trying to make a buck, and this doesn’t seem worth handing over the bucks for. As privacy has steadily become more and more of “a thing” in the last few years, the big platforms have all had to provide some kind of “export all my stuff” feature. GDPR was a big turning point for this, and like everyone else who works in technology, I found myself finding ways to help users in my day job to get their data out a few years ago as GDPR rolled out. Instagram hides their export as a “Data Download” in the “Privacy and Security” section of their information architecture. The site offers a JSON or HTML format download and emails once it’s been compiled. I ordered mine as HTML and an email link arrived ten minutes later. The zip contains a “media” directory with a “posts” sub-directory and date based folders.
A quick search in Finder limited to “jpg” type files and the pulls all of the photos that I’ve posted to Instagram, ever, so I took these and popped them flat in a directory.
I recalled from many years the de facto standard GNU Linux tools for image manipulation: ImageMagick. Where there is GNU there is generally Homebrew ports for Mac, so I Google about and found the ImageMagick montage tool. Installing ImageMagick is easy with Homebrew:
brew install imagemagick
The montage tool accepts a raft of parameters, but my needs were pretty straightforward. 210 photos looks nice in a table of ten rows of 21 tiles, and 128 by 128 pixels is about the right size for my purposes. The
+0+0 on the geometry tells the tool to scale the images to 128x128 no matter what.
montage -geometry 128x128+0+0 -tile 21x instaimages/*.jpg insta.jpg
The output is a single jpg — just what the doctor ordered:
As I’ve written about before, Goodreads is a sad shadow of what it could have been and is a nightmare to extend now it has no API. I use web-scraping for an app I created that figures out which books on award lists and user lists are available in digital form at my local library, but to do the same thing for my to-read list, I needed to make use of the Goodreads GDPR-style export. This outputs a CSV file that includes International Standard Books Numbers (ISBN). Goodreads has book cover images, but they don’t follow a standard addressing scheme and so the only way download all of the images for books I’ve read is to visit the book detail pages and scrape. This is very slow and unreliable, so this time I thought I’d try other book cover sources.
The Open Library project has a covers “API” with a defined URL addressing scheme, e.g. https://covers.openlibrary.org/b/isbn/9780374158460-S.jpg for a small thumbnail for Jonathan Franzen’s “Freedom” (ISBN 9780374158460). I wrote a quick Node.js script, using csv/parse and nodejs-file-downloader to drop the cover images in a directory so I could process them with ImageMagick. This worked, but the hit rate on available images wasn’t where I was hoping: of 778 books, 453 had an image.
Next up, I tried Google Books API. Google also has a key-free defined URL scheme for getting book information. The response includes a thumbnail image definition. E.g. https://www.googleapis.com/books/v1/volumes?q=isbn:1925603148 will return the information for Olga Tokarczuk’s book “Flights”. The JSON record returned by the endpoint includes the path
volumeInfo.imageLinks.smallThumbnail, which points to an image URL.
I tweaked my script and set it looping through my books. At this point I noticed just how many books in my Goodreads export don’t include an ISBN — about 180 of them. “Strange”, I thought — these books aren’t that obscure. I took a look at an example and discovered I had selected the Kindle edition when I was adding it to my Goodreads log. When I examined all available editions, it became apparent that Kindle editions are usually recorded with an Amazon Standard Identification Number (ASIN) instead of an ISBN. The Google Books API does include ASIN as an available query field, but the Goodreads export doesn’t include ASINs! 30 books with ISBNs also returned no results. For these, I hunted through the editions in Goodreads until I found one with an ISBN that worked with Google Books. I didn’t feel much like manually switching editions for 200 books, so at this point I decided I’d live with the 498 that had matched on Google.
For a popular book like Hilary Mantel’s “Bring Up the Bodies” there are 140 editions on Goodreads. Of those, many have no ISBN or ASIN, and finding one with a working ISBN means expanded record after record by hand and copying and pasting the numbers into Google Books queries.
ImageMagick montage is awesome.
Book metadata is hard.