Projects/drmcapture·2026-04-23

A research tool I built this afternoon

I'm putting together an atlas of human emotions. The raw material lives in other people's books. I own them — on Kindle, which means I can highlight and export a few snippets, but I can't get structured data out of the text.

Searching "shame" across a shelf of books is a non-task.

What I wanted was a table. Columns: term, book, page, definition, quote. Rows: one per term × book. So I could put every author's take on shame side by side, or surface terms I didn't know to look for.

I built it this afternoon. Four stages:

  1. Drive Amazon Kindle via AppleScript. Screenshot every page.
  2. OCR with Apple Vision. Save as markdown, one ## Page NNNN section each.
  3. Hand the whole markdown file to Claude with prompt caching. Ask for rows two ways: term-driven (from a list I care about) and discovery (whatever the author defines).
  4. Merge to CSV. Generate an INDEX.md.

First book through: Tiffany Watt Smith's The Book of Human Emotions.

75kindle pages 3,732lines of OCR 76definitions ~10 minhuman time

Ten minutes of measuring the reading rectangle and keeping my hands off the keyboard during the four-minute capture loop. That's it.

What I learned

Most "impossible" integration problems are permission dialogs. DRM isn't a wall — it's a slow API whose only endpoint is your screen.

Apple Vision is excellent at rendered book text. No fine-tuning needed.

Prompt caching makes rerunning extractions with a better prompt effectively free. Claude caught that longing is defined implicitly through viraha, saudade, and hiraeth — and flagged the confidence as implicit on its own. I didn't prompt for that nuance.

What I won't do

Distribute it. Capturing DRM content sits in a gray zone even when you own the book; this only makes sense for fair-use personal research on your own library. If you want the same thing, build it — the whole pipeline is four small Python modules, Apple Vision, the Anthropic SDK, and a justfile. I'll send the design spec to anyone who asks.

What's next

More books. Expand the term list. Compare authors' definitions of the same feeling. The structured data is the real product; the tool is just a shovel.

— Oleksandr

Source: private · Design spec on request — kryklia@gmail.com