A research tool I built this afternoon
I'm putting together an atlas of human emotions. The raw material lives in other people's books. I own them — on Kindle, which means I can highlight and export a few snippets, but I can't get structured data out of the text.
Searching "shame" across a shelf of books is a non-task.
What I wanted was a table. Columns: term, book, page, definition, quote. Rows: one per term × book. So I could put every author's take on shame side by side, or surface terms I didn't know to look for.
I built it this afternoon. Four stages:
- Drive Amazon Kindle via AppleScript. Screenshot every page.
- OCR with Apple Vision. Save as markdown, one
## Page NNNNsection each. - Hand the whole markdown file to Claude with prompt caching. Ask for rows two ways: term-driven (from a list I care about) and discovery (whatever the author defines).
- Merge to CSV. Generate an
INDEX.md.
First book through: Tiffany Watt Smith's The Book of Human Emotions.
Ten minutes of measuring the reading rectangle and keeping my hands off the keyboard during the four-minute capture loop. That's it.
What I learned
Most "impossible" integration problems are permission dialogs. DRM isn't a wall — it's a slow API whose only endpoint is your screen.
Apple Vision is excellent at rendered book text. No fine-tuning needed.
Prompt caching makes rerunning extractions with a better prompt effectively free. Claude caught that longing is defined implicitly through viraha, saudade, and hiraeth — and flagged the confidence as implicit on its own. I didn't prompt for that nuance.
What I won't do
Distribute it. Capturing DRM content sits in a gray zone even when you own the book; this only makes sense for fair-use personal research on your own library. If you want the same thing, build it — the whole pipeline is four small Python modules, Apple Vision, the Anthropic SDK, and a justfile. I'll send the design spec to anyone who asks.
What's next
More books. Expand the term list. Compare authors' definitions of the same feeling. The structured data is the real product; the tool is just a shovel.
— Oleksandr
Source: private · Design spec on request — kryklia@gmail.com