experiments.swm.cc
All experiments

Indexatron

Active

"What's in my photos?"

Started 22 February 2026 GitHub
llmpythonollamaprivacy

I have thousands of family photos. Finding specific ones is a nightmare. “That photo from the wedding with Uncle Dave” - good luck.

Cloud services can do this, but uploading family photos to third parties feels wrong. This experiment proves that locally-run LLMs can analyse photos with useful metadata extraction - no cloud required.

The hypothesis

Local LLMs can analyse family photos with useful metadata extraction.

Status: Confirmed. Now integrated with the-mcculloughs.org for automated photo analysis.

Current state

The service fetches pending uploads from the Rails app, analyses them with LLaVA, generates embeddings, and posts results back. Key learnings:

  • Context matters: Injecting photo metadata (title, caption, date, gallery) into prompts dramatically improves results
  • LLaVA > Llama 3.2 Vision: For structured JSON extraction, the smaller model is more reliable (no repetition loops)
  • Override when you know better: Use actual date_taken instead of AI guessing from visual cues
  • Defensive parsing: Vision models produce unexpected outputs; robust JSON repair is essential

The stack

  • Runtime: Ollama
  • Vision Model: LLaVA:7b (~4.7GB)
  • Embeddings: nomic-embed-text (~274MB)
  • Language: Python 3.11+ with pydantic, httpx, Pillow, Rich

What it does

Feed it a photo, get back:

  • Subject identification (people, objects, brands)
  • Scene categorisation
  • Era estimation (or override with actual date)
  • Family nickname resolution (“Mamie” -> “Isobel McCullough”)
  • 768-dimensional semantic embeddings
  • Context-aware analysis using photo metadata

Next

  • Make this a proper background service (systemd/launchd)
  • Search API - query by person, category, decade
  • Semantic search using embeddings - find similar photos