← Implausible Enterprises

Backyard Bestiary

A wildlife camera system built for a seven-year-old naturalist LIVE

Every morning, a seven-year-old checks what visited the backyard overnight. The Backyard Bestiary is a motion-triggered wildlife camera system built entirely from an old smartphone and a home server — no dedicated hardware required. It turns a suburban backyard into a continuously monitored nature station, with AI-assisted species identification to help catalog whatever wanders through.

The project started from a single motivation: nurture a kid's love of animals by making the invisible visible. Raccoons, possums, birds, and occasional surprises now have a record. The archive grows nightly.

~3,000
Total captures
46%
Pass motion filter
74%
Dedup rate
107
Curated keepers

Architecture

The system is deliberately split: a dumb sensor at the edge, all intelligence in the server. An old iPhone, propped in a window, runs a browser-based motion detection page. The browser accesses the camera via getUserMedia, captures frames to a canvas, and runs pixel-diff motion detection entirely in JavaScript. When a threshold is crossed, it POSTs the capture to a FastAPI server running on the home server. The phone does nothing else.

Phone (browser) Home Server (FastAPI) getUserMedia → canvas ┌─────────────────────────────┐ frame-diff motion detect ──▶ │ capture storage │ configurable sensitivity │ gallery + lightbox viewer │ cooldown timer │ curate.py pipeline │ wake lock + stream recovery │ SQLite classification db │ └─────────────────────────────┘ │ Gemini Vision API (species classification)

The server stores captures in date-bucketed directories, serves a gallery with lightbox viewer and date navigation, and runs a three-stage curation pipeline on demand. The setup page generates a QR code so onboarding a new phone takes about thirty seconds.

The Curation Pipeline

Raw motion captures are noisy — a blowing leaf triggers the same event as a raccoon. The pipeline filters aggressively before any AI classification runs:

  1. Motion score filter — captures below 3.0% pixel change are discarded as ambient noise. About 46% of raw captures survive this stage.
  2. Perceptual hash deduplication — captures within a two-minute window are compared by image hash. Near-duplicates (Hamming distance ≤8) are dropped, keeping only one frame per animal visit. This removes 74% of the survivors — otherwise you get five hundred photos of the same raccoon walking across the frame.
  3. Gemini Vision classification — the remaining captures go to Gemini for species identification. Results, confidence scores, and curator decisions are stored in a SQLite database.

The choice of Gemini over other vision models was deliberate: for wildlife recognition, especially in low-light conditions, model selection matters significantly. Claude's vision was tested and found weaker for this specific task. The right tool for the job.

Nighttime Capture

The hardest engineering problem turned out to be keeping an iOS browser tab running a camera stream through the night. iOS power management suspends getUserMedia video streams after extended idle periods — the tab stays alive, the gallery polling continues, but the camera track goes silent. This created a gap where the system appeared healthy but was capturing nothing.

Stream Recovery System

Two independent signals monitor stream health:

Both paths call recoverStream(), which releases the old track and re-calls getUserMedia with exponential backoff (2s → 60s cap). Every recovery event is logged server-side for monitoring.

Design Decisions

Using a phone instead of a dedicated wildlife camera was a deliberate choice: zero upfront cost, immediate start, and the ability to tune parameters in a browser rather than reflashing firmware. The known limitation is that smartphone cameras have IR-cut filters, which hurts nighttime performance. Mitigations include the ambient light of a suburban neighborhood at night, and an upgrade path to dedicated IR hardware if needed.

The sensitivity slider is inverted from the raw threshold (higher = easier to trigger), which matches the intuitive mental model of "how sensitive is the camera." A cooldown slider prevents the same moving object from generating hundreds of captures.

The Human Element

The system was built to answer a specific question a kid asks every morning: what came to visit last night? That framing shaped everything — the gallery is designed for browsing by date, the best captures are flagged by the curation pipeline, and the whole thing runs without any maintenance required during the night. The naturalist should be able to focus on the animals, not the infrastructure.

Taxonomy is a serious interest. Species identification, habitat, behavior — the Bestiary is a data source for that curiosity, not just a novelty. Over time, the classified archive becomes a record of what actually lives in and moves through one suburban backyard across seasons.

Stack

Components