biolift exercise catalog — pipeline status

scrape · transcode · upload · index
2026-05-22  ·  biolift/feat/all  ·  Goal: .agents/goals/biolift-exercise-catalog-comprehensive.md
2089
Exercises in Firestore
+2049 vs 40-seed
1326
Animated GIFs
in /exercises/sources/
1326
Derived WebPs
in /exercises/derived/
1
Failed (retry on rerun)
transient · idempotent retry

Highlights


Pilot: hero exercises — live on Firebase Storage


Source coverage after dedup


SourceRecordsAnimated GIFsRole
hasaneyldrm/exercises-dataset13241324PRIMARY animated source
yuhonas/free-exercise-db8730 (stills only)fallback / form cues
unique after dedup20481315by canonical slug

Architecture


scripts/catalog/
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| lib/                                                |
|   admin     firebase-admin (Storage + Firestore)   |
|   slug      canonical slug + dedupe                |
|   transcode gif2webp + ffmpeg frame extraction     |
|   storage   md5-guarded idempotent uploads         |
|   manifest  resumable state (state/manifest.jsonl) |
|   pipeline  per-record download->transcode->index  |
| sources/                                            |
|   hasaneyldrm        1324 animated GIFs (primary)  |
|   free-exercise-db   873 stills (fallback)         |
| run-pilot.cjs         5 hero exercises             |
| run-all.cjs           full long-tail               |
| regenerate-bootstrap.cjs  emit mobile/utils/bootstrapData.ts |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Firebase Storage layout:
  gs://<bucket>/exercises/sources/<slug>.gif         source-of-truth
  gs://<bucket>/exercises/sources/<slug>-frame0.jpg  start-frame still
  gs://<bucket>/exercises/sources/<slug>-frame1.jpg  end-frame still
  gs://<bucket>/exercises/derived/<slug>.webp        animated WebP (shipped)

Firestore document:
  exercise_catalog/<slug> -> {
    name, category, primaryMuscles, instructions, equipment,
    imageUrl, sourceGifUrl, startFrameUrl, endFrameUrl,
    sources: ['hasaneyldrm', ...], externalIds, scrapedAt
  }
The mobile app reads Firestore exercise_catalog and follows fully-qualified storage.googleapis.com URLs — no app-side bucket configuration needed. Storage objects ship with cache-control: max-age=31536000, immutable.

How to run


# One-time
brew install webp ffmpeg
gcloud auth application-default login
gcloud storage buckets create gs://<BUCKET> --project=<PROJECT> --location=us-central1
gcloud storage buckets add-iam-policy-binding gs://<BUCKET> \
  --member=allUsers --role=roles/storage.objectViewer

# Pilot
BIOLIFT_CATALOG_BUCKET=<BUCKET> NODE_PATH=functions/node_modules \
  node scripts/catalog/run-pilot.cjs

# Long-tail (~10-15 min, resumable)
BIOLIFT_CATALOG_BUCKET=<BUCKET> NODE_PATH=functions/node_modules \
  node scripts/catalog/run-all.cjs --concurrency=8

# Regenerate the app's seed data
BIOLIFT_CATALOG_BUCKET=<BUCKET> NODE_PATH=functions/node_modules \
  node scripts/catalog/regenerate-bootstrap.cjs