Input Pipeline

Audio naar transcript naar gestructureerde notities — dagelijks

(Engelstalig origineel — Nederlandse vertaling volgt.)

What it does

The input pipeline is the data backbone behind the rest of the OpenClaw + Papegaai stack. Every day it runs end-to-end:

Capture — voice memos, conversation recordings, and dictated notes land in a watched bucket.
Language-detect + transcribe — a 60-second sample drives Whisper language detection, then full transcription in the detected code (no defaulting to Dutch).
Structure — the LLM extractor emits zero-or-more typed items per topic — fleeting note, todo, CRM fact — using a single shared prompt across all sources.
Redact — a pipeline-side pass strips PII before anything is forwarded.
Route to KBs — extracted items are routed to the relevant knowledge bases by topic.

Workhorse VPS — 178.104.129.209 is the host. LLM calls, Postgres, scripts, no agents.
GCS archiving — every raw artefact is archived to Google Cloud Storage so re-extraction is always possible.
Daily 06:17 sync — KB rsync pulls processed notes locally for offline browsing.

Deployed and in iteration. GCS archiving + KB sync are live. Used in production every day.