Input Pipeline
Audio naar transcript naar gestructureerde notities — dagelijks
(Engelstalig origineel — Nederlandse vertaling volgt.)
What it does
The input pipeline is the data backbone behind the rest of the OpenClaw + Papegaai stack. Every day it runs end-to-end:
- Capture — voice memos, conversation recordings, and dictated notes land in a watched bucket.
- Language-detect + transcribe — a 60-second sample drives Whisper language detection, then full transcription in the detected code (no defaulting to Dutch).
- Structure — the LLM extractor emits zero-or-more typed items per topic — fleeting note, todo, CRM fact — using a single shared prompt across all sources.
- Redact — a pipeline-side pass strips PII before anything is forwarded.
- Route to KBs — extracted items are routed to the relevant knowledge bases by topic.
Architecture
- Workhorse VPS —
178.104.129.209is the host. LLM calls, Postgres, scripts, no agents. - GCS archiving — every raw artefact is archived to Google Cloud Storage so re-extraction is always possible.
- Daily 06:17 sync — KB rsync pulls processed notes locally for offline browsing.
Status
Deployed and in iteration. GCS archiving + KB sync are live. Used in production every day.