Papegaai.ai
ENNL

Back to all cases

Input Pipeline

Audio to transcript to structured notes — runs daily

Category
Data pipelines
Status
active
Phase
iteration

What it does

The input pipeline is the data backbone behind the rest of the OpenClaw + Papegaai stack. Every day it runs end-to-end:

  1. Capture — voice memos, conversation recordings, and dictated notes land in a watched bucket.
  2. Language-detect + transcribe — a 60-second sample drives Whisper language detection, then full transcription in the detected code (no defaulting to Dutch).
  3. Structure — the LLM extractor emits zero-or-more typed items per topic — fleeting note, todo, CRM fact — using a single shared prompt across all sources.
  4. Redact — a pipeline-side pass strips PII before anything is forwarded.
  5. Route to KBs — extracted items are routed to the relevant knowledge bases by topic.

Architecture

  • Workhorse VPS178.104.129.209 is the host. LLM calls, Postgres, scripts, no agents.
  • GCS archiving — every raw artefact is archived to Google Cloud Storage so re-extraction is always possible.
  • Daily 06:17 sync — KB rsync pulls processed notes locally for offline browsing.

Status

Deployed and in iteration. GCS archiving + KB sync are live. Used in production every day.


Schedule 15-min triage Back to all cases