How Fluidworks Keeps Hallucinations < 5 %

Jose Kuttan

Aug 27, 2025

At Fluidworks, factual accuracy is a product requirement, not an afterthought. Our agent runs through layered defenses—anytime confidence drops, it reformulates, asks for clarification, or hands off to a human.

1. Retrieval-First Architecture

Grounding in data

  • Every doc, call transcript, and UI screenshot you upload is chunk-indexed with pgvector on GCP Postgres.

  • The agent retrieves top-k passages (semantic similarity + recency bias) and assembles a context window.

  • The LLM must cite which chunks it used. If none pass the relevance threshold, it replies “I’m not sure” instead of guessing.

Why it matters: Most hallucinations come from ungrounded prompts. Retrieval-first ensures the model rarely generates in a vacuum.

2. Model Ensemble Voting

  • Primary model (e.g., GPT-4, Gemini 2 Pro) drafts an answer.

  • Verifier models (lighter Mistral-class) produce their own.

  • If ≥ 2 models agree (0.8+ cosine similarity), we proceed; otherwise confidence drops, triggering a rerun or user clarification.


    Why it matters: LLMs hallucinate independently. Ensembles make simultaneous hallucinations unlikely.

3. Guardrail Filtering

  • Rule-based layer: Blocks unsupported claims (“guaranteed ROI”), speculative numbers, or PII leaks.

  • Regex + semantic filters: Catch brand-new entities not in your KB, forcing a grounding pass.

  • Safe complete: Strips or replaces risky output with clarifying fallbacks.


Why it matters: Even correct answers can over-promise or leak. Guardrails enforce compliance and brand safety.

4. User Flagging & Human-in-the-Loop

  • Inline “Report Issue” button in every transcript.

  • Triage queue: issues appear in a JIRA-like board within minutes.

  • Flagged turns auto-feed into nightly fine-tune jobs and prompt updates.


    Why it matters: Edge cases surface fastest in the wild. A 24-hour SLA keeps the feedback loop tight.

5. Continuous Evaluation & Shadow Tests

  • Shadow prompts: 10 % of live turns replay on a staging model for deviation checks.

  • Monthly red-team sweeps: Adversarial prompts (e.g., “Invent an ROI metric”) test resilience. Failures trigger new guardrails.

  • Metric tracked: Hallucination Incident Rate = (confirmed ÷ total) × 100.

    Last 90-day rolling avg: 0.74 %.


When Confidence Drops

  • Explicit disclaimer: “I’m not 100 % sure—want me to connect you to a teammate?”

  • Human handoff: Escalate to live chat or Slack.

  • Auto-log: If no one’s available, a ticket is filed with full context.

Takeaway

By layering grounding, model voting, guardrails, and human review, Fluidworks keeps hallucinations both rare and recoverable , delivering onboarding you can trust.

How Fluidworks Keeps Hallucinations < 5 %

Jose Kuttan

Aug 27, 2025

At Fluidworks, factual accuracy is a product requirement, not an afterthought. Our agent runs through layered defenses—anytime confidence drops, it reformulates, asks for clarification, or hands off to a human.

1. Retrieval-First Architecture

Grounding in data

  • Every doc, call transcript, and UI screenshot you upload is chunk-indexed with pgvector on GCP Postgres.

  • The agent retrieves top-k passages (semantic similarity + recency bias) and assembles a context window.

  • The LLM must cite which chunks it used. If none pass the relevance threshold, it replies “I’m not sure” instead of guessing.

Why it matters: Most hallucinations come from ungrounded prompts. Retrieval-first ensures the model rarely generates in a vacuum.

2. Model Ensemble Voting

  • Primary model (e.g., GPT-4, Gemini 2 Pro) drafts an answer.

  • Verifier models (lighter Mistral-class) produce their own.

  • If ≥ 2 models agree (0.8+ cosine similarity), we proceed; otherwise confidence drops, triggering a rerun or user clarification.


    Why it matters: LLMs hallucinate independently. Ensembles make simultaneous hallucinations unlikely.

3. Guardrail Filtering

  • Rule-based layer: Blocks unsupported claims (“guaranteed ROI”), speculative numbers, or PII leaks.

  • Regex + semantic filters: Catch brand-new entities not in your KB, forcing a grounding pass.

  • Safe complete: Strips or replaces risky output with clarifying fallbacks.


Why it matters: Even correct answers can over-promise or leak. Guardrails enforce compliance and brand safety.

4. User Flagging & Human-in-the-Loop

  • Inline “Report Issue” button in every transcript.

  • Triage queue: issues appear in a JIRA-like board within minutes.

  • Flagged turns auto-feed into nightly fine-tune jobs and prompt updates.


    Why it matters: Edge cases surface fastest in the wild. A 24-hour SLA keeps the feedback loop tight.

5. Continuous Evaluation & Shadow Tests

  • Shadow prompts: 10 % of live turns replay on a staging model for deviation checks.

  • Monthly red-team sweeps: Adversarial prompts (e.g., “Invent an ROI metric”) test resilience. Failures trigger new guardrails.

  • Metric tracked: Hallucination Incident Rate = (confirmed ÷ total) × 100.

    Last 90-day rolling avg: 0.74 %.


When Confidence Drops

  • Explicit disclaimer: “I’m not 100 % sure—want me to connect you to a teammate?”

  • Human handoff: Escalate to live chat or Slack.

  • Auto-log: If no one’s available, a ticket is filed with full context.

Takeaway

By layering grounding, model voting, guardrails, and human review, Fluidworks keeps hallucinations both rare and recoverable , delivering onboarding you can trust.