Back to Portfolio
Production2025

AI Chat

Production RAG chatbot with Membrain CRM integration, privacy-first architecture (GDPR/CCPA compliant), FAISS vector search, and Railway deployment.

33MB vs 420MB
Embedding Size
10x faster
Inference Speed
5
Doc Formats
$5–20/mo
Hosting Cost

Architecture

  • Flask backend with FAISS + FastEmbed (ONNX-based) for vector search
  • Multi-LLM support (OpenAI GPT + Groq) with automatic fallback
  • PostgreSQL for persistent document storage surviving Railway redeployments
  • Documents stored as BYTEA, FAISS index serialized via faiss.serialize_index(), chunks as JSONB
  • Progressive conversation stages (Anonymous → Engaged → Qualified → Captured) control CTA timing
  • Keyword-based CRM qualification (pricing, demo, quote — 2+ signals threshold) keeps PII local

Key Decisions

FastEmbed (ONNX) over sentence-transformers

Why: 33MB model vs 420MB, 10x faster inference, no PyTorch dependency

Tradeoff: Fewer model options, but embedding quality is sufficient for RAG

Keyword-based qualification over LLM intent detection

Why: Privacy requirement — no PII sent to external LLM APIs

Tradeoff: Less accurate intent detection, but predictable and GDPR-safe

Technologies

FlaskFAISSFastEmbedMembrain CRMPostgreSQLRailway

What I Learned

  • FastEmbed (ONNX runtime) is a much better choice than sentence-transformers for production — smaller, faster, no PyTorch.
  • Keyword-based qualification is crude but predictable and privacy-safe; LLM-based detection would require sending conversation content to external APIs.
  • Railway's postgres:// to postgresql:// URL conversion is a common gotcha that needs auto-handling.