Production2025

AI Chat

Production RAG chatbot with Membrain CRM integration, privacy-first architecture (GDPR/CCPA compliant), FAISS vector search, and Railway deployment.

33MB vs 420MB

Embedding Size

10x faster

Inference Speed

Doc Formats

$5–20/mo

Hosting Cost

Architecture

Flask backend with FAISS + FastEmbed (ONNX-based) for vector search
Multi-LLM support (OpenAI GPT + Groq) with automatic fallback
PostgreSQL for persistent document storage surviving Railway redeployments
Documents stored as BYTEA, FAISS index serialized via faiss.serialize_index(), chunks as JSONB
Progressive conversation stages (Anonymous → Engaged → Qualified → Captured) control CTA timing
Keyword-based CRM qualification (pricing, demo, quote — 2+ signals threshold) keeps PII local

FastEmbed (ONNX) over sentence-transformers

Why: 33MB model vs 420MB, 10x faster inference, no PyTorch dependency

Tradeoff: Fewer model options, but embedding quality is sufficient for RAG

Keyword-based qualification over LLM intent detection

Why: Privacy requirement — no PII sent to external LLM APIs

Tradeoff: Less accurate intent detection, but predictable and GDPR-safe

FlaskFAISSFastEmbedMembrain CRMPostgreSQLRailway

FastEmbed (ONNX runtime) is a much better choice than sentence-transformers for production — smaller, faster, no PyTorch.
Keyword-based qualification is crude but predictable and privacy-safe; LLM-based detection would require sending conversation content to external APIs.
Railway's postgres:// to postgresql:// URL conversion is a common gotcha that needs auto-handling.