Back to Portfolio
Production2025
AI Chat
Production RAG chatbot with Membrain CRM integration, privacy-first architecture (GDPR/CCPA compliant), FAISS vector search, and Railway deployment.
33MB vs 420MB
Embedding Size
10x faster
Inference Speed
5
Doc Formats
$5–20/mo
Hosting Cost
Architecture
- Flask backend with FAISS + FastEmbed (ONNX-based) for vector search
- Multi-LLM support (OpenAI GPT + Groq) with automatic fallback
- PostgreSQL for persistent document storage surviving Railway redeployments
- Documents stored as BYTEA, FAISS index serialized via faiss.serialize_index(), chunks as JSONB
- Progressive conversation stages (Anonymous → Engaged → Qualified → Captured) control CTA timing
- Keyword-based CRM qualification (pricing, demo, quote — 2+ signals threshold) keeps PII local
Key Decisions
FastEmbed (ONNX) over sentence-transformers
Why: 33MB model vs 420MB, 10x faster inference, no PyTorch dependency
Tradeoff: Fewer model options, but embedding quality is sufficient for RAG
Keyword-based qualification over LLM intent detection
Why: Privacy requirement — no PII sent to external LLM APIs
Tradeoff: Less accurate intent detection, but predictable and GDPR-safe
Technologies
FlaskFAISSFastEmbedMembrain CRMPostgreSQLRailway
What I Learned
- FastEmbed (ONNX runtime) is a much better choice than sentence-transformers for production — smaller, faster, no PyTorch.
- Keyword-based qualification is crude but predictable and privacy-safe; LLM-based detection would require sending conversation content to external APIs.
- Railway's postgres:// to postgresql:// URL conversion is a common gotcha that needs auto-handling.