amoOS
AI-powered personal operating system with RAG-based semantic search, distributed microservices on Cloudflare edge, and a Telegram bot for natural-language knowledge capture.
Architecture
- Cloudflare Pages (Next.js 16 + React 19) frontend, Cloudflare Workers gateway (Hono.js) with KV sessions, Railway FastAPI backend
- Three-layer memory model: personal KB (Layer 1), project working memory (Layer 2), episodic activity context (Layer 3)
- PostgreSQL + pgvector (384-dim BGE vectors) for similarity search, Neo4j for relationship traversal
- 10-step document ingestion: upload → load → semantic chunk → embed → LLM entity extraction → hash dedup → entity consolidation → graph sync → relationship detection → 3D visualization
- Tiered LLM strategy: GPT-4o for PRDs, gpt-4o-mini for chat, Groq llama-3.3-70b for planning, local embeddings
- Content-hash caching on LLM calls reduced costs ~60% by skipping unchanged documents
Key Decisions
Cloudflare free tier + Railway (~$10) over AWS/GCP
Why: Keep infrastructure under $15/month for a personal tool
Tradeoff: Limited compute on free tier, but sufficient for single-user workload
Local BGE-small embeddings over OpenAI ada-002
Why: Eliminate per-request embedding costs entirely
Tradeoff: 384-dim vs 1536-dim — lower dimensionality, but retrieval quality is adequate for personal KB size
Semantic chunking (800 tokens, 150 overlap) with LLM boundary detection
Why: Produces better retrieval results than naive fixed-size chunking
Tradeoff: 3–5x slower ingestion, but ingestion is a background task
Technologies
What I Learned
- SSE is better than WebSocket for LLM streaming — works through CDNs, auto-reconnects, and is simpler to implement.
- pgvector is sufficient for under 100K chunks; Neo4j adds value for relationship traversal but not similarity search.
- SQLAlchemy PostgreSQL enums are case-sensitive, which caused subtle bugs during migration.