RAGBOT Specifications

Physical AI Humanoid Robotics Book – Integrated RAG Chatbot System

Version: 1.0.0
Date: December 2024
Status: Production Ready

System Architecture
Data Flow
API Specifications
Vector Ingestion Pipeline
Database Schema
Security Checklist
Deployment Instructions
Rate Limiting Strategy
Maintenance Guide

System Architecture

ASCII Diagram

┌─────────────────────────────────────────────────────────────────┐
│                       FRONTEND LAYER                            │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │           Docusaurus Site + React Components              │ │
│  │  ┌──────────────────────────────────────────────────────┐  │ │
│  │  │      ChatInterface Component                         │  │ │
│  │  │  • Mode Selector (fullbook | selected)              │  │ │
│  │  │  • Message History Management                       │  │ │
│  │  │  • Source Display                                   │  │ │
│  │  │  • Real-time Streaming                              │  │ │
│  │  └──────────────────────────────────────────────────────┘  │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                              ↓
                      HTTP/HTTPS REST API
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                     BACKEND API LAYER                           │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │              FastAPI Application (main.py)               │ │
│  │  ┌──────────────────────────────────────────────────────┐  │ │
│  │  │           Routers                                    │  │ │
│  │  │  • /api/chat (POST) - Chat endpoint                 │  │ │
│  │  │  • /api/chat/history (GET) - Chat history           │  │ │
│  │  │  • /api/chat/rate (POST) - Rating system            │  │ │
│  │  │  • /api/ingest/docs (POST) - Ingest documents       │  │ │
│  │  │  • /api/ingest/status (GET) - Ingestion status      │  │ │
│  │  └──────────────────────────────────────────────────────┘  │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
         ↙                       ↓                       ↘
      [Mode Selection]    [LLM Service]         [Vector Store]
         ↙                       ↓                       ↘
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│  Mode Routing    │  │   OpenAI API     │  │  Qdrant Cloud    │
│  ┌────────────┐  │  │  (GPT-4 Turbo)   │  │  Vector Store    │
│  │ FULLBOOK   │  │  │                  │  │                  │
│  │ Mode RAG   │  │  │ • Embeddings     │  │ • Collections    │
│  │            │  │  │ • Chat Completio │  │ • Vectors        │
│  │ + Qdrant   │  │  │ • Responses      │  │ • Metadata       │
│  └────────────┘  │  └──────────────────┘  └──────────────────┘
│  ┌────────────┐  │                             ↑
│  │ SELECTED   │  │                             │
│  │ Mode Only  │  │                    [FULLBOOK MODE]
│  │ No Vector  │  │                    • Query Embedding
│  │ Search     │  │                    • Vector Search
│  └────────────┘  │                    • Top K Retrieval
└──────────────────┘
         ↓
┌──────────────────────────────────────────────────────────────────┐
│                    DATABASE LAYER                               │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │      Neon Serverless PostgreSQL                           │ │
│  │  ┌──────────────────────────────────────────────────────┐  │ │
│  │  │  Tables:                                             │  │ │
│  │  │  • user_sessions (session management)                │  │ │
│  │  │  • chat_messages (conversation history)              │  │ │
│  │  │  • ingestion_logs (document processing logs)         │  │ │
│  │  └──────────────────────────────────────────────────────┘  │ │
│  └────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

Component Overview

Component	Technology	Purpose
Frontend	React + Docusaurus	Chat UI, document viewing
Backend API	FastAPI	Request routing, business logic
LLM	OpenAI GPT-4 Turbo	Answer generation, embeddings
Vector Store	Qdrant Cloud	Document vector storage
Database	Neon PostgreSQL	Session & chat logging
Hosting	Vercel (FE) / Render/Fly (BE)	Deployment targets

Data Flow

Flow 1: FULLBOOK RAG Mode

User Query
    ↓
[ChatInterface Component]
    ↓
POST /api/chat {mode: "fullbook", query: "..."}
    ↓
[FastAPI Router: chat.py]
    ↓
[Generate Query Embedding] (OpenAI API)
    ↓
[Vector Search in Qdrant] (retrieve top-5 chunks)
    ↓
[Build Context] (combine relevant sections)
    ↓
[OpenAI Agent] (GPT-4 with retrieved context)
    ↓
[Generate Response] + [Source References]
    ↓
[Store in Neon] (session, messages, sources)
    ↓
ChatResponse {response, sources, session_id, tokens_used}
    ↓
[Display in UI] + [Show Sources]

Flow 2: SELECTED Mode

User Selects Text
    ↓
[SelectedTextBox Component]
    ↓
User Enters Query
    ↓
POST /api/chat {mode: "selected", query: "...", selected_text: "..."}
    ↓
[FastAPI Router: chat.py]
    ↓
[Skip Qdrant Search] (no vector lookup)
    ↓
[Build Context] (ONLY from selected text)
    ↓
[OpenAI Agent] (GPT-4 with selected context only)
    ↓
[Generate Response] (constrained to selection)
    ↓
[Store in Neon] (mark mode as 'selected')
    ↓
ChatResponse {response, sources: [], session_id, tokens_used}
    ↓
[Display in UI]

Flow 3: Document Ingestion

Admin Triggers Ingestion
    ↓
[ragbot-ingest.py script OR POST /api/ingest/docs]
    ↓
[Read all .md/.mdx files from /docs]
    ↓
[Extract Sections] (by headers, chunk size 500 tokens)
    ↓
[Generate Embeddings] (OpenAI text-embedding-3-small)
    ↓
[Batch Upload] (chunked, 100 vectors per batch)
    ↓
[Qdrant Upsert] (with metadata: filename, module, section, content)
    ↓
[Log to Neon] (ingestion_logs table)
    ↓
✓ Vectors Ready for Search

API Specifications

Base URL

Development: http://localhost:8000
Production: https://ragbot-api.example.com

Authentication

All endpoints are currently open (for development). For production, add API keys:

# In config.py, add:
API_KEY = os.getenv("RAGBOT_API_KEY")

# In routers, use Depends(verify_api_key)

Endpoints

1. POST `/api/chat`

Chat Request

{
  "query": "What is ROS2?",
  "mode": "fullbook",
  "selected_text": null,
  "session_id": "session_123...",
  "user_id": "user_456..."
}

Parameters:

Name	Type	Required	Description
query	string	Yes	User question
mode	enum	Yes	"fullbook" or "selected"
selected_text	string	No	Required if mode="selected"
session_id	string	No	Session UUID (auto-generated if null)
user_id	string	No	User identifier (default: "anonymous")

Response:

{
  "response": "ROS2 is the Robot Operating System version 2...",
  "session_id": "session_123...",
  "sources": [
    {
      "score": 0.92,
      "filename": "module01-ros2/02-ros2-nodes-topics.md",
      "module": "module01-ros2",
      "section": "ROS2 Fundamentals",
      "content": "ROS2 (Robot Operating System version 2) is a middleware...",
      "full_path": "module01-ros2/02-ros2-nodes-topics.md"
    }
  ],
  "tokens_used": 487,
  "mode": "fullbook"
}

Status Codes:

200 OK - Successful response
400 Bad Request - Invalid query or missing required field
500 Internal Server Error - Processing failed

2. GET `/api/chat/history/{session_id}`

Response:

{
  "session_id": "session_123...",
  "messages": [
    {
      "query": "What is ROS2?",
      "response": "ROS2 is the Robot Operating System...",
      "mode": "fullbook",
      "sources": [...],
      "created_at": "2024-12-06T10:30:00Z"
    }
  ]
}

3. POST `/api/chat/rate`

Parameters:

{
  "message_id": "msg_123...",
  "rating": 5
}

Response:

{
  "status": "success",
  "message_id": "msg_123...",
  "rating": 5
}

4. POST `/api/ingest/docs`

Request:

{
  "force_reingest": false
}

Response:

{
  "status": "success",
  "message": "Documents ingested successfully",
  "files_processed": 15,
  "chunks_created": 247,
  "vectors_stored": 247,
  "point_ids": ["123456789", "123456790", ...]
}

5. GET `/api/ingest/status`

Response:

{
  "status": "success",
  "collection": {
    "name": "robotics_docs",
    "point_count": 247,
    "vector_size": 1536
  }
}

6. GET `/api/health`

Response:

{
  "status": "healthy",
  "service": "RAG Chatbot API",
  "version": "1.0.0"
}

Vector Ingestion Pipeline

Pipeline Steps

Document Discovery
- Scan /docs recursively
- Identify .md and .mdx files
- Extract module name from path
Text Chunking
- Split by headers (H1, H2, H3)
- Chunk size: 500 tokens
- Overlap: 100 tokens
- Preserve section hierarchy
Metadata Extraction
- filename: relative file path
- module: extracted from folder structure
- section: header/title
- content: chunk text (up to 1000 chars for preview)
- full_path: full relative path
- created_at: ingestion timestamp
Embedding Generation
- Model: text-embedding-3-small
- Vector dimension: 1536
- Batch size: 25 texts per request
- Rate limit: 0.1s between batches
Vector Storage
- Upload to Qdrant Cloud
- Batch size: 100 vectors per upsert
- Payload: full metadata
- Distance metric: COSINE
Logging
- Record in ingestion_logs table
- Status: success/failed
- Error message if failed

Configuration

# In config.py
chunk_size: int = 500
chunk_overlap: int = 100
top_k_results: int = 5
qdrant_vector_size: int = 1536

Database Schema

Table: `user_sessions`

CREATE TABLE user_sessions (
    id VARCHAR PRIMARY KEY,
    user_id VARCHAR NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    metadata TEXT,
    
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at)
);

Purpose: Track user sessions and conversation context

Table: `chat_messages`

CREATE TABLE chat_messages (
    id VARCHAR PRIMARY KEY,
    session_id VARCHAR NOT NULL,
    user_query TEXT NOT NULL,
    bot_response TEXT NOT NULL,
    mode VARCHAR DEFAULT 'fullbook',
    selected_text TEXT,
    source_sections TEXT,  -- JSON array
    tokens_used INTEGER,
    created_at TIMESTAMP DEFAULT NOW(),
    user_rating INTEGER CHECK (user_rating >= 1 AND user_rating <= 5),
    
    INDEX idx_session_id (session_id),
    INDEX idx_mode (mode),
    INDEX idx_created_at (created_at)
);

Purpose: Store conversation history and enable analytics

Table: `ingestion_logs`

CREATE TABLE ingestion_logs (
    id VARCHAR PRIMARY KEY,
    doc_path VARCHAR NOT NULL,
    chunks_created INTEGER NOT NULL,
    vectors_stored INTEGER NOT NULL,
    status VARCHAR DEFAULT 'success',
    error_message TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    
    INDEX idx_doc_path (doc_path),
    INDEX idx_created_at (created_at)
);

Purpose: Track document ingestion operations for debugging

Security Checklist

✓ Implementation Status

Environment variables for secrets (.env)
CORS configuration
Rate limiting support (configurable)
Database password encryption ready
API key injection ready
Secure headers middleware ready

🔒 Recommended Security Measures

For Production:

API Authentication

# Add to config.py
API_KEY = os.getenv("RAGBOT_API_KEY")

# Add to main.py
from fastapi.security import HTTPBearer
security = HTTPBearer()

# Use in routers
@router.post("/api/chat")
async def chat(request: ChatRequest, credentials: HTTPAuthCredential = Depends(security)):
    verify_api_key(credentials.credentials)

Database Security
- Use connection pooling (✓ configured in db.py)
- Neon provides SSL by default
- Add IP whitelisting at database level

Rate Limiting

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter

@router.post("/api/chat")
@limiter.limit("10/minute")
async def chat(...):
    pass

Input Validation
- All Pydantic models with validation ✓
- Query length limit: 2000 chars
- Response length limit: 5000 chars
HTTPS/TLS
- Enforce in frontend
- Backend behind reverse proxy
- Certificate management via platform

Logging & Monitoring

import logging
logger = logging.getLogger(__name__)
logger.info(f"Chat request from {user_id}")

Data Privacy
- Chat history encrypted at rest (Neon)
- GDPR compliance for user data
- Data retention policy: 90 days

🚨 Secrets Management

# .env (local development only)
OPENAI_API_KEY=<REDACTED>
QDRANT_API_KEY=<REDACTED>
DATABASE_URL=postgresql://...

# Production (use platform secrets):
# Vercel: Settings → Environment Variables
# Render/Fly: Secrets management UI

Deployment Instructions

Prerequisites

Python 3.9+
Node.js 18+
Git
Accounts: OpenAI, Qdrant Cloud, Neon, Vercel, Render/Fly

Step 1: Prepare Backend Environment

# Navigate to ragbot-api
cd /workspaces/hacks022/ragbot-api

# Create Python virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create .env file with secrets
cp .env.example .env
# Edit .env with actual credentials

Step 2: Initialize Database & Qdrant

# Run migrations and create tables
python -c "from db import init_db; init_db()"

# Ingest documents (from project root)
cd /workspaces/hacks022
python ragbot-ingest.py

Step 3: Test Backend Locally

cd ragbot-api
python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000

Visit: http://localhost:8000/docs for OpenAPI UI

Step 4: Deploy Backend to Render

Create Repository

git add .
git commit -m "Add RAG chatbot system"
git push origin main

Connect to Render
- Go to render.com
- Click "New +" → "Web Service"
- Connect GitHub repository
- Select /ragbot-api as root directory
Configure
- Build Command: pip install -r requirements.txt
- Start Command: gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app
- Environment Variables: Add from .env.example
Deploy
- Click Deploy
- Monitor build logs
- Get service URL

Step 5: Deploy Frontend to Vercel

Prepare Docusaurus
```
cd /workspaces/hacks022
npm install
```
Update API URL
- In src/pages/ragbot.jsx: Update apiUrl to backend service URL
- Or use environment variable: REACT_APP_API_URL
Connect to Vercel
- Go to vercel.com
- Click "New Project"
- Import GitHub repository
- Root directory: / (not needed if at project root)
- Framework: Docusaurus
- Environment variables:
  - REACT_APP_API_URL=https://your-render-service-url

If you plan to host the backend in Vercel serverless functions instead of Render/other platforms, set the following additional Vercel environment variables in the project settings (Environment → Production / Preview):

GROQ_API_KEY — your Groq API key for model/chat completions
QDRANT_URL — your Qdrant cloud URL
QDRANT_API_KEY — your Qdrant API key
DATABASE_URL — e.g. sqlite:///./chat.db for small deployments or a Postgres URL for production
FRONTEND_URL — e.g. https://<your-vercel-site>.vercel.app
EMBEDDINGS_PROVIDER — set to hf to use Hugging Face Inference service for embeddings (recommended on Vercel to avoid heavy torch installs)
HUGGINGFACE_API_KEY — Hugging Face Inference token used when EMBEDDINGS_PROVIDER=hf
EMBEDDINGS_HF_MODEL — (optional) the HF embeddings model name: sentence-transformers/all-MiniLM-L6-v2 by default

Notes:

Vercel serverless environment may struggle to install large binary packages like torch. Setting EMBEDDINGS_PROVIDER=hf lets the function use the Hugging Face Inference API for query embeddings (no torch required). Make sure HUGGINGFACE_API_KEY is configured as a project secret.
The serverless function for this repo is under /api/index.py and uses api/requirements.txt (lighter requirements). For local development or dedicated backend servers, keep ragbot-api/requirements.txt for full installs including sentence-transformers and torch.

Deploy
- Click Deploy
- Wait for build completion
- Get deployment URL

Step 6: Post-Deployment Checklist

# 1. Verify API health
curl https://your-render-service-url/api/health

# 2. Test chat endpoint
curl -X POST https://your-render-service-url/api/chat \
  -H "Content-Type: application/json" \
  -d '{"query":"test","mode":"fullbook"}'

# 3. Check Docusaurus at Vercel URL
# 4. Test RAG chatbot at /ragbot
# 5. Monitor logs for errors

Rate Limiting Strategy

Configuration

# In config.py
rate_limit_calls: int = 100
rate_limit_period: int = 3600  # 1 hour

Implementation with SlowAPI

from slowapi import Limiter
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)

@app.exception_handler(RateLimitExceeded)
async def ratelimit_handler(request, exc):
    return JSONResponse(
        status_code=429,
        content={"detail": "Rate limit exceeded. Max 100 requests per hour."}
    )

# In routers/chat.py
@router.post("/api/chat")
@limiter.limit("100/hour")
async def chat(request: ChatRequest, ...):
    pass

Rate Limit Tiers

Tier	Calls/Hour	Concurrent	Use Case
Free	100	5	Development
Standard	500	10	Testing
Premium	5000	50	Production

Monitoring

# Log rate limit hits
if rate_limited:
    logger.warning(f"Rate limit exceeded for {user_id}")
    
# Track in database
class RateLimitLog(Base):
    user_id = Column(String)
    timestamp = Column(DateTime)
    endpoint = Column(String)

Maintenance Guide

Monitoring

Health Checks

# API health
curl https://your-api-url/api/health

# Database connection
# Check Neon dashboard for connection metrics

# Qdrant status
curl https://your-qdrant-url/health -H "api-key: ..."

# OpenAI API status
# Monitor in OpenAI dashboard

Key Metrics

API Response Time: Target < 3s
Qdrant Search Latency: Target < 500ms
Embedding Generation: Depends on batch size
Database Query Time: Target < 100ms

Logging

# Structured logging
import logging
import json

logger = logging.getLogger(__name__)

# Log format
logger.info(json.dumps({
    "event": "chat_request",
    "user_id": user_id,
    "mode": mode,
    "duration_ms": elapsed_ms,
    "tokens_used": tokens
}))

Backup & Recovery

# PostgreSQL backup (Neon handles automatically)
# Access backups in Neon dashboard

# Qdrant snapshot
curl -X POST https://your-qdrant-url/snapshots \
  -H "api-key: ..."

# Restore from backup
# Contact Neon support for recovery

Troubleshooting

Issue	Solution
Slow chat responses	Check Qdrant search time, increase top_k_results
Embedding failures	Verify OpenAI API key, check rate limits
Database connection drops	Neon auto-reconnects; check connection pooling
Vectors not stored	Verify Qdrant collection exists, check payload size
UI not loading	Check CORS configuration, verify API URL

Update Procedure

Test locally

git checkout -b feature/update
# Make changes
# Test thoroughly

Deploy to staging

git push origin feature/update
# Create pull request
# Deploy to staging environment

Production deployment

git merge main
# Automated deployment triggers
# Monitor logs for errors

Regular Maintenance Tasks

Weekly: Review logs for errors
Monthly: Check API usage and costs
Monthly: Update dependencies
Quarterly: Review security settings
Quarterly: Archive old chat logs

Cost Optimization

# Monitor token usage
# Optimize chunk size for better search
# Batch embedding generation
# Use smaller model for simple queries

# Example: fallback to GPT-3.5 for simple questions
if query_complexity < 0.5:
    model = "gpt-3.5-turbo"
else:
    model = "gpt-4-turbo"

Scaling Considerations

As usage grows:

Add caching layer (Redis)

from redis import Redis
redis = Redis(host='localhost', port=6379)

# Cache embeddings
cached = redis.get(f"embedding:{query}")

Increase vector batch size

# Adjust in config
batch_size = 50  # increased from 25

Add database read replicas (Neon)
- Create read-only replica for analytics

Implement request queuing

from celery import Celery
app = Celery('ragbot')

@app.task
def process_chat(query, mode):
    # Long-running chat processing
    pass

Appendix

Environment Variables

# OpenAI
OPENAI_API_KEY=<REDACTED>
OPENAI_MODEL=gpt-4-turbo
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Qdrant
QDRANT_URL=<REDACTED>
QDRANT_API_KEY=<REDACTED>
QDRANT_COLLECTION_NAME=robotics_docs

# Database
DATABASE_URL=postgresql://...

# Frontend
FRONTEND_URL=https://...
REACT_APP_API_URL=https://...

# API Configuration
CHUNK_SIZE=500
TOP_K_RESULTS=5
RATE_LIMIT_CALLS=100
RATE_LIMIT_PERIOD=3600

Dependencies

Backend:

FastAPI 0.104.1
Uvicorn 0.24.0
SQLAlchemy 2.0.23
Qdrant Client 2.7.0
OpenAI 1.3.5

Frontend:

React 18+
Docusaurus 3+

References

Last Updated: December 2024
Maintained By: Engineering Team

Physical AI Humanoid Robotics Book – Integrated RAG Chatbot System​

Table of Contents​

System Architecture​

ASCII Diagram​

Component Overview​

Data Flow​

Flow 1: FULLBOOK RAG Mode​

Flow 2: SELECTED Mode​

Flow 3: Document Ingestion​

API Specifications​

Base URL​

Authentication​

Endpoints​

1. POST /api/chat​

2. GET /api/chat/history/{session_id}​

3. POST /api/chat/rate​

4. POST /api/ingest/docs​

5. GET /api/ingest/status​

6. GET /api/health​

Vector Ingestion Pipeline​

Pipeline Steps​

Configuration​

Database Schema​

Table: user_sessions​

Table: chat_messages​

Table: ingestion_logs​

Security Checklist​

✓ Implementation Status​

🔒 Recommended Security Measures​

For Production:​

🚨 Secrets Management​

Deployment Instructions​

Prerequisites​

Step 1: Prepare Backend Environment​

Step 2: Initialize Database & Qdrant​

Step 3: Test Backend Locally​

Step 4: Deploy Backend to Render​

Step 5: Deploy Frontend to Vercel​

Step 6: Post-Deployment Checklist​

Rate Limiting Strategy​

Configuration​

Implementation with SlowAPI​

Rate Limit Tiers​

Monitoring​

Maintenance Guide​

Monitoring​

Health Checks​

Key Metrics​

Logging​

Backup & Recovery​

Troubleshooting​

Update Procedure​

Regular Maintenance Tasks​

Cost Optimization​

Scaling Considerations​

Appendix​

Environment Variables​

Dependencies​

References​

Physical AI Humanoid Robotics Book – Integrated RAG Chatbot System

Table of Contents

System Architecture

ASCII Diagram

Component Overview

Data Flow

Flow 1: FULLBOOK RAG Mode

Flow 2: SELECTED Mode

Flow 3: Document Ingestion

API Specifications

Base URL

Authentication

Endpoints

1. POST `/api/chat`

2. GET `/api/chat/history/{session_id}`

3. POST `/api/chat/rate`

4. POST `/api/ingest/docs`

5. GET `/api/ingest/status`

6. GET `/api/health`

Vector Ingestion Pipeline

Pipeline Steps

Configuration

Database Schema

Table: `user_sessions`

Table: `chat_messages`

Table: `ingestion_logs`

Security Checklist

✓ Implementation Status

🔒 Recommended Security Measures

For Production:

🚨 Secrets Management

Deployment Instructions

Prerequisites

Step 1: Prepare Backend Environment

Step 2: Initialize Database & Qdrant

Step 3: Test Backend Locally

Step 4: Deploy Backend to Render

Step 5: Deploy Frontend to Vercel

Step 6: Post-Deployment Checklist

Rate Limiting Strategy

Configuration

Implementation with SlowAPI

Rate Limit Tiers

Monitoring

Maintenance Guide

Monitoring

Health Checks

Key Metrics

Logging

Backup & Recovery

Troubleshooting

Update Procedure

Regular Maintenance Tasks

Cost Optimization

Scaling Considerations

Appendix

Environment Variables

Dependencies

References