LakeFlow Documentation
Data Lake pipeline for RAG: ingest documents, extract text, embed, store in Qdrant. Use with semantic search, Q&A, or AI Portal agents.
Prerequisites
- Docker and Docker Compose
- Disk space: backend ~2GB, Qdrant ~500MB
- Optional: Ollama for embeddings and Q&A
- Optional: Python 3.10+ for local dev
Recommended reading order
- Getting Started β install and first run
- Backend API β REST endpoints
- Data Lake β zones and pipeline steps
- Configuration β environment variables
Quick checklist
- Set HOST_LAKE_PATH in .env
- Create Data Lake zones (000_inbox, 100_raw, etc.)
- Run docker compose up
- Add files to 000_inbox or use POST /inbox/upload
- Run pipeline step0βstep4, then semantic search or Q&A
Documentation sections
Click a card to open the section.
Getting Started
Install with Docker, create zones, first run.
Read more βBackend API
Auth, search, pipeline, inbox, admission agent.
Read more βFrontend (Streamlit)
Pipeline Runner, Semantic Search, Q&A, System Settings.
Read more βData Lake
Zone layout, pipeline steps, supported formats.
Read more βConfiguration
Environment variables, .env example.
Read more βDeployment
Portainer, manual deploy, GitHub Actions.
Read more βTroubleshooting
- Compose fails: check HOST_LAKE_PATH exists
- Frontend connection refused: ensure backend is running
- Search empty: run step3 + step4, check EMBED_MODEL
- Ollama not found: set LLM_BASE_URL, run ollama pull
Tips
- Use the sidebar to jump between sections.
- Run locally with Docker and follow Getting Started.
- Swagger UI at /docs when backend runs.
- Admission agent is an example for AI Portal integration.
Quick links
- GitHub β Lampx83/LakeFlow
- PyPI β lake-flow-pipeline
- Swagger UI: http://localhost:8011/docs (when backend is running)
- ReDoc: http://localhost:8011/redoc
Next:Getting Started β