LakeFlow
Get Started
Data Lake pipeline for RAG. Self-host in minutes.

From documents to semantic search

Ingest PDF, Word, Excel β†’ embed β†’ Qdrant. Q&A, agents, AI Portal integration.

docker compose up --build

or pipx run lake-flow-pipeline init

Get Started

Data Lake pipeline for RAG

Inbox β†’ raw β†’ staging β†’ processed β†’ embeddings β†’ Qdrant. Run step0 to step4. Use with AI Portal agents.

βš™οΈ

Backend (FastAPI)

REST API: Auth, Search (embed, semantic, Q&A), Pipeline (step0–step4), System, Qdrant proxy, Inbox, Admission agent.

πŸ–₯️

Frontend (Streamlit)

Control UI: Dashboard, Data Lake Explorer, Pipeline Runner, Semantic Search, Q&A with AI, Qdrant Inspector.

πŸ“

Layered Data Lake

Zones: 000_inbox β†’ 100_raw β†’ 200_staging β†’ 300_processed β†’ 400_embeddings β†’ 500_catalog. Hash, dedup, catalog.

πŸ”

Semantic search & embed

POST /search/embed (text→vector), /search/semantic (Qdrant), /search/qa (RAG). Qdrant vector store.

🐳

Docker-first

Backend, frontend, Qdrant via Docker Compose. No Python on host. venv Mac M1 for GPU (Metal/MPS).

🐍

Python & FastAPI

Python 3.10+, FastAPI, sentence-transformers, Qdrant. Easy to extend. PyPI: lake-flow-pipeline.

Solutions for every use case

From research to regulations. LakeFlow adapts to your document types.

For developers

Integrate LakeFlow into your Python stack. REST APIs, Docker, FastAPI backend, Streamlit UI.

Quick start β†’

For data teams

Ingest documents, run pipelines via UI or API. Embedding and semantic search for RAG and LLM.

Documentation β†’

Enterprise

Self-host on your infrastructure. NAS compatible (SQLite without WAL). Full data control.

Deployment guide β†’

Ready to deploy?

Start with LakeFlow. Run with Docker in minutes.

Open source. You own your data.