From documents to semantic search
Ingest PDF, Word, Excel β embed β Qdrant. Q&A, agents, AI Portal integration.
docker compose up --buildor pipx run lake-flow-pipeline init
Data Lake pipeline for RAG
Inbox β raw β staging β processed β embeddings β Qdrant. Run step0 to step4. Use with AI Portal agents.
Backend (FastAPI)
REST API: Auth, Search (embed, semantic, Q&A), Pipeline (step0βstep4), System, Qdrant proxy, Inbox, Admission agent.
Frontend (Streamlit)
Control UI: Dashboard, Data Lake Explorer, Pipeline Runner, Semantic Search, Q&A with AI, Qdrant Inspector.
Layered Data Lake
Zones: 000_inbox β 100_raw β 200_staging β 300_processed β 400_embeddings β 500_catalog. Hash, dedup, catalog.
Semantic search & embed
POST /search/embed (textβvector), /search/semantic (Qdrant), /search/qa (RAG). Qdrant vector store.
Docker-first
Backend, frontend, Qdrant via Docker Compose. No Python on host. venv Mac M1 for GPU (Metal/MPS).
Python & FastAPI
Python 3.10+, FastAPI, sentence-transformers, Qdrant. Easy to extend. PyPI: lake-flow-pipeline.
Solutions for every use case
From research to regulations. LakeFlow adapts to your document types.
For developers
Integrate LakeFlow into your Python stack. REST APIs, Docker, FastAPI backend, Streamlit UI.
Quick start βFor data teams
Ingest documents, run pipelines via UI or API. Embedding and semantic search for RAG and LLM.
Documentation βEnterprise
Self-host on your infrastructure. NAS compatible (SQLite without WAL). Full data control.
Deployment guide βBuilt for developers
Docker Compose, REST API, Streamlit UI. Full control of your pipeline.
Quick start
Create a LakeFlow project in one command and run with Docker.
pipx run lake-flow-pipeline initDocumentation
Full docs: Backend API, Frontend UI, Data Lake, Configuration, Deployment.
GitHub
Source code, issues, and contributions.
PyPI
Package lake-flow-pipeline β pip install, available on pypi.org.
Ready to deploy?
Start with LakeFlow. Run with Docker in minutes.
Open source. You own your data.