LakeFlow
Get Started

LakeFlow Documentation

Data Lake pipeline for RAG: ingest documents, extract text, embed, store in Qdrant. Use with semantic search, Q&A, or AI Portal agents.

Prerequisites

  • Docker and Docker Compose
  • Disk space: backend ~2GB, Qdrant ~500MB
  • Optional: Ollama for embeddings and Q&A
  • Optional: Python 3.10+ for local dev

Recommended reading order

  1. Getting Started β€” install and first run
  2. Backend API β€” REST endpoints
  3. Data Lake β€” zones and pipeline steps
  4. Configuration β€” environment variables

Quick checklist

  • Set HOST_LAKE_PATH in .env
  • Create Data Lake zones (000_inbox, 100_raw, etc.)
  • Run docker compose up
  • Add files to 000_inbox or use POST /inbox/upload
  • Run pipeline step0β†’step4, then semantic search or Q&A

Documentation sections

Click a card to open the section.

Troubleshooting

  • Compose fails: check HOST_LAKE_PATH exists
  • Frontend connection refused: ensure backend is running
  • Search empty: run step3 + step4, check EMBED_MODEL
  • Ollama not found: set LLM_BASE_URL, run ollama pull

Tips

  • Use the sidebar to jump between sections.
  • Run locally with Docker and follow Getting Started.
  • Swagger UI at /docs when backend runs.
  • Admission agent is an example for AI Portal integration.

Quick links