LakeFlow
Get Started

Configuration

LakeFlow uses a .env file in the repo root. Copy from env.example (or .env.example) then edit.

Environment variables

VariableDescription
HOST_LAKE_PATHRequired (Docker). Host path for volume bind mount. Maps to /data in container. Must exist before running docker compose up.
LAKE_ROOTData Lake root path in container/process. Docker: /data. Local: path you choose (e.g. /Users/me/datalake).
QDRANT_HOSTQdrant host. Docker Compose: lakeflow-qdrant. Local: localhost. Portainer: qdrant.
QDRANT_PORTQdrant port. Default 6333.
API_BASE_URLBackend URL for Frontend. Docker: http://lakeflow-backend:8011. Local: http://localhost:8011. Frontend calls API via this URL.
LAKEFLOW_MODEDEV = show Pipeline Runner in UI, default password in login form. Omit or other = hide (production).
LLM_BASE_URLOllama URL for Q&A, Admission agent, embedding (step3). E.g. http://host:11434. Backend must be able to reach it.
LLM_MODELLLM model. Default qwen3:8b. Used for Q&A, Admission Agent.
EMBED_MODELOllama embed model for step3 and Search API. Default qwen3-embedding:8b. Must match model used in step3 for search to work.
EMBED_MODEL_OPTIONSModel list for step3 dropdown. Format: qwen3-embedding:8b,nomic-embed-text,mxbai-embed-large.
OLLAMA_EMBED_URLOllama embed API URL. Default: $LLM_BASE_URL/api/embed.
OPENAI_API_KEYIf set, Q&A uses OpenAI instead of Ollama. Need OPENAI_BASE_URL, OPENAI_MODEL for custom endpoint.
LAKEFLOW_MOUNT_DESCRIPTIONDescription shown in System Settings (e.g. "Volume bind from /datalake/research").
QDRANT_SERVICESAdd Qdrant instances to UI dropdown. Format: URL or Label|URL, comma-separated.
LAKEFLOW_PIPELINE_BASE_URLBackend URL for Inbox when auto-running pipeline (after upload). Default http://127.0.0.1:8011. In Docker Inbox runs from backend container so use localhost.
LAKEFLOW_DATA_PATHUsed in deploy: Data Lake path on server. Overrides HOST_LAKE_PATH when using docker-compose.deploy.yml.
JWT_SECRET_KEYSecret for JWT. Production: set a secure value. Default dev-only.
QDRANT_API_KEYQdrant API key (if Qdrant Cloud or auth required).

Docker default values

In docker-compose.yml, backend/frontend receive:

  • LAKE_ROOT=/data
  • QDRANT_HOST=lakeflow-qdrant
  • QDRANT_PORT=6333
  • API_BASE_URL=http://lakeflow-backend:8011 (frontend)

Volume lakeflow_data uses device: $HOST_LAKE_PATH β€” from .env.

Create zones

If zones don't exist, create them in the Data Lake directory:

  • Docker: Docker: Create under HOST_LAKE_PATH (maps to /data in container)
  • Local: Local: Create under LAKE_ROOT
# Replace $DATA_DIR with HOST_LAKE_PATH (Docker) or LAKE_ROOT (local)
mkdir -p $DATA_DIR/000_inbox $DATA_DIR/100_raw $DATA_DIR/200_staging \
  $DATA_DIR/300_processed $DATA_DIR/400_embeddings $DATA_DIR/500_catalog

Example .env

# Docker dev (Ollama on host Mac/Win: use host.docker.internal)
HOST_LAKE_PATH=/Users/you/lakeflow_data
LAKE_ROOT=/data
QDRANT_HOST=lakeflow-qdrant
API_BASE_URL=http://lakeflow-backend:8011
LAKEFLOW_MODE=DEV
LLM_BASE_URL=http://host.docker.internal:11434
EMBED_MODEL=qwen3-embedding:8b

# Local dev
LAKE_ROOT=/Users/you/lakeflow_data
QDRANT_HOST=localhost
API_BASE_URL=http://localhost:8011
LAKEFLOW_MODE=DEV
LLM_BASE_URL=http://localhost:11434