LakeFlow
Get Started

Deployment

Portainer Stack

Portainer does not support build in stack. Build and push images to Docker Hub first.

Step 1: Build and push images

cd LakeFlow
export DOCKERHUB_USER=your-username
DOCKER_BUILDKIT=1 docker build -t $DOCKERHUB_USER/lakeflow-backend:latest ./backend
docker build -t $DOCKERHUB_USER/lakeflow-frontend:latest ./frontend/streamlit
docker push $DOCKERHUB_USER/lakeflow-backend:latest
docker push $DOCKERHUB_USER/lakeflow-frontend:latest

Step 2: Create stack in Portainer

  1. Portainer β†’ Stacks β†’ Add stack
  2. Web editor β†’ paste contents of portainer-stack.yml
  3. Env vars: add DOCKERHUB_USER (e.g. lampx83). Can add other vars from .env if needed.
  4. Deploy stack

Note: Stack uses named volume lakeflow_data. For host path bind, edit stack to add driver_opts with device: /path/on/host for volume.

Manual deploy to server

On VPS or on-prem (Ubuntu, Debian...):

  1. Clone and prepare env:
git clone https://github.com/Lampx83/LakeFlow.git
cd LakeFlow
cp env.example .env
nano .env   # Edit HOST_LAKE_PATH, QDRANT_HOST, API_BASE_URL, LLM_BASE_URL...
  1. Create Data Lake directory: mkdir -p $HOST_LAKE_PATH/000_inbox $HOST_LAKE_PATH/100_raw $HOST_LAKE_PATH/200_staging $HOST_LAKE_PATH/300_processed $HOST_LAKE_PATH/400_embeddings $HOST_LAKE_PATH/500_catalog
  2. Run: DOCKER_BUILDKIT=1 docker compose up -d --build

Using deploy override (fixed bind mount): export LAKEFLOW_DATA_PATH=/datalake/research then docker compose -f docker-compose.yml -f docker-compose.deploy.yml up -d --build

Auto deploy (GitHub Actions)

Workflow .github/workflows/deploy.yml SSHs to server and runs docker compose on each push to main.

Server setup (one-time)

  1. Install Docker:
    curl -fsSL https://get.docker.com | sh
    sudo usermod -aG docker $USER
    # Log out and log back in
  2. Clone repo: cd ~ && git clone https://github.com/Lampx83/LakeFlow.git lakeflow
  3. Create .env: cp env.example .env (or cp .env.example .env) then edit LAKE_ROOT, QDRANT_HOST, API_BASE_URL
  4. Create Data Lake directory: sudo mkdir -p /datalake/research && sudo chown $USER:$USER /datalake/research
  5. SSH key for GitHub Actions:
    ssh-keygen -t ed25519 -C "deploy" -f ~/.ssh/deploy_lakeflow -N ""
    cat ~/.ssh/deploy_lakeflow.pub >> ~/.ssh/authorized_keys
    # Get private key: cat ~/.ssh/deploy_lakeflow β†’ paste into GitHub Secret SSH_PRIVATE_KEY

GitHub Secrets

Settings β†’ Secrets and variables β†’ Actions β†’ New repository secret:

SecretRequiredDescription
DEPLOY_HOSTYesServer IP or hostname (e.g. 123.45.67.89)
DEPLOY_USERYesSSH user (e.g. ubuntu)
SSH_PRIVATE_KEYYesFull private key content (including BEGIN/END)
DEPLOY_REPO_DIRNoRepo directory on server; default ~/lakeflow
DEPLOY_SSH_PORTNoSSH port if not 22

Data Lake mount

  • Docker Compose (dev): Uses HOST_LAKE_PATH from .env. Directory must exist.
  • docker-compose.deploy.yml: Volume bind LAKEFLOW_DATA_PATH (default ./data). On server: export LAKEFLOW_DATA_PATH=/datalake/research before running compose.

CI/CD

WorkflowTriggerAction
ci.ymlPush/PR main, developLint (Ruff), Docker build
cd.ymlRelease tagBuild + push images to GitHub Container Registry
push-dockerhub.ymlPush main (when backend/frontend changed)Push lakeflow-backend, lakeflow-frontend to Docker Hub. Needs DOCKERHUB_USER, DOCKERHUB_TOKEN
publish-pypi.ymlGitHub ReleasePublish lake-flow-pipeline to PyPI
deploy.ymlPush mainSSH β†’ git pull β†’ docker compose up