Frontend (Streamlit)
LakeFlow frontend is a Streamlit control UI at http://localhost:8012. Connects to Backend API to run pipelines, explore Data Lake, and test Semantic Search.
Login
Default: admin / admin123. JWT token is stored in session and sent with every API request.
Pages requiring login: Q&A with AI, System Settings (some operations). Other pages can be used once backend is ready.
Pages overview
Dashboard
Pipeline status overview, run history. Quick view of file count per zone, recent pipelines.
Data Lake Explorer
Browse zone directory tree: inbox β raw β staging β processed β embeddings β catalog. Select zone and path to view files; preview JSON content (validation.json, chunks.json).
Pipeline Runner
Only shown when LAKEFLOW_MODE=DEV. Manually run step0βstep4. Options:
- Select folder (domain or file_hash) β run on subset only
- Enable Force rerun β run again even if already processed
- Step3: choose embed model from dropdown (from EMBED_MODEL_OPTIONS)
- Step4: choose collection_name, qdrant_url
Results show returncode, stdout, stderr.
SQLite Viewer
View SQLite databases in Data Lake (e.g. catalog, app DB). Select .db file, view tables and query.
Qdrant Inspector
List collections, view points in a collection. Supports custom Qdrant URL (multi-Qdrant). Useful to verify vectors after step4.
Semantic Search
Enter natural language question, get results with score. Can select collection, Qdrant URL, top_k. Use to test search before integrating API.
Q&A with AI
RAG Q&A: ask question β semantic search finds context β LLM (Ollama/OpenAI) answers. Login required. Displays contexts and answer.
System Settings
Full configuration: Connection status (Backend, Qdrant), runtime config table (Data Lake path, Qdrant URL, Embed/LLM model, OpenAI key set), zone status (file counts), create missing zones button, Data Lake path config. Does not display secrets (API key).
MultiβQdrant
Semantic Search and Qdrant Inspector allow entering custom Qdrant URL and collection. Use when testing multiple vector stores or environments.
Frontend code structure
frontend/streamlit/
βββ app.py # Entry, sidebar, routing
βββ pages/ # Each file = one page (Streamlit auto-detect)
β βββ pipeline_dashboard.py
β βββ data_lake_explorer.py
β βββ pipeline_runner.py
β βββ sqlite_viewer.py
β βββ qdrant_inspector.py
β βββ semantic_search.py
β βββ qa.py # Q&A with AI
β βββ system_settings.py
β βββ admin.py
β βββ login.py
βββ state/
β βββ session.py # Session init
β βββ token_store.py # Auth token storage
βββ services/
βββ api_client.py # HTTP client for backend
βββ pipeline_service.py # Calls /pipeline/run/*
βββ qdrant_service.py # Qdrant API callsRun locally
# From repo root # dev_with_reload auto-loads .env from repo root python frontend/streamlit/dev_with_reload.py # Or run Streamlit directly (need .env or export vars) streamlit run frontend/streamlit/app.py
When running backend locally: set API_BASE_URL=http://localhost:8011 in .env. Frontend auto-resolves lakeflow-backend β localhost when hostname does not resolve (in Docker, uses service name).
Troubleshooting
- Connection refused: Check backend is running, API_BASE_URL is correct. In Docker, frontend calls lakeflow-backend:8011.
- Pipeline Runner not showing: Set LAKEFLOW_MODE=DEV in .env.
- Q&A 401 error: Log in again; token may have expired.