End-to-End Analytics Platform: Databricks, Streamlit & GP...

From Raw Data to Revenue Decisions

Client: Global manufacturing company (e-commerce + B2B dealer network) Timeline: 12 weeks Scope: Full-stack analytics platform — data pipelines, ML models, AI querying, 10 production dashboards

The client had petabytes of e-commerce, marketing, and customer data in a Databricks lakehouse. None of it was being used by the people making budget decisions. I built the platform that changed that.

The Stack

Layer	Technology
Data warehouse	Databricks — Delta Lake, Unity Catalog, SQL Warehouses
Orchestration	Databricks Workflows (multi-task DAGs)
Frontend	Streamlit — multi-page, fragment-based tab isolation
LLM	Azure OpenAI GPT-4o via LiteLLM, Databricks AI/BI Genie fallback
ML	Databricks-hosted churn scoring, LTV cohort modeling, product affinity
CI/CD	GitHub Actions, containerized deployment
Monitoring	z-score anomaly detection, Databricks Jobs API health checks

What I Built

10 production dashboards across two business units

B2C: Paid media ROI, revenue analysis (14 markets), LTV & cohort analysis, out-of-stock impact, multi-touch attribution, ML intelligence, live notification centre.

B2B: Dealer performance, distributor pipeline, market scorecard.

AI-powered natural language querying

Users ask questions in plain English. GPT-4o generates validated SQL against Databricks, results stream back with a narrative explanation and follow-up suggestions — in one pass. Successful query patterns are saved as few-shot examples that improve future accuracy.

Anomaly detection engine

Z-score based system comparing each day’s metrics against a 28-day rolling baseline. Critical at |z| > 3, warning at |z| > 2. Runs as a scheduled job, surfaces severity-ranked cards on open.

Out-of-stock revenue impact model

Flags products with high view/search intent but zero purchases. Quantifies lost revenue as lost_atc × CVR × AOV by market. Has directly driven supply chain escalations.

Pipeline monitoring

Instrumented 29 tables with freshness checks. Identified and fixed two production failures programmatically via REST API — a race condition in the DAG and a quota exhaustion death spiral in a third-party ingest notebook.

Results

29 live tables powering dashboards across 14 European markets
~150M+ rows queryable in under 3 seconds
7 scheduled pipelines with dependency validation
2 production failures caught and fixed during audit
Teams have answers before the meeting — not after

Full Write-Up

Read the full breakdown — architecture decisions, the AI querying pipeline, anomaly detection math, and what I’d do differently.

Interested in something similar? Get in touch.

End-to-End Analytics Platform — Databricks, Streamlit & GPT-4o

From Raw Data to Revenue Decisions

The Stack

What I Built

10 production dashboards across two business units

AI-powered natural language querying

Anomaly detection engine

Out-of-stock revenue impact model

Pipeline monitoring

Results

Full Write-Up

Interested in similar projects?

View Next

Conversational Analytics Prototype - Beyond Dashboards

From Raw Data to Revenue Decisions

The Stack

What I Built

10 production dashboards across two business units

AI-powered natural language querying

Anomaly detection engine

Out-of-stock revenue impact model

Pipeline monitoring

Results

Full Write-Up

Interested in similar projects?

View Next

Conversational Analytics Prototype - Beyond Dashboards

Relaterat innehåll

Beyond Dashboards: How to Build Living Data Experiences with Conversational Analytics

Why Most Business Dashboards Fail (Enterprise, Small Business & Sports Teams) - Data Engineer Insights

Conversational Analytics Prototype - Beyond Dashboards