Skip to main content

End-to-End Analytics Platform — Databricks, Streamlit & GPT-4o

From Raw Data to Revenue Decisions

Client: Global manufacturing company (e-commerce + B2B dealer network) Timeline: 12 weeks Scope: Full-stack analytics platform — data pipelines, ML models, AI querying, 10 production dashboards

The client had petabytes of e-commerce, marketing, and customer data in a Databricks lakehouse. None of it was being used by the people making budget decisions. I built the platform that changed that.


The Stack

LayerTechnology
Data warehouseDatabricks — Delta Lake, Unity Catalog, SQL Warehouses
OrchestrationDatabricks Workflows (multi-task DAGs)
FrontendStreamlit — multi-page, fragment-based tab isolation
LLMAzure OpenAI GPT-4o via LiteLLM, Databricks AI/BI Genie fallback
MLDatabricks-hosted churn scoring, LTV cohort modeling, product affinity
CI/CDGitHub Actions, containerized deployment
Monitoringz-score anomaly detection, Databricks Jobs API health checks

What I Built

10 production dashboards across two business units

B2C: Paid media ROI, revenue analysis (14 markets), LTV & cohort analysis, out-of-stock impact, multi-touch attribution, ML intelligence, live notification centre.

B2B: Dealer performance, distributor pipeline, market scorecard.

AI-powered natural language querying

Users ask questions in plain English. GPT-4o generates validated SQL against Databricks, results stream back with a narrative explanation and follow-up suggestions — in one pass. Successful query patterns are saved as few-shot examples that improve future accuracy.

Anomaly detection engine

Z-score based system comparing each day’s metrics against a 28-day rolling baseline. Critical at |z| > 3, warning at |z| > 2. Runs as a scheduled job, surfaces severity-ranked cards on open.

Out-of-stock revenue impact model

Flags products with high view/search intent but zero purchases. Quantifies lost revenue as lost_atc × CVR × AOV by market. Has directly driven supply chain escalations.

Pipeline monitoring

Instrumented 29 tables with freshness checks. Identified and fixed two production failures programmatically via REST API — a race condition in the DAG and a quota exhaustion death spiral in a third-party ingest notebook.


Results

  • 29 live tables powering dashboards across 14 European markets
  • ~150M+ rows queryable in under 3 seconds
  • 7 scheduled pipelines with dependency validation
  • 2 production failures caught and fixed during audit
  • Teams have answers before the meeting — not after

Full Write-Up

Read the full breakdown — architecture decisions, the AI querying pipeline, anomaly detection math, and what I’d do differently.


Interested in something similar? Get in touch.

Interested in similar projects?

I help companies build modern data solutions and web applications. Let's discuss your next project!

Contact Me