From Raw Data to Revenue Decisions: Building an End-to-End Analytics Platform
TL;DR: 12-week engagement. Global manufacturer. Petabytes of untouched e-commerce and marketing data. I built a real-time analytics platform covering 14 markets — Databricks medallion architecture, Streamlit multi-page app, GPT-4o natural language querying, z-score anomaly detection, and ML models for churn and LTV. Two production pipeline failures found and fixed. Teams now have answers before the meeting starts.
Every large company has data. Very few actually use it.
When I started this engagement, the client had petabytes of e-commerce, marketing, and customer data sitting in a data lakehouse — largely untouched by the people who needed it most. Marketing managers were making budget decisions based on gut feel. No one could answer basic questions like “which channel actually drives revenue when you account for the full customer journey?” or “what’s happening to conversions in Germany right now?” without waiting days for a data analyst to pull a report.
Twelve weeks later, they had a real-time analytics platform their teams use every day. Here’s what I built and how.
The Stack
- Data warehouse: Databricks (Delta Lake, Unity Catalog, SQL Warehouses)
- Orchestration: Databricks Workflows — multi-task DAGs with explicit dependency graphs
- Frontend: Streamlit — multi-page app, fragment-based tab isolation, session state management
- LLM integration: Azure OpenAI GPT-4o via LiteLLM, with Databricks AI/BI Genie as fallback
- ML: Databricks-hosted models for churn scoring, LTV cohort modeling, product affinity
- CI/CD: GitHub Actions, containerized deployment
- Monitoring: Custom z-score anomaly detection, live pipeline health checks
What I Built
10 production dashboards across two product lines
The platform covers two distinct business units — direct-to-consumer e-commerce and B2B dealer/distributor sales — each with their own data models, KPIs, and user personas. Rather than one monolithic dashboard, I built a navigation-driven multi-page app where each page is purpose-built for a specific decision type.
B2C dashboards:
- Paid media ROI — ROAS by channel, dayparting heatmaps, demographic bid optimisation
- Revenue analysis — 14 markets, WoW/YoY comparisons, anomaly highlighting
- LTV & cohort analysis — retention curves, LTV/CAC ratio, acquisition channel quality
- Out-of-stock impact — lost revenue estimation, supply chain alerts
- Multi-touch attribution — linear vs last-click, identifying under/overvalued channels
- ML Intelligence — churn risk scoring, product affinity, session feature analysis
- Live notification centre — anomaly alerts, ROAS alerts, funnel collapse detection
AI-powered natural language querying
One of the flagship features is a Chat with Data interface — users ask questions in plain language and get SQL-backed answers with narrative explanations.
The pipeline:
- GPT-4o generates SQL from the user’s question using a dynamically-built system prompt with live schema context, actual column sample values, and few-shot examples
- A safeguard layer validates the SQL — read-only, allowlisted tables, LIMIT wrapper
- Results are executed against Databricks and cached by MD5 hash
- A streaming narrative plus follow-up suggestions are generated in one pass (saves 2–3s vs two separate calls)
- If the LLM produces no valid SQL, Databricks Genie handles the fallback
The system learns over time — thumbs-up feedback saves successful query patterns as few-shot examples that improve future responses.
Anomaly detection engine
Built a z-score based system that compares each day’s metrics against a 28-day rolling baseline:
z = (today_value − 28d_mean) / 28d_stddev
critical: |z| > 3
warning: |z| > 2
This runs as a scheduled Databricks job and writes to a gold table that powers both the dedicated anomaly dashboard and the live notification centre. Users see cards ranked by severity the moment they open the app.
Out-of-stock revenue impact model
Built an estimation model that flags products with high search/view intent but zero purchases, quantifies lost revenue using lost_atc × CVR × AOV, and surfaces actionable alerts by market. This has directly driven supply chain escalations — with the data attached.
Pipeline monitoring and self-healing
Part of the engagement was keeping the platform reliable, not just building it. I instrumented all 29 data tables with freshness checks and built a health audit that runs against the Databricks Jobs API to validate:
- Last successful run timestamp per table
- Days since last write
- Job dependency graph integrity
During this I identified and fixed two production pipeline failures:
- Race condition — a market scorecard notebook tried to read a table 6 minutes before a parallel task finished creating it. Fixed by adding an explicit task dependency in the Databricks job DAG.
- Quota exhaustion death spiral — a third-party API ingest notebook kept retrying after hitting rate limits, burning through the daily quota. Fixed by patching the notebook to cap daily API calls within quota limits, then redeploying via the Databricks Workspace API.
Both fixes applied programmatically via REST API. No manual UI intervention.
The Data Architecture
The platform follows a medallion architecture:
| Source | Bronze | Silver | Gold |
|---|---|---|---|
| GA4 / BigQuery | bronze_ga4_events | silver_ga4_session | gold_funnel_daily |
| Google Ads API | bronze_gads_raw | silver_gads_hourly | gold_paid_roas_daily |
| Salesforce / SFCC | bronze_orders | silver_order_lines | gold_daily_sales |
| Microsoft Clarity | bronze_clarity | — | gold_cwv_page_daily |
| 3rd party feeds | bronze_feeds | silver_* | gold_market_scorecard |
One non-obvious challenge: several tables use STRING dates in yyyyMMdd format (GA4’s native format) while others use proper DATE types. Getting this wrong silently returns zero rows — no error, just missing data. I documented a canonical filter model covering every table’s quirks, including micro-currency encoding in the Google Ads cost column (divide by 1,000,000 for USD) and malformed market codes that need filtering before any aggregation.
Results
The platform went from zero to covering 14 markets across Europe:
- 29 live Databricks tables powering the dashboards
- ~150M+ rows of session and user feature data queryable in under 3 seconds
- 7 scheduled pipelines running daily, with dependency validation
- 2 production failures identified and fixed during the audit phase
- Complete stakeholder documentation covering methodology, data lineage, and chart interpretation
More importantly: the teams using it now have answers before the meeting starts — not after.
What I Learned
Schema assumptions will hurt you. I found column names that differed between what the documentation said and what was actually in production. Live schema discovery on every deploy — not hardcoded column names — is non-negotiable.
LLM accuracy depends more on context quality than model quality. Injecting actual sample values, date ranges, and known-bad query patterns as few-shot examples improved SQL accuracy far more than any prompt engineering trick.
Data pipelines fail silently. A notebook that “succeeded” but wrote zero rows to a gold table looks identical to a successful write unless you add explicit row-count checks. Build data quality assertions into the pipeline itself, not just the dashboard.
Interested in building something like this for your team? I work with companies that have the data but aren’t using it. Get in touch.