Skip to main content

From Raw Data to Revenue Decisions: Building an End-to-End Analytics Platform

6 min read

TL;DR: 12-week engagement. Global manufacturer. Petabytes of untouched e-commerce and marketing data. I built a real-time analytics platform covering 14 markets — Databricks medallion architecture, Streamlit multi-page app, GPT-4o natural language querying, z-score anomaly detection, and ML models for churn and LTV. Two production pipeline failures found and fixed. Teams now have answers before the meeting starts.


Every large company has data. Very few actually use it.

When I started this engagement, the client had petabytes of e-commerce, marketing, and customer data sitting in a data lakehouse — largely untouched by the people who needed it most. Marketing managers were making budget decisions based on gut feel. No one could answer basic questions like “which channel actually drives revenue when you account for the full customer journey?” or “what’s happening to conversions in Germany right now?” without waiting days for a data analyst to pull a report.

Twelve weeks later, they had a real-time analytics platform their teams use every day. Here’s what I built and how.


The Stack

  • Data warehouse: Databricks (Delta Lake, Unity Catalog, SQL Warehouses)
  • Orchestration: Databricks Workflows — multi-task DAGs with explicit dependency graphs
  • Frontend: Streamlit — multi-page app, fragment-based tab isolation, session state management
  • LLM integration: Azure OpenAI GPT-4o via LiteLLM, with Databricks AI/BI Genie as fallback
  • ML: Databricks-hosted models for churn scoring, LTV cohort modeling, product affinity
  • CI/CD: GitHub Actions, containerized deployment
  • Monitoring: Custom z-score anomaly detection, live pipeline health checks

What I Built

10 production dashboards across two product lines

The platform covers two distinct business units — direct-to-consumer e-commerce and B2B dealer/distributor sales — each with their own data models, KPIs, and user personas. Rather than one monolithic dashboard, I built a navigation-driven multi-page app where each page is purpose-built for a specific decision type.

B2C dashboards:

  • Paid media ROI — ROAS by channel, dayparting heatmaps, demographic bid optimisation
  • Revenue analysis — 14 markets, WoW/YoY comparisons, anomaly highlighting
  • LTV & cohort analysis — retention curves, LTV/CAC ratio, acquisition channel quality
  • Out-of-stock impact — lost revenue estimation, supply chain alerts
  • Multi-touch attribution — linear vs last-click, identifying under/overvalued channels
  • ML Intelligence — churn risk scoring, product affinity, session feature analysis
  • Live notification centre — anomaly alerts, ROAS alerts, funnel collapse detection

AI-powered natural language querying

One of the flagship features is a Chat with Data interface — users ask questions in plain language and get SQL-backed answers with narrative explanations.

The pipeline:

  1. GPT-4o generates SQL from the user’s question using a dynamically-built system prompt with live schema context, actual column sample values, and few-shot examples
  2. A safeguard layer validates the SQL — read-only, allowlisted tables, LIMIT wrapper
  3. Results are executed against Databricks and cached by MD5 hash
  4. A streaming narrative plus follow-up suggestions are generated in one pass (saves 2–3s vs two separate calls)
  5. If the LLM produces no valid SQL, Databricks Genie handles the fallback

The system learns over time — thumbs-up feedback saves successful query patterns as few-shot examples that improve future responses.

Anomaly detection engine

Built a z-score based system that compares each day’s metrics against a 28-day rolling baseline:

z = (today_value − 28d_mean) / 28d_stddev

critical: |z| > 3
warning:  |z| > 2

This runs as a scheduled Databricks job and writes to a gold table that powers both the dedicated anomaly dashboard and the live notification centre. Users see cards ranked by severity the moment they open the app.

Out-of-stock revenue impact model

Built an estimation model that flags products with high search/view intent but zero purchases, quantifies lost revenue using lost_atc × CVR × AOV, and surfaces actionable alerts by market. This has directly driven supply chain escalations — with the data attached.

Pipeline monitoring and self-healing

Part of the engagement was keeping the platform reliable, not just building it. I instrumented all 29 data tables with freshness checks and built a health audit that runs against the Databricks Jobs API to validate:

  • Last successful run timestamp per table
  • Days since last write
  • Job dependency graph integrity

During this I identified and fixed two production pipeline failures:

  1. Race condition — a market scorecard notebook tried to read a table 6 minutes before a parallel task finished creating it. Fixed by adding an explicit task dependency in the Databricks job DAG.
  2. Quota exhaustion death spiral — a third-party API ingest notebook kept retrying after hitting rate limits, burning through the daily quota. Fixed by patching the notebook to cap daily API calls within quota limits, then redeploying via the Databricks Workspace API.

Both fixes applied programmatically via REST API. No manual UI intervention.


The Data Architecture

The platform follows a medallion architecture:

SourceBronzeSilverGold
GA4 / BigQuerybronze_ga4_eventssilver_ga4_sessiongold_funnel_daily
Google Ads APIbronze_gads_rawsilver_gads_hourlygold_paid_roas_daily
Salesforce / SFCCbronze_orderssilver_order_linesgold_daily_sales
Microsoft Claritybronze_claritygold_cwv_page_daily
3rd party feedsbronze_feedssilver_*gold_market_scorecard

One non-obvious challenge: several tables use STRING dates in yyyyMMdd format (GA4’s native format) while others use proper DATE types. Getting this wrong silently returns zero rows — no error, just missing data. I documented a canonical filter model covering every table’s quirks, including micro-currency encoding in the Google Ads cost column (divide by 1,000,000 for USD) and malformed market codes that need filtering before any aggregation.


Results

The platform went from zero to covering 14 markets across Europe:

  • 29 live Databricks tables powering the dashboards
  • ~150M+ rows of session and user feature data queryable in under 3 seconds
  • 7 scheduled pipelines running daily, with dependency validation
  • 2 production failures identified and fixed during the audit phase
  • Complete stakeholder documentation covering methodology, data lineage, and chart interpretation

More importantly: the teams using it now have answers before the meeting starts — not after.


What I Learned

Schema assumptions will hurt you. I found column names that differed between what the documentation said and what was actually in production. Live schema discovery on every deploy — not hardcoded column names — is non-negotiable.

LLM accuracy depends more on context quality than model quality. Injecting actual sample values, date ranges, and known-bad query patterns as few-shot examples improved SQL accuracy far more than any prompt engineering trick.

Data pipelines fail silently. A notebook that “succeeded” but wrote zero rows to a gold table looks identical to a successful write unless you add explicit row-count checks. Build data quality assertions into the pipeline itself, not just the dashboard.


Interested in building something like this for your team? I work with companies that have the data but aren’t using it. Get in touch.

Tools Used in This Article

This article mentions several tools from my tech stack.

Get insights and updates first

Subscribe to get updates on AI agents, automation, and new projects straight to your inbox.