— Capability demonstration · Reference build
A reference build using the exact stack delivered to every Shopify Data Hub client. Built end-to-end with production-grade modeling, peer-reviewed improvements, and full documentation.
Dataset
Walmart retail sales
Public dataset
Rows processed
10M+
Data sources
4
Build time
28 days
The brief was simple: ingest a fragmented retail sales dataset across four sources, model it for analytical use, and surface the views a business operator would actually look at. Build it the way you'd build it for a client.
Real retail data — Walmart's or Shopify's — never arrives clean. Sales tables don't speak the same language as inventory feeds. Promotional data lives in a separate system. External factors (weather, economic indicators, holidays) sit in entirely different sources. Stitching them together is most of the work, and it's where most internal teams stall.
This build deliberately replicated that mess: four sources, inconsistent grain, conflicting timestamps, missing joins. The kind of state every Shopify operator with Klaviyo + Shopify Analytics + Meta Ads + Google Ads is sitting in right now.
fct_sales, dim_product, dim_store, dim_date. Documented and exposed.The first version worked. The second version was production-grade. The difference came from a structured peer review that surfaced three categories of improvement:
dbt test coverage on every model — uniqueness on primary keys, not-null on critical fields, referential integrity on every join. CI now blocks broken models from reaching the mart layer.The output isn't pretty charts. It's specific business questions answered at a glance:
Translate those into Shopify language: real LTV by channel, real product profitability after refunds, real attribution after platform noise, real cohort retention. Same questions. Same architecture. Same answers.
The retail dataset is a stand-in. The stack, the modeling approach, the testing discipline, and the layered architecture are identical to what's delivered in every Shopify Data Hub engagement. Source connectors swap (Shopify, Klaviyo, Meta Ads, Google Ads replace the retail sources), business logic shifts to e-commerce KPIs, but the engineering rigor is the same.
That rigor — testing, documentation, incremental loads, layered models — is what separates "a dashboard a junior built once" from "infrastructure your business can actually trust." It's the difference you're paying for.
— Disclosure
This is a reference build on a public dataset, not a paid client engagement. The Shopify Data Hub service applies the same architecture, stack, and modeling rigor to your live store data. First paid engagements onboarding now.