What is a model in a data warehouse?

A model in a data warehouse defines how information is structured for storage and querying. It specifies fact tables, dimension tables, and relationships, so that data can be integrated from multiple sources and queried efficiently.

What are the four types of data models?

In data management, the four standard types of data models are Conceptual, Logical, Physical, and Dimensional. Conceptual shows high-level entities, logical defines relationships and keys, physical specifies implementation in a database, and dimensional structures data for analytics. These explain how data is represented at different levels. In the context of data warehouses, “modeling” usually refers to schema patterns such as star, snowflake, or Data Vault.

What is the data model of a data warehouse?

The data model of a data warehouse is its schema design — how fact tables and dimension tables are defined and linked. It governs how raw data becomes structured for reporting, and can be dimensional (star, snowflake) or historical (Data Vault, 3NF).

What are the three data warehouse models?

The three commonly referenced data warehouse models are the Enterprise Data Warehouse, Data Mart, and Operational Data Store (ODS). Each serves a different purpose: enterprise data warehouses cover the whole organization, data marts focus on a single department, and ODS supports short-term operational reporting.

What are the four components of a data warehouse?

The four core components are: Data sources (operational systems, external feeds), ETL/ELT processes (extraction and transformation), Storage (centralized data warehouse schema), and Presentation layer (dashboards, BI, analytics tools).

What is data warehousing modeling?

Data warehousing modeling is the process of designing fact and dimension tables in a structure that supports analytics. It includes approaches like star schema, snowflake schema, galaxy schema, and Data Vault, depending on governance and performance needs.

What are the four stages of data warehousing?

The four stages are Data Extraction, Data Staging, Data Storage, and Data Presentation. Data is pulled from sources, cleansed and transformed, stored in the data warehouse schema, and then exposed for analytics and BI.

What is the difference between data warehouse and data model?

A data warehouse is the system where integrated, historical data is stored for analysis. A data model is the blueprint of how that data is organized, defining tables, keys, and relationships. The model is the design; the data warehouse is the implementation.

Is data modeling an ETL?

No. Data modeling defines how data is structured in the data warehouse, while ETL/ELT are the processes that move and transform the data before it lands in that structure. Modeling comes first; ETL/ELT implements the pipeline.

Data Warehouse Models: Star, Snowflake, Data Vault & More

Mathias Golombek

· 1st September 2025 · 15 mins read

Data warehouse models define how information is structured inside a data warehouse. They determine which tables exist, how they connect, and how queries are executed at scale, making them a core element of any data warehouse architecture.

The choice of model influences speed, cost, governance, and how easily teams can adapt the data warehouse as business needs evolve.

This guide explains the major approaches — dimensional schemas such as star and snowflake, the galaxy constellation, Data Vault, and normalized ER/3NF structures. Each model serves a different purpose. Some simplify reporting with predictable dimensions, while others prioritize flexibility, auditability, or integration across many sources.

You will also find a comparison matrix, worked examples, and industry-specific blueprints. These are designed to help architects and decision-makers evaluate which model best fits their environment.

Data Warehouse Model Comparison

Data warehouse models differ in how they balance query speed, storage efficiency, governance, and scalability. The table below highlights the main approaches side by side.

Functional Comparison:

Model	Best For	Pros	Cons	Complexity	Tooling Fit
Star schema	Standard BI dashboards, KPIs	Simple, fast queries, widely supported	Can oversimplify hierarchies	Low	Power BI, Tableau, Looker
Snowflake schema	Complex hierarchies, normalized dims	Reduces redundancy, smaller storage	Slower joins, more complex queries	Medium	BI, Exasol
Galaxy (Constellation)	Multi-fact schemas, cross-domain BI	Supports shared dimensions, flexible	Complexity increases sharply	High	Often custom
Data Vault 2.0	Regulated industries, audit trails	Strong lineage, historization, scalable	Needs marts for BI, verbose structure	High	dbt, Exasol
ER/3NF	Operational warehouses, staging	Strong normalization, low redundancy	Poor for BI, many joins required	Medium	RDBMS, limited BI
Anchor Modeling	Extreme agility, schema volatility	Temporal modeling, flexible, evolving	Niche skills, limited adoption	High	Niche support; few mainstream tools
Wide Tables	ML feature stores, prototyping	Simple access, flat structure	Redundant, poor for BI	Low	ML platforms, notebooks

Each data warehouse model comes with trade-offs. Star and snowflake schemas remain the most common choices for BI because they balance speed and usability. Data Vault is preferred in regulated industries for its strong audit trails, while Galaxy models support complex cross-domain reporting at the cost of added complexity. Emerging approaches like Anchor Modeling and Wide Tables serve niche use cases such as agile schema changes or machine learning feature stores. The right model depends on your balance of governance, performance, and business needs.

Operational Trade-offs:

Model	Change Handling	Team Skills Needed	Typical BI Latency	Auditability and Governance
Star schema	Type 1–3 SCD manageable	SQL / BI analysts	Low (fast)	Limited (basic lineage)
Snowflake schema	Handles deep SCDs well	Strong SQL, DBAs	Medium	Better lineage
Galaxy (Constellation)	Complex	Advanced SQL/modeling	Medium-High	Limited out of the box
Data Vault 2.0	Built for churn + CDC	Data engineers	Higher (w/PIT views)	Excellent (audit, compliance)
ER/3NF	Schema changes hard	DBAs / engineers	High latency	Good lineage
Anchor Modeling	Best for schema churn	Specialist knowledge	Variable	Decent lineage, tool gaps
Wide Tables	Overwrites only	Data scientists	Very low (fast)	Minimal (no lineage)

Operational trade-offs show that not all models handle change, latency, or governance equally. Star schemas deliver fast BI but limited lineage, while Snowflake schemas provide better normalization at the cost of query speed. Data Vault offers the strongest compliance and auditability but increases latency without PIT views. Wide tables enable rapid prototyping for data science but sacrifice governance. Choosing the right model means aligning technical strengths with your organization’s priorities in agility, control, and performance.

Dimensional Modeling in Data Warehousing

Dimensional modeling organizes data into facts and dimensions to support fast, predictable queries. It is the foundation of many business intelligence and reporting systems, and one of the core data warehouse concepts that inform model selection.

Star Schema

A star schema places a single fact table at the center, surrounded by dimension tables.

Use case: Dashboards and BI queries with large volumes of fact data and repeatable analysis.
Advantages: Simple to understand, fast joins, widely supported by BI tools.
Limitations: Redundant data in dimensions, limited flexibility for complex hierarchies.

Example:
A retail data warehouse may store sales in a fact table with foreign keys to dimensions such as Date, Product, Store, and Customer.

Try Exasol Free: No Costs, Just Speed

Run Exasol locally and test real workloads at full speed.

Try it free

Snowflake Schema

A snowflake schema normalizes dimensions into multiple related tables. The snowflake schema normalizes dimensions into multiple related tables. In a modern data warehouse, this trade-off can reduce storage but adds complexity through extra joins.

Use case: Environments with complex hierarchies, such as geography (Country → Region → City).
Advantages: Reduces data duplication, enforces consistency across attributes.
Limitations: Joins become more complex; queries may run slower than in a star schema.

Example:
A Product dimension may split into Product, Category, and Supplier tables, linked by keys instead of a single flat table.

Data warehouse models compared: Star vs. Snowflake: in a Star Schema, all dimensions connect directly to the fact table. In a Snowflake Schema, dimensions are normalized into related tables such as Region → Country → City.

Galaxy (Constellation) Schema

A galaxy schema, also called a constellation, contains multiple fact tables that share dimension tables.

Use case: Enterprises that need reporting across business processes (e.g., sales and finance).

Advantages: Integrates multiple subject areas; allows comparisons across domains.

Limitations: Design and maintenance are more complex; requires careful governance of shared dimensions.

Example:
A Customer dimension could link to both a Sales fact table and a Support fact table, enabling analysis across revenue and service.

Slowly Changing Dimensions (SCD Types)

Dimensional models must handle how attributes change over time. SCDs define strategies for updating dimension data. Done well, they deliver clear data warehouse advantages in accuracy, compliance, and long-term usability.

Type	Description	Use Case
Type 0	Retain original value, no changes.	Static attributes (e.g., Date of Birth).
Type 1	Overwrite old value with new.	Non-critical changes (e.g., correcting a spelling error).
Type 2	Add a new row with start/end dates to preserve history.	Tracking history of Customer Address or Job Title.
Type 3	Keep previous value in additional column(s).	Storing “current” and “previous” values for limited history.
Type 4	Store history in a separate history table.	When historical data is rarely queried.
Type 5/6	Hybrids combining Types 1–3.	Complex history needs in regulated industries.

Example SQL — SCD Type 2 (simplified):

MERGE INTO dim_customer t
USING staging_customer s
ON (t.customer_id = s.customer_id AND t.current_flag = 1)
WHEN MATCHED AND t.attribute <> s.attribute THEN
  UPDATE SET t.current_flag = 0, t.end_date = CURRENT_DATE
WHEN NOT MATCHED THEN
  INSERT (customer_id, attribute, start_date, end_date, current_flag)
  VALUES (s.customer_id, s.attribute, CURRENT_DATE, NULL, 1);

Data Vault Modeling

Data Vault 2.0 is a modeling method designed for scalability, auditability, and long-term adaptability. Instead of storing all data in a single star or snowflake schema, Data Vault separates the business keys, relationships, and context into three distinct structures.

Unlike a traditional enterprise data warehouse, which often locks design choices early, Data Vault is built for incremental change and auditability.

Core Components

Hubs

Contain the unique business keys of core entities (e.g., Customer ID, Product ID).

Attributes: surrogate key, business key, load date, record source.
Purpose: provide a consistent anchor for all related data.

Satellites

Store descriptive attributes and changes over time (e.g., Customer Name, Address).

Attributes: surrogate key, foreign key to hub/link, effective start/end dates, record source.
Purpose: capture history with full lineage and auditability.

Example Hub Definition

CREATE TABLE hub_customer (
    hub_customer_id BIGINT GENERATED ALWAYS AS IDENTITY,
    customer_key VARCHAR(100) NOT NULL,  -- business key
    load_date TIMESTAMP NOT NULL,
    record_source VARCHAR(50) NOT NULL,
    PRIMARY KEY (hub_customer_id)
);

Example Satellite with History

CREATE TABLE sat_customer (
    sat_customer_id BIGINT GENERATED ALWAYS AS IDENTITY,
    hub_customer_id BIGINT NOT NULL,
    customer_name VARCHAR(200),
    customer_address VARCHAR(200),
    effective_start TIMESTAMP NOT NULL,
    effective_end TIMESTAMP,
    record_source VARCHAR(50) NOT NULL,
    PRIMARY KEY (sat_customer_id),
    FOREIGN KEY (hub_customer_id) REFERENCES hub_customer(hub_customer_id)
);

This design allows multiple versions of the same customer attributes to exist, preserving a full timeline.

PIT and Bridge Views

Point-in-Time (PIT) Views

Join hubs, links, and satellites to provide a snapshot of data at a specific time. Useful for simplifying reporting queries without losing historical context.

Bridge Views

Pre-join multiple satellites or complex relationships to improve query performance. Especially helpful when BI tools need flattened structures.

Example PIT View (simplified):

CREATE VIEW pit_customer AS
SELECT
    h.customer_key,
    s.customer_name,
    s.customer_address,
    s.effective_start,
    s.effective_end
FROM hub_customer h
JOIN sat_customer s
  ON h.hub_customer_id = s.hub_customer_id
WHERE s.effective_end IS NULL; -- current record

When to Use Data Vault

Best suited for:

Large, evolving data environments.
Enterprises needing full audit history and regulatory compliance.
Scenarios with frequent schema changes or multiple source systems.

Trade-offs:

More complex to implement than star or snowflake.
Requires downstream marts or views for BI consumption.

Entity-Relationship (3NF / Inmon)

In the Inmon approach, data warehouses are built in 3rd Normal Form (3NF). This reduces redundancy and enforces integrity, but it requires many joins and makes BI queries slower compared to dimensional models.

Anchor Modeling

Anchor modeling is designed for highly volatile schemas. It uses temporal anchors to handle change efficiently. While flexible, it requires niche skills and has limited tool adoption.

Wide Tables

Wide tables flatten many attributes into a single denormalized structure. This is common in ML feature stores and prototyping, but it increases storage costs and reduces governance.

Data Models in a Lakehouse Architecture

Many teams first encounter the data lake vs data warehouse debate when weighing storage options. The medallion approach reframes that choice by layering raw, curated, and business-ready data.

In this setup, data is organized into Bronze, Silver, and Gold layers, each with a different role. Understanding where data warehouse models fit helps align design choices with performance and governance needs.

Bronze Layer – Raw Data

Role: Ingested data in its original format (files, streams, CDC feeds).
Modeling: Minimal structure. Typically landing zones or staging tables.
Best practices: Keep lineage, source metadata, and apply only essential data quality checks here.

Silver Layer – Cleansed and Integrated

Role: Curated data with standardized keys, deduplicated records, and business context applied.
Modeling approaches here:
- 3NF / ER models for source integration.
- Data Vault hubs, links, and satellites for flexible, auditable history.
Value: Provides a governed foundation without constraining how marts are built.

Gold Layer – Optimized for Analytics

Role: Business-ready datasets, tuned for dashboards and analytics.
Modeling approaches here:
- Star and Snowflake schemas for BI tools.
- Galaxy schemas when multiple subject areas share common dimensions.
Value: Delivers fast queries, user-friendly structures, and predictable measures.

Try Exasol Free: No Costs, Just Speed

Run Exasol locally and test real workloads at full speed.

Try it free

Worked Example

Bronze: Capture customer change data from a CRM feed.
Silver: Store changes in Data Vault Satellites linked to a Customer Hub. Maintain full history.
Gold: Expose a Customer Dimension (Type 2 SCD) with attributes flattened for BI queries.

This path combines auditability at Silver with usability at Gold.

When to Map Models Across Layers

Use Data Vault or 3NF in Silver when audit trails or flexible integration are required.
Use Star/Snowflake in Gold to simplify analytics.
For small projects or proofs of concept, you can move from Bronze directly to Gold, but this sacrifices lineage and scalability.

Best Practices for Data Warehouse Modeling

Modeling is not only about schemas. Governance, transformation patterns, and change handling all influence whether a data warehouse remains reliable and scalable over time.

ETL vs. ELT

In short, ETL runs transformations before loading; ELT pushes them into the data warehouse.

ETL (Extract–Transform–Load):
Data is transformed before loading into the data warehouse. Suitable for traditional on-premises systems with limited compute inside the data warehouse.
ELT (Extract–Load–Transform):
Data is loaded first, then transformed using data warehouse compute. This is the standard for modern cloud warehouses.

Best practice:

Use ELT where compute elasticity allows pushing heavy transformations inside the data warehouse.
Retain ETL for cases requiring sensitive pre-processing or when source systems demand it.

Change Data Capture (CDC)

CDC captures and applies only changed records rather than reloading entire tables.

Methods:

Log-based CDC (reads database logs).
Timestamp or version-based CDC.
Trigger-based CDC (less common).

Integration with models:

Dimensional models: drive SCD Type 2 updates for dimension history.
Data Vault: populate satellites with every change, preserving lineage.

Best practice:

Prefer log-based CDC where available for efficiency and accuracy.
Apply CDC patterns consistently across Bronze/Silver layers to ensure integrity.

Governance and Access Control

Environment separation: keep development, staging, and production isolated.
Role-based access control (RBAC): restrict access to raw vs. curated vs. business-ready layers, following the NIST RBAC standard.
Data lineage and cataloging: document models and transformations for transparency.
Testing: enforce schema checks, null constraints, and referential integrity at load time.

Best practice:

Align governance to model choice:

Data Vault/3NF in Silver → strong audit, lineage, and compliance.
Dimensional schemas in Gold → access tuned for analysts and BI tools.

Try Exasol Free: No Costs, Just Speed

Run Exasol locally and test real workloads at full speed.

Try it free

Performance and Cost Management

Partitioning and clustering: improve query speed on large fact tables.
Materialized views: speed up heavy joins (e.g., snowflake or constellation queries).
Storage considerations: Snowflake schemas and Data Vaults reduce duplication but increase joins. Wide tables increase storage but reduce query time for simple ML workloads.

Industry-Specific Data Warehouse Models

Different industries face unique requirements. While the modeling principles remain the same, domain-specific facts and dimensions change. Below are simplified blueprints that highlight how core models adapt in healthcare, banking, and retail.

Healthcare Data Warehouse Model

Fact tables: Patient Encounters, Procedures, Billing Claims.
Dimensions: Patient, Provider, Diagnosis, Facility, Date.
Modeling considerations:
- Track patient history with SCD Type 2 to preserve address, insurance, or physician changes.
- Regulatory compliance (HIPAA, GDPR) requires auditability → Data Vault satellites fit well in Silver layer.

Case Example – Piedmont Healthcare

Piedmont Healthcare manages data from 16 hospitals, 1,400 physician practices, and over 2 million patient visits annually. Their environment includes electronic health records (EHR), claims, ERP, CRM, and survey systems, generating more than 15 TB of data.
By migrating from SQL Server to Exasol, Piedmont Healthcare reduced dashboard refresh times from 10 minutes to seconds and scaled to deliver around 20 new metrics per month.

This illustrates the reality that data warehouses in healthcare must integrate and query vast, heterogeneous healthcare data sources — EHRs, claims, and operational systems — which poses significant modeling challenges. Choosing a model that balances history tracking, auditability, and query speed is critical in healthcare contexts.

Banking Data Warehouse Model

Fact tables: Account Transactions, Loan Applications, Payments.
Dimensions: Customer, Account, Product, Branch, Time.
Modeling considerations:
- Strong lineage for compliance with Basel/DORA → Data Vault recommended for full history.
- Reporting marts in Gold can use star schema for KPIs such as loan approval rates or NPL ratios.

Retail Data Warehouse Model

Fact tables: Sales, Inventory Movements, Returns.
Dimensions: Product, Store, Promotion, Supplier, Date.
Modeling considerations:
- High-volume sales facts → partition fact tables by Date/Store.
- Promotions and price changes fit SCD handling for Product Dimension.

Case Example – Flaconi

Flaconi, a leading European online beauty retailer, migrated from a legacy analytics system to a cloud-based Exasol deployment. Their environment integrates Python and Pentaho ETL pipelines with AWS services, processing around 3 TB of sales, product, and customer data.

With Exasol, Flaconi accelerated ETL and reporting, enabling ~120 self-service BI users to access real-time dashboards. They also improved forecasting models for stock levels and campaign performance. This demonstrates a common retail challenge: data warehouses must handle large data volumes and diverse sources while still delivering fast, user-ready insights for marketing, inventory, and operations.

Try Exasol Free: No Costs, Just Speed

Run Exasol locally and test real workloads at full speed.

Try it free

How to Choose the Right Model

Different models solve different problems. The choice depends on data volume, complexity, governance requirements, and the skills of the team building and maintaining the data warehouse. Established research on data modeling shows how different approaches trade off flexibility and complexity.

Decision Factors

Data Volume and Complexity

Small to medium volumes with straightforward reporting → Star schema.
High complexity or multiple subject areas → Snowflake or Galaxy.
Petabyte scale, multi-source integration, and frequent schema changes → Data Vault.

Governance and Compliance

Strong auditability or regulatory reporting (e.g., banking, healthcare) → Data Vault or 3NF/ER in Silver layer.
Business intelligence with less regulatory overhead → Star/Snowflake in Gold layer.

Team Skills and Tooling

BI analysts and SQL developers → Dimensional models.
Advanced data engineering teams comfortable with automation → Data Vault or Anchor Modeling.
Data science and ML teams → Wide Tables or curated Gold layer extracts.

Performance and Cost Priorities

Fast query response for dashboards → Star schema.
Storage efficiency and reduced duplication → Snowflake schema.
Balanced performance and auditability → Hybrid (Data Vault in Silver + Dimensional marts in Gold).

Recommended Hybrid

Most modern data warehouses blend approaches:

Silver layer: Data Vault or 3NF for history, auditability, and integration.
Gold layer: Star or Snowflake marts for BI usability.
Optional: Wide Tables for ML feature stores.

This hybrid addresses both compliance requirements and business agility.

Data Warehouse Model Comparison

Functional Comparison:

Operational Trade-offs:

Dimensional Modeling in Data Warehousing

Star Schema

Try Exasol Free: No Costs, Just Speed

Snowflake Schema

Galaxy (Constellation) Schema

Slowly Changing Dimensions (SCD Types)

Example SQL — SCD Type 2 (simplified):

Data Vault Modeling

Core Components

Hubs

Links

Satellites

Example Hub Definition

Example Satellite with History

PIT and Bridge Views

Point-in-Time (PIT) Views

Bridge Views

Example PIT View (simplified):

When to Use Data Vault

Entity-Relationship (3NF / Inmon)

Anchor Modeling

Wide Tables

Data Models in a Lakehouse Architecture

Bronze Layer – Raw Data

Silver Layer – Cleansed and Integrated

Gold Layer – Optimized for Analytics

Try Exasol Free: No Costs, Just Speed

Worked Example

When to Map Models Across Layers

Best Practices for Data Warehouse Modeling

ETL vs. ELT

Change Data Capture (CDC)

Governance and Access Control

Try Exasol Free: No Costs, Just Speed

Performance and Cost Management

Industry-Specific Data Warehouse Models

Healthcare Data Warehouse Model

Case Example – Piedmont Healthcare

Banking Data Warehouse Model

Retail Data Warehouse Model

Case Example – Flaconi

Try Exasol Free: No Costs, Just Speed

How to Choose the Right Model

Decision Factors

Data Volume and Complexity

Governance and Compliance

Team Skills and Tooling

Performance and Cost Priorities

Recommended Hybrid

FAQs about Data Warehouse Models

What is a model in a data warehouse?

What are the four types of data models?

What is the data model of a data warehouse?

What are the three data warehouse models?

What are the four components of a data warehouse?

What is data warehousing modeling?

What are the four stages of data warehousing?

What is the difference between data warehouse and data model?

Is data modeling an ETL?