What are the concepts of a data warehouse?

The four foundational concepts of a data warehouse are: subject-oriented, integrated, time-variant, and non-volatile. These principles ensure the data is reliable, unified, historically accurate, and stable for reporting and analysis.

What are the 4 features of data warehousing?

Key features include: These traits make DWHs ideal for long-term analytical use.

What are the 3 data warehouse models?

The three primary data warehouse models are: Each model suits different levels of complexity and control.

What is the difference between a data warehouse and a data mart?

A data warehouse stores enterprise-wide data across all functions. A data mart is a focused subset, scoped to a specific team or department for faster, simpler access to relevant information.

DWH Concepts: Basics & Hands-On Tutorial

Mathias Golombek

· 30th June 2025 · 13 mins read

In this DWH Concepts guide, you’ll get up to speed on the basics behind modern and high-performance data warehousing, walk through a hands-on mini-project to build and query your first warehouse, and discover best practices for performance, governance, and cost management—no fluff, just exactly what you need.

Let’s dive straight into the heart of data warehouse concepts, tutorials, and foundations.

The 4 Foundational Concepts of Data Warehousing

Before we dive into the technical concepts of a modern data warehouse, it’s important to understand its original design philosophy. These four foundational concepts come from Bill Inmon, one of the earliest pioneers of data warehousing. They define what makes a warehouse fundamentally different from a regular database or data lake, even today.

The Four Inmon Concepts

Subject-Oriented
A data warehouse is organized around key business domains like sales, finance, or customers, not individual applications. This structure enables consistent, company-wide reporting.

Integrated
Data from different systems is standardized: formats, naming conventions, and data types are aligned. Integration ensures that customer IDs, currencies, and timestamps mean the same thing everywhere.

Time-Variant
Unlike operational databases, a data warehouse retains historical data. Each record includes time context (e.g., order_date, snapshot_date) to enable trend analysis over months or years.

Non-Volatile
Once data is loaded into the warehouse, it’s not updated or deleted by transactions. This stability ensures consistency and traceability over time.

These concepts laid the foundation for how data warehouses are still defined today, even in cloud-native and real-time environments.

The next section covers the more technical concepts of warehouse design: schemas, OLAP patterns, and data modeling strategies.

Try Exasol Free: No Costs, Just Speed

Run Exasol locally and test real workloads at full speed.

Try it free

Core DWH Concepts & Components: Building Blocks

A Data Warehouse (DWH) is a centralized repository optimized for analytical queries and historical reporting.

Think of it as your company’s “traffic control center”—it collects data from all sources, organizes it, and provides a single pane of glass for insight.

It’s best understood by starting with the data warehouse definition in our glossary.

Before building or evaluating a data warehouse, it’s essential to understand the core concepts that shape how data is structured, queried, and scaled. Whether you’re setting up a traditional on-premises warehouse or a cloud-native platform, these principles remain the foundation of any successful implementation.

This section covers the building blocks of a DWH: how your data is organized (schemas), how it’s stored (facts and dimensions), and how it’s used (analytical vs. transactional). These aren’t just technical definitions—they influence everything from query speed and storage costs to usability for analysts and business stakeholders.

If you’re evaluating solutions or preparing to build your first DWH, this is the strategic knowledge that helps you make the right decisions from the start.

Let’s begin with the first key decision in any data warehouse: how to structure your data using Star or Snowflake schemas.

Schema Patterns: Star vs. Snowflake

One of the most important data warehouse concepts is how you structure your schema—how your fact and dimension tables relate. Two dominant patterns exist: the Star schema and the Snowflake schema. Both define how data is logically organized for analysis, and both impact query performance, maintenance effort, and scalability.

Star Schema

A Star schema places a central fact table (e.g., sales transactions) at the center, with each dimension table (e.g., time, product, customer) connected directly to it. These dimension tables are denormalized, meaning they contain all the information needed without splitting into sub-tables.

Key Advantages:

Fewer joins = faster query performance
Easy for BI tools and analysts to understand
Simpler to build and maintain

Drawbacks:

Data redundancy (e.g., region info duplicated in multiple rows)
Doesn’t scale well with large or hierarchical dimensions

Snowflake Schema

In a Snowflake schema, dimension tables are normalized—split into sub-tables to remove redundancy. For example, instead of storing country, region, and city in one table, they would each exist in separate linked tables.

Key Advantages:

Reduces storage by avoiding duplicate data
Easier to enforce data integrity
Better for managing complex hierarchies (e.g., product categories, geographies)

Drawbacks:

Requires more joins in queries
Slightly higher query latency, especially on large datasets
Can be harder for business users to navigate

A visual DWH concept that shows comparison of Star Schema and Snowflake Schema in data warehousing, showing how a central fact table connects to dimension tables directly in the star schema and through normalized hierarchies in the snowflake schema.

Which Should You Use?

Choose Star schema if performance and simplicity are top priorities (most reporting scenarios).
Choose Snowflake schema if your data has deep hierarchies, needs strong normalization, or if storage is a concern.

In modern DWH systems like BigQuery or Exasol, performance differences between the two patterns can be minimized, but schema design still matters for usability, maintainability, and clarity.

Facts, Dimensions & Granularity

In any data warehouse, the two most important table types are fact tables and dimension tables. Together, they define how data is stored, queried, and aggregated.

Fact Tables

A fact table contains the measurable, quantitative data you want to analyze—things like:

sales_amount
units_sold
page_views
revenue

Each row represents a single event at a defined level of detail (known as the “grain”).

Example: One row might represent the total sales of a product in a single store on a specific day.

Dimension Tables

A dimension table adds context to your facts. It stores descriptive attributes like:

product_name, product_category
store_location, region
customer_segment, age_group
year, quarter, weekday

These attributes help analysts filter, group, and slice the data during analysis.

Dimensions often answer questions like: “By which product?” or “From which region?”

Granularity: Choosing the Right Level of Detail

Granularity (or “grain”) defines how detailed your data is in the fact table. It affects everything from storage size to how flexible your analysis can be.

This picture shows a table with the grain type, description, and example row as part of the granularity concept of a data warehouse.

Important: Define your grain before you load data. Changing it later usually requires a complete rebuild of your warehouse.

Fact-to-Dimension Relationships

Fact tables often reference multiple dimensions via foreign keys.
These relationships form the core of your schema, whether star or snowflake.
A good practice is to include surrogate keys in your dimension tables for performance and consistency.

By understanding how facts and dimensions interact—and choosing the right granularity—you lay the groundwork for a clean, performant, and scalable warehouse.

OLTP vs. OLAP: What Makes a Warehouse Different?

A data warehouse is designed for OLAP, Online Analytical Processing, which is very different from the OLTP (Online Transaction Processing) systems used in operational databases.

While OLTP handles day-to-day transactions like customer orders or bank transfers, OLAP enables complex queries, aggregations, and multi-dimensional analysis.

A visual table showing the difference between OLAP and OLTP.

Why this matters: If you try to run OLAP-style queries on an OLTP system, you’ll hit performance and scalability issues fast. That’s exactly why data warehouses exist.

Real-World Example

OLTP scenario:

A customer orders a product online. The system updates the order and stock quantity in real time.

OLAP scenario:

A data analyst queries sales from the past 12 months, broken down by product category, region, and marketing channel.

These are different engines for different purposes—DWHs are built to analyze, not transact.

Next up: how these concepts come together in actual architecture patterns—from batch pipelines to real-time streams.

Data Modeling Patterns: When to Use What

Once you’ve defined your schema structure and granularity, the next step is choosing a data modeling pattern. This decision affects how flexible, scalable, and maintainable your data warehouse will be over time.

There’s no one-size-fits-all—each model fits different levels of complexity and business requirements.

Star/Snowflake Modeling (Dimensional)

The most common and straightforward approach, especially for BI tools.

Best for:

Quick setup
Reporting and dashboards
Clear, consistent logic

Strengths:

Simple joins
Excellent performance
Easy to explain to non-technical stakeholders

Limitations:

Hard to track data history or structural changes
Less flexible as data sources evolve

Pro tip: use this when you need fast time-to-value and your data is relatively stable.

Data Vault

A flexible, auditable model designed for long-term enterprise use.

Structure:

Hubs = business entities (e.g. Customer, Product)
Links = relationships (e.g. Customer_Purchased_Product)
Satellites = contextual data + history (e.g. name, location, change timestamps)

Best for:

Regulatory environments
Evolving, unclean, or decentralized data
Full auditability & change tracking

Strengths:

Separation of raw data from transformations
Historical tracking out of the box
Easily scalable as sources change

Limitations:

Slower to implement
Requires strong data governance
BI queries are more complex (requires transformation layer)

Pro tip: use this when you need resilience, compliance, and long-term scalability.

Anchor Modeling

A highly normalized and temporal model optimized for change.

Core principle: Every attribute is stored independently, allowing time-versioned changes to any part of a record.

Best for:

Agile teams
Metadata-driven warehouses
Projects with frequently evolving data models

Strengths:

Fully temporal (tracks every change to every attribute)
Easy to extend or modify structure
Pairs well with automation

Limitations:

Higher technical complexity
Less intuitive for reporting/BI teams
Requires advanced tooling or custom frameworks

Pro tip: use this when your schema changes often, and you need full version control at the attribute level.

A table picture representing data modeling as a dwh concept with the name of the pattern, use case, pros, and cons.

Not sure which to pick?

Start simple. Use Star schema for quick wins. As complexity grows, layer in Data Vault or Anchor Modeling to support scale, compliance, or agility.

For additional modeling guidance, the Kimball Group’s dimensional techniques remain a go-to industry reference.

While schema design and modeling patterns form the core of a data warehouse, a complete understanding of data warehousing concepts includes the surrounding processes and tools that support storage, access, and governance.

Here’s how the data warehouse fits into the broader analytics stack:

ETL / ELT

ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are data pipeline processes that move raw data from source systems into the warehouse.

ETL performs transformations before loading.
ELT loads data first, then transforms it inside the DWH, commonly used in cloud-native platforms.

These pipelines form the foundation of how data enters your warehouse.

Data Marts

A data mart is a focused subset of a data warehouse, typically scoped to a department like marketing, finance, or HR.

They improve performance and simplify access by isolating relevant data.
Data marts are often fed by the central warehouse, not separate systems.

Think of a warehouse as the backend, and data marts as team-specific views.

Business Intelligence (BI) Tools

BI tools are what business users interact with, not the warehouse itself.

Tools like Power BI, Tableau, Looker, and Superset connect to the DWH to run queries, generate reports, and build dashboards.
A clean schema (star/snowflake) improves self-service exploration.

A good warehouse makes BI tools faster, not more complicated.

Metadata & Data Governance

Modern DWHs track metadata like schema definitions, data lineage, load timestamps, and access logs.
Governance features include:

Role-based access control (RBAC)
Change tracking (e.g., dbt docs or Exasol virtual schemas)
Data quality checks and monitoring

These systems are critical for auditing, security, and compliance, especially in enterprise environments.

Together, these supporting layers turn a data warehouse from a database into a business-ready analytics platform. And now that you have the full picture, we’ll shift from concepts to fundamentals and practical implementation in the hands-on tutorial.

Try Exasol Free: No Costs, Just Speed

Run Exasol locally and test real workloads at full speed.

Try it free

DWH Fundamentals

Let’s zoom out and cover the fundamentals of a data warehouse—why organizations build them, what problems they solve, and where they deliver value.

This is your crash course in DWH basics.

Why Organizations Use Data Warehouses

A data warehouse acts as your company’s central nervous system for analytics. It collects, stores, and organizes data from various operational systems, making it easier to generate reports, track performance, and make decisions.

Common goals:

Eliminate inconsistent data across systems
Consolidate reporting into one trusted source
Enable faster, deeper analysis over long timeframes
Support self-service BI for business teams

Top 5 Use Cases for Data Warehousing

These aren’t just “big company” use cases. Even small teams benefit from unified, query-ready data.

Key Benefits of a DWH

Simple table showing key benefits of data warehouses and impact.

Modern data warehouses are increasingly hybrid in nature. Rather than choosing between traditional on-premises systems and cloud-native platforms, many organizations adopt a mixed architecture. This allows them to keep critical or sensitive data on-prem while leveraging cloud services for elasticity, analytics, and storage at scale.
Florian Wenzel, VP of Product

The decision between on-prem vs cloud depends on factors like latency, compliance, data sovereignty, and cost structure.

Hands-On Tutorial: Build Your First DWH

Let’s put theory into practice. In this tutorial, you’ll walk through a simplified but realistic data warehouse setup—from schema creation to running your first query. Whether you’re testing a cloud platform or working locally, these steps will help you understand how the pieces fit together.

You’ll need:

A SQL-compatible data warehouse (e.g. Exasol, PostgreSQL)
Basic SQL and Python knowledge
Optional: Airflow or another orchestration tool

Project Setup: Define Your Schema

We’ll start with a star schema for retail sales analysis. Our fact table will track transactions, and dimension tables will provide context.

Tables:

fact_sales
dim_product
dim_store
dim_date

SQL snippet:

CREATE TABLE dim_product (
product_id INT PRIMARY KEY,
product_name VARCHAR,
category VARCHAR
);

CREATE TABLE dim_store (
store_id INT PRIMARY KEY,
region VARCHAR,
store_type VARCHAR
);

CREATE TABLE dim_date (
date_id INT PRIMARY KEY,
date DATE,
year INT,
month INT,
weekday VARCHAR
);

CREATE TABLE fact_sales (
sales_id INT PRIMARY KEY,
product_id INT,
store_id INT,
date_id INT,
units_sold INT,
revenue DECIMAL(10,2),
FOREIGN KEY (product_id) REFERENCES dim_product(product_id),
FOREIGN KEY (store_id) REFERENCES dim_store(store_id),
FOREIGN KEY (date_id) REFERENCES dim_date(date_id)
);

Load and Transform Data

You can load your data in batches or real time. For this tutorial, we’ll use a simple Python script and simulate a batch load.

Python + SQL Example:

import pandas as pd
                            import sqlalchemy

                            engine = sqlalchemy.create_engine("your_connection_string")

# Load CSV data
  df_products = pd.read_csv("products.csv")
df_products.to_sql("dim_product", engine, if_exists="append", index=False)

Pro Tip: Use tools like dbt or Airflow if your pipeline gets more complex.

Run Your First Query

Now that your tables are set up and populated, you can start analyzing.

Example query:

SELECT
                            d.year,
                            p.category,
SUM(f.revenue) AS total_revenue
               FROM fact_sales f
               JOIN dim_product p ON f.product_id = p.product_id
                                                    JOIN dim_date d ON f.date_id = d.date_id
                                                                                   GROUP BY d.year, p.category
ORDER BY d.year, total_revenue DESC;

This gives you revenue breakdown by product category over time—exactly the kind of insight OLAP systems are built for.

Visualize the Results

Load the query output into a BI tool like Tableau, Power BI, or even a Jupyter notebook chart.

Try plotting revenue over time by category to identify seasonal trends or top performers.

This hands-on example gives you a working foundation for a data warehouse tutorial, loading, and querying. In the next section, we’ll show you a short example using Exasol as the technology behind dashboarding:

The 4 Foundational Concepts of Data Warehousing

The Four Inmon Concepts

Try Exasol Free: No Costs, Just Speed

Core DWH Concepts & Components: Building Blocks

Schema Patterns: Star vs. Snowflake

Star Schema

Snowflake Schema

Facts, Dimensions & Granularity

Fact Tables

Dimension Tables

Granularity: Choosing the Right Level of Detail

OLTP vs. OLAP: What Makes a Warehouse Different?

Data Modeling Patterns: When to Use What

Star/Snowflake Modeling (Dimensional)

Data Vault

Anchor Modeling

Related Concepts in Modern Data Warehousing

ETL / ELT

Data Marts

Business Intelligence (BI) Tools

Metadata & Data Governance

Try Exasol Free: No Costs, Just Speed

DWH Fundamentals

Why Organizations Use Data Warehouses

Top 5 Use Cases for Data Warehousing

Key Benefits of a DWH

Hands-On Tutorial: Build Your First DWH

Project Setup: Define Your Schema

Load and Transform Data

Run Your First Query

Visualize the Results

Frequently Asked Questions

What are the concepts of a data warehouse?

What are the 4 components of a data warehouse?

What are the 4 features of data warehousing?

What are the 3 data warehouse models?

What is the difference between a data warehouse and a data mart?