I recently posted a blog on the topic of self-service business intelligence, where I provided a clear definition for self-service BI, how it differs from traditional BI and some of the key benefits organizations can take advantage of when investing in self-service analytics. In this blog I want to focus on a critical design best practice that will help ensure a successful self-service BI deployment.
The dimensional data model is a database design technique that is specifically used to support reporting and analytics. It’s a more simplistic, intuitive design compared to a typical ER model used for ERP business applications. They are easy to understand for business users, flexible to adapt to changing business needs and perform well against large amounts of data, returning result sets quickly and accurately.
Today’s self-service reporting solutions can allow users to access a variety of data sources, such as excel spreadsheets, flat files, and a variety of external data feeds from social media and websites. However, the main data source for most organizations still comes from the enterprise data warehouse or data mart(s). It is here, within the data warehouse environment, where dimensional data models can be accessed and queried by BI analytical tools for data exploration and ad-hoc reporting.
What is a dimensional data model?
A dimensional model (also referred to as a star schema) is a database design optimized for getting data out of a database. It provides a predictable, standard framework that allows business users to easily understand the information at hand, make processing more efficient, and avoid querying back erroneous data.
The star schema architecture earned its name because it’s a diagram that resembles a star, with points radiating from a center. The center of the star consists of the fact table, and the points of the star are dimension tables. A schema is a database term describing how a set of tables are arranged.
Difference between entity-relationship models and dimensional data models
The dimensional data model has several import advantages over an entity-relationship model (ERP). In the relational mode, normalization and entity-relationship models reduce redundancy in data. On the contrary, the dimensional data model arranges data in such a way that it is easier to retrieve information and generate reports. Hence, dimensional models are used in data warehouse systems and are not a good fit for relational ERP systems. In short, entity-relationship models are good for getting data into a system and dimensional data models are good for getting data out of a system.
Benefits of dimensional data model
There are many benefits to implementing dimensional data models in your data warehouse and I’ve summarized them into three categories: Flexibility & Scalability, Ease of Use, and Query Performance.
Flexibility & Scalability
Dimensional models can easily accommodate change and provide the ability to react to quickly-changing business needs. It is easy to add new data attributes into the model or new rows to the tables. These changes will also have little impact to the query tool. The flexible design is important because it allows different users to slice/dice the data in different ways.
Easy to Understand & Use
One key benefit that makes dimensional data models so attractive is that they are easy to understand and use, because they provide a more simplified structure where the data is grouped into clear and simple business categories. The models also work very well with most analytics tools which feature a “drag-and-drop” approach to data exploration and report-building, such as when using Power BI and a Microsoft Excel pivot table.
Dimensional data models are optimized for high query performance. Since there are fewer tables, there will be fewer joins, which yields less processing and higher performance compared to writing the same query against the OLTP database. Many relational database platforms will recognize this model and optimize query execution plans to aid in performance. Finally, a well-designed model can help prevent certain types of joins, which can cause run-away queries that hog resources and bog down the system.
What are some of the key characteristics of a data model?
As we mentioned above, the dimensional data model has a simple design, it’s easy to use and it works well for quick query performance. Let’s take a quick look at some of the key features that contribute to these benefits.
There are only two main types of tables: dimensions and facts. A typically schema contains 1 to 3 fact tables and 4 to 10 dimension tables.
It has a limited number of one-to-many joins between the dimension tables and the fact table.
A fact table is the primary table in a dimensional model and should only contain keys and facts or metrics from your business process.
Dimension tables provide the context surrounding a business process event.
Dimensions are de-normalized and can contain hierarchies, which will contain repeatable data values.
The attributes describe the various characteristics of the dimension.
Conformed dimensions are dimension tables that can be shared across multiple fact tables.
In summary, when designing a data model, follow these key steps:
Identify the business process (sale, invoice, payment, enrollment etc.)
Identify the grain (level of detail, such as a purchase transaction, order detail line item, etc.)
Identify the dimensions (time, product, geography, customer)
Identify facts (sales amount, units, price)
Build your dimensional model
If you’re interested in learning more about dimensional data models or how to implement or improve self-service business intelligence for your organization, please feel free to reach out and we can discuss and provide recommendations for deploying self-service BI, based on your unique needs.