Your shopping cart is empty!
The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used:
The process of dimensional modeling builds on a 4-step design method that helps to ensure the usability of the dimensional model and the use of the data warehouse. The basics in the design build on the actual business process which the data warehouse should cover. Therefore the first step in the model is to describe the business process which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can choose to do this in plain text or use basic Business Process Modeling Notation (BPMN) or other design guides like the Unified Modeling Language (UML).
After describing the Business Process, the next step in the design is to declare the grain of the model. The grain of the model is the exact description of what the dimensional model should be focusing on. This could for instance be “An individual line item on a customer slip from a retail store”. To clarify what the grain means, you should pick the central process and describe it with one sentence. Furthermore the grain (sentence) is what you are going to build your dimensions and fact table from. You might find it necessary to go back to this step to alter the grain due to new information gained on what your model is supposed to be able to deliver.
The third step in the design process is to define the dimensions of the model. The dimensions must be defined within the grain from the second step of the 4-step process. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday.
After defining the dimensions, the next step in the process is to make keys for the fact table. This step is to identify the numeric facts that will populate each fact table row. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse. Therefore most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc.
Dimensional normalization or snowflaking removes redundant attributes, which are known in the normal flatten de-normalized dimensions. Dimensions are strictly joined together in sub dimensions.
Snowflaking has an influence on the data structure that differs from many philosophies of data warehouses. Single data (fact) table surrounded by multiple descriptive (dimension) tables
Developers often don't normalize dimensions due to several facts:
There are some arguments on why normalization can be useful. It can be an advantage when part of hierarchy is common to more than one dimension. For example, a geographic dimension may be reusable because both the customer and supplier dimensions use it.
Benefits of the dimensional modeling are following:
To build a dimensional database, you start with a dimensional data model. The dimensional data model provides a method for making databases simple and understandable. You can conceive of a dimensional database as a database cube of three or four dimensions where users can access a slice of the database along any of its dimensions. To create a dimensional database, you need a model that lets you visualize the data.
Suppose your business sells products in different markets and evaluates the performance over time. It is easy to conceive of this business process as a cube of data, which contains dimensions for time, products, and markets. Figure 35 shows this dimensional model. The various intersections along the lines of the cube would contain the measures of the business. The measures correspond to a particular combination of product, market, and time data.
Another name for the dimensional model is the star-join schema. The database designers use this name because the diagram for this model looks like a star with one central table around which a set of other tables are displayed. The central table is the only table in the schema with multiple joins connecting it to all the other tables. This central table is called the fact table and the other tables are called dimension tables. The dimension tables all have only a single join that attaches them to the fact table, regardless of the query. Figure 36 shows a simple dimensional model of a business that sells products in different markets and evaluates business performance over time.
To build a dimensional data model, you need a methodology that outlines the decisions you need to make to complete the database design. This methodology uses a top-down approach because it first identifies the major processes in your organization where data is collected. An important task of the database designer is to start with the existing sources of data that your organization uses. After the processes are identified, one or more fact tables are built from each business process. The following steps describe the methodology you use to build the data model.
Although a dimensional database can be based on multiple business processes and can contain many fact tables, the data model that this section describes is based on a single business process and has one fact table.
A business process is an important operation in your organization that some legacy system supports. You collect data from this system to use in your dimensional database. The business process identifies what end users are doing with their data, where the data comes from, and how to transform that data to make it meaningful. The information can come from many sources, including finance, sales analysis, market analysis, customer profiles. The following list shows different business processes you might use to determine what data to include in your dimensional database:
Suppose your organization wants to analyze customer buying trends by product line and region so that you can develop more effective marketing strategies. In this scenario, the subject area for your data model is sales.
After many interviews and thorough analysis of your sales business process, suppose your organization collects the following information:
Previously, sales districts were divided by city. Now the customer base corresponds to two regions: Region 1 for California and Region 2 for all other states.
To develop a working data model, you can assume that the relational database of sales information has the following properties:
The region information has not yet been added to the database.
An important characteristic of the dimensional model is that it uses business labels familiar to end users rather than internal tables or column names. After the business process is completed, you should have all the information you need to create the measures, dimensions, and relationships for the dimensional data model. This dimensional data model is used to implement the sales_demo database that Implementing a Dimensional Database (XPS) describes.
The stores_demo demonstration database is the primary data source for the dimensional data model that this chapter develops. For detailed information about the data sources that are used to populate the tables of the sales_demo database,
After you gather all the relevant information about the subject area, the next step in the design process is to determine the granularity of the fact table. To do this you must decide what an individual low-level record in the fact table should contain. The components that make up the granularity of the fact table correspond directly with the dimensions of the data model. Thus, when you define the granularity of the fact table, you identify the dimensions of the data model.
The granularity of the fact table also determines how much storage space the database requires. For example, consider the following possible granularities for a fact table:
The size of a database that has a granularity of product by day by region would be much greater than a database with a granularity of product by month by region because the database contains records for every transaction made each day as opposed to a monthly summation of the transactions. You must carefully determine the granularity of your fact table because too fine a granularity could result in an astronomically large database. Conversely, too coarse a granularity could mean the data is not detailed enough for users to perform meaningful queries against the database.
A careful review of the information gathered from the business process should provide what you need to determine the granularity of the fact table. To summarize, your organization wants to analyze customer-buying trends by product line and region so that you can develop more effective marketing strategies.
The granularity of the fact table always represents the lowest level for each corresponding dimension. When you review the information from the business process, the granularity for customer and product dimensions of the fact table are apparent. Customer and product cannot be reasonably reduced any further: they already express the lowest level of an individual record for the fact table. (In some cases, product might be further reduced to the level of product component because a product could be made up of multiple components.)
Because the customer buying trends your organization wants to analyze include a geographical component, you still need to decide the lowest level for region information. The business process indicates that in the past, sales districts were divided by city, but now your organization distinguishes between two regions for the customer base: Region 1 for California and Region 2 for all other states. Nonetheless, at the lowest level, your organization still includes sales district data, so district represents the lowest level for geographical information and provides a third component to further define the granularity of the fact table.
Customer-buying trends always occur over time, so the granularity of the fact table must include a time component. Suppose your organization decides to create reports by week, accounting period, month, quarter, or year. At the lowest level, you probably want to choose a base granularity of day. This granularity allows your business to compare sales on Tuesdays with sales on Fridays, compare sales for the first day of each month, and so forth. The granularity of the fact table is now complete.
The decision to choose a granularity of day means that each record in the time dimension table represents a day. In terms of the storage requirements, even 10 years of daily data is only about 3,650 records, which is a relatively small dimension table.
After you determine the granularity of the fact table, it is easy to identify the primary dimensions for the data model because each component that defines the granularity corresponds to a dimension. Figure 39 shows the relationship between the granularity of the fact table and the dimensions of the data model.
With the dimensions (customer, product, geography, time) for the data model in place, the schema diagram begins to take shape.
You can now map out dimension elements and hierarchies for each dimension. Figure 40 shows the relationships among dimensions, dimension elements, and the inherent hierarchies.
In most cases, the dimension elements need to express the lowest possible granularity for each dimension, not because queries need to access individual low-level records, but because queries need to cut through the database in precise ways. In other words, even though the questions that a data warehousing environment poses are usually broad, these questions still depend on the lowest level of product detail.
The measures for the data model include not only the data itself, but also new values that you calculate from the existing data. When you examine the measures, you might discover that you need to make adjustments either in the granularity of the fact table or the number of dimensions.
Another important decision you must make when you design the data model is whether to store the calculated results in the fact table or to derive these values at runtime.
The question to answer is, "What measures are used to analyze the business?" Remember that the measures are the quantitative or factual data that tell how much or how many. The information that you gather from analysis of the sales business process results in the following list of measures:
Use these measures to complete the fact table in Figure 41.
Assume, for the moment, that the schema of Figure 41 shows both the logical and physical design of the database. The database contains the following five tables:
Each of the dimensional tables includes a primary key (product, time_code, customer, district_code), and the corresponding columns in the fact table are foreign keys. The fact table also has a primary (composite) key that is a combination of these four foreign keys. As a rule, each foreign key of the fact table must have its counterpart in a dimension table. Furthermore, any table in a dimensional database that has a composite key must be a fact table, which means that every table in a dimensional database that expresses a many-to-many relationship is a fact table.