Metadata and data warehousing
Data warehouse (DW) is a repository of an organization's electronically stored data. Data warehouses are designed to manage and store the data whereas the Business Intelligence (BI) focuses on the usage of data to facilitate reporting and analysis.
The purpose of a data warehouse is to house standardized, structured, consistent, integrated, correct, cleansed and timely data, extracted from various operational systems in an organization. The extracted data is integrated in the data warehouse environment in order to provide an enterprise wide perspective, one version of the truth. Data is structured in a way to specifically address the reporting and analytic requirements. The design of structural metadata commonality using a data modeling method such as entity relationship model diagraming is very important in any data warehouse development effort.
An essential component of a data warehouse/business intelligence system is the metadata and tools to manage and retrieve metadata. Ralph Kimball describes metadata as the DNA of the data warehouse as metadata defines the elements of the data warehouse and how they work together.
Kimball et al. refers to three main categories of metadata: Technical metadata, business metadata and process metadata. Technical metadata is primarily definitional while business metadata and process metadata are primarily descriptive. Keep in mind that the categories sometimes overlap.
- Technical metadata defines the objects and processes in a DW/BI system, as seen from a technical point of view. The technical metadata includes the system metadata which defines the data structures such as: Tables, fields, data types, indexes and partitions in the relational engine, and databases, dimensions, measures, and data mining models. Technical metadata defines the data model and the way it is displayed for the users, with the reports, schedules, distribution lists and user security rights.
- Business metadata is content from the data warehouse described in more user-friendly terms. The business metadata tells you what data you have, where it comes from, what it means and what its relationship is to other data in the data warehouse. Business metadata may also serves as documentation for the DW/BI system. Users who browse the data warehouse are primarily viewing the business metadata.
- Process metadata is used to describe the results of various operations in the data warehouse. Within the ETL process all key data from tasks are logged on execution. This includes start time, end time, CPU seconds used, disk reads, disk writes and rows processed. When troubleshooting the ETL or query process, this sort of data becomes valuable. Process metadata is the fact measurement when building and using a DW/BI system. Some organizations make a living out of collecting and selling this sort of data to companies - in that case the process metadata becomes the business metadata for the fact and dimension tables. Process metadata is in interest of business people who can use the data to identify the users of their products, which products they are using and what level of service they are receiving.
Data warehousing has specific metadata requirements. Metadata that describes tables typically includes:
- Physical Name
- Logical Name
- Type: Fact, Dimension, Bridge
- Role: Legacy, OLTP, Stage,
- DBMS: DB2, Informix, MS SQL Server, Oracle, Sybase
Metadata describes columns within tables:
- Physical Name
- Logical Name
- Order in Table
- Decimal Positions
- Default Value
- Edit Rules
The concept of metadata is divided into 3 categories: Descriptive, Administrative, and Structural. However, metadata can further be classified into 14 types. The first type is the relational database metadata which is obviously used for relational database files. This type of metadata may include data about the tables, their sizes, names, and the number of rows, and the columns available in each database. Another metadata type is the data warehouse metadata which may be seen in two separate sections: the back room and the front room. There is also the business intelligence metadata that is used in explaining on how the corporate finance is reported. General IT metadata is another type of metadata. There is also the File system metadata that is utilized for different filing systems.
Image metadata is also available and is obviously used for different image files. Audio metadata is also included, which is also an important element in many audio files. There is also the program metadata which assists the executable files. Included to the list of metadata types is the existing software metadata, document metadata, and geospatial metadata. Digital library metadata is also a type of metadata, which may be separated into three categories: descriptive, administrative, and structural. Other metadata types are the metamodels and the meta-metadata.
There are three main types of metadata:
• Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords.
• Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.
• Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of administrative data; two that are sometimes listed as separate metadata types are:
− Rights management metadata, which deals with intellectual property rights,
− Preservation metadata, which contains information needed to archive and preserve a resource
|Administrative||Metadata used in managing and administering information resources||
- Acquisition information
- Rights and reproduction tracking
- Documentation of legal access requirements
- Location information
- Selection criteria for digitization
- Version control and differentiation between similar information objects
- Audit trails created by record keeping systems
|Descriptive||Metadata used to describe or identify information resources||
- Cataloging records
- Finding aids
- Specialized indexes
- Hyperlinked relationships between resources
- Annotations by users
- Metadata for record keeping systems generated by records creators
|Preservation||Metadata related to the preservation management of information resources||
- Documentation of physical condition of resources
- Documentation of actions taken to preserve physical and digital versions of resources, e.g., data refreshing and migration
|Technical||Metadata related to how a system functions or metadata behave||
- Hardware and software documentation
- Digitization information, e.g., formats, compression ratios, scaling routines
- Tracking of system response times
- Authentication and security data, e.g., encryption keys, passwords
|Use||Metadata related to the level and type of use of information resources||
- Exhibit records
- Use and user tracking
- Content re-use and multi-versioning information