Certified Data Mining and Warehousing Professional Data Quality

Data Quality
 


Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. Furthermore, apart from these definitions, as data volume increases, the question of internal consistency within data becomes paramount, regardless of fitness for use for any external purpose, e.g. a person's age and birth date may conflict within different parts of a database. The first views can often be in disagreement, even about the same set of data used for the same purpose. This article discusses the concept as it related to business data processing, although of course other data have various quality issues as well.

Definitions

1. Data exhibited by the data in relation to the portrayal of the actual scenario.

2. The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.

3. The totality of features and characteristics of data that bears on their ability to satisfy a given purpose; the sum of the degrees of excellence for factors related to data. 

4. Data quality: The processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria

5. Complete, standards based, consistent, accurate and time stamped 

 

Data quality is a perception or an assessment of data's fitness to serve its purpose in a given context. 

Aspects of data quality include: 

  • Accuracy
  • Completeness
  • Update status
  • Relevance
  • Consistency across data sources
  • Reliability
  • Appropriate presentation
  • Accessibility

Within an organization, acceptable data quality is crucial to operational and transactional processes and to the reliability of business analytics (BA) / business intelligence (BI) reporting. Data quality is affected by the way data is entered, stored and managed. Data quality assurance (DQA) is the process of verifying the reliability and effectiveness of data. 

Maintaining data quality requires going through the data periodically and scrubbing it. Typically this involves updating it, standardizing it, and de-duplicating records to create a single view of the data, even even if it is stored in multiple disparate systems. There are many vendor applications on the market to make this job easier.

It includes the following topics -

 For Support