Data Basics

Data is needed for what to know, from whom to know and what to do with the data. Factors which ensure that data is relevant to the project includes

  • Person collecting data like team member, associate, subject matter expert, etc.
  • Type of Data to collect like cost, errors, ratings etc.
  • Time Duration like hourly, daily, batch-wise etc.
  • Data source like reports, observations, surveys etc.
  • Cost of collection

Types of data

There are two types of data, discrete and continuous.

  • Attribute or discrete data – It is based on counting like the number of processing errors, the count of customer complaints, etc. Discrete data values can only be non-negative integers such as 1, 2, 3, etc. and can be expressed as a proportion or percent (e.g., percent of x, percent good, percent bad). It includes
    • Count or percentage – It counts of errors or % of output with errors.
    • Binomial data – Data can have only one of two values like yes/no or pass/fail.
    • Attribute-Nominal – The “data” are names or labels. Like in a company, Dept A, Dept B, Dept C or in a shop: Machine 1, Machine 2, Machine 3
    • Attribute-Ordinal – The names or labels represent some value inherent in the object or item (so there is an order to the labels) like on performance – excellent, very good, good, fair, poor or tastes – mild, hot, very hot
  • Variable or continuous data – They are measured on a continuum or scale. Data values for continuous data can be any real number: 2, 3.4691, -14.21, etc. Continuous data can be recorded at many different points and are typically physical measurements like volume, length, size, width, time, temperature, cost, etc. It is more powerful than attribute as it is more precise due to decimal places which indicate accuracy levels and specificity. It is any variable measured on a continuum or scale that can be infinitely divided.

Data are said to be discrete when they take on only a finite number of points that can be represented by the non-negative integers. An example of discrete data is the number of defects in a sample. Data are said to be continuous when they exist on an interval, or on several intervals. An example of continuous data is the measurement of pH. Quality methods exist based on probability functions for both discrete and continuous data.

Data could easily be presented as variables data like 10 scratches could be reported as total scratch length of 8.37 inches. The ultimate purpose for the data collection and the type of data are the most significant factors in the decision to collect attribute or variables data.

Converting Data Types – Continuous data, tend to be more precise due to decimal places but, need to be converted into discrete data. As continuous data contains more information than discrete data hence, during conversion to discrete data there is loss of information.

Discrete data cannot be converted to continuous data as instead of measuring how much deviation from a standard exists, the user may choose to retain the discrete data as it is easier to use. Converting variable data to attribute data may assist in a quicker assessment, but the risk is that information will be lost when the conversion is made.

Measurement Scales

A measurement is assigning numerical value to something, usually continuous elements. Measurement is a mapping from an empirical system to a selected numerical system. The numerical system is manipulated and the results of the manipulation are studied to help the manager better understand the empirical system. Measured data is regarded as being better than counted data. It is more precise and contains more information. Sometimes, data will only occur as counted data. If the information can be obtained as either attribute or variables data, it is generally preferable to collect variables data.

The information content of a number is dependent on the scale of measurement used which also determines the types of statistical analyses. Hence, validity of analysis is also dependent upon the scale of measurement. The four measurement scales employed are nominal, ordinal, interval, and ratio and are summarized as

ScaleDefinitionExampleStatistics
NominalOnly the presence/absence of an attribute. It can only count items. Data consists of names or categories only. No ordering scheme is possible. It has central location at mode and only information for dispersion.go/no-go, success/fail, accept/rejectpercent, proportion, chi-square tests
OrdinalData is arranged in some order but differences between values cannot be determined or are meaningless. It can say that one item has more or less of an attribute than another item. It can order a set of items. It has central location at median and percentages for dispersion.taste, attractivenessrank-order correlation, sign or run test
IntervalData is arranged in order and differences can be found. However, there is no inherent starting point and ratios are meaningless. The difference between any two successive points is equal; often treated as a ratio scale even if assumption of equal intervals is incorrect. It can add, subtract and order objects. It has central location at arithmetic mean and standard deviation for dispersion.calendar time, temperaturecorrelations, t-tests, F-tests, multiple regression
RatioAn extension of the interval level that includes an inherent zero starting point. Both differences and ratios are meaningful. True zero point indicates absence of an attribute. It can add, subtract, multiply and divide. It has central location at geometric mean and percent variation for dispersion.elapsed time, distance, weightt-test, F-test, correlations, multiple regression
Statistics Basics
Measurement Systems

Get industry recognized certification – Contact us

keyboard_arrow_up