ETL Testing

Go back to Tutorial

ETL is commonly associated with Data Warehousing projects but there in reality any form of bulk data movement from a source to a target can be considered ETL. Large enterprises often have a need to move application data from one source to another for data integration or data migration purposes. ETL testing is a data centric testing process to validate that the data has been tranformed and loaded into the target as expected.

ETL Testing Types

ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used:

  • New Data Warehouse Testing – New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and new data warehouse is built and verified with the help of ETL tools.
  • Migration Testing – In this type of project customer will have an existing DW and ETL performing the job but they are looking to bag new tool in order to improve efficiency.
  • Change Request – In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule.
  • Report Testing – Report is the end result of any Data Warehouse and the basic propose for which DW builds. The report must be tested by validating layout, data in the report and calculation.

ETL Testing Techniques

  • Data transformation Testing: Verify that data is transformed correctly according to various business requirements and rules.
  • Source to Target count Testing: Make sure that the count of records loaded in the target is matching with the expected count.
  • Source to Target Data Testing: Make sure that all projected data is loaded into the data warehouse without any data loss and truncation.
  • Data Quality Testing: Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data.
  • Performance Testing: Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability.
  • Production Validation Testing: Validate the data in production system & compare it against the source data.
  • Data Integration Testing: Make sure that the data from various sources has been loaded properly to the target system and all the threshold values are checked.
  • Application Migration Testing: In this testing, it is ensured that the ETL application is working fine on moving to a new box or platform.
  • Data & constraint Check: The datatype, length, index, constraints, etc. are tested in this case.
  • Duplicate Data Check: Test if there is any duplicate data present in the target systems. Duplicate data can lead to wrong analytical reports.

Apart from the above ETL testing methods other testing methods like system integration testing, user acceptance testing, incremental testing, regression testing, retesting and navigation testing is also carried out to make sure everything is smooth and reliable.

ETL Testing Process

Similar to any other testing that lies under Independent Verification and Validation, ETL also goes through the same phase.

  • Requirement understanding
  • Validating
  • Test Estimation based on a number of tables, the complexity of rules, data volume and performance of a job.
  • Test planning based on the inputs from test estimation and business requirement. We need to identify here that what is in scope and what is out of scope. We also look out for dependencies, risks and mitigation plans in this phase.
  • Designing test cases and test scenarios from all the available inputs. We also need to design mapping document and SQL scripts.
  • Once all the test cases are ready and are approved, testing team proceed to perform pre-execution check and test data preparation for testing
  • Lastly, execution is performed till exit criteria are met. So, execution phase includes running ETL jobs, monitoring job runs, SQL script execution, defect logging, defect retesting and regression testing.
  • Upon successful completion, a summary report is prepared and closure process is done. In this phase, sign off is given to promote the job or code to the next phase.

Metadata Testing

The purpose of Metadata Testing is to verify that the table definitions conform to the data model and application design specifications.

Data Type Check – Verify that the table and column data type definitions are as per the data model design specifications.

Example: Data Model column data type is NUMBER but the database column data type is STRING (or VARCHAR).

Data Length Check – Verify that the length of database columns are as per the data model design specifications.

Example: Data Model specification for the ‘first_name’ column is of length 100 but the corresponding database table column is only 80 characters long.

Index/Constraint Check

Verify that proper constraints and indexes are defined on the database tables as per the design specifications.

  • Verify that the columns that cannot be null have the ‘NOT NULL’ constraint.
  • Verify that the unique key and foreign key columns are indexed as per the requirement.
  • Verify that the table was named according to the table naming convention.

Data Completeness Testing

The purpose of Data Completeness tests are to verify that all the expected data is loaded in target from the source. Some of the tests that can be run are : Compare and Validate counts, aggregates (min, max, sum, avg) and actual data between the source and target.

Record Count Validation – Compare count of records of the primary source table and target table. Check for any rejected records.

Example: A simple count of records comparison between the source and target tables.

Source Query

SELECT count(1) src_count FROM customer

Target Query

SELECT count(1) tgt_count FROM customer_dim

Column Data Profile Validation – Column or attribute level data profiling is an effective tool to compare source and target data without actually comparing the entire data. It is similar to comparing the checksum of your source and target data. These tests are essential when testing large amounts of data.

Some of the common data profile comparisons that can be done between the source and target are:

  • Compare unique values in a column between the source and target
  • Compare max, min, avg, max length, min length values for columns depending of the data type
  • Compare null values in a column between the source and target
  • For important columns, compare data distribution (frequency) in a column between the source and target

Data Quality Testing

The purpose of Data Quality tests is to verify the accuracy of the data. Data profiling is used to identify data quality issues and the ETL is designed to fix or handle these issue. However, source data keeps changing and new data quality issues may be discovered even after the ETL is being used in production. Automating the data quality checks in the source and target system is an important aspect of ETL execution and testing.

Duplicate Data Checks – Look for duplicate rows with same unique key column or a unique combination of columns as per business requirement.

Example: Business requirement says that a combination of First Name, Last Name, Middle Name and Data of Birth should be unique.

Sample query to identify duplicates

SELECT fst_name, lst_name, mid_name, date_of_birth, count(1) FROM Customer GROUP BY fst_name, lst_name, mid_name HAVING count(1)>1

Data Validation Rules – Many database fields can contain a range of values that cannot be enumerated. However, there are reasonable constraints or rules that can be applied to detect situations where the data is clearly wrong. Instances of fields containing values violating the validation rules defined represent a quality gap that can impact ETL processing.

Example: Date of birth (DOB). This is defined as the DATE datatype and can assume any valid date. However, a DOB in the future, or more than 100 years in the past are probably invalid. Also, the date of birth of the child is should not be greater than that of their parents.

Data Integrity Checks – This measurement addresses “keyed” relationships of entities within a domain. The goal of these checks is to identify orphan records in the child entity with a foreign key to the parent entity.

  • Count of records with null foreign key values in the child table
  • Count of invalid foreign key values in the child table that do not have a corresponding primary key in the parent table

Data Transformation Testing

Data is transformed during the ETL process so that it can be consumed by applications on the target system. Transformed data is generally important for the target systems and hence it is important to test transformations. There are two approaches for testing transformations – white box testing and blackbox testing

Transformation testing using White Box approach – White box testing is a testing technique, that examines the program structure and derives test data from the program logic/code.

For transformation testing, this involves reviewing the transformation logic from the mapping design document and the ETL code to come up with test cases.

The steps to be followed are listed below:

  • Review the source to target mapping design document to understand the transformation design
  • Apply transformations on the data using SQL or a procedural language such as PLSQL to reflect the ETL transformation logic
  • Compare the results of the transformed test data with the data in the target table.

The advantage with this approach is that the test can be rerun easily on a larger source data. The disadvantage of this approach is that the tester has to reimplement the transformation logic.

Example: In a financial company, the interest earned on the savings account is dependent the daily balance in the account for the month.

  1. Review the requirement and design for calculating the interest.
  2. Implement the logic using your favorite programming language.
  3. Compare your output with data in the target table.

Transformation testing using Black Box approach – Black-box testing is a method of software testing that examines the functionality of an application without peering into its internal structures or workings. For transformation testing, this involves reviewing the transformation logic from the mapping design document setting up the test data appropriately.

The steps to be followed are listed below:

  • Review the requirements document to understand the transformation requirements
  • Prepare test data in the source systems to reflect different transformation scenarios
  • Come with the transformed data values or the expected values for the test data from the previous step
  • Compare the results of the transformed test data in the target table with the expected values.

The advantage with this approach is that the transformation logic does not need to be reimplemented during the testing. The disadvantage of this approach is that the tester needs to setup test data for each transformation scenario and come up with the expected values for the transformed data manually.

 

Go back to Tutorial

Share this post
[social_warfare]
Cloud Testing
Software Testing Metrics

Get industry recognized certification – Contact us

keyboard_arrow_up