Quantcast
Channel: Infosys-Oracle Blog
Viewing all articles
Browse latest Browse all 561

Data Quality Overview

$
0
0
 
  • In this blog we are going to discuss about the Data Quality and to overcome the issues data redundancy, data duplication, Inconsistence data, junk data etc.

     

  • Every organisation are struggling for total data quality - to create and maintain top condition, information that is fit for its intended business purpose. Ente. By working on Data quality justifications are required to filter exceptional data and maximize the trust factor of the Common Data Model. Data quality gives mainly to govern and judge the information.

     

  • Below diagram gives an idea about the data quality flow.

     

 

The architecture diagram below describes a Data Quality analysis as part of Enterprise Data Integration (EDI).

Basically, EDI consists of architectural components such as data sources, Operational Data Store (ODS), data profiler, Data Quality Analysis (DQA), ETL and target database. The target database should be compliant with the industry standard SID framework. A SID-based  CDM is a catalyst in the integration architecture since it enables CSPs to integrate disparate data sources and data from mergers and acquisitions with minimal rework.

 

Data Profiling

 

Data profiling will inform about the state of the data and based on the data state we have to create rules and apply on the data.

In below table we can see some discrepancy with the data which are highlighted in blue, red and green colors are depicts classic examples of data anomalies.

Data profiling also helps to discover data problems such as inconsistency, missing data, duplicate data, data that does not meet business rules and orphaned data.

Name

Age

DOB

Gender

Height

Anomalies

Bob

35

13-1-81

M

6

Nil

Rob

34

27-8-82

5.5

Lexis Error

Madan

34

15-1-82

M

5-9-2

Domain Format Error

Diana

33

Duplicates

Jim

0

12-7-88

M

5.1

Integrity Constrait Violation

~@$

^^

@

#

Missing Tuple

 

Data profiler supports data analysts to find metadata information such as data type, decimal places, data length, cardinality, primary key, unique count, unique percent, pattern count, minimum value, maximum value, minimum length, maximum length, null counts, blank counts, etc. Such analysis helps determine whether the data adheres to metadata definitions and the business expectations.

The assistances of leveraging data profiling include:

  Data as per business expectations

  Determining the level of quality of each attribute

  Deciding the type of cleansing, standardization or de-duplication required to meet business expectations

Outline matching is an advanced analysis on any given attribute whose distinct value is very low. Below table shows the result of a typical pattern matching analysis on the phone number attribute. It helps a subject matter expert to frame Data Quality rule on phone numbers.

Outline

No. of Rows

Percentage

222222

1456

80

21365-Aaa-1542

3498

9.5

[2345]1234-999

1000

12.1

7777-123-7777

1500

9.8

6666-aaa

45

0.1

 

Here are few examples of Data Quality rules.

  • Phone numbers containing Text should be stripped and should contain only numbers.

  • Gender should be either M or F, and any other entries should be defaulted to blank.

  • For records having name as null, the records should not be loaded into the Database.

 

Data which is having poor quality, incomplete and inaccurate ca over come by doing data auditing.

A perfect data reviews detect key quality metrics, missing data, incorrect values, duplicate records, and inconsistencies. When used in combination with Oracle Enterprise Data Quality Parsing and Standardization, it can deliver unique understanding of your data. 

Results of these profiling and audit processes are presented in easy-to-understand executive dashboards. By usig Data quality dashboards allow problems to be quickly identified, before they start to cause significant business impact. Graphical views show data quality trends over time, helping your organization protect its investment in data quality by giving visibility to the right people.

 

In this part we will discuss how to cleanse the data. Data cleansing task performs to remove duplicates, anomalies, inaccurate data from the source.

Data which has passed quality testing and can be passed for ETL processing.

Data cleansing is the method of locating and repairing data anomalies by comparing the domain of values within the database. Data is scrubbed or cleansed as per the corrective measures suggested for exception like misspellings, missing information, invalid data, varying value representations, etc. Anomalies are fixed automatically by ETL processing.

The DQ and ETL framework must understand, interpret and re-organize data automatically within any context, from any source, to the standard target form. 

 

Data Quality engine finds that any data item "stands-out" (holds statistically significant variance from a mean population), then the engine flags it as an exception and stores it in the exception schema. Such exceptions are thoroughly analyzed by the data governance team or subject matter experts. Depending on the category, exceptions are communicated to:

1.  Data factors to fix data anomalies at the source database.

2.  Quality experts and business users to frame new quality rules/corrective measures.

3.  Exceptions captured during data auditing and ETL processing are analyzed to generate scorecards and dashboard reports.

4.  The data quality framework automatically measures, logs, collects, communicates and presents the results to those entrusted with data stewardship and the source data owners

 

Quality Dimension

Accuracy

Uniqueness

Integrity

Consistency

Density

Completeness

Validity

Schema Comformance

Uniformity

Vocabulary Errors

Format Error

Irregularities

Missing Value

 

Indicates Direct downgrading of the quality dimension

 

Indicates that the occurrence of this anomaly hampers the detectio of other aomalies downgradig the quality dimension

 


Viewing all articles
Browse latest Browse all 561

Trending Articles