Content of a Data Profiling Repository

Content of a Data Profiling Repository

APPENDIX B Content of a Data Profiling Repository This appendix lists information that can be included in a data profiling repository. This demonst...

133KB Sizes 3 Downloads 77 Views

APPENDIX

B

Content of a Data Profiling Repository

This appendix lists information that can be included in a data profiling repository. This demonstrates the information that needs to be collected and maintained in order to record and demonstrate everything you need to know about a data source. This information is invaluable for use on application modifications or intended new uses. A specific organization may find additional information that can be added to this list to make it even more comprehensive. Some of this information could be transferred to a formal metadata repository for permanent storage. However, be aware that formal repositories do not have constructs for all of the information listed in this appendix.

B.1 Schema Definition 9

Identification of the collection of data that is profiled together

9

Business description

9

Data steward

9

Business analysts with knowledge of objects

13.2 Business Objects 9

Name

9

Business description

9

Tables used to store object data

9

Data model of business object

z7z

B.4 Data Source

B.3 D o m a i n s 9

Name

9

Description

9

D a t a type

9

Length boundaries

9

N u m e r i c precision

9

Value p r o p e r t i e s 9

D i s c r e t e value list 9

~

E n c o d e d value m e a n i n g s

R a n g e o f values p e r m i t t e d 9

S k i p - o v e r rule

9

Character patterns required

9

C h a r a c t e r set

9

C h a r a c t e r exclusions

~

Text field restrictions

I .4 Data Source 9

TypelMS/VSAM/ORACLE/...

9

Physical location

9

Application name

9

Application description

9

Database administrator name

9

Key dates

9

9

~

First d e p l o y m e n t date

9

Major c h a n g e dates

Extraction information 9

Data conversions needed

9

O v e r l o a d e d field definitions

Tables that result f r o m e x t r a c t i o n

273

274

A v v E N D I x B Content of a Data Profiling Repository

Extraction executions 9

Date of extraction

~

T y p e full o r s a m p l e

B.5 Table Definitions 9

Name

9

Descriptive name

9

Business m e a n i n g

9

Columns 9

Name

~

Longer descriptive name

-

Business d e f i n i t i o n

9

Confidence indicators 9

Trusted

9

S u s c e p t i b i l i t y to d e c a y

9

Enforcement processes

9

D o m a i n n a m e s if i n h e r i t e d

9

Data type 9

9

Data type discovered

Length boundaries *

Maximum length discovered

9

Minimum length discovered

9

Length distributions discovered

Numeric precision 9

Maximum precision discovered

Value p r o p e r t i e s 9

D i s c r e t e v a l u e list ,

Values d i s c o v e r e d

*

Inaccurate values

~

Values n o t u s e d

o

Value f r e q u e n c y pair list d i s c o v e r e d

~

Encoded value meanings

B.5 Table Definitions

9

Range of values permitted 9

Range of values discovered

9

Skip-over rule 9

9

Skip-over rule violations

Character patterns required 9

Patterns discovered

9

Character set

9

Character exclusions

9

Text field restrictions 9 9

Keywords discovered Text constructs discovered (embedded blanks, special characters)

9

U p p e r / l o w e r c a s e conventions discovered

9

Leading/trailing blanks discovered

Property rules 9

Unique rule 9

9

Consecutive rule discovered

Null rule 9

9

Uniqueness percentage discovered

Consecutive rule 9

9

2"7

Null indications 9

Null indications discovered

9

Blank or zero rule

Inconsistency points 9

Date of change

9

Description of change

Functional dependencies 9

LHScolumns

9

RHScolumns

9

Type 9

Primary key 9

Token

9

Natural

276

APP END IX B

Content

of a

Data Profiling Repository

*

Denormalized key

*

Derived column 9

Rule or formula

Discovered percentage true .

Violation values

B.6 Synonyms 9

P r i m a r y table a n d c o l u m n s

9

S e c o n d a r y table a n d c o l u m n s

9

Type

9

9

9

9

Primary key/foreign key

9

Redundant

~

Domain

9

Merge

Value c o r r e s p o n d e n c e 9

Same

9

Transform

Inclusivity 9

Inclusive

~

Bidirectional inclusive

9

Exclusive

9

Mixed

Degree of overlap 9

One-to-one

9

One-to-many

9

Many-to-many

9

Value lists

9

Violation data

B.8 Value Rules

B.7 Data Rules ~

N a m e o f rule

9

D e s c r i p t i o n o f business m e a n i n g

9

Table n a m e s and c o l u m n n a m e s used

9

Execution logic 9

9

9

Rule logic expression or p r o g r a m or p r o c e d u r e n a m e

Execution results 9

D a t e executed

9

N u m b e r o f rows

9

R o w I D list of violations with data

Remedy implementation 9

Date/time implemented

9

Type of implementation 9

Data entry

9

Transaction program

9

Database-stored procedure

9

Periodic checker execution

9

Business process p r o c e d u r e

B.8 Value Rules 9

N a m e o f rule

9

D e s c r i p t i o n o f business m e a n i n g

9

Table n a m e s and c o l u m n n a m e s used

9

Execution logic 9

Rule logic expression or p r o g r a m or p r o c e d u r e n a m e

9

Result expectations

9

Execution results 9

D a t e executed

9

N u m b e r o f rows

~

R o w I D list o f violations with data

277

2 7 8

AP P EN D I X B

Content

of a

Data Profiling Repository

Remedy implementation ~

D a t e / t i m e implemented

9

Type of implementation 9

Data entry

9

Transaction program

9

Database-stored procedure

9

Periodic checker execution

9

Business process procedure

B.9 Issues 9

D a t e / t i m e created Description of problem Supporting evidence 9

Column properties violations

9

Structure analysis violations

9

Data rule violations

~

Value rule violations

9

Remedies recommended

9

Remedies accepted

9

Remedies implemented

9

Evidence supporting improvements 9

Reduction in violations