This appendix lists information that can be included in a data profiling repository. This demonstrates the information that needs to be collected and maintained in order to record and demonstrate everything you need to know about a data source. This information is invaluable for use on application modifications or intended new uses. A specific organization may find additional information that can be added to this list to make it even more comprehensive. Some of this information could be transferred to a formal metadata repository for permanent storage. However, be aware that formal repositories do not have constructs for all of the information listed in this appendix.
B.1 Schema Definition 9
Identification of the collection of data that is profiled together
9
Business description
9
Data steward
9
Business analysts with knowledge of objects
13.2 Business Objects 9
Name
9
Business description
9
Tables used to store object data
9
Data model of business object
z7z
B.4 Data Source
B.3 D o m a i n s 9
Name
9
Description
9
D a t a type
9
Length boundaries
9
N u m e r i c precision
9
Value p r o p e r t i e s 9
D i s c r e t e value list 9
~
E n c o d e d value m e a n i n g s
R a n g e o f values p e r m i t t e d 9
S k i p - o v e r rule
9
Character patterns required
9
C h a r a c t e r set
9
C h a r a c t e r exclusions
~
Text field restrictions
I .4 Data Source 9
TypelMS/VSAM/ORACLE/...
9
Physical location
9
Application name
9
Application description
9
Database administrator name
9
Key dates
9
9
~
First d e p l o y m e n t date
9
Major c h a n g e dates
Extraction information 9
Data conversions needed
9
O v e r l o a d e d field definitions
Tables that result f r o m e x t r a c t i o n
273
274
A v v E N D I x B Content of a Data Profiling Repository
Extraction executions 9
Date of extraction
~
T y p e full o r s a m p l e
B.5 Table Definitions 9
Name
9
Descriptive name
9
Business m e a n i n g
9
Columns 9
Name
~
Longer descriptive name
-
Business d e f i n i t i o n
9
Confidence indicators 9
Trusted
9
S u s c e p t i b i l i t y to d e c a y
9
Enforcement processes
9
D o m a i n n a m e s if i n h e r i t e d
9
Data type 9
9
Data type discovered
Length boundaries *
Maximum length discovered
9
Minimum length discovered
9
Length distributions discovered
Numeric precision 9
Maximum precision discovered
Value p r o p e r t i e s 9
D i s c r e t e v a l u e list ,
Values d i s c o v e r e d
*
Inaccurate values
~
Values n o t u s e d
o
Value f r e q u e n c y pair list d i s c o v e r e d
~
Encoded value meanings
B.5 Table Definitions
9
Range of values permitted 9
Range of values discovered
9
Skip-over rule 9
9
Skip-over rule violations
Character patterns required 9
Patterns discovered
9
Character set
9
Character exclusions
9
Text field restrictions 9 9
Keywords discovered Text constructs discovered (embedded blanks, special characters)
9
U p p e r / l o w e r c a s e conventions discovered
9
Leading/trailing blanks discovered
Property rules 9
Unique rule 9
9
Consecutive rule discovered
Null rule 9
9
Uniqueness percentage discovered
Consecutive rule 9
9
2"7
Null indications 9
Null indications discovered
9
Blank or zero rule
Inconsistency points 9
Date of change
9
Description of change
Functional dependencies 9
LHScolumns
9
RHScolumns
9
Type 9
Primary key 9
Token
9
Natural
276
APP END IX B
Content
of a
Data Profiling Repository
*
Denormalized key
*
Derived column 9
Rule or formula
Discovered percentage true .
Violation values
B.6 Synonyms 9
P r i m a r y table a n d c o l u m n s
9
S e c o n d a r y table a n d c o l u m n s
9
Type
9
9
9
9
Primary key/foreign key
9
Redundant
~
Domain
9
Merge
Value c o r r e s p o n d e n c e 9
Same
9
Transform
Inclusivity 9
Inclusive
~
Bidirectional inclusive
9
Exclusive
9
Mixed
Degree of overlap 9
One-to-one
9
One-to-many
9
Many-to-many
9
Value lists
9
Violation data
B.8 Value Rules
B.7 Data Rules ~
N a m e o f rule
9
D e s c r i p t i o n o f business m e a n i n g
9
Table n a m e s and c o l u m n n a m e s used
9
Execution logic 9
9
9
Rule logic expression or p r o g r a m or p r o c e d u r e n a m e
Execution results 9
D a t e executed
9
N u m b e r o f rows
9
R o w I D list of violations with data
Remedy implementation 9
Date/time implemented
9
Type of implementation 9
Data entry
9
Transaction program
9
Database-stored procedure
9
Periodic checker execution
9
Business process p r o c e d u r e
B.8 Value Rules 9
N a m e o f rule
9
D e s c r i p t i o n o f business m e a n i n g
9
Table n a m e s and c o l u m n n a m e s used
9
Execution logic 9
Rule logic expression or p r o g r a m or p r o c e d u r e n a m e
9
Result expectations
9
Execution results 9
D a t e executed
9
N u m b e r o f rows
~
R o w I D list o f violations with data
277
2 7 8
AP P EN D I X B
Content
of a
Data Profiling Repository
Remedy implementation ~
D a t e / t i m e implemented
9
Type of implementation 9
Data entry
9
Transaction program
9
Database-stored procedure
9
Periodic checker execution
9
Business process procedure
B.9 Issues 9
D a t e / t i m e created Description of problem Supporting evidence 9