Data Staging Area

The Data Staging Area :

Often the most complex part in the architecture, and involves...

• Extraction (E)
• Transformation (T)
• Load (L)
• indexing

ETL-tools can be used Scripts for extraction, transformation and load are implemented.




Extraction :

means reading and understanding the source data and copying the data needed for the data warehouse into staging area for further manipulation, i.e. transformation

Transformation involves…

• data conversion/transformation
(specify transformation rules to convert to a common data format
and common terms/semantics)

• data cleaning/cleansing

– data scrubbing (use domain-specific knowledge (e.g postal
adresses) to check the data)
– data auditing (discover suspicious pattern, discover violation of
stated rules)

• combining data from multiple sources
• assigning warehouse (surrogate) keys
• data aggregation

1 comment: