Archives‎ > ‎Interesting Articles‎ > ‎

What is a Data Warehouse?

posted Jun 16, 2013, 4:57 AM by Leroy Dyer   [ updated Apr 21, 2016, 5:19 PM ]

What is a Data Warehouse?

A data warehouse is a centralized data store collating data, from multiple sources, making it easier to share data across multiple platforms.  Data from various data stores can be pulled or pushed, reports created for rich media presentations, websites, also manages big data.

Figure 1

DATA: these points can be: sql server, sas enterprise miner, oracle, and spread sheet data

Typically, a relational database designed for query and analysis, rather than transactional queries.

  • A place where historical data is stored, for archival purposes, analysis and security.
  • Contains raw or formatted data
  • Combines data from multiple sources.

Why

  • Data is stored in various types of format and in different locations.
  • Data in various sources is not easily accessed such as operational databases (OLTP).
  • Online transactional processing systems are and not designed for analysing data, this data is also often live data which is constantly updating and changing.
  • Data is often stored in different platforms and different locations.
  • Subject orientated –Decision based
  • Integrated – various platforms
  • Non-volatile – not changing
  • Time variant – showing changes over time

Defined as; too much volume, velocity, variability, potentially 5 petabyte or more, to work on normal database architectures. Such as Google, Facebook, twitter etc.

Testing / developing a data warehouse

  • Business analysts ------ get requirements
  • Q-A tester ----------- develops and executes test cases.
  • Architects ----------- set up test environment
  • Developers ------------- reform test units
  • Database administrators ------- reform functionality (user acceptance tests)
  • Sources
  • ETL layer
  • The data warehouse
  • The business intelligence front end (dashboards) (Scorecard)
  • Test all entry point
  • Testing the dashboard
  • Logical calculations – compare data M art to report
  • Data validation –create equivalent sql queries from dashboard and data mart.
  • Make sure create reports based on user requirements
  • Verify Field lengths data elements
  • Use automation tools
  • Test speed + performance + workloads, workflows and wait time.

The data ware house needs to load data from various sources, the data EXTRACTED needs to be TRANSFORMED into a format which can be used by the data warehouse for analysis.

EXTRACT:     from OLTP SYSTEM and copied to the ware house

Transform:   remove inconsistences, format to a common format, add missing fields derive new fields to store calculated data.

Load:        Map data into the warehouse

  • Source to target map: required to plan the etl process.
  • Capture business rules
  • Data flow mapping
  • Data movement requirements
  • Mapping document specification:
  • SOURCE: Input Definition
  • TARGET: output formats
  • Business and data transformation rules
  • Data quality requirements
  • Source & Target SQL is available here also.