Data warehouse Architecture and Process Flow.
Depending upon the business requirements and the budget , different data Warehouse may have different architectures Types
Data Warehouse Architecture – Type 1 :
Source (OLTP) —–> Staging Area ——> Data Warehouse ——> Reporting Layer
Data Warehouse Architecture – Type 2 :
Source (OLTP) —–> Staging Area ——> Data Marts ——> Reporting Layer
Data Warehouse Architecture – Type 3 :
Source (OLTP) –> Staging Area –>Data Warehouse –>Data Marts –>Reporting Layer
Data Warehouse Architecture – Type 4 :
Source (OLTP) –> Staging Area –>Data Marts –>Data Warehouse–>Reporting Layer
Basic Components of Data Warehouse are :
1. Source Systems (OLTP) :
- These Systems include the Operational databases , which contains the current day to day transaction.
- They act as the source for the data to be supplied to data warehouse for storage.
- The Source could be in different formats e.g. Flat files , Relational databases , Excels , other databases etc.
- These Sources could be internal , as well as external. For e.g. , A Samsung store may be interested in knowing the total sale of TV in all its stores(internal) . In addition to this it may also be interested in knowing the total sale of TV in the entire city ( external) in order to study the trend for future forecasting.
The process of ‘Data Extraction from the source ‘ is explained in detail under ‘ETL Process’. Read more….
2. Staging Area :
- It act as a mid-ware platform between the source and the target systems. It provides a platform where data could undergo the process of cleaning and transformation before being loaded into the target.
- The Staging area is a temporary database which could be either relational database , flat file or other database.
- Staging Area is a part of Data warehouse server.
- The data in the staging area is cleaned just prior to new ETL Process or just after the completion of current ETL process and successful loading.
Design of Staging Area :
Actually Staging area consist of 2 temporary tables. In first table ( mostly flat files or may be relational database or other database) raw data from single / multiple sources is just dumped by straight load without any modifications. From first table , data undergoes the process of cleaning and transformation one by one and moved to the second table . After all the records are aggregated in this second database , in one shot from here data is loaded into the target. The utility of this second database is that if this is not there , then data needs to be loaded into the target one by one instead of one shot i.e one record cleaned , transformed and loaded into data warehouse. similarly for second record and so on. This is not an efficient way.
Not necessary staging area always follows this architecture of two temporary tables., it may vary as per the business need. But basically it act as the stage for the data to rest and get processed.
Need of Staging area :
- There may be situations where data from multiple sources needs to be loaded into the data warehouse. For this , some platform is needed where data coming from multiple sources can reside , cleaned and transformed. Staging area provides that platform.
- Besides data coming from multiple sources , there could be situations where data from multiple sources are coming in different time zones. Hence in this situation , also a platform is needed for holding the data unless data from all the sources can be integrated.
- If staging area is not there then data from the source (OLTP) needs to be directly cleaned ,transformed and loaded into OLAP systems . This will take a lot of time as 1 -1 record needs to be processed. This will require the OLTP systems to be kept on hold until loading completes, which is not possible in real- time. It will also hamper the performance of the OLTP systems badly.
- Moreover, direct loading data from OLTP to OLAP systems would mess up both the systems as data to be loaded in OLAP is in different format and has business rules applied.This would hamper the OLTP systems badly.
The process of ‘Cleaning and Transformation ‘ is explained in detail under ‘ETL Process’. Read more….
3. Data Presentation / Storage Area (Target or OLAP Systems)
- Data Marts
- Data Warehouse
The process of ‘Loading Data in Target Systems’ is explained in detail under ‘ETL Process’. Read more….
4. Data Access / Reporting Layer
In this layer the Business Intelligence (BI) people uses the Data from the target systems which may either be data warehouse or data mart for analysis , performing ad – hoc queries , generating reports. These Reports help in taking right decisions and proper business forecasting , they help to find out the overall statistics of the company , the trend and thus play a key role for survival of the business organization in the world of fast changing trends and competitors.
Read More ….under ‘BI Reporting.’