Advantages of having a Data warehouse
- A data warehouse provides a common data model for all data of interest regardless of the data’s source. This makes it easier to report and analyse information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc.
- Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.
- Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time.
- Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems.
- Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems.
- Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals.
Disadvantages of data warehouses
- Data warehouses are not the optimal environment for unstructured data.
- Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data.
- Over their life, data warehouses can have high costs. The data warehouse is usually not static. Maintenance costs are high.
- Data warehouses can get out-dated relatively quickly. There is a cost of delivering suboptimal information to the organization.
- There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa.
ETL stands for extract, transform, and load. Each term means a process in the context of data warehousing.
- Extracting data from outside sources
- Transforming it to fit operational needs (which can include quality levels)
- Loading it into the end target (database or data warehouse)
Important function of ETL is “cleansing” data. The ETL consolidation protocols also include the elimination of duplicate or fragmentary data, so that what passes from the E portion of the process to the L portion is easier to assimilate and/or store. Such cleansing operations can also include eliminating certain kinds of data from the process. If you don’t want to include certain information, you can customize your ETL to eliminate that kind of information from your transformation.
The first part of an ETL process involves extracting the data from the source systems. Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization / format. Common data source formats are relational databases and flat files, but may include non-relational database structures such as Information Management System (IMS) or other data structures such as Virtual Storage Access Method (VSAM) or Indexed Sequential Access Method (ISAM), or even fetching from outside sources such as web spidering or screen-scraping.
The transform stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the end target. Some data sources will require very little or even no manipulation of data. In other cases, specific transformation types may be required to meet the business and technical needs of the target database. Some standard transformation types are – filtering, sorting, aggregation, adding derived columns etc.
The load phase loads the data into the end target, usually the data warehouse (DW). Depending on the requirements of the organization, this process varies widely. Some data warehouses may overwrite existing information with cumulative, updated data every week, while other DW (or even other parts of the same DW) may add new data in a histories form, for example, hourly. The timing and scope to replace or append are strategic design choices dependent on the time available and the business needs. More complex systems can maintain a history and audit trail of all changes to the data loaded in the DW.
OLTP and OLAP forms two major type of database systems. The focus on both the system is entirely different ones, OLTP caters for typical end user applications needs like faster insert, update etc. Whereas OLAP system is designed for faster query response. The article lists various other differences between these designs.
OLTP Vs OLAP: Online analytical processing systems VS Oline transactional systems
|OLTP System||OLAP System|
|Source of data||Operational data; OLTPs are the original source of the data.||Consolidation data; OLAP data comes from the various OLTP Databases|
|Purpose of data||To control and run fundamental business tasks||To help with planning, problem solving, and decision support|
|What the data||Reveals a snapshot of ongoing business processes||Multi-dimensional views of various kinds of business activities|
|Inserts and Updates||Short and fast inserts and updates initiated by end users||Periodic long-running batch jobs refresh the data|
|Queries||Relatively standardized and simple queries Returning relatively few records||Often complex queries involving aggregations|
|Processing Speed||Typically very fast||Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes|
|Space Requirements||Can be relatively small if historical data is archived||Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP|
|Database Design||Highly normalized with many tables||Typically de-normalized with fewer tables; use of star and/or snowflake schemas|
|Backup and Recovery||Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability||Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method|
All data warehouses are databases, not all databases are data warehouses.
A database is an integrated collection of logically related records or files consolidated into a common pool that provides data for one or multiple uses. On the other hand, Data warehouse is a repository of an organization’s electronically stored data. Data warehouses are designed to facilitate reporting and analysis.
The primary difference between you application database and a data warehouse is that while the former is designed (and optimized) to record , the latter has to be designed (and optimized) to respond to analysis questions that are critical for your business.
Khan – SQLDBA – MCTS – www.addarr.com