Whats a Data Mart?
21 January, 2010
The term ?Data Mart? refers to a sub-set of Data Warehouse or OLTP (Transaction) systems data, formatted specifically for a particular business view.
To create an analogy, if data warehouses or OLTP systems were to be viewed as actual retail warehouses, then datamarts can be viewed as the showroom. A retail warehouse is packed full of items with the main consideration being well-organized storage with the ability to get items in and out. In contrast, the showroom?s consideration is presentation, showing the value of the items, and taking care of overall customer satisfaction. The warehouse has to be suited to store items of all different uses, while the showroom will focus on items of a particular use.
A datamart is simply defined as a logically related subset of data from the complete data warehouse or OLTP system. The subset of data is related to a single business process or a group of related business processes. Datamarts can be seen as meeting a certain criteria, such as all data relating to purchase orders that falls within the date range of the last three months, or all the data relating to shipping over the last two years.
?
Data Mart Design Objectives
The key design objective for Data Marts is to provide the business user with the data that is most relevant, in the shortest possible amount of time ? in order to support On-line, Analytical Processing (referred to as OLAP). This allows users to develop and follow a train of thought, without needing to wait long periods for queries to complete.
Benefits of a Datamart:
1. Incremental development
Datamarts can be designed and built separately and can fit into an incremental development strategy where only one datamart at a time will be delivered.
2. Customer understandability of data
By the datamart only suppling data that matters to a specific area of the business, there is less confusion for the customers. The customers do not need to sift through data that they are not interested in. The datamart ETL process does the filtering at the datamart level.
3. Manageable pieces
Datamarts break down the complicated data design into small, manageable pieces. This is helpful to the end user, but this is also helpful to the development teams. By the mart being simplistic in design it is easy to communicate across teams and maintain them.
4. Manipulation of data in the mart
Inside the datamart the data can be aggregated, summarized, averaged, etc., to meet the specific needs of the business area. Since the mart is separate from its source system, there is freedom to work with the numbers as the business chooses, without having to consider impact on related systems.
5. Better reporting performance
By having only a subset of the data in the mart, the database system can manage the data faster and easier. Also, since the filtering has already been applied at the mart ETL, the reporting queries become much smaller. Smaller queries performing on a small subset of data are, of course, easier to tune. Therefore there will be improved reporting performance.
6. Use of distributed technology
Since the datamarts are smaller they can be placed on smaller distributed machines to allow users to break away from massively powered machines and still handle processing of the reports.
Data Mart Architecture
Data Mart design involves identifying those measures or facts in which the user is interested, and the ways that the user would like to ?slice and dice? that data. For ease of understanding, these data concepts are presented in the form of a ?Star Schema?. That is, collections of facts or measures in a central data table, surrounded by other tables that contain the dimensions of interest (thus resembling a Star).
In the example of a video rental store data mart below, the REVENUE table is a fact table, and the Customer, Movie, Market, and Time are dimension tables. The REVENUE fact table contains revenue data for movies rented by each customer, in a geographic market, over a period of time. The dimension tables in this data mart define the customers, movies, markets, and time periods used in the fact table.
The creation of a Data Mart inevitably involves summarisation of data. This yields the highest performance for queries. A much broader range of data will be available in the under-lying Data Warehouse, but generally this data is not summarised, and as such query times may be longer. For specific needs, it may be possible to link data in Data Marts to the data in the Data Warehouse. This access can be provided transparently by data access tools.
To create an analogy, if data warehouses or OLTP systems were to be viewed as actual retail warehouses, then datamarts can be viewed as the showroom. A retail warehouse is packed full of items with the main consideration being well-organized storage with the ability to get items in and out. In contrast, the showroom?s consideration is presentation, showing the value of the items, and taking care of overall customer satisfaction. The warehouse has to be suited to store items of all different uses, while the showroom will focus on items of a particular use.
A datamart is simply defined as a logically related subset of data from the complete data warehouse or OLTP system. The subset of data is related to a single business process or a group of related business processes. Datamarts can be seen as meeting a certain criteria, such as all data relating to purchase orders that falls within the date range of the last three months, or all the data relating to shipping over the last two years.
?
Data Mart Design Objectives
The key design objective for Data Marts is to provide the business user with the data that is most relevant, in the shortest possible amount of time ? in order to support On-line, Analytical Processing (referred to as OLAP). This allows users to develop and follow a train of thought, without needing to wait long periods for queries to complete.
Benefits of a Datamart:
1. Incremental development
Datamarts can be designed and built separately and can fit into an incremental development strategy where only one datamart at a time will be delivered.
2. Customer understandability of data
By the datamart only suppling data that matters to a specific area of the business, there is less confusion for the customers. The customers do not need to sift through data that they are not interested in. The datamart ETL process does the filtering at the datamart level.
3. Manageable pieces
Datamarts break down the complicated data design into small, manageable pieces. This is helpful to the end user, but this is also helpful to the development teams. By the mart being simplistic in design it is easy to communicate across teams and maintain them.
4. Manipulation of data in the mart
Inside the datamart the data can be aggregated, summarized, averaged, etc., to meet the specific needs of the business area. Since the mart is separate from its source system, there is freedom to work with the numbers as the business chooses, without having to consider impact on related systems.
5. Better reporting performance
By having only a subset of the data in the mart, the database system can manage the data faster and easier. Also, since the filtering has already been applied at the mart ETL, the reporting queries become much smaller. Smaller queries performing on a small subset of data are, of course, easier to tune. Therefore there will be improved reporting performance.
6. Use of distributed technology
Since the datamarts are smaller they can be placed on smaller distributed machines to allow users to break away from massively powered machines and still handle processing of the reports.
Data Mart Architecture
Data Mart design involves identifying those measures or facts in which the user is interested, and the ways that the user would like to ?slice and dice? that data. For ease of understanding, these data concepts are presented in the form of a ?Star Schema?. That is, collections of facts or measures in a central data table, surrounded by other tables that contain the dimensions of interest (thus resembling a Star).
In the example of a video rental store data mart below, the REVENUE table is a fact table, and the Customer, Movie, Market, and Time are dimension tables. The REVENUE fact table contains revenue data for movies rented by each customer, in a geographic market, over a period of time. The dimension tables in this data mart define the customers, movies, markets, and time periods used in the fact table.
The creation of a Data Mart inevitably involves summarisation of data. This yields the highest performance for queries. A much broader range of data will be available in the under-lying Data Warehouse, but generally this data is not summarised, and as such query times may be longer. For specific needs, it may be possible to link data in Data Marts to the data in the Data Warehouse. This access can be provided transparently by data access tools.
As far as I understood Data Mart is some kind of online data room that has all the features of cloud repository service. The difference is only in security level. Thanks for insightful article.
Hi Guest,
Thank you for the useful information,
I will pass this information over to the relevant teams to update our information.
Thanks
Adam
Thank you for the useful information,
I will pass this information over to the relevant teams to update our information.
Thanks
Adam