Data warehouse as a foundation for business analysis
Source: Äri-IT Spring 2022
Author: Mihkel Nugis, business analysis developer and consultant at BCS Itera
Data warehouse and business analytics belong together. One without the other is an incomplete solution. Although we have already written about the data warehouse more than once in previous issues, let’s explain once more what it is.
WHAT IS A DATA WAREHOUSE?
A data warehouse is technology that allows us to collect, edit, and store data collected in the course of a company’s operations. In the data warehouse, data loaded from different sources is standardised, converted into a format understandable for the data consumer, and access to business analysis applications is enabled.
ARCHITECTURE OF THE DATA WAREHOUSE
Three layers are usually distinguished in data warehouses.
In the first layer (database layer), data is read from external data sources and data is stored (ETL: Extract – Transform – Load). Data sources can be various applications that support the company’s business activities, for example ERP software (Enterprise Resource Planning), CRM software (Customer Relationship Management), HRM software (Human Resource Management), as well as APIs (Application Programming Interfaces) from other internal or external applications, etc.
In the database layer, data is extracted, transformed, and loaded as tables.
The main operation for processing data in the data warehouse is adding it. Typically, data warehouse tables are updated with new data without the need to change or delete previously stored data. In this case, we are talking about incremental loading. Data is updated regularly, the time interval can be set according to requirements. In most cases, it is reasonable to load data once a day at night, so as not to burden the work of operational bases when they are actively in use.
The database layer of the data warehouse not only consists of the raw data of business operations, but also of meta and aggregated data.
Metadata is data about data, i.e. technical information (program scripts, tables of correspondences and configurations), which contains instructions on how the database is created, what the table relationships are, etc. Metadata controls and organises the work of data warehouse processes.
Aggregate data is data derived from raw data. For example, it is often reasonable to use the data warehouse to calculate the remaining stock for each item, day, warehouse, and shelf in advance based on the movement of goods. In this case, subsequent queries about the stock levels are faster, because no additional calculations are needed.
The data is served to the user in the middle layer (analytics layer). The end user does not have direct access to the database layer of the data warehouse. In the analytics layer, a service has been set up through which applications can access business information. When we talk about data cubes in business analytics, this is the part of the data warehouse we mean. The term cube refers to the analysis layer’s special method of presenting data – a multidimensional model. In this layer, the data is dimensioned.
Each numerical value, or metric, has a relationship with different dimensions. With the help of dimensions, the user can limit the conditions for calculating the selected figures. Dimensions act as filters for calculations. Dimensions are, for example, characteristics of time, place, or object – attributes. The attributes of the time dimension are the date, month, year, or week number, while the attributes of the customer can be their code, name, address, or country of location.
In the analytics layer, the way the data is presented is aimed at making the purpose and content of the data as understandable as possible for the user. Usually, in the analytics layer, data is divided into separate data marts, which only contain data of a certain common nature. For example, it makes sense to keep data required for sales analysis and HR management data separate – this way it is more convenient to restrict users’ access rights only to the information necessary for them.
LATEST DEVELOPMENTS
The multidimensional model has started to be replaced by a newer data model technology called the tabular data model. Nothing much changes for the user, as various metrics and dimensions can still be used when creating reports. The way data is stored in the base of the analysis layer is changed. Because the tabular data model is loaded in its entirety into RAM, it allows the implementation of faster queries compared to the multidimensional model.
I emphasise that the data is stored in both the database and the analysis layer. You may wonder whether double storage of data is justified and if it wastes server storage space. The same data is indeed stored in two places, but thanks to effective data compression, the storage space used in the analytics layer is much smaller than in the database layer. Also, making queries based on the data in the analytics layer is several times faster. Thus, implementing an analytics layer in the data warehouse is not only recommended, but also essential, to ensure efficient and fast reporting.
Speaking of reporting, we can now discuss the third or presentation layer. It consists of several applications that allow the user to connect to the analytics layer service and generate reports. Because the output of the analytics layer is in a standard form, users have a choice of many applications to visualise and analyse the data. One of these applications everyone knows is Excel and many are also familiar with another Microsoft business analytics product, Power BI.
DEVELOPING A DATA WAREHOUSE
Developing a data warehouse is usually a time-consuming job, as the data sources are of a very different nature and structure. The availability of automated data warehouse solutions can greatly save time on data warehouse implementation. If the data mainly originates from one system, it is possible that a data warehouse already exists for this source and, to implement it, you may simply need to run a set of prepared scripts that automatically create the data warehouse.
Users can start using the analysis services created by the automated data warehouse on the day it is installed. Owing to the specifics of a company’s activities, the business analysis solution often needs to be upgraded and changed, however, the volume of this work is only a fraction compared to what would be required to develop a data warehouse from scratch.
BI4Dynamics, an automated data warehouse solution, has been developed for users of ERP software created on the Microsoft Dynamics platform. We have implemented it for dozens of MS Dynamics Business Central customers. On average, it takes a few hours to a few days to get a data warehouse with initial standard complexity up and running.
Such solutions make the implementation of a data warehouse a quick and painless process. We have seen that people who have become used to using business analytics based on a data warehouse will continue to trust it.
Customer stories can also be found on bi365.ee.