Data Quality Management
Accurate analyses and correct decisions thanks to clean data
The benefit of business intelligence stands and falls with the quality of the data used. Insights gained from unclean data are flawed. Decisions made on the basis of such data can lead to major problems. True to the motto «garbage in, garbage out.» High data quality is therefore a decisive success factor for companies. However, although the importance of data quality is well known in most companies, the data in many companies is still flawed.
What is data quality and data quality management?
Data quality is a subjective term that must be defined individually for each company. It is the overall characteristics of a data set that meet the requirements of the users.
Data quality management refers to all processes and procedures that deal with ensuring high data quality. This includes the identification, cleansing and provision of data.
The main reasons for poor data quality
Data is never 100 per cent clean and perfect. This may be due to the fact that data enters a company in different ways. As a result, it can be outdated, duplicated or inconsistent. To ensure the highest possible data quality, it helps if you know the main reasons for faulty data. This way you can avoid poor quality data in advance. These are the most important reasons:
- Manual data entry: In many companies, manual data entry takes place. However, this is very error-prone. Data can be entered in the wrong place or in the wrong format and a typo or number error can easily occur.
- Data conversion: When transferring data from one location to another, data can be accidentally lost or changed. This could be because the data is stored in different formats or the data structure is different.
- Real-time updates: In order to make good decisions, it is important to work with up-to-date data at all times. But even here, errors can occur if individual data sets may not have been updated at the time of an analysis or because there has not been enough time to review the data.
- Merging data: If data has to be merged, for example in consolidations, company mergers or system changes, errors such as invalid formats, duplicates and conflicts can also occur.
- System upgrades: Frequent updates or upgrades of your software can also lead to errors, as it is possible that data is deleted or corrupted in the process.
- Indiscriminate data collection: Companies often collect all the data that is generated. This holds some potential, as the data might be needed in future. However, it also makes quality assurance and data analysis more difficult. Therefore, if possible, only the data that is really needed should be stored.
What are the criteria for data quality?
Various criteria show you how high the quality of your data is and whether the data is suitable for a particular task.
- Completeness: Are all data sets that are needed complete?
Incomplete data may not be usable or only partially usable. Therefore, it is important to ensure that a data set contains all necessary attributes and that the attributes in turn contain all necessary data.
- Relevance: Is all the data needed for the intended purposes available?
Not all data that is generated is relevant for your purposes. Therefore, it should be collected deliberately so that only as much data as necessary is collected. This is especially true for customer data, which is subject to data privacy.
- Accuracy: Is the data collected accurate and stated as needed?
When collecting data, it is important to ensure that the data is accurate. At the same time, it should also be in the level of detail required. This means, for example, that all necessary decimal places should be stored.
- Timeliness: Are the data sets up to date?
New data is constantly being created in a company. Therefore, it makes sense to always carry out analyses with up-to-date data in order to detect changes or problems at an early stage. In practice, we often recommend that our clients refer to data with a reliable status when making decisions. Depending on the situation, it may make sense to use data from the previous day, for example, as live data can change within a very short time.
- Validity: Is the origin of the data reliable, or does the data come from reliable sources?
The origin of the data sets should be traceable in order to assess whether the data is reliable.
- Availability and accessibility: Can users easily access the data they need? Is it available in the required format?
For example, if relevant data is spread across different tools or is not in the right format, easy accessibility is not always given.
- Consistency: Are there contradictions or duplicates in the data? Are there inconsistencies with other data?
Data must be unambiguous, without contradictions to itself or other data and without redundancies, as well as uniformly structured.
What can be done to ensure high data quality?
In order for data to be of consistently high quality, it is first necessary to define how its quality can be measured. Then the data should be analysed, cleansed and monitored against the defined criteria. This process should be carried out regularly in order to maintain consistently high data quality and to be able to permanently eliminate sources of error.
1. Define criteria
The first step is to determine which criteria should be used to measure the quality of the data. For example, it is defined which data must be available for your purposes and in which format it should be available.
2. Data profiling / data analysis:
Data analysis is used to identify duplicate data, contradictions, errors and incomplete sets of information. In this way, the quality of the data can be measured and the data can be cleaned and updated in further steps. In addition, the data analysis can be used to identify sources of errors and thus take measures to ensure that the detected errors do not happen again in future.
3. Data Cleaning
In the data cleaning step, the problems identified in the data analysis are corrected. This means that duplicates are deleted, incomplete data is added or contradictions are corrected.
4. Data Monitoring:
The existing and new data should be continuously checked to ensure high data quality on a permanent basis.
Tips for data quality management:
1. Determine responsible persons
Without someone to take responsibility for data quality, no one may feel responsible for it. Therefore, it is important to define responsible persons. Depending on the data set, these can be different people, but also a single employee. The responsible persons are in charge of ensuring that the defined standards are adhered to when creating the data and that the data is regularly checked and maintained.
2. Dealing with quality deficiencies
There is no such thing as 100 per cent data quality, as errors can occur at any time. However, depending on the purpose of the application, it can be determined which data must be absolutely correct in order to be able to carry out correct analyses and thus make correct decisions.
Our tip: It is important that as many data sets as possible are correct. However, the cost-benefit ratio of the corrections can be poor, for example if it takes a lot of time to clean the data, but you use the data little afterwards or it has no relevance. Therefore, give priority to addressing quality deficiencies in the essential data.
3. Improve data quality directly at the source
Business intelligence solutions such as myPARM BIact allow you to manually change, correct or add to stored data. However, you should bear in mind that when you make such corrections, on the one hand the data source is still flawed and on the other hand manual corrections also have a high potential for errors. In addition, it could be that existing errors are overlooked. Therefore, the quality of the data should be improved at the data source if possible. In this way, high quality data is made available to the BI software.
4. Continuous data monitoring
The more often you detect errors, correct them and take measures against them, the higher the quality of your data will be in the future. Nevertheless, it is important to view data quality as an iterative process, since new errors can arise at any time, the requirements for the data can change or the amount of data created and its diversity can increase. The process of data quality management should therefore be continuous.
Making decisions based on data instead of gut feeling can contribute a lot to the success of your company. However, with this comes the risk that the data that led to a decision may be flawed. For this reason, it is important that good data quality management ensures that you can always rely on the accuracy of your data.
Learn more about the Business Intelligence Software Software myPARM BIact:
Would you like to get to know myPARM BIact in a demo presentation? Then make an appointment with us right away!