The old axiom “garbage in, garbage out” applies to data analysis. Even the most sophisticated BI tools may deliver incorrect or incomplete stories when presented with data irregularities. The first step in any good BI process is to clean and transform the data. Most enterprise-class BI systems provide utilities to identify and correct bad or missing data.
Some examples of this include null values, blank values, bad dates, and other invalid values. Let’s look at each of these in turn. Null values are often misunderstood. A null value is not a blank; it is the absence of a value. Nulls create all sorts of headaches for filtering, sorting, and aggregating data. Nulls are not always bad, but they need to be taken into account. We recommend replacing nulls in numeric fields with a default value (typically 0). Like null values, blank values create a level of unpredictability. Defaulting some textual value like ‘N/A’ or ‘NONE’ usually takes care of the issue. Bad dates usually come from entry fields in the accounting system that do not validate input or restrict future dates. By way of example, a user accidentally enters 2/1/2129 instead of 2/1/2019. Future dates can filter out transactions, especially on dashboards with date range parameters or aging criteria.
A good BI tool highlights bad dates so that they can be corrected in the host system. Often firms want to group data for analysis by a field from the host system that is not validated – that is a field where users can enter whatever value they want. A common example would be a field-like state. If the state field fails to restrict values to a known list, users tend to key the same piece of data in different ways. California, for example, could have many different entries: CA, CA., CAL, or California. Name entry fields create a more pronounced version of this problem. Best practices recommend reporting on validated data, but exception reports can identify “similar values” for correction in the host accounting system. In addition to simply highlighting data inconsistency, a good BI tool will provide some data “cleansing” capabilities as a part of the creation of the data warehouse. Rules can be set to trap bad data and transform it as a part of the data creation process. When the data warehouse does not match the host system, balancing between the two systems presents challenges; correcting the data in the host system ensures fewer reporting headaches. As firms begin to move toward a more data-driven decision-making process, finding the correct BI platform is half the challenge. Ensuring that the system has good-quality data to report against is equally important.