This week, Eva Murray takes a look at:
- The impact poor data quality has on organizations
- Practical advice for how to overcome this challenge through the use of feedback loops
Poor data quality can cost organizations millions each year. It can lead to incorrect decisions, delays, and missed opportunities.
While data quality has been a topic that people across IT and business departments have talked about and focused on for well over the last decade, the current state of data quality in many organizations suggests that these discussions haven’t been followed up with the necessary actions.
In this article I will outline some suggestions for getting traction with data quality in your organization. This is based on the idea of using feedback loops to not just distribute the workload of addressing data quality issues but also sharing the responsibilities between data owners, data users and data producers.
Poor data – and lots of it
Organizations, regardless of their size, are becoming more and more data-rich by collecting data and information from every part of their business processes. If data quality is already below target and processes to rectify this are not in place, increasing the volume of data will only make the problem worse.
Data quality issues are often acutely felt by data analysts and data scientists. They have to tackle these issues through data cleansing and oftentimes lengthy processes to transform data into something useful for their analyses.
Each additional data source adds further complexity to the process and consumes – at least in the short term, prior to process automation – additional time. Time that these data professionals could otherwise devote to answering business questions, identifying trends and insights in the data that can lead to new revenue opportunities. So, what’s the solution?
Where to address data quality problems
Data preparation and integration tools have enabled organizations to tackle the symptoms of poor data quality and rectify them as much as possible so that valid and reliable data can be used to support decision-making processes.
My suggestion is to establish effective feedback loops between those working with the data and those producing or entering the data at its origin. Connecting these two parts of the data value chain can help to address the causes of poor data quality in the long-term by improving processes and changing behaviors.
When I moved to the UK last year and signed up with a mobile phone provider, I witnessed first-hand how poor data comes to exist.
During his conversation with me, the sales consultant had to ask me about personal details to complete the form that provided the basis for our contract. There were, however, at least two dozen additional fields he had to populate with information before he could proceed with the transaction. But neither was this information essential at this point in the process, nor would asking all these questions create a positive experience for me as the customer, because I wanted a mobile phone contract, not a 30-min interview.
To spare me the inconvenience and to speed up the process, the sales consultant simply entered dummy values in each of the mandatory fields and we finished the transaction.
I couldn’t help but feel for his colleagues in the data engineering and analytics departments who now have to clean up my messy data before analyzing my mobile phone usage.
How to create a feedback loop
If you want to tackle poor data quality at its source, it helps to connect those creating the data with the people who use it, so they understand each other’s needs and tasks better.
Going back to my example above, if we could facilitate a conversation between the sales consultant and a data analyst, I am sure the sales consultant would understand better how important high quality data is for the data analyst. Similarly, the analyst could see opportunities to improve the data collection process in customer-facing roles to help her colleague produce the much needed data.
In my work with analytics and data communities in organizations across the world, I have seen that bringing people from different roles together and encouraging them to learn from each other can make significant contributions to building a data culture.
Connect staff to improve data quality
For data quality, a similar approach can work. Why not connect the customer-facing staff who enter data with those who analyze it? Whether the sales consultant for a mobile phone provider, the nurse or front office staff in a hospital or the bank teller – each of them gather data from customers, patients and clients and the better their process is, the better the resulting data quality.
To initiate the feedback loop and to help people in your organization build relationships and have constructive conversations, I recommend you start with a small group. It’s important that you are clear about what the aim is, how each side benefits and that you’d like them to have an ongoing connection so they can raise any issues, questions and solutions as necessary.
Using specific examples will help you and the group start the conversation. It can also provide the basis for ideas, suggestions and a shared understanding of the role everyone plays in the process from data creation to analysis and reporting.
Continue the conversation
Once the initial meeting has happened, support each participant to stay in touch. This can happen through the communication tools you have available within your organization as an open channel where people can engage as needed. And when this process has gained traction you can start inviting others to contribute.
Showing results is important, of course, and after the initial issues are discussed, people need to take action to make changes and improvements.
Provide the results as regular updates to the wider group of people directly affected and subsequently to the business as a whole as appropriate.
If you can show how constructive discussions about data quality can lead to improved, simplified processes for frontline staff and better data quality for analysts and data scientists, that is a powerful way to engage a broader audience and to achieve further improvements.
As you present the results, include quantifiable metrics where possible. Do the process changes save time (and frustrations) when dealing with customers? How has the improved data quality reduced the need for complex data cleansing workflows?
Leading the change
Taking the initiative to improve the quality of the data in your organization is not only a good idea but also an important step to take right now as you face the ongoing growth of data volumes, the complexity of data sources and the constant evolution of your data architecture.
Too many of us continue to work in silos and sometimes forget that the others who are further upstream in the data value chain are actually our colleagues and that we could talk to them, ask questions, listen and find solutions together. In large organizations and even in small ones, this communication doesn’t often happen, so I want to encourage you to facilitate it and to bring together the people that are a crucial part of the data collection, processing and analysis process and help them address data quality issues together in a constructive way.
Eva Murray, Technology Evangelist, Exasol