Insights Blog

Data ethics: what, why now and where do we start? 

Data ethics: what, why now and where do we start? 

This is a guest blog by Exasol’s Chief Data and Analytics Officer Peter Jackson, in it he explores the big questions related to data ethics. In particular, he covers:

– Why we need to start taking action on data ethics now
– How to get started with data ethics
– The big questions to consider
– What steps to take now

We have to make ethical decisions every day when working with data. The problem is, a lot of the time we might not realize when this is the case. Data professionals and organizations are often so focused on what can be done with data and at what scale, that critical questions can be forgotten – should we be doing this and how should we be doing this? 

These are quite basic questions, but there’s no doubt that they slip out of the thought process far too often. What data should we collect and keep? What should we collect and delete? What should we leave alone completely? How should we collect the data? These aren’t compliance issues or a question to be answered just to stay on the right side of GDPR, CCPA or any other data protection / privacy legislation. These are questions of ethics. Just because we can, doesn’t mean we should. 

Why do we need to start taking action on data ethics now?

The fact is, that too often, the interpretation of data ethics is limited to Loading...data science teams understanding bias in training sets and teams knowing just enough to keep the regulators happy. This shouldn’t be enough to satisfy those of us working in the industry. So, why not? Why should organizations be taking data ethics seriously?

There can be no clearer representation of the consequences of unethical behavior than Cambridge Analytica. Once the fact that it had collected and used the data of millions of Facebook users without their consent was made public, the company collapsed. 

This isn’t the only arena in which ethical considerations will need to be taken into account. Data ethics will have to play a part in the on-going discussions about Lethal Autonomous Weapons (LAW), for example. 

These are clearly extreme examples, but when it comes to handling personal data, the conduct of organizations will continue to be under the spotlight and one public misstep could be extremely costly. The more conscious that people become about the value of their personal data and how it should be handled, the more reputational damage organizations will suffer when they drop the ball with data ethics. 

This is also being taken seriously at a governmental level. Late in 2018 the UK government announced the foundation of its Centre for Data Ethics and Innovation, which, as per its own website, is ‘tasked by the Government to connect policymakers, industry, civil society, and the public to develop the right governance regime for data-driven technologies.’ 

In the US, as part of the Federal Data Strategy, The General Services Administration announced late in 2020 that it had released a framework to help agencies make ethical decisions. The topic has also made it onto the agenda at the European Commission.

So, there’s no doubt that data ethics is on the radar, but how should we be turning this awareness into action? 

How to get started with data ethics

Data ethics is a very broad topic which can take us in multiple directions. So, if you’re to start making tangible progress there needs to be a focus on some key issues:

1. The collection of data

Organizations will naturally collect as much data as possible and keep hold of as much data as possible, just in case it might prove useful in future. We need to move away from this behavior being the norm and ingrain new approaches. We can do this by constantly asking simple questions:

  • How should we collect data?
  • What should we collect and keep?
  • What should we collect and delete?
  • What should we leave well alone?

This is not an exhaustive list by any means, but the basic premise that we need to question how data is being collected is crucial to making more ethically sound choices.

2. Reassess how we use algorithms

Organizations cannot absolve themselves of responsibility when something goes wrong and lay the blame at the door of an algorithm. There has to be human accountability with regards to whether algorithms should be used at all, whether human decisions would be preferrable in some instances and how we can find a healthy balance in collaborative intelligence or, human-machine collaboration. 

When algorithms are used, governance is absolutely critical. There has to be a clear understanding of the bias in training data sets and that loops back to the collection of data. Compliance teams also have to be completely on top of how data ought to be deployed and used. 

3. Take a customer-centric approach to data ethics

Personally Identifiable Information is not ‘data’, it is a person. As consumers, and citizens, we’re becoming more value-driven in how we engage with organizations. This naturally extends to how we expect our personal data to be handled in an ethical manner. In this context, organizations can’t afford to be reactive and merely do enough to satisfy regulators. Getting on the front foot with data ethics will play a huge role in boosting an organization’s, or a government’s, reputation with current and future customers and citizens. It can be a competitive advantage, not only in the sense that you can get a step ahead of the regulators and avoid potential fines related to poor practice, but in the sense that you can truly stand out as an organization putting data ethics at the forefront of everything it does, not just a compliance box-ticking exercise. 

Where next?

This is just the start of what you need to consider to bake data ethics into your organization’s daily work. But we need to start somewhere. Over the coming weeks and months, we’re going to be investigating this topic in a lot more detail, digging into the core areas that demand our attention and guiding you through what really matters when it comes to data ethics. Stay tuned. 

Peter Jackson is Chief Data and Analytics Officer at Exasol. He started his career as a business analyst before transitioning into software development. He’d later become the first Head of Data at The Pensions Regulator, the first CDO of Southern Water, and the first Group Director of Data Science at Legal and General. Jackson then joined forces with Caroline Carruthers to write The Chief Data Officer’s Playbook and launch the CDO Summer School.