Ethical data governance practices are essential to driving business growth and positive social change. However, issues like misrepresented data or dark data can negatively impact underrepresented groups and contribute to environmental harm. How can companies effectively manage data complexities and uphold responsible practices to ensure a sustainable and equitable future? In this episode, experts uncover the challenges of inclusive research, the evolving regulatory environment, and the future of data management. They highlight the importance of ethical governance and leveraging data for good, while also exploring the role of AI in identifying and addressing these issues. Featured Experts • Caroline Frankum, Global Chief Executive Officer, Profiles Division, Kantar • Valeria Piaggio, Global Head, Diversity, Equity & Inclusion, Kantar • Dr. Elena Huff, Director, Consult Partner, Kyndryl
Ethical data governance practices are essential to driving business growth and positive social change. However, issues like misrepresented data or dark data can negatively impact underrepresented groups and contribute to environmental harm. How can companies effectively manage data complexities and uphold responsible practices to ensure a sustainable and equitable future?
In this episode, experts uncover the challenges of inclusive research, the evolving regulatory environment, and the future of data management. They highlight the importance of ethical governance and leveraging data for good, while also exploring the role of AI in identifying and addressing these issues.
Featured Experts
Tom Rourke 00:03
Good afternoon, and welcome to the latest episode of The Progress Report. I'm Tom Rourke, your host for this afternoon and Global Leader for Kyndryl Vital. As organizations navigate the complexities of protecting and utilizing data, they face the dual challenges of ensuring sustainability and ethical integrity. Effective data utilization not only bolsters ESG and DEI outcomes, but also drives business performance and competitive advantage. Today, I'm delighted to be joined by Caroline Frankum, Global Chief Executive Officer of the Profiles Division of Kantar, her colleague, Valeria Piaggio, who is the Global Head of Diversity, Equity and Inclusion at Kantar, and my colleague, Dr. Elena Huff, who is Director and Consult Partner here at Kyndryl. So I'd like to start our conversation this afternoon by talking about, well, what is bad data? Because I've been fascinated as we prepare for this call to realize there's a difference between bad data and dark data. So maybe if I could start with you, Caroline. What is bad data and how is that impacting on your work in the area of market research?
Caroline Frankum 01:13
There's a saying, isn't there, that anything without data is just an opinion. So at Kantar, our purpose is to shape the brands of tomorrow by better understanding people everywhere, and that means that we have to ensure, as one of the world's leading market research agencies that, we truly represent the diverse world we serve. So in my book, bad data - it misleads people, it's misrepresentative, and it misinforms people. And we know there's a whole world of fake news out there, as we've all grown more and more in this always-on digital world. But for me, bad data is misrepresentative, misinforms and misrepresents and misleads people.
Tom Rourke 01:53
And Valeria, you have a particular perspective on this which is that bad data isn't just frustrating in terms of kind of getting in the way of the work that your organization does, but it can be actively harmful to particular groups of people and particularly underrepresented groups.
Valeria Piaggio 02:08
Well, bad data is even worse when it is meant to be representing people who are already voiceless, because one of the things we know is that misrepresenting those who are underrepresented in the social discourse reinforces biases and stereotypes and leads to even more discrimination in society and in business. And one of the most worrying results of our recent Brand Inclusion Index is that discrimination is very widespread. It affects majorities of people around the world, and it is often happening in the business context. It is happening even more to those who are underrepresented, and it takes place at work, when people are looking for a job in the marketplace, when they're trying to buy something or visit a hospitality venue. So as we think about this problem, we need to think about it as a business problem.
Tom Rourke 03:17
Elena, if you could help us with some definitions here. Perhaps for our listeners, you could explain what does dark data mean?
Dr. Elena Huff 03:23
Absolutely. So, bad data refers to this incomplete, inaccurate, misrepresented information, often resulting from poor designed surveys, data entry errors, or even manipulations by malicious actors. Bad data costs the US economy up to $3.1 trillion annually through poor decision making, lost productivity, and missed opportunities. In market research, this bad data can lead to erroneous conclusions that affect everything from product development to customer experience strategies. But there is a bright side to the dark data. I have actually seen firsthand how unused data, the dark data, can actually be a goldmine for efficiency and environmental stewardship. I always love to talk to companies and clients about bringing data to life. Make data tell the story. But if the data is bad, well, that's one plot twist I don't want to hear.
Tom Rourke 04:26
Interesting. And maybe Elena, if I could push you to just elaborate a little bit, also on the ESG dimension of dark data, because my own minor contribution to dark data is the vast amounts of really bad photographs that I seem to have hanging around in my phone. But I think there are bigger issues around the implications for ESG in a sort of inappropriate response to dark data. Maybe you could elaborate a little on that?
Dr. Elena Huff 04:51
From an ESG perspective, just so you know, in 2020, digitization generated 4% of global greenhouse gas emissions. And collectively, data centers have a greater carbon footprint than the aviation industry. And if the data centers contain dark data, data that hasn't been used for years and years and years, then that's a direct impact to sustainability. One of my good friends and clients is a CIO for a big organization. He's been with this company for 27 years. 25 years ago, he was a part of a team that converted a lot of their data to archives. That data has been sitting there. And he said, "Elena, I know it's still there for 25 years." Imagine the emissions. Imagine all of the unnecessary implications to the ESG reporting that that data, terabytes of data, sitting there on data centers and how it affects our planet. And in the context of planet stewardship, I always like to talk about how taking control of the data and putting emphasis on being stewards of the data we use on the personal, individual level, and on an organizational level, can positively help with the net zero goals we all have.
Tom Rourke 06:06
So if I just take us back a little bit, Valeria and Caroline, we had the benefit of having one of your former colleagues on an earlier podcast where we were talking about the concept of ethical and responsible AI, and that was very much focused on what we do with the sort of technology that analyzes and works with data. What we didn't actually spend any time talking about was that earlier part of the chain about what constitutes best practice in inclusivity and in accessibility in terms of research. So the quality, the rigor, and the thinking that needs to go into the quality of the research in the first place before it ever presents itself to an AI. What constitutes inclusivity and accessibility in terms of research from your organization's point of view?
Valeria Piaggio 06:49
Inclusive research is intentional research, and the aim is really to represent everyone in society. Inclusive research really entails everything from the team that is conducting this study, because they are the ones framing the questions of you know, "What is it that we're going to find?", to everything. Planning, the methodologies that are being used, and whether these methodologies are already intrinsically biased. And this is important when we're trying to think about expanding segments in society or the marketplace. And in some cases, it's not easy to reach these populations because they may have a disability or a learning and thinking difference, or they might be in remote populations. And then we need to focus on the "how". You know, inclusive research has a lot to do with how we conduct the work. It needs to be very intentional about interrupting biases, interrupting our way of looking at the world, and making sure that we contemplate other perspectives. It's also critical that we create an environment where respondents feel first safe and then respected and valued for what they have to say. It is important that we are very conscious about how we are asking questions, starting with how we ask about people's identity, because first, if we are not getting honest, transparent responses about that, then obviously we are already flawed.
Tom Rourke 08:32
So thank you, Valeria, and before we talk about the AI again, Caroline, I loved the way you introduced yourself during one of our earlier conversations, which I think if I remember correctly, was it was your job to understand the thinking of people who are who they say they are. That we are actually talking to authentic humans, but obviously from a business point of view, that has implications as well. So maybe we could talk a little bit around that landscape of people not necessarily always being who they say they are, and what that might mean for business.
Caroline Frankum 09:01
As we all migrated to a more digital online world during COVID, so did the fraudsters. We actually witnessed online surveys being not exempt from being attacked by fraudsters who were trying to earn money out of taking surveys. In fact, over the last six years, out-of-country hackers breaking into surveys and pretending to be people who they actually aren't has increased by over 700 times in the last six years alone. They are a very lucrative bunch of individuals, so they work in big groups, and I mean hundreds at a time doing one survey. And they tend to be based in offshore countries, and they can be earning up to $250,000 each a year. We all know that data is very powerful and very meaningful and very important to help clients make big, strategic decisions, and they've got savvy to this. So it's a game of always getting better on our part to ensure we stay one step ahead. But yes, please don't underestimate how smart, innovative, creative, but also well-paid as a result, these individuals are.
Tom Rourke 10:06
In earlier conversations around AI, the issue of bias had arisen, and I think one of the more positive things we said was the power of AI actually may be able to help us detect and interfere with the ability of these people to kind of misrepresent. So Elena, maybe you can talk a little bit about what the technology landscape looks like in this space.
Dr. Elena Huff 10:26
Absolutely, I'll be happy to. So, to your point about bias and data, it's a critical issue. We really can have far reaching implications, particularly when it comes from bad data or manipulated data. I know that Kantar has amazing technology that is able to address the immediate threat of this fraudulent data by detecting it and removing bad actors in real time. There is a dark data offering that's combined with this AI technology. The offering is focusing on the comprehensive dark data governance strategy that complements the AI technology, and by providing this overall perspective, we can develop and deliver a long-term solution to manage and reduce dark data, obviously enhancing both operational efficiency and sustainability goals. Often, that bias is introduced from the incomplete, inaccurate, and underrepresented data sets. AI systems are trained on such data. They can perpetuate or even amplify those biases. So, this is particularly dangerous in decision-making processes across multiple industries. My current focus is on finance, healthcare, hiring and law enforcement. So this is where the biased data can lead to extremely discriminatory outcomes.
Tom Rourke 11:43
And Caroline, you mentioned I think yesterday, I was quite surprised, just how narrow the margins are here in terms of detecting these kind of bad actors. I seem to remember what you kind of said about this. There's quite a heavy investment from your part, but you really are having to work at quite a pace to stay ahead of the challenges.
Caroline Frankum 12:04
From what we've seen over the last two years, you have to be able to make a decision on whether a respondent to a survey is a bad or good actor in milliseconds, not seconds. So that's a heavy lift for a human being to do, which is why we've invested heavily in AI as a purpose for good to identify these bad actors before they do serious damage in surveys. So ideally, we stop them from joining the panel when they first sign up, because we can identify that there are some behaviors that just don't make sense. For example, we tend to see a lot of 19 year old CEOs in some parts of the world sometimes. And while 19 CEOs do exist, and that's amazing, it's not at the scale that we often see them at in certain parts of the world. Our antifraud software solution is called Qubed. And that is powered by five deep neural networks. And why that's amazing is it's been trained. It's machine algorithms that have been trained on at least five years of survey behavior, and it will make a decision in milliseconds of whether this panelist in the very first start of the survey is a real panelist or a fraudulent actor, and then it routes them out, makes that decision in milliseconds, and routes them out in seconds. So that's the speed, unfortunately, we have to work at nowadays to really ensure that you have real people who really are who they say they are in your survey, so that you can source data from them that you can trust, rely, and depend on.
Tom Rourke 13:23
Thanks, Caroline. And I think Valeria, you maybe may use some examples just to where it isn't simply about these bad actors making a living off fraudulent data, but there's probably a darker picture again, where particularly around the representation of particular groups, opinions can be swayed at a political level by people acting in this way.
Valeria Piaggio 13:44
Exactly. Imagine when these bad actors are representing people who are already voiceless, who are underrepresented, sometimes not well understood because people don't have references. Sometimes you know that information stands as the truth.
Tom Rourke 14:05
Maybe I could ask also about governance for a technology point of view, information security has become a huge issue, and we all talk about how it's a boardroom issue. But do you think there's a need for a broader awareness in organizations around the importance of and recognizing and dealing with bad data?
Caroline Frankum 14:22
There are three things that have been changing the market research world and everyone in it. The first one is the fight for eyeballs is more intense than it ever has been. You know, expecting a person to give up their precious time and data to fill in a survey on a topic that suits us when it suits us is just not the norm anymore when you're up against the likes of Tiktok and Netflix. So that's the first thing. The second thing is data legislation privacy is changing rapidly and inconsistently from one market to the next, and there are more and more new laws coming in when it comes to API from the tech side of things, literally on a weekly basis, particularly in China right now. So we have to ensure that we collect data compliantly, irrespective of which market and which human in which market we're talking to, which isn't easy. And then the third thing, of course, we've talked about is the increased uprise, the substantial uprise, in exponential fraud. So, hackers coming in and really behaving badly and unfairly and misrepresenting world population. So the triangulation of those three things actually makes our jobs more challenging than they ever have been. I've been doing this now for 14 years, and I have to say it's never been this challenging, but also this exciting, because, as I said, you can use AI as a purpose for good if you lean into it the right way and bring that right balance between machine and human collaboration.
Tom Rourke 15:38
Valeria, maybe I can come to you. Caroline mentioned there the kind of regulatory landscape, and often we find there's a game to try and stay ahead of or keep up with all the regulatory changes in whatever field we're in, but often that regulation can be quite disjointed and not always as helpful as the intention was behind it. What's your view on regulation at the moment in terms of whether it actually does genuinely help, or are there other things we need to be thinking about, or other directions we need to be moving in?
Valeria Piaggio 16:09
Yes, one of the things we are struggling with from a diversity, equity and inclusion perspective, is our ability to even ask about certain characteristics of respondents. In many countries, it is illegal to ask about someone's ethnicity or someone's sexual orientation, or to even ask about sensitive social topics to the point that creates limitations to global studies and understanding some of these key issues. So suddenly those regulations, even though they are meant to protect people, they do create challenges for researchers and those who are meant to use that information, but certainly they are positive in the sense that they are also designed to protect people's information and people's identity.
Tom Rourke 17:04
As we look to the future and look at both the developments in technology, but also the challenges that we need to be addressing. Elena, what does progress look like from your point of view in this domain?
Dr. Elena Huff 17:16
I think it's so important for us as humans to remember that it's a great world, and future is about leveraging the strengths of both to drive innovation, efficiency, growth, but for this collaboration to thrive, companies must prioritize ethical governance, upskilling and AI transparency, making sure that AI systems are designed and used responsibly. And this balance will help ensure that AI supports human workers rather than recycling them, and this will foster a future where technology empowers people to reach new heights. In the future, progress and data management will hinge on three key areas that data governance, innovation and ethical use.With the global data volume set to grow by I believe 180 zettabytes by 2025, companies truly must develop stronger governance to manage this explosion responsibly. AI-driven tools will be critical for harnessing data, but ethical oversight is essential to avoid the risks of biased or bad data and progress will depend on building systems that balance technological advancements with robust frameworks for governance, inclusivity and sustainability to meet future challenges.
Tom Rourke 18:32
Elena, I'm smiling here because zettabyte is a new one on me, but I can just assume it's a really enormous number, and I lined up with the fact that data centers have as bad a footprint as as the airline industry. Thank you for that. So Caroline, and maybe I could turn to you next as you look forward acknowledging all of the kind of challenges that you're facing and have faced. What does progress look like from from your point of view?
Caroline Frankum 18:58
I just think you know our culture and our practices that shape the data we collect and interpret have a really important role to play in understanding how we measure properly what matters, and also how we influence how we measure it. So it's bringing that cultural and technological piece together. For example, in my sector, we're looking at synthetic panels, and in what world, how could they be used to really bring more insights at scale to people faster? And if you think about using synthetic panels so you have a smaller representation of real humans, but you know that they're the real people who are who they say they are, you can then use those to build those out in a synthetic but meaningful way to really understand how you can get education to all kids in the world, or healthcare to more people across the world. So I do think we have a really important role to play as a purpose for good here by bringing out the benefits and the collaboration and the strengths of humans and machines together.
Tom Rourke 19:58
So perhaps if I turn last to you, Valeria. Again, you have a unique position here in terms of the focus on underrepresented groups and on the concept of inclusion. From your perspective, what does real progress look like over the next two to five years?
Valeria Piaggio 20:15
Real progress would be to use data for good, for actually helping people be better understood, and for companies to also counter the misinformation we're seeing in societies today. One of the things that really worries me these days is the amount of extreme activism that is at play and how much uncertainty it generates, how much fear, to the point that business leaders and agents of change are really challenged to respond in the world of diversity, equity and inclusion. These extreme forces are actually pressuring business leaders to abandon their support for communities that are often marginalized, and I think that is what I hope will take place, because what we know is that growth for organizations and for brands will come from these underrepresented yet high growth populations. And we also know that diversity and inclusion is very important for younger generations, for millennials, for Gen-Z And the next one, Gen Alpha. So using this kind of data to make good decisions and bring about positive social change is what I see is the opportunity to make progress.
Tom Rourke 21:54
So, Caroline, Valeria, and Elena, thank you so much for your time this afternoon to discuss a fascinating subject, I certainly have learned a great deal, including the differences between dark data and bad data, and also the consequences to both businesses and to groups of individuals of us not having appropriate governance and forethought in how we gather data and from whom we gather it. Certainly in previous editions of The Progress Report, we spent a lot of time talking about ethics and responsibility in the application of AI, but the quality of the inputs to those AI processes really are the key determinant to the quality of the outcomes. So thank you all for listening to the latest episode of The Progress Report. If you've enjoyed today's episode, please be sure to like, subscribe and share so that you don't miss any of our future conversations.