Social Science Bites

Safiya Noble on Search Engines

January 8, 2024 1034

The work of human hands retains evidence of the humans who created the works. While this might seem obvious in the case of something like a painting, where the artist’s touch is the featured aspect, it’s much less obvious in things that aren’t supposed to betray their humanity. Take the algorithms that power search engines, which are expected to produce unvarnished and unbiased results, but which nonetheless reveal the thinking and implicit biases of their programmers.

While in an age where things like facial recognition or financial software algorithms are shown to uncannily reproduce the prejudices of their creators, this was much less obvious earlier in the century, when researchers like Safiya Umoja Noble were dissecting search engine results and revealing the sometimes appalling material they were highlighting.

In this Social Science Bites podcast, Noble — the David O. Sears Presidential Endowed Chair of Social Sciences and professor of gender studies, African American studies, and information studies at the University of California, Los Angeles — explains her findings, insights and recommendations for improvement with host David Edmonds.

And while we’ve presented this idea of residual digital bias as something somewhat intuitive, getting here was an uphill struggle, Noble reveals. “It was a bit like pushing a boulder up a mountain — people really didn’t believe that search engines could hold these kinds of really value-laden sensibilities that are programmed into the algorithm by the makers of these technologies. Even getting this idea that the search engine results hold values, and those values are biased or discriminatory or harmful, is probably the thrust of the contribution that I’ve made in a scholarly way.”

But through her academic work, such as directing the Center on Race & Digital Justice and co-directing of the Minderoo Initiative on Tech & Power at the UCLA Center for Critical Internet Inquiry and books like the 2018 title Algorithms of Oppression: How Search Engines Reinforce Racism, the scale of the problem and the harm it leaves behind are becoming known. Noble’s own contributions have been recognized, too, such as being named a MacArthur Foundation fellow in 2021 and the inaugural NAACP-Archewell Digital Civil Rights Award winner in 2022.

To download an MP3 of this podcast, right-click this link and save. The transcript of the conversation appears below.

David Edmonds: Type ‘social science podcasts’ into Google, and I’m pleased to say that Social Science Bites appears on the front page, evidence surely that Google is a search engine to be trusted. But Safiya Noble, who teaches at UCLA, is not so sure. She’s the author of Algorithms of Oppression. Safiya Noble, welcome to Social Science Bites.

Safiya Noble: Thank you. It’s such a pleasure and an honor to be here, I’m so grateful.

Edmonds: This interview is going to focus on search engines and bias. By a search engine, we mean what exactly?

Noble: Well, you know, a search engine is really, and I think for those who are listening who remember the internet before the search engine, we will remember that there were lots of websites, web addresses, all over. The way we organized information on the internet was to build complex directories, they were often built by librarians or subject matter experts. These were curated heavily by communities of practice we could say, or hobbyists, or people who knew a lot about different kinds of things. To find information, you’d need to know a web address.

And then the search engine comes along and it’s a kind of artificial intelligence or a set of algorithms that start to index all of these links all over the web and try to cohere, or make sense of them, in some type of rank order. You can figure out, allegedly, the kind of the good stuff from the junk. And the way that the dominant search engine that most of us use, which is probably Google in the Western world — or for sure in the United States — would be to their crawlers. Their web crawlers would look and see which sites were pointing to other sites. And of course, this was the process called hyperlinking. So you had a blog, or you had a website, and then you would have links to other people’s information or websites or web addresses.

And the process of hyperlinking was kind of allegedly like a credibility factor. It kind of says, “Well, if I’m pointing to David Edmonds’ web site, then you know you can trust it, if you trust what I have to say.” And of course, now it’s so much more sophisticated, because we have these things we called ‘search engine optimization’ cottage industries, where you can purchase keywords to help optimize and help make sure that the algorithm finds your website out of millions of potential websites that are using the same kinds of words that you’re using on your site. So now we have a huge industry, an SEO industry. But I think for everyday people, we open up a browser, there’s a plain white screen with a box, we type in some words, and we get back what we think is the very best kinds of things that you can get back. And that’s really the way people experience search engines.

Edmonds: And do we know what kinds of searches are the most common? Is there data on that? What are people looking for generally? Or is that impossible to say?

Noble: No, I don’t think that’s impossible to say. I mean, the last time I looked at what Google was reporting out (in my work, I kind of focus on Google just because they’re the largest) the most frequent kinds of information and research are health information. I know this is probably hard for you to believe, but in the United States where we don’t have a nationalized health care system, that health care is extremely precarious for most people, even if you have insurance. You find that people really use search to diagnose themselves or get medical advice or help. And that’s one of the most prevalent series of search terms that are looked for.

Edmonds: Right, how fascinating. Now, obviously, when you do a Google search, you get dozens, even hundreds of links. How many of these links do people typically scroll through? Do they get beyond the first page of search results typically?

Noble: No, the majority of people who use search engines — we know this from search engine use studies that are conducted by Pew Research, for example, they periodically do the search engine, user studies — and we know the majority of people do not go past the first page. So what happens on the first page of search results is extremely important. As far as I’m concerned, that’s the place to look and study because that’s where most people are.

Edmonds: OK, so most of your research is on search engines and bias. Give us a couple of examples of the kinds of bias that search engines throw up.

Noble: Yeah well, let me just say that even the notion of search engines being biased is probably something that I helped introduce into our lexicon [from work] a handful of us who studied search engines more than a decade ago. I will say that now it’s kind of more of a common sense understanding that search engines favor, let’s say, companies that pay them the most money in the ad words system that Google in particular, but other search engines also use. So they certainly favor larger companies over smaller businesses, unless you know the keywords to search to find that small business, like you know the name of that business. So they’re always going to kind of favor the people who paid them to be made more visible.

In my work, you know, I conducted this study over a decade ago now that was really the first look at how people, especially racialized people, or ethnic minorities, racial minorities, and women, girls, were profoundly misrepresented in search engines. I took the U.S. Census, and all of the categories, racial and ethnic categories there, and I took the gender categories available then and I just kind of crossed them and did a lot of search, you know, dozens and dozens of searches. And what I found was that Black girls, Latina girls, Asian girls, in the U.S. were almost exclusively misrepresented as pornography, or as sexual commodities.

And, of course, this opened up for me a pathway to talking about what happens when we rely upon something like a search engine to tell us about other people, other communities, ourselves. And in this way, I was arguing that these are profoundly biased toward the interests of the most powerful industries, which of course, the most powerful industry over determining what women and girls of color seem like on the internet is the porn industry. And how unfair is that? That women and girls be misrepresented so profoundly even that conceptualizing that I will tell you, it was a bit like pushing a boulder up a mountain, people really didn’t believe that search engines could hold these kinds of really value-laden sensibilities that are programmed into the algorithm by the makers of these technologies. Even getting this idea that the search engines results hold values, and those values are biased or discriminatory or harmful, is probably the thrust of the contribution that I’ve made in a scholarly way.

Edmonds: So one types in ‘Black girls’ into Google, and one gets what, a whole bunch of links to pornographic sites?

Noble: Yeah, in the first pre-study that I did before kind of a more formal study, and this would have been in 2010, was ‘hotblackpussy.com.’ That was the first thing you got when you’d Google ‘Black girls.’ And then shortly thereafter, that got moved out, and it was ‘sugaryblackpussy.com.’ So very graphic websites.

I wrote about this in 2012, because I wanted to show that even when you were looking for things, and this was like a small article in this feminist magazine called Bitch magazine, even when you looked for, let’s say, women athletes, the first things that would come up on the first page of search results would be ‘the 50 Sexiest Women Athletes,’ you know, so this like hyper sexualized way. And of course, the sexualized commodified version of us is actually what’s profitable. And that over time is actually quite dangerous in holding up stereotypes and people act upon the stereotypes that they are subtly socialized by, without even really realizing it.

Edmonds: It sounds like though it’s not exactly intentional. There’s an algorithm which links to all sorts of other sites, and maybe it’s driven by some kind of business criteria? But it’s not like Google or the other search engines are doing it deliberately to spike traffic, or have I got that wrong?

Noble: Well, it’s really hard to think about the intention of programmers. I mean, I’m not really interested in their intention, whether they meant to or didn’t mean to. You know, we always want to look at the outcome. That’s a more powerful place to actually then solve from, rather than knowing the hearts and minds of people. But I will say that the incredible lack of diversity in Silicon Valley makes it unsurprising that you would get these kinds of results and you still have a lot of problems with search, but I will just say, if you have 3 percent of your workforce is Black and Latino, and you have very few women or women leave the tech industry from just the constant bullying and harassment and terrible working conditions for them, then I think we should expect that the kinds of questions the kinds of testing, the kinds of curiosities about the product are just not going to happen.

Edmonds: So that explains why they might be indifferent to this phenomena. But do you think they actually benefit from it in an economic sense?

Noble: Oh, for sure. I mean, there’s no question that if you decide that the shortcut to Black girls or Latina girls or Asian girls, this is a good example. I mean, you’re imagining who your primary customer or let’s say, user, not even a customer, it’s really a user, because the relationship between the user of the product which is an everyday person, and the customers, which are the porn industry, who pay the bill to Google who paid the fees, if you look at it from that vantage point, then they have a sense of who they imagine is using their product. And that majority, so to speak, is going to win in terms of how they model their products.

So if you are part of a racialized or ethnic minority, well, you’re going to lose on the democracy like, you know, ‘majority wins’ kind of paradigm. So that’s not a helpful way of developing a product. That’s for an imagined majority, when the people themselves who are misrepresented, who might be in a minority would never really have the same amount of money to shift the results, like certain industries, or political actors or political action committees or other kinds of groups that are deeply moneyed.

Edmonds: So one thing I’ve never understood, and perhaps you can clear it up for me, if I type ‘Black girls’ into my Google search engine on my laptop, and you type ‘Black girls’ into your search engine, on your laptop, will we get the same list of searches as a result?

Noble: Well, you know, this is a great question. And when I was writing my book, and doing these studies, I was thinking about these digital traces that we leave and how much they influence what we find, because of course we know that every time we’re moving through the world, with a smartphone in our pocket that’s linked to, you know, it’s a signed into a big platform, whether it’s Facebook, or Google or YouTube, there’s going to be trackers about where we are in the world, how we move what we’re doing, there’s going to be all the previous searching that we’ve done, if our history is logged, which the majority of people it is. So there’s definitely a sense that the results are hyper custom to you. I mean, this is really what the public generally believes.

But the truth is, and there was this great study done by Matthew Fuller et al, where they did a mass study of thousands of searches, so they had thousands of people doing searches, and they wanted to compare, did they get the same thing or not? How much were the digital traces affecting? And what they found was that for the most part, overwhelmingly, people got the same set of results, but they might have been in different order. So there might have been a few things that were unique. But really, what’s happening when we think about customization, or personalization in search, which is kind of like a buzzword that was very popular about six or seven years ago, this was really about a message intended for the customers or the paying clients of big tech companies that we could customize and personalize. And really what they’re doing is they’re kind of aggregating people who have similar interests, and doing better targeted marketing for those paying customers to better defined groups, rather than just kind of like scattershot. So that’s about as personalized as it gets.

I will say that if you get into social media companies, I think they do a much, much more targeted kind of marketing and outreach to people than a search company really can do.

Edmonds: That’s interesting, because I think I really want my search engines not to be customized, because I would feel that if they were customized just for me, I was being manipulated in some way. Whereas if everybody else is getting the same results, then somehow I can trust the search engine more.

Noble: Yeah, I mean, I think that the kinds of if we’re gonna call it customization or grouping that we fall into, we take for granted and we don’t even think about and we think of as just like normal. So for example, if I do a search for pizza near me, it’s going to be pizza near me. It’s not going to be the universe of pizza. Like it’s not pizza in London. Do you know? So I think those kinds of geographic bundles, let’s say that we live in people take for granted as completely normal and want that. So it’s a little bit of a tension.

There’s kind of two things happening. If you look for something extremely specific, and many computer programmers who are listening know what I’m talking about, let’s say you put in a big string of code into a search engine, it’s going to show you where the code is broken. And a lot of programmers especially in the early days, really use search in that way to go like, let me just see, like someone has already fixed this. So if you’re looking for like a passage of literature, and you put that paragraph in, it’s going to point you to the book that it came from right or the source material.

So on one hand, if you’re very specific, and you have a lot of information, you’re going to really get something probably quite accurate. But if you are very general, then you’re much more likely to be under the influence of who has optimized for you to see them. And the thing we know from the information retrieval literature, now I’m just being a complete nerd when I say that, but there are scholars who just look at how people retrieve information, many people really just use the fewest keywords possible, because they also believe that that’s going to give them a bigger universe of things.

Edmonds: Right. So a Black woman who’s interested in athletics, and a white male, or a male who’s interested in pornography, they type in Black athletes, and they will get a similar result.

Noble: Well, if they type in Black athletes, they are for sure gonna get mostly men, because athletes, like many words, is always qualified by gender, the default of the noun will always be male. But if you type in Black women athletes, you are likely to get I believe, you may or may not. It’s interesting, I’m pausing, because so much of my work has been kind of attended to around Black women and girls at Google. I mean, people actually know that I’m out here talking about this. And so let’s say if you typed in Latina athletes, you’re probably gonna get sexualized kinds of commodified…

Edmonds: Whoever you are.

Noble: Yeah, whoever you are, yeah.

Edmonds: This is probably a stupid question. But talk me through the harms that this produces, it probably doesn’t need spelling out. But is there any evidence that it actually entrenches stereotypes and caricatures and so on?

Noble: Yes, we do know. We know from the work of researchers who study mass media, for example, that people who are exposed to stereotypic images, especially kind of racist stereotypes and sexist stereotypes, are more likely to be desensitized toward those groups. They act as a dehumanizing factor that allows for people to be less empathetic, sympathetic. We also know from the research and here we have people like Oscar Gandy, Jr., Herman Gray, people writing about race and media images, that when stereotypes are invoked in a society, and people live and are raised and reared under constant exposure to stereotypes, racist stereotypes, that that actually also affects the political climate of resource distribution.

So of course, this is one of the things we know powerfully all over the world, that when you dehumanize a community of people, you are more willing to accept their degradation and harm against those communities. This is one of the reasons why to me, it was extremely important. Because if you have people who have low contact and exposure to people that are different than them, and use these kinds of technologies as just like, “Let me just go find something out. Let me just learn something here about this group of people,” if those representations are misrepresentative, that will do damage. And I really believe that this is one of the reasons why we’re so passionate about talking about this.

And then more broadly, kind of what does that open up? And what does that mean? That we use advertising technology companies, like fact checkers, like knowledge machines, like a replacement for the library or other kinds of learning, because we will be susceptible. And I will tell you, there’s never been a time in more than a decade of talking to people that people have not scratched their head and said, “I never thought about what I got in a search engine as being biased in any particular way.” Because it’s so banal, in the majority of ways that we use it. Going back to the pizza. I used it gave me the pizza joint by me, you know, I used it and it gave me the truth, or the right thing or the thing I was looking for the majority of the time. So then, when I come up against stereotypes that I may not even know are stereotypes, I don’t really think twice about it, I don’t really question it. And that, to me, is what’s so dangerous.

Edmonds: Is there evidence on trust because, as you say, I know if I type into Google, best restaurants in London, or best places to stay in Budapest, and the first page come up, I trust it. I kind of assume that those links provide me with the best restaurants to eat in all the best places to stay in Budapest. What evidence is there that people are skeptical or that they have blind trust in their search engines?

Noble: People have a high degree of trust. In the last search engine study that I looked at by Pew, over 70 percent of people believe that search engines are trustworthy sources of information. And if you were to contrast that with something like social media companies, they would have more trust in a search engine than even in social media. Because since things like Cambridge Analytica, since the kind of political upheavals in the US and other places, the purchase of Twitter by Elon Musk, and Facebook’s Mark Zuckerberg being in front of Congress, people really understand there have been so many whistleblowers and leakages about behavioral manipulation. And even just on the face of it, people know, “I’m in the world of just the people I’m connected to, I know that that’s just gonna give me a point of view.” By contrast, that then makes search engines feel much more trustworthy and one of the ways that we know this is because we see people, especially in political contexts, like electoral politics, they turn to search engines, like fact checkers, to help them figure out what’s fact and what’s fiction. And of course, we know that search engines are highly susceptible to manipulation by political action committees and others, again, other parties that can spend a lot to optimize certain kinds of stories and messages to the front page.

Edmonds: So $64 million question, what can be done about it?

Noble: Ah, well, you know, I’ve always felt that, just like my feeling about speech, and let’s say, hate speech and other kinds of damaging speech, that you meet speech with more speech, we need more search engines, we need to disambiguate what these products are, so that if we want to shop for the best deals, we know to go to Google. I mean, listen, I use Google Flights, because I know it’s just scouring through all of the airline sales. Putting these search engines in perspective, then we would have things like if librarians and scholars and teachers were curating certain kinds of search engines, that would be something very different, it wouldn’t be commingled with advertising. So we need more search engines that have a specific use, so that we can, I think, disambiguate the advertising from the trustworthy information. Right now, what we have is something called indexing everything from propaganda to evidence-based research. It’s unhealthy for democracy, it’s unhealthy for people, it doesn’t help us get the best to use the knowledge that we need for the future. And I think if you think about overlays, new forms of search that are coming into play now, like ChatGPT, were going in the wrong direction. It’s making it even harder to do that disambiguation.

Edmonds: So that’s one solution, more search engines, but that’s obviously a long-term solution and Google has a dominant place in the market. Couldn’t Google just change their algorithm? How complicated could that be? To make sure that when you type in ‘Black girls,’ you don’t get the results that you’re talking about?

Noble: Well, it does do that. Now, in 2024, if you use type ‘Black girls,’ you’re not going to get all the porn. But the way that that has happened has been through the work of me and others who critique these systems. Our work just gets taken up by the companies, and then they just go and start fixing things or downranking things. Or let’s say there’s political pressure from governments, they’ll downrank information, so they kind of hand curate and solve these problems. And I have feelings about that because, you know, I’m a state worker, I’m a public servant, as a professor at UCLA. And we do the kind of cleanup, if you will, raising the alarm about our concerns and the things we see. And the companies do sometimes respond. But structurally, you’re not going to undo an ad tech business and turn it into a consortia of thousands of libraries, and scholars and deep thinkers who are going to try to surface the very best knowledge and information for you as librarians, you’re just not going to turn it’s a different thing. It’s an ad tech company.

Edmonds: “Don’t be evil” was Google’s former motto. You obviously don’t think it’s a motto that it’s lived up to.

Noble: Well, they don’t think it’s a motto that they lived up to because they changed it! They got rid of it. So that’s not really on me that’s on them.

I really try to not talk about these things purely like in a moral sense of bad or good. I’m just pointing to the evidence and saying, what do we do with this. What I’m most worried about is an over reliance upon YouTube, and search and social media to understand our world. And the many worlds we live in: TikTok, Instagram, I mean, I could really name a number of products. Of course, we need regulation, we have a lot of evidence of harm. Now, when we have evidence of harm in any other industry, pharmaceutical, automotive, you name it, we have regulations, we have fines, penalties, sometimes those products have to come off the market, they come off the shelves. The tech industry has been very, very effective at not having their projects seen as products that can be harmful in society. And therefore they are not subject to the kind of scrutiny that we put on consumer protection from harm. But I think it’s very important that we think of them in that way and stop using just words like “innovation” and “the future” around them, because those are marketing words that they use to eschew responsibility.

You know, I say, if Exxon has an oil spill here off the coast of California, they’re absolutely responsible for the cleanup. And they have to restore back to a state better than they found it. I mean, this is really, really important. If you think about the kind of propaganda that moves through our information ecosystems and the impact that it’s having on everything, really, there has to be accountability for that. And of course, we know that many of these companies, their products are directly implicated in the rise of authoritarian regimes, anti-democratic politics, many Silicon Valley and other Silicon Corridor leaders don’t even believe in democracy. They believe in the technocracy, [where] the whims of billionaires are actually much more important than the messy project of democracy. That’s frightening. And I think we have to come to terms with that and do something about it.

Edmonds: Safiya Noble, thank you very much indeed.

Noble: Thank you so much for inviting me.

For a complete listing of past Social Science Bites podcasts, click HERE. You can follow Bites on Twitter @socialscibites and David Edmonds @DavidEdmonds100.

Social Science Bites

Welcome to the blog for the Social Science Bites podcast: a series of interviews with leading social scientists. Each episode explores an aspect of our social world. You can access all audio and the transcripts from each interview here. Don’t forget to follow us on Twitter @socialscibites.

View all posts by Social Science Bites

Published

January 8, 2024

Sixth Edition of ‘The Evidence’: We Need a New Approach to Preventing Sexual Violence

By Joe Sweeney

Read Now

Stop Buying Cobras: Halting the Rise of Fake Academic Papers

Communication

July 22, 2024

Stop Buying Cobras: Halting the Rise of Fake Academic Papers

By Lex Bouter

Read Now

New SSRC Project Aims to Develop AI Principles for Private Sector

Industry

July 19, 2024

New SSRC Project Aims to Develop AI Principles for Private Sector

By Social Science Space

Read Now

Let’s Return to Retractions Being Corrective, Not Punitive

Communication

July 15, 2024

Let’s Return to Retractions Being Corrective, Not Punitive

By Tim Kersjes

Read Now

Uncovering ‘Sneaked References’ in an Article’s Metadata

Lonni Besançon and Guillaume Cabanac 355 Communication, Ethics, Industry

The authors describe how by chance they learned how some actors have added extra references, invisible in the text but present in the articles’ metadata, when those unscrupulous actors submitted the articles to scientific databases.

Read Now

Megan Stevenson on Why Interventions in the Criminal Justice System Don’t Work

Social Science Bites 1208 Public Policy, Research, Social Science Bites

Megan Stevenson’s work finds little success in applying reforms derived from certain types of social science research on criminal justice.

Read Now

Fifth Edition of ‘The Evidence’: Do Peacebuilding Practices Exclude Women?

Joe Sweeney 333 Bookshelf, Communication

The June 2024 installment of The Evidence newsletter puts post-war conflict resolution practices under the microscope – taking a closer look at how women are adversely affected by these peacebuilding exercises.

Read Now