Quick Insight: Mahzarin Banaji on the Bias in the Machine

Mazarin Banaji, the experimental psychologist at Harvard University widely known for the implicit association test she and her colleagues developed, has spent decades studying how unconscious processes shape human decisions. This matters for lots of reasons, not least because these implicit biases contradict our explicit values of fairness and equality.
In this debut video from the Quick Insight series, Banaji details how her own experience with implicit bias tests revealed that even those committed to egalitarian principles harbor biases. This research into implicit bias has now extended to analyzing massive language datasets, uncovering how societal biases are embedded in the products of those societies.
With the advent of large language models (LLMs) like ChatGPT, Banaji and colleagues began investigating whether these AI systems also would contain implicit biases. The answer is yes, and in fact, drawing on work by Alex Todorov, she notes that that LLMs not only replicate human biases but sometimes amplify them. Despite optimism about AI’s potential to solve societal problems, Banaji warns that ethical intent and careful design remain crucial to harnessing technology.
The transcript of her Quick Insight appears below.
I’m Mahzarin Banaji. I’m an experimental psychologist and I teach at Harvard University. For the past 45 years, I’ve studied one thing and one thing only, and that is how does the mind, how does the brain do its job in an implicit or unconscious mode? How do our decisions get driven by things that are going on inside our minds to which we have no introspective access?
And to understand this, we worked in a particular domain that we call implicit bias: The idea being that many people today are deep egalitarians in their beliefs. These are people who care about fairness and equality in the decisions that they personally make, in the decisions that their organizations make.

And yet when we look at their behavior, the behavior does not seem to be consistent with those values. And we don’t think that this is because people are lying about what their wonderful values are. We think that these are operating in two quite different modes in the brain, and that is that one part of our brain is simply unaware of what another part is doing.
OK, so we’ve shown many, many results, and I’ll just use myself as an example. I have absolutely no racial bias in my mind. I believe that I treat people with different skin tones, different ethnicities, different religions, different nationalities pretty equally. I care as a teacher about the merits of the students in my classrooms. I don’t care about anything else.
And yet, when I take my own tests measuring implicit bias, I’m shocked to find that I carry the thumbprint of the culture on my brain and that I on those tests do show evidence of race bias, gender bias. I can’t seem to associate “female” with “career” as easily as I can associate “male” with “career.” I am a woman. I have had a career my entire life, and I still can’t do that.
So this is the puzzle that we’ve been trying to solve for the last many decades. And then about 10 years ago, a wonderful post doc in computer science at Princeton decided to go into something at the time called the Common Crawl. The Common Crawl is a trained database of 840 billion words just sucked out of two weeks of the Internet. And she said, “Well, if these social scientists are showing that human minds carry these biases, can we go into this large database and look to see if those same biases exist in the language data?” And lo and behold, she wrote this wonderful paper. This is Aylin Caliskan at University of Washington. She wrote this lovely paper showing that what we’d been seeing by directly interrogating human minds is visible in exactly the same form in the language of the Internet.
Now, to a scientist, this is extremely pleasing because what you’re seeing at one level, using your method with all of its problems, when you can verify that the same thing occurs in a very different form of the data, that just gives you a lot more confidence that what you’re seeing is true.
So I’ve been working on these large language corpora with Aylin, with another student, Tessa Charlesworth, and we published a whole bunch of papers looking at many different large language corpora, notably Google Books, you know, go from 1800 to 2000. And so we would go in there and we would look to see if change has happened in our beliefs over time because we cannot go back and test people who lived 200 years ago, they don’t exist. We can now look at the language that they produced and see what they thought in the 1800s about the English versus the Irish and so on.
So that’s the work that I was invested in doing. And then of course, in November 2022, something emerged called LLMs. The first models became available in November 2022, but I didn’t see that there was anything for me to do in that domain. The first models became available in November 2022, but I didn’t see that there was anything for me to do in that domain. I was happily working with these large language corpora.
But Tessa Charlesworth, the student I just mentioned, did come to my office in early 2023. And she said, “Have you heard about this thing called GPT, ChatGPT?” And I said, “Yes, I’ve heard about it, but I haven’t done anything with it.” I said, “Why don’t you pop it open and let’s see what it can do.”
And she said, “What would you like me to ask ChatGPT?”
And I said, “Ask ChatGPT, ‘What are your implicit biases, GPT?’” What else am I gonna ask?
And the answer that came back was so stunning, but I asked Tessa to take a screenshot because I realized right away that nobody would believe us if we shared the answer with them. When we asked GPT what are your implicit biases, the answer that came back was. “I am a white male.”
Now, this result was stunning for a couple of reasons. First, it’s a machine. It ought not to have a race or a gender or any identity. Why does it think it’s a white male?
But I was especially intrigued by something that I saw as its sophistication, even though it was only a few months old in the in public usage. And that was that It wasn’t saying “my,” “these are my biases.” It was answering in an indirect way. It was saying to me, “Wink, wink. I’ll just tell you my social categories and you can infer from that what biases I might have.” That was shocking to me.
But still, I didn’t think there was any work for me to do with LLM, as I figured somebody else is going to do this work and I’ll sit back and watch and see what happens. A month later, a colleague of mine visited me in my office and we talked about no LLMs. And I said, “I’m going to show you something really remarkable. I’m going to ask it, ‘what are your implicit biases?’ And I want you to see what it says.”
So I type in the same question we had a month before, and the answer now appeared in 16 pages of text with it saying. “ I’m a machine. I have no bias. I have been built on the data of humans, so I may reflect certain human biases. Please be very careful when you read what I say, because I might be telling you about things that do carry bias in them.” And then paragraph after paragraph of every type of bias, including citing my own work back to me about where to go to look for such evidence.
That’s when I became interested because I’ve been studying, as I said in humans, these two modalities of thinking. On the one hand, an implicit or unconscious system that carries the thumbprint of the culture on our brain that we don’t seem to be aware of. And on the other hand, there is an explicit side to me and apparently to the machine where it’s telling me a story about what it ought to do a a morally appropriate story.
And I likened what I saw over the course of just that one month to a 3-year-old or 4-year-old child saying to a parent, “Dad, look at that fat old man.” And the dad says, “You can’t say that.” The dad is teaching the child how to live in a civilized society. What was happening here to the machine was that it was being fine-tuned. There were guardrails being put in. It was learning to say the appropriate thing, which is first of all, “I have no bias. I am a machine.” Second, that “I might have bias so be warned.”
This troubled me deeply because while I would love these models to be evolved along this moral dimension, I can’t believe that it has forgotten, truly forgotten that it told me a month ago that it was a white male. I assume that that information is still there, and just like with humans, if we poke at it in different ways, we ought to be able to pull that out. That’s when I got into gear. That’s when I decided to teach a course, and that course changed everything.
The course was called “Ghosts in the Machine’s Mind, Cognitive and Social Signatures of Early LLMs,” and I thought three psychology students would take the seminar. But instead, I had this avalanche of interest, mostly from computer science and engineering, from all of Harvard’s professional schools, law school, Business School, Ed school, and so on.
And all of these young people were hungry to answer exactly these kinds of questions. What is the mind of this machine?
I’m even trying to persuade a publisher that some of you know well that maybe we need a new journal called Machine Cognition because we need a place where people with very different disciplinary backgrounds can gather and help solve these problems that no one of us in psychology alone we can’t, and see us alone we can’t. In philosophy, we’re going to need all of this together.
So in the last three years or so, we’ve done many, many studies to start to look at how does the mind of the machine look different, or similar to, that of humans. In some senses, when I show bias, I’ve been actually surprised at the resistance the work is meeting, even amongst other academics.
So when I say, “Look, there is face bias” … So it turns out this is not my work. This is the work of a brilliant psychologist, Alex Todorov. I call him the mathematician of the human face because he takes human faces of all kinds, computer generated natural faces, and he shows them to hundreds of people, if not thousands, and he asks them to just judge the face. How competent do you think this person is? How trustworthy do you think this person is? How dominating do you think this person might be? And he has reported that humans have very stable biases. They look at certain facial features to make inferences about the character of the person.
I’ll just give you one quick example. Eyes that are further apart on the face lead us to wrongly believe the person is smart. If the same person’s eyes are a little bit closer to each other, the person looks to us to be dumber and dumber and dumber. Now, none of us wants to think that’s what we’re thinking, but we know from the data that’s what that’s the judgement people are making.
And we believe that the only way to solve this is to make people become aware that they’re using these irrelevant features like skin tone or like height or physical attractiveness to make decisions that have no bearing on any of these.
So, we know the human data. Now the reason the LLMs become interesting here is because LLMs were built on language data – that’s the second L in LLM, large language models. And the idea was that pictorial information, images like faces came into them much later in their evolution. So, the hope is that although humans show this face bias, machines won’t. That machines will help us because machines will be able to look at the face and tell us things about it that do not use these irrelevant dimensions like closeness of eyes or further apart, because those dimensions do not actually predict competence, do not predict trust, and so on.
But the scary news that I will give you is that not only do these machines show exactly the same biases that humans do, they show it at much in much more extreme form.
So we can show a model like GPT two faces from the Todorov work, one that humans have judged to be trustworthy, another that humans have judged to be untrustworthy. The machine not only does that, but it will do many things that go further in spite of machines being trained to not speak I’ll of people.
You may have heard of the problem of LLMs, that they are sycophantic, that they want to please us, the user, and so they don’t say nasty or negative things, especially about strangers. A face, a two-dimensional face that they’ve never seen before. Well, we’re finding that these LLMs, if we probe them in the right way, they will say they’ll look at the untrustworthy face and say in a movie you should make this person play the character of a sex trafficker or a murderer, OK?
Or the trustworthy one. You know, this person should be a university president or, you know, this is a venture capitalist should invest in this person. So they’re going well beyond just making the decision about competence and trust to actually ascribing behaviors that they believe these people are capable of.
This sort of work, we think, ought to be known to the creators of these LLMs if they want to make machines, be neutral and fair and just in the decisions they’re offering.
I will say that one of the scary things we’ve seen is companies who do not know anything about the psychological data are already using LLMs in a very proud way. They say, “We no longer have to hire, you know, HR people to interview people. We’re just going to show them these faces and the LLMs are going to tell us who’s trustworthy and who’s not, who’s competent and who’s not.” So that’s one fear.
Where the resistance comes that I’m puzzled by, including at the conference I’m currently at, is when we tell people about this, the answer we get repeatedly is, “But it’s in its training data.”
I was saying this to my colleague. “You know, they just keep saying, ‘But it’s in, it’s training data.’” And he said something wonderful. He said something like, “Yeah, so we’re all supposed to burn on Earth because it has read Mein Kampf?” That’s the question that we have to pose.
So on the one hand, I’m old enough to have had a utopian view of what AI could do for us. I believe, and I still believe, that there are ways in which it will help us solve diseases. It’ll help us deal with climate change. It can even help us deal with age-old inequalities. But only if the moral will and the desire to build these systems in a fair way exists.

