Research

(Photo: Keaiyac/Flicker/CC BY 2.0)

How Far Can Twitter Reach in Good Survey Research?

By Karen Hilyard, David Broniatowski and Mark Dredze

April 1, 2015 3780

(Photo: Keaiyac/Flicker/CC BY 2.0)

For decades, one of the mainstay tools for survey research has been the landline telephone. A whole science and a multimillion-dollar industry are based on recruiting and surveying research participants by phone. It’s the way we measure who people will vote for, what products they buy – and our attitudes about health and medicine.

But today, fewer people have landlines, and laws that ban robotic dialing of cellphones mean they aren’t easily used for this kind of research. The people with landlines are disproportionately white, older and rural, so surveying them doesn’t lead to a sample that is representative of the entire population. But there is a place where we can find people who these landline surveys are missing: Twitter.

This article by Karen Hilyard, David Broniatowski and Mark Dredze originally appeared at The Conversation, a Social Science Space partner site, under the title “Survey research can’t capture everyone’s opinion – but Twitter can”

Our team – a public health researcher who studies vaccine refusal, a computer scientist who develops computational tools, and a systems integration expert – is working on a method to analyze tweets in real time. We plan to analyze attitudes about vaccines to figure out how to use Twitter as a social science research tool to provide an accurate, real-time sample of a much larger population.

What do we know about people on Twitter?

Wait, you say: Aren’t people already doing research using Twitter? Yes, but at the moment most Twitter research has significant limitations.

Researchers who lack computer science expertise may find tweets difficult to gather or categorize and difficult to quantify. Social scientists, in particular, need to be able to sort people by demographics and other characteristics.

But it is not always easy to glean that kind of information about people on Twitter. At the moment, we don’t have a good method for analyzing all of the information that people put out there every day. Our research team is trying to develop a way to do just that. The idea is that in the future other researchers can use the same method to study social science questions on social media too.

What can public opinion research do for health?

Think back to the early days of the AIDS epidemic, and how critical to the prevention effort it was for researchers to understand and correct people’s misconceptions about transmission of the disease, or to identify which groups were engaging in risky behaviors so they could be approached with tailored messages.

Imagine what a skewed picture we would have had if researchers had been talking about attitudes, beliefs and practices with too many older, white people in rural areas and not enough younger people and minorities in cities.

Policymakers and public health officials depend on accurate data to make decisions, and in a crisis, researchers can’t afford to get it wrong. This is why we are trying to find a way to use Twitter to fill in the gaps in our current surveys.

Getting a representative sample

Today we know that for a survey to be accurate, it needs to poll a large enough group of people. It also needs to sufficiently represent groups of people who tend to participate less often in research, like males versus females, or African-Americans versus white Americans, whose attitudes, beliefs and behaviors are therefore under-represented in traditional research.

In the early days of social research in the 1920s and 1930s researchers typically just found people available on the street, or mailed surveys to people whose addresses were easy to get because they had telephones, owned cars, or subscribed to magazines. These samples were often great indicators of what mostly white, affluent Americans thought, but were wildly inaccurate when it came to taking the pulse of the nation.

But a few years before World War II, pollsters like Gallup, Harris and Roper started using a new technique called sampling. This careful selection of survey participants allowed pollsters to interview a small number of people and generalize their views to accurately represent people across the country, using estimates based on statistical probability.

Like those early polls, our current survey methods aren’t that good at generating a representative sample of the population to measure opinions and attitudes. We think our approach to using social media will go a long way toward correcting this sampling problem.

Surveys need to go where people are: online

Young people and minorities are the heaviest users of social media platforms like Twitter. And groups of all ages, races and socioeconomic levels are becoming active social media users. We can use the wealth of information they produce to fill in the gaps in existing social research surveys. But first, we need to figure out how to do that.

To develop a method for using Twitter to understand public opinion in real time, our team is going take millions of tweets related to vaccinations, such as tweets from the 2009-2010 H1N1 (swine flu) pandemic, and we’ll compare them to past survey data collected during the pandemic, such as research about vaccine attitudes.

Then we are going to compare thousands of existing survey responses about H1N1 with these tweets until we can parse out patterns in the Twitter data that resemble proven patterns from the surveys.

Using geolocation, language recognition algorithms and existing knowledge of group attitudes and narratives, we are going to match data from Twitter with responses in surveys, paying special attention to demographic groups that are well represented online. If they match, then in the future we can go straight to the quicker, cheaper Twitter to get the information we’re looking for.

We want to get to the point where we can say “Here’s what we know from surveys: that during the swine flu pandemic, for example, Hispanic parents were far more worried about the vaccine than African-American or white parents, but still vaccinated their children in much higher numbers,” and then use algorithms, language recognition software and other analytical tools to detect the same attitudes from the same group on Twitter.

Such research could complement data from existing surveys, fill in our gaps in knowledge about groups who are often under-represented in surveys, and possibly be generalized to a larger population. And once we do all that comparison, going forward we could apply the same principles about group identity and opinion formation to Twitter data evolving in real time.

What will this data be good for?

In an unfolding public health situation, we can gather and analyze that data – statistically – the same way we could analyze the much slower, much more expensive survey data. This can help public health officials understand where and how to target messages on the fly. Although we are starting with attitudes and behaviors about vaccination, our tool could be used in the same way for any other health issue.

What this means to public health researchers, and to the taxpayers who often fund their studies, is reliable data, delivered much faster and at a much lower cost than what we have now. The bonus is because of the current demographics of Twitter, we will also have information on those hard-to-reach younger people and racial and ethnic minorities.

Karen Hilyard, David Broniatowski and Mark Dredze

Karen Hilyard is an assistant professor of health communication at the University of Georgia. David Broniatowski is an assistant professor in the School of Engineering and Applied Science at George Washington University. Mark Dredze is an assistant research professor of computer science at Johns Hopkins University.

View all posts by Karen Hilyard, David Broniatowski and Mark Dredze

Published

April 1, 2015

Qualitative Researchers Point Out The Limitations of AI’s Contributions

By Kelley Cotter, Ankolika De and Priya C. Kumar

Read Now

Who Do You Trust More: Your Colleagues or Your AI?

Artificial Intelligence

May 22, 2026

Who Do You Trust More: Your Colleagues or Your AI?

By Sungho Hong and Victor J. Drew

Read Now

Academic Authorship Confronts Ghosts, Gifts and Gender

Higher Education Reform

May 14, 2026

Academic Authorship Confronts Ghosts, Gifts and Gender

By Mary M. Hausfeld

Read Now

Critical Thinking

April 28, 2026

From ‘Which Database?’ to ‘Under What Conditions?’: Teaching Critical Thinking Through Search Tool Selection in an AI Age

By Yuqi He

Read Now

Celebrating the National Survey of Health and Development: 1946-2026

Robert Dingwall 4528 News, Public Policy, Research, Research Ethics

Eighty years ago this month, the United Kingdom pioneered a novel form of social science research, the life-long cohort study. The tool […]

Read Now

A Psychologist Explains Replication (and Why It’s Not the Same as Reproducibility)

Amanda Kay Montoya 24494 Industry, Research

Back in high school chemistry, I remember waiting with my bench partner for crystals to form on our stick in the cup […]

Read Now

A Look at How Large Language Models Transform Research

Ali Shiri 9424 Infrastructure, Innovation, Research

Generative AI, especially large language models (LLMs), present exciting and unprecedented opportunities and complex challenges for academic research and scholarship. As the […]

Read Now

0 0 votes

Article Rating

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment

Newest

Oldest Most Voted

Lady Light

10 years ago

I have confidence that Twitter datum research could be utilized as a brilliant solution to assemble datum concerning varies public health solutions to address people around the world. It is also a easy way to get the much needed datum to the public health officials regarding attitudes and beliefs and benefits about the vaccinations (2009-2010) H1N1 (Swine Flu) pandemic that will help families make more informed choices as to what their best options are to treating & protecting their children. Thereby, supplying policy makers and public health officials with quality data to make proactive decisions that generate sound result that… Read more »