Using Twitter to track epidemics

Monday, April 6, 2020

Researchers at Arizona State University are harnessing the power of technology to track and predict trends in everything from disaster response to epidemic outbreaks in real time, using data collected from Twitter.

Their website, where users can see visualizations of daily and weekly flu counts by city, state and region, recently went public.

While the technology is currently being used only to track general cases of flu, researchers say it has the potential to track the spread of specific strains of illness — such as COVID-19 — as well as misinformation related to outbreaks.

headshot of ASU professor

Feng Wang

“Right now, we want to test it and see how people receive it,” said Feng Wang, associate professor at the School of Mathematical and Natural Sciences at ASU’s New College of Interdisciplinary Arts and Sciences. “We wanted to make it available to the public online so that we can be aware of how it’s performing and make it better. We don’t know what future collaborations might come out of it, but we wanted to open that door.”

Using Twitter to track flu outbreaks is not entirely novel. Scientists at other institutions, including Johns Hopkins University, have also been working in this space. What distinguishes the work being done at ASU from others is that the data output is in real time, is accessible to the public and incorporates a prediction module.

The reason Twitter is used as the source for data mining is because it is the quintessential model for information sharing and diffusion. Wang and her collaborative team that includes mathematicians, statisticians and computer scientists first received a National Science Foundation grant to track information diffusion via Twitter in 2013. In 2017, their focus turned to flu tracking.

“Information diffusion and flu diffusion are similar,” Wang said. “So we started thinking, no matter what type of flu is going around, it’s going to bother us for a long time; and each year, you see some new type of flu. So that’s why we think this has long-lasting research potential.”

The team received another NSF grant in 2017 to fund the project “An Integrated Framework of Network Theory, Data Mining and Partial Differential Equation for Early Detection of Epidemic Outbreaks” through 2020.

The process begins with the data collection engine, which communicates with Twitter’s server to collect tweets that contain key words related to the flu. Then the data cleaning module analyzes those tweets and filters out the ones about flu awareness from the ones reporting actual flu cases. The next step is to geotag the tweets to determine the location of flu cases.

From there, the mathematical modeling module employs a partial differential equation to compare the number of flu case tweets collected with the Centers for Disease Control and Prevention’s daily report of flu cases to predict weekly flu trends. All of this data is then fed into the visualization module, which plots the data on a graph or a map.

flu tracker infographic

Infographic by Alex Cabrera/ASU

ASU undergraduates are involved in nearly every aspect of the project. Many of them are able to participate through the New College Undergraduate Inquiry and Research Experiences (NCUIRE) Program, which Wang credits for providing students with the opportunity to work on meaningful research.

“Every line of code was written by undergraduate students,” she said. “It’s real industry experience because we’re building this real tool. And I’m very proud of what they do, and I think they’re proud of themselves, too, because they see what they can do. It’s not something abstract. The output is real; it’s something people can actually use.”

Applied computing senior Andrew Lamontagne has been working on the project for a few semesters, handling raw data on the back end, from collection to processing to cleaning to visualization.

“A lot of times in the classroom, problems are already half-solved,” he said. “It’s like taking a puzzle box and half the puzzle is already solved; you just have to place the rest of the pieces in. With this, you don’t even know what the puzzle is or where the pieces are. You have to go and find them and build the puzzle yourself.”

And their participation is reciprocal — Wang often uses the challenges her undergraduate researchers experience to generate course material that allows all of her students to benefit.

“Sometimes they are able to identify the difficulties and the meaningful questions better than I am,” she said. “And then I can bring that back to the classroom and prepare students much better for the job market. It’s a good model of how we can integrate undergraduate research into teaching.”

Many publications have already come out of the flu tracking project, but Wang and her team are currently working on a paper that looks at user mobility via commuter networks — where and how far they travel from their home each day for work, errands, etc. — to reconsider how we define regions in relation to flu outbreak.

Going forward, they hope to explore how the technology might help track misinformation.

“We have already built a very solid foundation to support that kind of research,” Wang said. “So the possibility is there for this tool to benefit government agencies by acting as another source of flu indicator data.”

Top photo courtesy of Pixabay