Adaptive Information Extraction from Social Media for Actionable Inferences in Public Health

Computer Science Department
Columbia University

Project Summary

Social media is a major source for non-curated, user-generated feedback on virtually all products and services. Users increasingly rely on social media to disclose serious real-life incidents, such as a food poisoning incident at a restaurant, rather than reporting to official government channels. This valuable user-generated information, if identified reliably, may have a dramatic positive impact on critical applications related to public health—the family of applications of interest in this project—and beyond. For example, a local health department might launch an investigation of a potential foodborne illness outbreak at a restaurant if compelling evidence supporting the investigation can be inferred from social media. This project addresses fundamental research challenges associated with processing social media data to produce actionable inferences, where the output of the process leads to concrete actions in the real world. In addition to producing broadly applicable research results, the project has as its centerpiece a critical public health application, namely, detecting and acting on foodborne illness outbreaks in restaurants.

Overall, this project develops (1) strategies for entity-centric modeling and selection of social media, to cover the vast volumes of user-produced content across sources; (2) non-traditional information extraction strategies over informal, noisy, and ungrammatical text, as well as learning-based approaches to produce actionable, entity-centric inferences for public health applications; and (3) methods for general online active learning and search that are tuned for detecting the rare and infrequent occurrences required for actionable inferences. Furthermore, this project centers around (4) an application, detecting and acting on foodborne illness outbreaks, in a joint collaboration between Columbia University and the New York City Department of Health and Mental Hygiene (DOHMH). This collaboration provides a robust, real-world platform for a continuous, end-to-end evaluation of the novel research results as applied to a large-scale data science problem, a rare opportunity in the evaluation of Computer Science research. This collaboration includes the development and deployment of a system with a direct impact on public health and society. A proof-of-concept prototype is already in use at DOHMH and has helped identify and act on several previously unknown outbreaks. The public health findings from the project are shared across governmental agencies, following DOHMH's best practices. Developed code and annotated datasets will be shared with other researchers and agencies.

Acknowledgments: This research is supported by the National Science Foundation under Grant No. IIS-15-63785. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Our colleagues at the New York City DOHMH are supported by the Alfred P. Sloan Foundation under Grant No. Gā€2015ā€14017 managed by the Fund for Public Health in New York, Inc.

We thank Yelp for generously providing us with access to its raw feed of business reviews for New York City and Los Angeles County.

People at Columbia

Collaborators at the New York City Department of Health and Mental Hygiene

Collaborators at the Los Angeles County Department of Public Health

Publications

Datasets

Presentations

Press, Press Releases, Tweets, etc.

Code

X (formerly Twitter) Account, Etc.


Last updated: February 2, 2024