Adaptive Information Extraction from Social Media
for Actionable Inferences in Public Health
Computer Science Department
Columbia University


Project Summary

Social media is a major source for non-curated, user-generated feedback on virtually all products and services. Users increasingly rely on social media to disclose serious real-life incidents, such as a food poisoning incident at a restaurant, rather than visiting official communication channels. This valuable user-generated information, if identified reliably, may have a dramatic positive impact on critical applications related to public health---the family of applications of interest in this project---and beyond. For example, a local health department might launch an investigation of a potential foodborne disease outbreak at a restaurant if compelling evidence supporting the investigation can be inferred from social media. This project addresses fundamental research challenges associated with processing social media to produce actionable inferences, where the output of the process leads to concrete actions in the real world. In addition to producing broadly applicable research results, the project has as its centerpiece a critical public health application, namely, detecting and acting on foodborne disease outbreaks in restaurants.

Overall, this project develops (1) strategies for entity-centric modeling and selection of social media, to cover the vast volumes of user-produced content across sources; (2) non-traditional information extraction strategies over informal, noisy, and ungrammatical text, as well as learning-based approaches to produce actionable, entity-centric inferences for public health applications; and (3) methods for general online active learning and search that are tuned for detecting the rare and infrequent occurrences required for actionable inferences. Furthermore, this project centers around (4) an application, detecting and acting on foodborne disease outbreaks, in a joint collaboration between Columbia University and the New York City Department of Health and Mental Hygiene (DOHMH). This collaboration provides a robust, real-world platform for a continuous, end-to-end evaluation of the novel research results as applied to a large-scale data science problem, a rare opportunity in the evaluation of Computer Science research. This collaboration includes the development and deployment of a system with a direct impact on public health and society. A proof-of-concept prototype is already in use at DOHMH and has helped identify and act on several previously unknown outbreaks. The public health findings from the project are shared across governmental agencies, following DOHMH's best practices. Developed code and annotated datasets will be shared with other researchers and agencies.

Acknowledgments: This research is supported by the National Science Foundation under Grant No. IIS-15-63785. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.


At Columbia:

  • Luis Gravano (PI)
  • Daniel Hsu (Co-PI)

  • Tom Effland (PhD student)
  • Lampros Flokas (PhD student)
  • Yogesh Garg (MS student)
  • Mohip Joarder (MS student, graduated)
  • Anna Lawson (undergraduate student, graduated)
  • Alden Quimby (undergraduate student, graduated)
  • Vipul Raheja (MS student, graduated)
  • Henri Stern (undergraduate student, graduated)

Collaborators at the New York City Department of Health and Mental Hygiene

  • Sharon Balter
  • Katelynn Devinney
  • Vasudha Reddy
  • and many others


Luis Gravano