is a major source for non-curated, user-generated feedback on
virtually all products and services. Users increasingly rely
on social media to disclose serious real-life incidents, such
as a food poisoning incident at a restaurant, rather than
visiting official communication channels. This valuable
user-generated information, if identified reliably, may have a
dramatic positive impact on critical applications related to
public health---the family of applications of interest in this
project---and beyond. For example, a local health department
might launch an investigation of a potential foodborne disease
outbreak at a restaurant if compelling evidence supporting the
investigation can be inferred from social media. This project
addresses fundamental research challenges associated with
processing social media to produce actionable inferences,
where the output of the process leads to concrete actions in
the real world. In addition to producing broadly applicable
research results, the project has as its centerpiece a
critical public health application, namely, detecting and
acting on foodborne disease outbreaks in restaurants.
Overall, this project develops (1) strategies for entity-centric modeling and selection of social media, to cover the vast volumes of user-produced content across sources; (2) non-traditional information extraction strategies over informal, noisy, and ungrammatical text, as well as learning-based approaches to produce actionable, entity-centric inferences for public health applications; and (3) methods for general online active learning and search that are tuned for detecting the rare and infrequent occurrences required for actionable inferences. Furthermore, this project centers around (4) an application, detecting and acting on foodborne disease outbreaks, in a joint collaboration between Columbia University and the New York City Department of Health and Mental Hygiene (DOHMH). This collaboration provides a robust, real-world platform for a continuous, end-to-end evaluation of the novel research results as applied to a large-scale data science problem, a rare opportunity in the evaluation of Computer Science research. This collaboration includes the development and deployment of a system with a direct impact on public health and society. A proof-of-concept prototype is already in use at DOHMH and has helped identify and act on several previously unknown outbreaks. The public health findings from the project are shared across governmental agencies, following DOHMH's best practices. Developed code and annotated datasets will be shared with other researchers and agencies.
Acknowledgments: This research is
supported by the National
Science Foundation under Grant No. IIS-15-63785. Any opinions, findings,
and conclusions or recommendations expressed in this
material are those of the authors and do not necessarily
reflect the views of the National Science Foundation.
- Luis Gravano (PI)
- Daniel Hsu (Co-PI)
- Tom Effland (PhD student)
- Lampros Flokas (PhD student)
- Yogesh Garg (MS student)
- Mohip Joarder (MS student, graduated)
- Anna Lawson (undergraduate student, graduated)
- Alden Quimby (undergraduate student, graduated)
- Vipul Raheja (MS student, graduated)
- Henri Stern (undergraduate student, graduated)
Collaborators at the New York City Department of Health and Mental Hygiene
- Sharon Balter
- Katelynn Devinney
- Vasudha Reddy
- and many others
- Kernel Ridge vs. Principal Component Regression: Minimax Bounds and Adaptability of Regularization Operators, L. Dicker, D. Foster, and D. Hsu, in Electronic Journal of Statistics, vol. 11, no. 1, pages 1022-1047, 2017.
- Correspondence Retrieval, A. Andoni, D. Hsu, K. Shi, and X. Sun, in Thirtieth Annual Conference on Learning Theory, 2017.
- Using Online Reviews by Restaurant Patrons to Identify Unreported Cases of Foodborne Illness — New York City, 2012–2013, C. Harrison, M. Jorder, H. Stern, F. Stavinsky, V. Reddy, H. Hanson, H. Waechter, L. Lowe, L. Gravano, and S. Balter, in Centers for Disease Control and Prevention Morbidity and Mortality Weekly Report (CDC MMWR), vol. 63, no. 20, pages 441-445, May 2014.
- Detecting Foodborne Disease Outbreaks Using Social Media (demonstration), F. Psallidas, L. Gravano, and many others, in NYC Media Lab's Annual Summit, 2014.
- Information Extraction from Social Media for Public Health, N. Elhadad, L. Gravano, D. Hsu, S. Balter, V. Reddy, and H. Waechter, in KDD at Bloomberg Workshop, Data Frameworks Track (KDD 2014), 2014.