With a federated data mesh infrastructure that allows access to high volumes of rich, interoperable data, a modernized public health surveillance system can deploy advanced analytics and novel technologies to optimize efficiency – all at sufficient scale to produce accurate, real-time insights.
1. Using natural language processing to analyze complex, unstructured data
A tremendous volume of valuable health data is buried in imaging files, lab reports, and clinical notes. Relatively recent advances in natural language processing (NLP) make it possible to analyze these types of unstructured data.
NLP enables computer systems to understand and interpret human language through topic modeling, sentiment analysis, and other techniques. By capturing complex linguistic relationships, NLP goes well beyond keyword searches to identify common themes or attitudes towards a particular topic from medical record notes, as well as social media data and other large, unstructured data sets.
In recent years, the performance of NLP has improved significantly through what’s known as transfer learning – that is, taking a well-honed model and using it to train a new model for a related task. Massive pre-trained language models such as Google’s BERT and OpenAI’s GPT-3 are driving the state of the art across the full range of NLP’s capabilities, enabling the development of more powerful models with less training data and computing resources.
To date, public health researchers have successfully employed NLP models to monitor flu-like symptoms mentioned on Twitter, identify public sentiment related to the COVID-19, and pursue other exciting studies. These applications only begin to scratch the surface of NLP’s potential – particularly when combined with a federated data infrastructure and extended interoperability – to revolutionize how public health surveillance is conducted on a national scale.
2. Large-scale modeling for robust, scenario-based insights
Agent-based modeling (ABM) is a computational method for simulating actions and interactions between people and their environment. Public health researchers use ABM to model disease transmission, social influences on health, health behavior outcomes, and evaluate the efficacy of interventions.
The utility of ABM depends on how well the environment and rules that govern agent behavior are understood. With more and better data, ABM simulations can be used to model increasingly complex scenarios.
For example, public health officials could:
- Examine the impact of immunization and introduction of new variants on community spread
- Identify at-risk populations
- Detect hotspots and conditions that promote the spread of the disease
- Proactively evaluate the efficacy and impact of prevention and control strategies
Powered by sufficiently rich data such as demographics, social determinants, vaccination status, geographic and other environmental data, sophisticated agent-based models can predict risk and outcomes, allowing agencies to effectively allocate resources in the interest of public health.