AI Foundation Models for Epidemic Intelligence: Integrating Heterogeneous Surveillance Streams through Pre-Training on Historical Outbreak Data

Authors

  • Dr. I Carol Department of IT, St. Joseph's College Trichy. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V6I4P137

Keywords:

Epidemic Intelligence, Foundation Model, Pre-Training, Multimodal Surveillance, Federated Learning, Outbreak Forecasting, Genomic Surveillance, Wastewater Epidemiology, Zero-Shot Transfer, Pandemic Preparedness

Abstract

Foundation models, pre-trained on massive and diverse datasets and subsequently fine-tuned for specific downstream tasks, have transformed natural language processing, computer vision, and molecular biology. The epidemic intelligence domain, which requires the integration of heterogeneous surveillance streams ; clinical case time series, pathogen genomic sequences, human mobility patterns, environmental surveillance signals, and behavioural and social indicators presents a compelling opportunity for the foundation model paradigm, yet no concrete pre-training methodology for epidemic foundation models has been proposed. This paper introduces the Epidemic Foundation Model, a novel pre-training approach that learns generalised representations of disease dynamics from historical outbreak data spanning more than forty diseases, six decades, and five surveillance modalities, using a federated pre-training architecture that assembles cross-border training data through privacy-preserving gradient aggregation rather than data centralisation. The pre-training methodology employs five complementary objectives: within-stream masked prediction, cross-stream masked prediction, outbreak boundary detection, geographic spread prediction, and cross-disease transfer. The resulting shared model is fine-tunable for outbreak forecasting, variant detection, cross-border alert generation, and intervention decision support. Empirical evidence from deployed multimodal federated surveillance systems demonstrates that integrating all five surveillance streams achieves detection lead times substantially exceeding single-stream baselines, motivating the foundation model approach to learn these cross-stream correlations from historical data rather than re-discovering them from scratch for each new pathogen. The proposed framework is evaluated through leave-one-disease-out cross-validation on forty-two historical outbreaks, demonstrating zero-shot forecasting capability for held-out diseases that substantially outperforms epidemiological model baselines.

References

1. L. O. Gostin and R. Katz, "The International Health Regulations: The Governing Framework for Global Health Security," Milbank Q., vol. 94, no. 2, pp. 264-313, Jun. 2016.

2. Y. Ai, F. He, E. Lancaster, and J. Lee, "Application of machine learning for multi-community COVID-19 outbreak predictions with wastewater surveillance," PLOS ONE, vol. 17, no. 11, p. e0277154, Nov. 2022.

3. Qian, F., & Zhang, A. (2021). The value of federated learning during and post-COVID-19. International Journal for Quality in Health Care, 33(1), mzab010. https://doi.org/10.1093/intqhc/mzab010

4. R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al., "On the opportunities and risks of foundation models," arXiv preprint arXiv:2108.07258, 2021.

5. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. NAACL, 2019, pp. 4171-4186.

6. Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, et al., "Evolutionary-scale prediction of atomic-level protein structure with a language model," Science, vol. 379, no. 6637, pp. 1123-1130, Mar. 2023.

7. J. Dalla-Torre, L. Gonzalez, J. Mendoza-Revilla, N. Lopez Carranza, A. Henryk Grzywaczewski, F. Oteri, C. Dallago, S. Trop, H. de Almeida Barbosa, L. Richard, et al., "The Nucleotide Transformer: Building and evaluating robust foundation models for human genomics," bioRxiv preprint, 2023.

8. K. Huang, J. Altosaar, and R. Ranganath, "ClinicalBERT: Modeling clinical notes and predicting hospital readmission," arXiv preprint arXiv:1904.05342, 2019.

9. N. Reich, C. McGowan, T. Yamana, A. Tushar, E. Ray, D. Osthus, S. Kandula, L.-F. Brooks, W. Crawford-Crudell, G. Gibson, et al., "Accuracy of real-time multi-model ensemble forecasts for seasonal influenza in the U.S.," PLOS Comput. Biol., vol. 15, no. 11, p. e1007486, Nov. 2019.

10. R. Li, S. Pei, B. Chen, Y. Song, T. Zhang, W. Yang, and J. Shaman, "Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2)," Science, vol. 368, no. 6490, pp. 489-493, May 2020.

11. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, "Communication-efficient learning of deep networks from decentralized data," in Proc. AISTATS, 2017, vol. 54, pp. 1273-1282.

12. R. Lyu, R. Rosenfeld, and B. Wilder, "Federated epidemic surveillance," PLOS Comput. Biol., vol. 21, no. 4, p. e1012907, Apr. 2025.

13. European Parliament. Regulation (EU) 2016/679 General Data Protection Regulation. Official Journal of the European Union, L 119, 2016.

14. H. Hethcote, "The mathematics of infectious diseases," SIAM Rev., vol. 42, no. 4, pp. 599-653, 2000.

15. T. Fabbri, T. L. Gashaw, S. Gubarev, S. Marconi, and B. Althouse, "Filling the gap: AI-driven One Health integration to strengthen pandemic preparedness in resource-limited settings," Front. Public Health, vol. 13, 2025.

16. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," in Proc. NeurIPS, 2017, vol. 30, pp. 5998-6008.

17. C. Dwork and A. Roth, "The algorithmic foundations of differential privacy," Found. Trends Theor. Comput. Sci., vol. 9, pp. 211-407, 2014.

18. M. Biggerstaff, R. B. Slayton, M. A. Johansson, and J. C. Butler, "Improving pandemic response: Employing mathematical modeling to confront coronavirus disease 2019," Clin. Infect. Dis., vol. 74, no. 5, pp. 913-917, Mar. 2022.

19. S. Flaxman, S. Mishra, A. Gandy, H. J. T. Unwin, T. A. Mellan, H. Coupland, C. Whittaker, H. Zhu, T. Berah, J. W. Eaton, et al., "Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe," Nature, vol. 584, no. 7820, pp. 257-261, Aug. 2020.

20. World Health Organization. International Health Regulations (2005), 3rd ed. WHO: Geneva, Switzerland, 2016.

Downloads

Published

2025-12-30

Issue

Section

Articles

How to Cite

1.
I C. AI Foundation Models for Epidemic Intelligence: Integrating Heterogeneous Surveillance Streams through Pre-Training on Historical Outbreak Data. IJAIBDCMS [Internet]. 2025 Dec. 30 [cited 2026 Jun. 13];6(4):312-8. Available from: https://ijaibdcms.org/index.php/ijaibdcms/article/view/565