University of Pittsburgh · VA Pittsburgh

Soumik
Purkayastha

Assistant Professor, Department of Biostatistics & Health Data Science, University of Pittsburgh School of Public Health

Research Biostatistician, Center for Healthcare Evaluation, Research, and Promotion (CHERP), VA Pittsburgh Healthcare System

I develop statistical and AI methods for discovering structure, quantifying association, and extracting safety-relevant signal from complex health data. My methodological work spans weighted Bayesian network learning, information-theoretic association estimation, and federated inference. With colleagues at CHERP, I co-develop clinical NLP for medical device safety surveillance and serve as statistician/co-investigator on Veteran-focused health services research projects examining medication initiation, social risks, and equity in care.

i

Graphical models for causality and association in complex biomedical data

I develop graphical models for learning causal structure and quantifying directional association under realistic biomedical data conditions — survey weighting, mixed-typed variables, ordinal outcomes, and beyond rigid parametric assumptions. Recent work spans copula-based mutual information, generative exposure models with cross-fitting inference, and weighted ordinal Bayesian networks.

structure learning · mutual information · ordinal Bayesian networks · cross-fitting
ii

AI models for medical device safety

With colleagues at VA Pittsburgh, I co-develop AI models for surveillance of medical-device adverse-event reports — combining rule-based classifiers for auditable extraction of known risks with deep-learning and unsupervised methods for detecting emerging patterns. The current focus is insulin pumps and continuous glucose monitors, with ongoing extensions to broader Veteran-use device safety infrastructure.

adverse-event surveillance · supervised classification · deep learning · trustworthy AI
iii

Veteran health services research

I serve as the statistician on a portfolio of research at CHERP examining medication initiation patterns, social determinants of health, and equity in care among Veterans. Active threads include alcohol use disorder treatment, guideline-directed therapy for heart failure, social risks among sexual and gender minority Veterans, and HPV vaccination.

causal inference · survey design · equity · pharmacoepidemiology

Working on federated generalized estimating equations for distributed health data; survey-weighted ordinal Bayesian network learning for Veteran mental-health surveys; and the impact of declining drug-overdose mortality on deceased-donor organ transplantation.

Fast and consistent copula-based nonparametric estimator of mutual information.
R · Github
Bivariate causal discovery via the generative exposure model.
R · Github
Survey-weighted, covariate-adjusted ordinal Bayesian network learning.
R · in development
Federated generalized estimating equations for multicenter longitudinal data.
R · in development
SEIRfansy
Extended SEIR model with false-negative correction and symptom-based testing.
R · CRAN
Apr 2026
Work on medications for alcohol use disorder among hospitalized Veterans accepted at Annals of Internal Medicine.
Feb 2026
Paper on minimum Bregman divergence inference accepted at Mathematics.
Jan 2026
Paper on quantification and cross-fitting inference under generative exposure mapping models accepted at Statistica Sinica.
Dec 2025
Work on racial, ethnic, and sex differences in social-needs concordance among Veterans accepted at JAMA Network Open.