Qiuran Lyu Homepage

(Ongoing work) LLM Feature Selection — We propose a method that fine-tunes LLMs to pinpoint key input variables. We partition the prompt into predefined groups (e.g., demographics, genetics, MRI features), each with a learnable scaling parameter (0–1). Under sparsity-inducing regularization (L1 or entropy), unimportant features are replaced by special tokens (⟨PAD⟩), ensuring the model focuses on clinically impactful data while preserving predictive fidelity for AD outcomes.

(Ongoing work) Generative AI Assisted Response Adaptive Factorial Designs — This project proposes a generative AI-assisted, response-adaptive factorial design framework to identify the most effective combinations of health intervention components. By adaptively learning from participant responses and optimizing experimental allocation, the method efficiently identifies impactful strategies while protecting vulnerable populations through in-silico experimentation. The design enables personalized, ethical, and data-efficient evaluation of interventions in resource-constrained settings. See poster.

(Ongoing work) AI-driven generative digital twins cohort to emulate communication and behavioral dynamics in real-world ADRD patients — We built a platform for digital twins trained by video capturing different symptoms, in terms of linguistic, emotional, and behavioral nuances that typify ADRD. Our aims are (1) mimicking the real-world interaction between caregivers and ADRD patients; (2) providing better caregiver training; and (3) doing experimentation on created digital twins. We also aim to validate the created digital twins to provide trustworthy results. See the current Platform.

An Enhanced Language Model for Predicting Alzheimer's Disease Pathology — We investigated different serialized ways (e.g. Markdown, plain text, feature-wise, and visit-wise) for longitudinal tabular data from ADNI, HABS-HD, and POINTER as LLM inputs and used LORA to finetune Llama 3 and Llama 3.1 tailored to Alzheimer’s disease pathology outcomes prediction. Our developed model is referred to as ADLLM and outperforms existing ML models in external A4 cohorts. See preprint.

Mediation Analysis with Mendelian Randomization and Efficient Multiple GWAS Integration. — We used structural equations to construct the relationship between the mediator, exposure, and outcome effect based on the causal diagram. A three-step procedure was designed for conducting mediation analysis with integrated multiple GWAS using joint rerandomization and Rao-blackwellization to eliminate the measurement error bias, the winner's curse, the loser's curse, and the imperfect IV selection issue. See preprint, links to code and package .

On the Theoretical Investigation of Mediation Analysis with Mendelian Randomization and Summary Data. — We provide rigorous statistical analysis of existing two popular frameworks for conducting mediation analysis with Mendelian Randomization. See preprint .

Benchmark of different QTL pipelines (including isoform-QTL, eQTL, and splicing-QTL). — We compared the performance of RSEM, Kallisto, Cufflinks, Salmon + FastQTL, eQTL, and Leafcutter on the simulated dataset. We empirically demonstrated that isoform-QTL pipelines outperform all others. Among all isoform-QTL pipelines, Cufflinks has the best performance in terms of power and false discovery rate. See slides and preprint .

GMS training framework and WMMLP. — We constructed the weighted multiplicative MLP (WMMLP) in PyTorch based on Taylor expansion of M estimators and used neural networks to solve the M-estimation problem under the bootstrap and cross-validation context. See final summer research report.

Research