Research

If interested, please refer to Qiuran’s CV or contact me for more details.

water (Ongoing work) Nonfaithful Query in Lyrics Retrieval — This project is of my personal interest and I am exploring on it.
sand (Ongoing work) Heterogeneity-aware fintuning — Heterogeneity-aware adapter.
water (Ongoing work) Generative AI Assisted Response Adaptive Factorial Designs — This project proposes a generative AI-assisted, response-adaptive factorial design framework to identify the most effective combinations of health intervention components. By adaptively learning from participant responses and optimizing experimental allocation, the method efficiently identifies impactful strategies while protecting vulnerable populations through in-silico experimentation. The design enables personalized, ethical, and data-efficient evaluation of interventions in resource-constrained settings. See poster.
sand TERRA Transformer–Enabled Recursive R-learner — TERRA is a novel framework for estimating longitudinal heterogeneous treatment effects (HTEs) by combining the causal structure of structural nested mean models (SNMMs) with the representational power of Transformers. Many real-world settings—clinical interventions, digital experimentation, and marketing attribution—feature repeated treatments over time, where effects exhibit carryover, time-varying heterogeneity, and post-treatment bias. Standard HTE methods are not designed for these challenges; TERRA directly addresses them. See preprint.
water Feature Selection in LLM — We are interested in selecting variable groups from serialized (semi-)structural input because of data availability and interpretability. We partition the prompt into predefined groups (e.g., demographics, genetics, MRI features), each with a learnable scaling parameter (0–1). Under sparsity-inducing regularization (L1 or entropy), unimportant features are replaced by special tokens (⟨PAD⟩), ensuring the model focuses on clinically impactful data while preserving predictive fidelity for AD outcomes. We do two-layer smoothing, one from combinatorial search of selection indicator to Bernoulli sampling, another using gradient estimator to approximate the discrete sampling gradient to enable stochastic gradient search.
sand An Enhanced Language Model for Predicting Alzheimer's Disease Pathology — We investigated different serialized ways (e.g. Markdown, plain text, feature-wise, and visit-wise) for longitudinal tabular data from ADNI, HABS-HD, and POINTER as LLM inputs and used LORA to finetune Llama 3 and Llama 3.1 tailored to Alzheimer’s disease pathology outcomes prediction. Our developed model is referred to as ADLLM and outperforms existing ML models in external A4 cohorts. See preprint.
water Mediation Analysis with Mendelian Randomization and Efficient Multiple GWAS Integration. — We used structural equations to construct the relationship between the mediator, exposure, and outcome effect based on the causal diagram. A three-step procedure was designed for conducting mediation analysis with integrated multiple GWAS using joint rerandomization and Rao-blackwellization to eliminate the measurement error bias, the winner's curse, the loser's curse, and the imperfect IV selection issue. See preprint, links to code and package .
sand On the Theoretical Investigation of Mediation Analysis with Mendelian Randomization and Summary Data. — We provide rigorous statistical analysis of existing two popular frameworks for conducting mediation analysis with Mendelian Randomization. See preprint .
sand Benchmark of different QTL pipelines (including isoform-QTL, eQTL, and splicing-QTL). — We compared the performance of RSEM, Kallisto, Cufflinks, Salmon + FastQTL, eQTL, and Leafcutter on the simulated dataset. We empirically demonstrated that isoform-QTL pipelines outperform all others. Among all isoform-QTL pipelines, Cufflinks has the best performance in terms of power and false discovery rate. See slides and preprint .
sand GMS training framework and WMMLP. — We constructed the weighted multiplicative MLP (WMMLP) in PyTorch based on Taylor expansion of M estimators and used neural networks to solve the M-estimation problem under the bootstrap and cross-validation context. See final summer research report.

Find me on WeChat with the ID LQRweixin1101, or scan my QR code:

QR code