Research
I am passionate about building machine learning systems and studying responsible foundation models (especially large language models). In the past, I worked on machine learning systems and deep learning for security issues.
|
On Fairness of Low-Rank Adaptation of Large Models
Zhoujie Ding*,
Ken Ziyu Liu*,
Pura Peetathawatchai,
Berivan Isik,
Sanmi Koyejo (*equal contribution)
In the 1st Conference on Language Modeling (COLM 2024)
arXiv, code
In this study, we focus on fairness and ask whether LoRA has an unexamined impact on utility, calibration, and resistance to membership inference across different subgroups (e.g., genders, races, religions) compared to a full-model fine-tuning baseline. We present extensive experiments across vision and language domains and across classification and generation tasks using ViT-Base, Swin-v2-Large, Llama-2 7B, and Mistral 7B. Intriguingly, experiments suggest that while one can isolate cases where LoRA exacerbates model bias across subgroups, the pattern is inconsistent—in many cases, LoRA has equivalent or even improved fairness compared to the base model or its full fine-tuning baseline.
|
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection
Yizheng Chen,
Zhoujie Ding,
Lamya Alowain,
Xinyun Chen,
David Wagner
In the 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2023)
arXiv, dataset
We propose and release a new vulnerable source code dataset. Our new dataset contains 18,945 vulnerable functions spanning 150 CWEs and 330,492 non-vulnerable functions extracted from 7,514 commits. It covers 295 more projects than all previous datasets combined.
|
Continuous Learning for Android Malware Detection
Yizheng Chen,
Zhoujie Ding,
David Wagner
In the 32th USENIX Security Symposium (USENIX Security 2023)
arXiv
Top 10 Finalist of the CSAW'23 Applied Research Competition
We propose a new hierarchical contrastive learning with active learning to combat the concept drift problem of Android malware classifiers. Our approach reduces the false negative rate from 14% (for the best baseline) to 9%, while also reducing the false positive rate (from 0.86% to 0.48%).
|
Kernel-as-a-Service: A Serverless Interface to GPUs
Nathan Pemberton,
Anton Zabreyko,
Zhoujie Ding,
Randy Katz,
Joseph Gonzalez
Arxiv, 2022
arXiv
We present Kernel-as-a-Service (KaaS), a serverless interface to GPUs. KaaS runs graphs of GPU-only code while host code is run on traditional functions. Our results show that KaaS is able to drive up to 50x higher throughput and 16x lower latency when GPU resources are contended.
|
|
Machine Learning Engineer Intern, Apple
June 2024 - September 2024
Worked in AI/ML Siri Query Understanding team.
|
Misc
The pronunciation of my first name is "Chow-Jay". My Chinese name is 丁周节. I grew up in Hangzhou, China.
In my free time, I enjoy playing basketball, poker, and video games.
|
|