Zhoujie (Jason) Ding

I am currently a second-year MSCS student at Stanford University. I received my B.A. in computer science and applied mathematics at UC Berkeley.

At Stanford, I am working on trustworthy machine learning with Professor Sanmi Koyejo at Stanford Trustworthy AI Research. I also had the fortune to work on online data selection under Professor Chris Ré.

At Berkeley, I worked on the Skyplane project at the SkyLab with Professor Joseph Gonzalez and Professor Ion Stoica. I also researched deep learning for code vulnerability detection under the supervision of Professor Yizheng Chen and Professor David Wagner.

I am graduating in Spring 2025 and am actively looking for New Grad MLE/SWE positions. Please reach out if you know of any openings!

Email  /  Resume  /  Google Scholar  /  LinkedIn  /  Github

profile photo
Research

I am passionate about building machine learning systems and studying responsible foundation models (especially large language models). In the past, I worked on machine learning systems and deep learning for security issues.

On Fairness of Low-Rank Adaptation of Large Models
Zhoujie Ding*, Ken Ziyu Liu*, Pura Peetathawatchai, Berivan Isik, Sanmi Koyejo (*equal contribution)
In the 1st Conference on Language Modeling (COLM 2024)
arXiv, code

In this study, we focus on fairness and ask whether LoRA has an unexamined impact on utility, calibration, and resistance to membership inference across different subgroups (e.g., genders, races, religions) compared to a full-model fine-tuning baseline. We present extensive experiments across vision and language domains and across classification and generation tasks using ViT-Base, Swin-v2-Large, Llama-2 7B, and Mistral 7B. Intriguingly, experiments suggest that while one can isolate cases where LoRA exacerbates model bias across subgroups, the pattern is inconsistent—in many cases, LoRA has equivalent or even improved fairness compared to the base model or its full fine-tuning baseline.

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection
Yizheng Chen, Zhoujie Ding, Lamya Alowain, Xinyun Chen, David Wagner
In the 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2023)
arXiv, dataset

We propose and release a new vulnerable source code dataset. Our new dataset contains 18,945 vulnerable functions spanning 150 CWEs and 330,492 non-vulnerable functions extracted from 7,514 commits. It covers 295 more projects than all previous datasets combined.

Continuous Learning for Android Malware Detection
Yizheng Chen, Zhoujie Ding, David Wagner
In the 32th USENIX Security Symposium (USENIX Security 2023)
arXiv
Top 10 Finalist of the CSAW'23 Applied Research Competition

We propose a new hierarchical contrastive learning with active learning to combat the concept drift problem of Android malware classifiers. Our approach reduces the false negative rate from 14% (for the best baseline) to 9%, while also reducing the false positive rate (from 0.86% to 0.48%).

Kernel-as-a-Service: A Serverless Interface to GPUs
Nathan Pemberton, Anton Zabreyko, Zhoujie Ding, Randy Katz, Joseph Gonzalez
Arxiv, 2022
arXiv

We present Kernel-as-a-Service (KaaS), a serverless interface to GPUs. KaaS runs graphs of GPU-only code while host code is run on traditional functions. Our results show that KaaS is able to drive up to 50x higher throughput and 16x lower latency when GPU resources are contended.

Industry

Machine Learning Engineer Intern, Apple

June 2024 - September 2024

Worked in AI/ML Siri Query Understanding team.

Teaching

CS 224N: Natural Language Processing with Deep Learning

course website

TA, Spring 2024

CS 229: Machine Learning

course website

Head TA, Fall 2024; Head TA, Winter 2024; TA, Fall 2023

CS 189/289A: Introduction to Machine Learning

course website / my discussion session slides

TA, Spring 2023; Reader, Spring 2022

CS194-26/294-26: Intro to Computer Vision and Computational Photography

course website / class project I updated

Tutor, Fall 2022

CS61C: Great Ideas in Computer Architecture (Machine Structures)

course website / my discussion session slides

Tutor, Fall 2021 and Spring 2021

Projects

Skyplane

Facial Keypoint Detection with Neural Networks

Tour into the Picture

Pintos

Misc

The pronunciation of my first name is "Chow-Jay". My Chinese name is 丁周节. I grew up in Hangzhou, China.

In my free time, I enjoy playing basketball, poker, and video games.


Thanks to Jon Barron for the template