Autograder | Notion

Weile Zheng - VP Education 2024-2025

Background

The MDST Recruiting/Onboarding process includes a series of introductory to Python, Data Analysis, and Machine Learning questions and tasks that new members need to complete in order to join the club. There are two Jupyter Notebook, Checkpoint0 and Checkpoint1 that interested applicants need to complete, alongside with optional challenges that pushes students to further in more advanced topics such as Deep Learning and Computer Vision.

Traditionally, MDST Education Team manually look over the completed checkpoints one by one, submitted through Google Form with a link to GitHub Repository, and grade them on a scale of 5 base on correctness and code quality. However, as MDST starts to grow with hundreds of code submissions every semester, the evaluation process becomes increasingly difficult due to:

Too many code to review
Hard to manage scores on a huge spreadsheet with a bunch of other informations
Inconsistent repositories (Some have missing files, incorrect file format, private repo etc…)

Initial Approach

The initial goal was to design a solution to automate the checkpoint grading process by running a batch job of diff check on all submitted checkpoints after the deadline. Although this approach solved the first concern, it fails to solve the second and third pain points. Furthermore, conducting one final evaluation at the end of deadline means that there is no early feedback for the applicants that allow them to make changes to their submission.

Since diff check requires sort of fixed output, it is not sufficient for evaluating many of the open-ended questions on data analysis that we asked.

Improved Approach

A better way to solve the problem is to create a web submission portal that applicants can directly interact with to upload checkpoint files, do the necessary checks upfront (file format, file size, upload deadline), and utilize a LLM to automatically grade the submissions. Since the questions we ask in our checkpoints are all pretty beginner level, a modern day LLM is more than enough to examine code correctness and quality given the necessary prompts.