Weile Zheng - VP Education 2024-2025
The MDST Recruiting/Onboarding process includes a series of introductory to Python, Data Analysis, and Machine Learning questions and tasks that new members need to complete in order to join the club. There are two Jupyter Notebook, Checkpoint0 and Checkpoint1 that interested applicants need to complete, alongside with optional challenges that pushes students to further in more advanced topics such as Deep Learning and Computer Vision.
Traditionally, MDST Education Team manually look over the completed checkpoints one by one, submitted through Google Form with a link to GitHub Repository, and grade them on a scale of 5 base on correctness and code quality. However, as MDST starts to grow with hundreds of code submissions every semester, the evaluation process becomes increasingly difficult due to:
The initial goal was to design a solution to automate the checkpoint grading process by running a batch job of diff check on all submitted checkpoints after the deadline. Although this approach solved the first concern, it fails to solve the second and third pain points. Furthermore, conducting one final evaluation at the end of deadline means that there is no early feedback for the applicants that allow them to make changes to their submission.
Since diff check have some sort of “hardcoded-ness” output, it is not sufficient for evaluating many of the open-ended questions on data analysis that we asked.
A better way to solve the problem is to create a web submission portal that applicants can directly interact with to upload checkpoint files, do the necessary checks upfront (file format, file size, upload deadline), and utilize a LLM to automatically grade the submissions. Since the questions we ask in our checkpoints are all pretty beginner level, a modern day LLM is more than enough to examine code correctness and quality given the necessary prompts.