Abstract for HONS 08/15
A Similarity Ranking of Python Programs
Jonathan Wardell Avery
Department of Computer Science and Software Engineering
University of Canterbury
Abstract
Detection of similar programs is a highly studied problem. Detecting similar code is an important strategy for detecting badly modularized code, finding vulnerabilities due to error prone copy-paste programming methodologies, and detecting academic dishonesty in online code assignment submissions following the copy-paste-adapt-it pattern. The latter is the impetus for this work.
A novel system is presented that is specifically adapted to programs that may be small, and similar by virtue of being written to solve the same problem. The system is also adapted toward specific expected behaviors of plagiarists, making use of algorithms custom built to both recognize these behaviors while satisfying hierarchical properties. A defining and novel property of the proposed method is the categorical information it provides. A hierarchy of categories with an implica- tion relationship are leveraged in the production of descriptive, rank-able results.