TY - GEN
T1 - Analysis and extraction of sentence-level paraphrase sub-corpus in CS education
AU - Alvi, Faisal
AU - El-Alfy, El Sayed M.
AU - Al-Khatib, Wasfi G.
AU - Abdel-Aal, Radwan E.
PY - 2012
Y1 - 2012
N2 - Since the advent of the Internet, plagiarism has become a widespread problem in student submissions. Paraphrasing is one of the several types of plagiarism employed by students to mask the original source. In this work, we construct a sub-corpus of paraphrased sentences by extracting all lightly and heavily revised sentences from the Corpus of Plagiarized Short Answers, using modified criteria for sentences. We then apply document similarity measures on this sub-corpus and derive some interesting features of this sub-corpus. Our findings suggest that this sub-corpus is more suited for testing paraphrase detection techniques by providing sentence-level paraphrasing samples instead of the file-level classification provided in the original corpus. Additional sentence samples may also be added to this sub-corpus to achieve variety and scale.
AB - Since the advent of the Internet, plagiarism has become a widespread problem in student submissions. Paraphrasing is one of the several types of plagiarism employed by students to mask the original source. In this work, we construct a sub-corpus of paraphrased sentences by extracting all lightly and heavily revised sentences from the Corpus of Plagiarized Short Answers, using modified criteria for sentences. We then apply document similarity measures on this sub-corpus and derive some interesting features of this sub-corpus. Our findings suggest that this sub-corpus is more suited for testing paraphrase detection techniques by providing sentence-level paraphrasing samples instead of the file-level classification provided in the original corpus. Additional sentence samples may also be added to this sub-corpus to achieve variety and scale.
KW - Paraphrasing
KW - Plagiarism
KW - Similarity measures
UR - http://www.scopus.com/inward/record.url?scp=84869179652&partnerID=8YFLogxK
U2 - 10.1145/2380552.2380566
DO - 10.1145/2380552.2380566
M3 - Conference contribution
AN - SCOPUS:84869179652
SN - 9781450314640
T3 - SIGITE'12 - Proceedings of the ACM Special Interest Group for Information Technology Education Conference
SP - 49
EP - 54
BT - SIGITE'12 - Proceedings of the ACM Special Interest Group for Information Technology Education Conference
ER -