TY - GEN
T1 - Statistical analysis of ml-based paraphrase detectors with lexical similarity metrics
AU - El-Alfy, El Sayed M.
PY - 2014
Y1 - 2014
N2 - Paraphrase detection has several important applications in natural language processing. Examples of such applications include language translation, text summarization, question answering, plagiarism detection, and online information retrieval. A number of metrics have been proposed in the literature to quantify the textual similarity between two sentences. However, the accuracy of utilizing each similarity metric alone in detecting paraphrases is very low. Though some machine learning (ML) techniques have been deployed for paraphrase detection, there is no known study that intensively benchmarks their performance on this problem under similar conditions. In this paper, we evaluate the utility of integrating five lexical similarity metrics with three standard machine learning paradigms to detect paraphrases. We apply statistical tests to compare and benchmark the relative significance of the adopted ML-based paraphrase detectors on different datasets.
AB - Paraphrase detection has several important applications in natural language processing. Examples of such applications include language translation, text summarization, question answering, plagiarism detection, and online information retrieval. A number of metrics have been proposed in the literature to quantify the textual similarity between two sentences. However, the accuracy of utilizing each similarity metric alone in detecting paraphrases is very low. Though some machine learning (ML) techniques have been deployed for paraphrase detection, there is no known study that intensively benchmarks their performance on this problem under similar conditions. In this paper, we evaluate the utility of integrating five lexical similarity metrics with three standard machine learning paradigms to detect paraphrases. We apply statistical tests to compare and benchmark the relative significance of the adopted ML-based paraphrase detectors on different datasets.
UR - https://www.scopus.com/pages/publications/84904498468
U2 - 10.1109/ICISA.2014.6847467
DO - 10.1109/ICISA.2014.6847467
M3 - Conference contribution
AN - SCOPUS:84904498468
SN - 9781479944439
T3 - ICISA 2014 - 2014 5th International Conference on Information Science and Applications
BT - ICISA 2014 - 2014 5th International Conference on Information Science and Applications
PB - IEEE Computer Society
ER -