Abstract
This paper describes a joint software entry by King Fahd University of Petroleum & Minerals and the University of Sheffield for the text-alignment task at PAN-2014. We employ the three steps of seeding, extension and filtering for text alignment. For seeding we use character n-grams with a variant of the Rabin-Karp Algorithm for multiple pattern search. We then use an elaborate merging mechanism with several cases to combine the individually found seeds. A short filtering step is then used to remove extraneous passages. This approach scored plagdet scores of 0.65954 and 0.73416 on test corpora 2 and 3 during the final test run.
| Original language | English |
|---|---|
| Pages (from-to) | 939-946 |
| Number of pages | 8 |
| Journal | CEUR Workshop Proceedings |
| Volume | 1180 |
| State | Published - 2014 |
ASJC Scopus subject areas
- General Computer Science