Suggestion and Policy

Plagiarism Checker for TER Admin to check Reviews
impposter 49 Reviews 808 reads
posted

Just read the "How long do you guys typically spend writing each of your reviews?" (General) and other threads about Reviews (below, "No, not another Review thread"). So here's my thought / suggestion.

1. Others have previously suggested finding a way to search the TEXT portion of Reviews for specific strings and it has been pointed out that that should be relatively fast and easy to do on the pre-indexed flat text.  (Still easy, but slower, if not pre-indexed.)

2. There are websites that provide free "plagiarism checking" so the code must be pretty easily available. TER does not have to compare a newly submitted review against Shakespeare, Portnoy's Complaint or the entire internet. They just have to compare a new submission to their own internal reviews.

If TER makes the TEXT of the reviews searchable, they can simultaneously do their own PLAGIARISM searching within their own Review Domain to see if FAKERS are just copying old reviews in an effort to get free days for new reviews.  

Suggestion: Hay, TER! Get a two-for-one for the effort. Searchable review text AND a way to check new review submissions for possible plagiarism before approval.  

The code is out there. It just needs to be adapted to the TER Review Database.  



-- Modified on 12/20/2016 2:51:47 PM

NoYellowEnvelope405 reads

... for free text search and plagiarism checking.  But there are plagiarism checkers with an open API such that it would be easy for TER to use one of those, either as a service or running the code themselves.  As for the free text search, there's many tools for doing that also, from simple SQL pattern matching to more sophisticated methods.

But I wonder if a plagiarism checker would work with reviews. There's so many common acronyms and phrases used in reviews, with no intent to plagiarize, that a checker might get a lot of false positives, leading to either innocent reviewers' work being rejected, or more Admin time reviewing all "hits" to ensure they're really cases of plagiarism.  

Besides, haven't you heard that imitation is the sincerest form of flattery?  :)

No, definitely not the same code, but the same properly formatted flat text: the General and Juicy Details of the reviews.  Once formatted, Code-1 is for searching reviews for specific strings and displaying the results to the searcher; Code-2 is for plagiarism checking and alerting the searcher (TER admin or even just automate it to search every submission and just flag the bad ones).  

I think the web-based plagiarism checkers look for copied term papers, short stories and stuff like that so they search the entire internet. TER doesn't need that. I think they could run their own code and only compare new reviews to the existing database of reviews.  250 petabytes versus a few tens of gigabytes.

I guess they'd have to set a threshold for pattern matching, beyond just common abbreviations.  And maybe something TOO common ("Treat her well.") can be relegated to the innocuous copying category. Less common multiples might be the trigger.  Yeah, there might be the need for some tweaking.  

Maybe a TER-only on-line tool for members-only to do the checking if they read something that sounds a little too familiar.  

Anyway, two problems (searching reviews; cutting down on one category of fake reviews) might be solved using overlapping resources and the flat text Review text.

Posted By: NoYellowEnvelope
... for free text search and plagiarism checking.  But there are plagiarism checkers with an open API such that it would be easy for TER to use one of those, either as a service or running the code themselves.  As for the free text search, there's many tools for doing that also, from simple SQL pattern matching to more sophisticated methods.  
   
 But I wonder if a plagiarism checker would work with reviews. There's so many common acronyms and phrases used in reviews, with no intent to plagiarize, that a checker might get a lot of false positives, leading to either innocent reviewers' work being rejected, or more Admin time reviewing all "hits" to ensure they're really cases of plagiarism.  
   
 Besides, haven't you heard that imitation is the sincerest form of flattery?  :)
Yes, thank you for the flattery. I've said that so many times myself. :-)

-- Modified on 12/20/2016 6:03:25 PM

Register Now!