1

I have run into a situation where a "plagiarism detection software" (I believe the one being used is called iThenticate) is showing 31 % similarity index for my document. In the list at the end of the report, it shows 20 entries under "Primary sources" (all have a subtitle "Internet" which I think is supposed to say all the matches with my document were found in other documents on the Internet) where the similarity percentages for each entry ranges from 6%, 1240 words to 1%, 118 words. This document is in the Humanities field. How should my document be modified to reduce this similarity index?

  1. I have some sections in this document that have been paraphrased from another article (literally copy/pasted), but they have all been cited in the reference section. Are these one of the triggers for the similarity? The report shows some paraphrased material as being similar to something else (understandable) but not others. Should the paraphrased sections be formatted in some way to have the software ignore it?

  2. Some common words such as "Chapter", "Introduction", etc and few common phrases are flagged as being similar to something else (words such as "chatper", "also" are flagged on one page, but not flagged in another). How concerned should I be about these? If I need the similarity index to be below some threshold percentage how do I ensure words like these do not take me above the threshold?

  3. Some sections of the document are my own words, but the phrasings used are common enough that the entire section is being flagged as being from someplace else. How should I modify sections such as these?

New contributor
user13267 is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
14
  • 1
    Why do you want to reduce this metric? Presumably anyone sane looking at it will be able to see that the detector is just detecting a lot of regular things and not plagiarism.
    – JoshuaZ
    Commented Jul 2 at 13:48
  • 1
    Are you running the software yourself, or is someone else running it and just giving you the output?
    – Anonymous
    Commented Jul 2 at 13:58
  • 7
    "paraphrased from another article (literally copy/pasted)" -- That doesn't really make sense. To be clear, you copy-pasted and then changed some words, or do you mean something else? Commented Jul 2 at 14:11
  • 9
    @user13267: That would of course be a "quote", not a "paraphrase" (I'm assuming you've actually got the copied text noted in quotation marks). It's, respectfully, kind of a red flag to not know that. Commented Jul 2 at 15:39
  • 1
    And is this mandatory? If it is, whoever is mandating it should have guidelines on how they set the software up (and I've used it-- it definitely has settings) and what you should do with or about the results.
    – Anonymous
    Commented 2 days ago

3 Answers 3

11

Ignore it. You've identified bad software.

Similarity is not plagiarism. It can be a clue that plagiarism has happened, but you do not need to "reduce similarity". Be certain you have not plagiarized, and then be happy with this and stop sending content you haven't plagiarized to a similarity detector; these tools are for people who don't know whether the content they are reading is plagiarized or not.

Some better things to focus on:

I have some sections in this document that have been paraphrased from another article (literally copy/pasted)

This does not make sense. If something is paraphrased, it is not literally copy/pasted. If it's literally copy/pasted, quote it and be clear that it's literally copy/pasted, and then do not worry about similarity. Generally this is best done sparingly; if you find yourself needing to quote long passages frequently without special reason (like critiquing a specific passage in a work of literature) then there is something wrong with your overall approach to the writing project rather than a problem of similarity: you're regurgitating something that already exists rather than creating something new. Take a couple steps back and reevaluate your goals. You may not need to include this at all and just direct your reader to the source if they need further information.

If you've copy/pasted and then just edited a few words or changed the grammar/phrasing to try to make it less "similarity index", you've just plagiarized. Making more of these trivial edits will not make it less plagiarized even if you make it less "similarity index". Either mentally digest the content and completely rewrite it from scratch from your own head while citing individual pieces of fact necessary to support your argument, or go back and verbatim quote the whole thing with quotation marks and a clear citation that it came from someone else's writing.

If I need the similarity index to be below some threshold percentage

If this is a rule by someone at your institution, I am sorry to inform you that either this person is an idiot or your entire institution is a joke. You cannot solve this problem by changing your similarity index, you're still attending an incompetent institution if this is how they treat plagiarism. Just think about it: if you were actually able to make your work "not plagiarized" by doing things like using some other word instead of "chapter" or "introduction", what is the purpose of this? There is no purpose. Absolutely asinine rule with no educational value. How can you possibly trust someone who imposes rules like this that have no value, when they are clearly showing how little they understand academia and writing and the meaning of the output of the software they are using? You'll have to decide how to handle that information, though.

3
  • 3
    "Ignore it, you've identified bad software" -- why? I agree with the rest of your answer, but according to the body of the question there are 20(!) instances that have been flagged, each ranging from ~100(!!) to over 1000(!!!) words. In this case I think the software is doing exactly what it was designed to do...
    – hiccups
    Commented 2 days ago
  • 1
    I must in good conscience downvote this, because per the OP's comments, its use is mandated. Ignoring it seems unwise.
    – Anonymous
    Commented 2 days ago
  • @hiccups The similarity score is not actually what's telling OP they've copied something, though. If they've copied without quoting, that's the problem, not the similarity. If they've copied without quoting but then edited enough to trick the similarity score, that's still equally a problem. Worse, the similarity score may encourage this obfuscation behavior that resolves similarity while preserving plagiarism. So, the similarity score provides no additional information that the author doesn't already have about how they wrote their paper. That's why they should not use a similarity score.
    – Bryan Krause
    Commented 2 days ago
4

It's hard to be sure, but it seems like the software is mis-configured.

If your institution's guidelines are, from your comment:

exclude quotes/bibliographies/phrases/small matches-10 words/source matches-by 1%

And it is still reporting those, then either someone is not setting up the exclusion criteria correctly, or you can ignore that part of the report. You still have other issues to work through.

But, as much as I hate the expression, this is a classic XY Problem.

You are asking X: How can I lower this score?

You should ask Y: What are the guidelines for using this tool, how does it work, and what do I need to do with or about the result?

And you need to ask your professor, or advisor, or editor, or whoever is mandating the use of this tool for this paper.

Because we can't tell you.

(As an example, for my dissertation, I was told how to set up the software to run it myself, and then had to write a brief justification for each of the software's squawks. Presumably someone read it-- I never heard another word about it. I have no expectation that your process will be the same.)

1

How should my document be modified to reduce this similarity index?

Don't plagiarise.

From one of your comments, "quotes/bibliographies/phrases/small matches-10 words/source matches-by 1%" can be ignored. So any matches for singular words ("introduction", "chapter" etc.) as well as "common phrases" (each of which should be maybe ~10 words long?) can be ignored---sensibly so.

The bigger problem is where you have "20 entries... where the similarity percentages for each entry ranges from 6%, 1240 words to 1%, 118 words". Check each of these entries. Have you copied/pasted from those sources as suggested by the software? If yes, you have plagiarised. If no, follow the advice in Brian's answer.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .