How to detect or eliminate duplicate lines on Linux and Windows

How to detect or eliminate duplicate lines on Linux and Windows

If you have duplicated lines in a text file or CSV file on Linux or Windows, we can detect or eliminate duplicates.



Create a text file which contains duplicates

In this article, we use the following file which contains duplictes. As we can see, 789 and 456 are duplicated.
$ cat dup.txt
123
456
ABC
789
DEF
GHI
789
111
222
333
456

Eliminate duplicates on Linux

In order to eliminate duplicates, we need to sort the file contents first by using sort command, then we need to run uniq command to eliminate duplicates as follows.

$ sort dup.txt | uniq
111
123
222
333
456
789
ABC
DEF
GHI

Detect duplicates on Linux

In order to detect only duplicates, we need to use sort command then "uniq -D" command, so that it shows only duplicates as follows.

$ sort dup.txt | uniq -D
456
456
789
789

How to get number of duplicates on Linux

If we have a large file and have many duplicates, we can use "uniq -cd" command to get the number of duplicates as follows.

$ sort dup.txt | uniq -cd
      2 456
      2 789

"2" means "456" is duplicated and 2 lines are duplicated. Same thing for "789".

Eliminate duplicates on Windows Power Shell

We can detect duplicates by utilizing Windows Power Shell as follows.

PS C:\> gc dup.txt | sort | get-unique
111
123
222
333
456
789
ABC
DEF
GHI

Detect duplicates on Windows Power Shell

If we need to detect only duplicates lines, we can use the following command on Windows Power Shell.

PS C:\> Get-Content dup.txt | Group-Object | Where-Object { $_.Count -gt 1 } | Select -ExpandProperty Name
456
789