How to detect or eliminate duplicate lines on Linux and Windows
If you have duplicated lines in a text file or CSV file on Linux or Windows, we can detect or eliminate duplicates.Create a text file which contains duplicates
In this article, we use the following file which contains duplictes. As we can see, 789 and 456 are duplicated.$ cat dup.txt 123 456 ABC 789 DEF GHI 789 111 222 333 456
Eliminate duplicates on Linux
In order to eliminate duplicates, we need to sort the file contents first by using sort command, then we need to run uniq command to eliminate duplicates as follows.$ sort dup.txt | uniq 111 123 222 333 456 789 ABC DEF GHI
Detect duplicates on Linux
In order to detect only duplicates, we need to use sort command then "uniq -D" command, so that it shows only duplicates as follows.$ sort dup.txt | uniq -D 456 456 789 789
How to get number of duplicates on Linux
If we have a large file and have many duplicates, we can use "uniq -cd" command to get the number of duplicates as follows.$ sort dup.txt | uniq -cd 2 456 2 789
"2" means "456" is duplicated and 2 lines are duplicated. Same thing for "789".
Eliminate duplicates on Windows Power Shell
We can detect duplicates by utilizing Windows Power Shell as follows.PS C:\> gc dup.txt | sort | get-unique 111 123 222 333 456 789 ABC DEF GHI
Detect duplicates on Windows Power Shell
PS C:\> Get-Content dup.txt | Group-Object | Where-Object { $_.Count -gt 1 } | Select -ExpandProperty Name 456 789