6/28/22
9:00 am

2 Ways To Find Duplicate Files On a Mac

If you suspect that you have some large duplicate files on your Mac, you can find them without any special software. You can use the Finder to search for files and sort them so duplicates are together. You can also use the Terminal to find duplicates with a multi-part command.

▶ You can also watch this video at YouTube.
▶

Save or share this tutorial:

▶ Watch more videos about related subjects: Finder (327 videos), Terminal (44 videos).

Video Transcript

Hi, this is Gary with MacMost.com. Let me show you two ways to find duplicate files on your Mac.

MacMost is brought to you thanks to a great group of more than 1000 supporters. Go to MacMost.com/patreon. There you can read more about the Patreon Campaign. Join us and get exclusive content and course discounts.

So I'm often asked how can you find duplicate files on your Mac. After all we may occasionally make a mistake and create a duplicate file somewhere and then you've got two copies of the same file taking up a lot of space. Of course if you do find this is a problem that happens to you often you going to want to make sure you figure out what behavior you're doing that leads to the duplicate file so it doesn't become a problem. But let's say you want to review what you've got now to see if there is a duplicate there.

Well, a simple way to do it in the Finder is to simply do a Search. So here I am in my Documents folder and that's where I want to start the Search. I want to look for duplicates in the Documents folder. So I'm going to do Command F to do a search and then I'm going to search for Other and look for Size because I want to only search for large files here. I'm not going to care about the tiny little files that may be identical. I want to see if there is anything big out there that I can get rid of. So I'm going to look for file size and then say, is greater than and set it to something like oh maybe greater than 1MB. So now I'm only going to get files like that. I'm going to also want to search for just for the folder I'm in, so Documents. Then I'm going to want to Sort By Size. If you don't see Size up here at the top you can Control Click and select Size here. Anywhere here in the Headers. Now I've got Size. Now I can Sort By Size and anything that is exactly the same size will appear right next to the same file. Like, for instance, here are a bunch of files that seem like they maybe duplicates because they are the same size and you can see they are similar names.

So, I could easily go and look here at where each one is located. Figure out if I've got a duplicate and get rid of the duplicate. Then I can quickly go through these large files in the order they are in so I can Quit at some point if I think the files are getting too small for me to care about.

Now this isn't a great method because you have to carefully inspect everything to make sure it's actually a duplicate and, of course, you're getting a list of all your files here. But the advantage is you don't need any special skills to do this. You can do it just right here in the Finder with a simple Search and it will spot anything, particularly anything that's pretty large.

But there's another way to do this using Terminal. You can actually have the Terminal search for duplicate files. This takes a multi-part command but it's not too difficult to understand. Now I've actually created some files in this test folder here where if I look in the subfolder there is a duplicate of this file here so we have something we can test this with. We're going to start off just running the test here in the Test Directory or Test Folder, and then we can understand how it works and we can apply that to all the files in the Documents folder.

So I'm going to run Terminal here. I'm going to make sure I'm in this folder here. So I can do PWD and it shows me I am indeed in my Test Folder. If not, you can do CD for Change Directory. Let's go up a level here and I'm going to drag Test in there and you can see it puts the full path so I don't have to type it and I can change directory to that directory. So that's how you know where you are and that's how you get to some new location.

Now that we are in here, while in Terminal we can run this Command that will find duplicates. Here's the command. So you can see it takes up a few lines. Note that every time you see an up and down line, like this here or this here or this here, that's basically saying take the output from the previous command and send it to this new command. So you have basically a bunch of different commands each one sending output to the next. Now if we were to run this it would quickly tell us that there are two files here that are the same. What are these two numbers here? Well, this second number is actually the size of the file. You can see here that, indeed, this is 7.4M so you can see that in bytes right there. What's this first number? Well, this first number here is something called the Checksum What's a checksum? Well, basically think about if you took a book and you signed a number to every letter, like A was 1, B was 2, C was 3 and you added up all of the letters using those numbers. You'd come up with a really big number for the total of all the letters in the book. But it wouldn't be as big as the book. The book might be millions of letters long. But you'll probably end up with, you know, a 13 or 15 digit number. That would be a really unique number. The only way you'll probably going to ever find that same number again is if you use exactly the same book and count them all again.

The Checksum works in a similar way. It's a little more complex than that but it analyzes every byte in the file and uses that to calculate a number. The chances of any two files having the same number are nearly zero unless the files are identical. So, this is the checksum for this file and this is the checksum for this file. The fact that they are the same points to the fact that they are exactly the same file and having the same exact size is even more confirmation. So it's telling us that these two files are identical files. They are duplicates. So let's take a look at how exactly this works right here.

I've broken the line down here with every time it is sent to a new command its on a new line. So we can now look at each individual thing here. The Find command will find things. Where? At the current location. That's what the dot means. Of what type; type f files. It's going to look for things greater than 1M in size. Then it's going to take what it finds and run a program called Checksum with no extra parameters and then it's going to terminate that. That's going to basically assign that checksum to everything there. Let's take this and try running it here and we'll see what the results are. You could see here it looks at every file in that folder that's greater than 1M and assigns that checksum to it. So, now we have that. Now it's pretty easy here to see that we've got two duplicates. But that would not be so easy if we were looking through the entire Documents folder with thousands of files.

So the next thing we're going to do is this, which is going to take the results and do two things with it. One is, it's going to save it in to a file. We're going to use the Temp directory, which is a directory on your Mac where you can save temporary files in little commands like this. It's going to save it to a file called filelist.temp.  The second thing it does is it sends it to the next command. So tee as like a t-joint in plumbing. Two different things with the same data. Then, we're going to cut and basically get two fields, 1, 2 and it's going to look for divisions of a space. So, it's going to get this and this. The checksum and the file size for everything. That's what we've got after this part. So let's try that out.

So we can see that's exactly what we get. It takes away the file name. Great. What's the next part. Well, the next part is it is going to Sort. So let's try that. Now we can see it does it. It's sorted. It was already sorted so we're not going to see any change here but if these weren't in alphabetical order they would be. By sorting anything that's a duplicate is going to be right next to the other thing that's a duplicate because they are the same. So sorting puts duplicates next to each other. Now the next thing we're going to do is we're going to look for lines that are not unique. We're going to show the duplicate for any line. So we just get this one line here because the other one was unique and this one was part of a duplicate. So it shows that. So great. So now we've found every duplicate.

Next thing we want to do is use grep to go back to that file. Remember we saved the original data from the find here. So we're going to go back and use grep to say, show us any lines that match the beginning of whatever comes out of the unique part. So in other words it's going to find the lines that are there. If there are two files that are the same it's going to find both of them because both will have the same checksum and the same file size. So we use that and we see that it found two lines that start with that. This one and this one. The last thing we want to do is we want to sort again because if we find a lot of duplicates they're not going to be in any particular order. It may list the first duplicate as the first line and second duplicate as the fifteenth line. But by sorting it will put the duplicates together. We want to Sort By number. We're going to sort by file size. We want the biggest ones to be at the top. We're going to do Reverse Sort that puts the biggest at the top. K2 means the second key. So not sorting by this, the checksum. But sorting by the second thing.

So now we put that all together and we get this result. Not only finds these two duplicates but what if we were to go up a level, do CD up here, and now we look where we are. We're at the Documents folder. So let's run that command again and we can see it actually found a bunch of things. Because they are sorted everything is grouped together. We can see the first three things here are three identical files. Then we can see some duplicates right here in two different locations. Then we can see another set of three identical files. There's three. There's two. Now we've identified a bunch of places in the Documents folder here where we have found files that are duplicates.

I'll include this full code right here for the command at this post at MacMost.com. I hope you find it useful.

Here’s the code for the Terminal command. This should all be on one single long line.

find . -type f -size +1M -exec cksum {} \; | tee /tmp/filelist.tmp | cut -f 1,2 -d ' ' | sort | uniq -d | grep -hif - /tmp/filelist.tmp | sort -nrk2; rm /tmp/filelist.tmp

Note: I have added a part at the end that is not in the video to delete the temporary file to keep things cleaner.

Comments: 38 Comments

Jacques Huot

3 years ago

Hello. Did a copy, paste of the code into "terminal" nothing happened. Apologies for what will, no doubt, turn out to be a dumb question.
Love your content.
Jacques

Scott Palluth

3 years ago

This is a great tool. Thank you. Is there a way that a shortcut can be madero run this as a shortcut command "duplicates'?

Gary Rosenzweig

3 years ago

Scott: You should be able to do it as a Shortcut, yes. Try it.

Gary Rosenzweig

3 years ago

Jacques: did it all appear as one long line? Did you press return after pasting it?

Wes W

3 years ago

The command as-written works, but was reading several hours on my flash at about 380-MB/s before I killed it (many TB). Swapping out cksum for md5, check sums are now reading at about 555-MB/s (46% faster). Installed md5 as part of "brew install md5sha1sum" - otherwise, work great, thanks Gary.

chris pearce

3 years ago

Is there a limitation of not working on the icloud drive data, or is there a trick to getting the path right. I can get the local documents working but the cloud stoared files seem to be overlooked. possible path issues. Any hint?

Gary Rosenzweig

3 years ago

Chris: just start in the top level of iCloud Drive and it should work. But I wouldn't try it if you have "Optimize" turned on, as there is no way for the checksum to work right if the file isn't really local.

3 years ago

This is great. I can get a lot of use out if it. But I am getting an error: "grep: -: No such file or directory". I think this refers to the "grep -hif -". What is the purpose of the '-' at the end before the filename? Could that be my issue? I am on macOS 11.6, could that be the case?

Thanks.

Gary Rosenzweig

3 years ago

Al: Make sure the path after the dash in the grep is correct. That's what the error is telling you.

Murray Walker

3 years ago

I too have the same issue as AI above, in that it shows the same error "grep: -: No such file or directory"...I have cut and pasted from above so its not a syntax error...any ideas Gary?? Thanks for all your great videos btw, theyve been so helpful in getting my new MacBook Pro under control!!!

Gary Rosenzweig

3 years ago

Murray: If the error is the same, then it sounds like the /tmp/filelist.tmp is not being created. Maybe you just don't have any duplicates at all?

Dan

3 years ago

Gary,
I get zsh: permission denied: whenever I try the terminal command. Any ideas?

Gary Rosenzweig

3 years ago

Dan: Maybe try it piece by piece to see where the problem is.

Kay Fisher

3 years ago

I ran it. Quite an eye opener. Most of my dupilicates were from FCP
3765589312 357053440 ./Movies/FCP Cache/FCP 2021 Cache.fcpcache/Sailing 28-Oct-2021/Render Files...
3765589312 357053440 ./Movies/FCP Cache/FCP 2021 Cache.fcpcache/Sailing 28-Oct-2021/Render Files...
But I got a lot of stuff from: ./Library/Trial/Treatments
I googled that but it looks like Apple is dropping the ball here!

Gary Rosenzweig

3 years ago

Kay: I would really just run this on your Documents folder, not your Home folder. You don't have to worry about things in the Library or inside of project/library files. I should have stressed that more in the video.

Taylor Francis

3 years ago

I'm getting the "grep: -: No such file or directory" error. I've double checked the file name spelling AND I've guaranteed there are duplicates.... any ideas??? Thanks!

Gary Rosenzweig

3 years ago

Taylor : perhaps you aren’t getting any duplicates?

Michael Cliff

3 years ago

Gary - I'm getting the same problem as Taylor, Murray and Al - "No such file or directory" error. I've also made sure there are duplicates and done it step by step as in your video. There are no errors until the "grep" command. I'm running Catalina 10.15.7 on a MacBook Pro. I am not used to using Unix so would appreciate any more help. In general thanks for all your ideas and suggestions.

Gary Rosenzweig

3 years ago

Michael: Wait, you are in Catalina? That could explain it. Also, are you using zsh, or bash in terminal? What I'm showing here is using zsh, the default shell in Terminal in Monterey.

Taylor Francis

3 years ago

I'm running zsh, but my OS is Big Sur...

Gary Rosenzweig

3 years ago

Taylor: Could be it, no way for me to test to be sure.

Michael Cliff

3 years ago

Gary - I'm using zsh in terminal and still can't get it to work. It works as far as the end of "uniq-d". I made a file called 888.txt with the output of "uniq-d" and then ran "% cat 888.txt | grep -hif - /tmp/filelist.tmp" and got the missing file error and realised it was looking for a file called"-" so I removed this "-" after the -hif and it ran with no error but also with no output.

Jim Owens

3 years ago

Hi Gary. Is there any way to eliminate duplicate photographs? Thanks!

Gary Rosenzweig

3 years ago

Jim: See https://macmost.com/forum/how-do-i-remove-duplicate-photos-from-imac.html

Charles Holder

3 years ago

Attempting to run this tool, I found difficulties because of “operation not permitted”. Changing privacy settings to allow terminal full disk access improved the situation but not completely. Running in documents, “operation not permitted” still appeared for an older machine ‘documents’ residing inside the current machine/user documents. Why would this older machine ‘documents’ not be accessible with full disk access enabled? It does contain a PW restricted folder containing health data.

Charles Holder

3 years ago

I was able to run this tool after ‘cleaning up’ the drive to remove seeming remnants of the old drive, PW protected stuff, and downloading all iCloud files. I will next change the size specification to get to duplicates smaller than 1M.

Your UTube guides are greatly appreciated; always clear and succinct.

Grant

3 years ago

Very useful thanks. Would be even better if the duplicate file names were saved to a file with 'clickable' locations to access them easily.

Steve Urich

3 years ago

I am also getting the error message about no file found. Believe me I have duplicates. I run Gemini duplicate finder and it shows plenty. But it does not make it easy to get rid of the files that are not where they belong.

Any recommended troubleshooting steps?

I am going to try and run it one step at a time like you did in the video, but if there are many users that are getting the same error then their might be a common problem.

Gary Rosenzweig

3 years ago

Steve: Are you sure you are running the Terminal command in the right place? If that other app finds something in folder A and you are running the shell script in folder B, then you are looking in two different places.

Rune Schjoenning

3 years ago

Hi Gary thanks for excellent instruction. Every line in the script works perfectly until the "grep" instruction.

find . -type f -size +1M -exec cksum {} \; | tee /tmp/filelist.tmp | cut -f 1,2 -d ' ' | sort | uniq -d | grep -hif - /tmp/filelist.tmp | sort -nrk2
grep: -: No such file or directory

The filelist.tmp is generated with duplicate files in it. I run OS High Sierra. Terminal Version 2.8.3 (404.1)

Thanks again kind regards Rune

Gary Rosenzweig

3 years ago

Rune: Probably because you are running such an old version of macOS.

Rune Schjoenning

3 years ago

Yes probably. I tried a simpler 'grep' (picking out a text from a txt file), first from a dokument folder, then I moved the txt file to the tmp folder. Dragged the folder to terminal, and got a path similar to what I had tried earlier. No trouble picking the text from there either. So the path seems to be working, but maybe a more complex grep command is the problem on my old OS? kind regards Rune

Sharyn

3 years ago

Hi Gary, will this work on Ventura? Great content by the way, thank you.

Gary Rosenzweig

3 years ago

Sharyn: The Terminal command? It should. Try it and see.

Steph

3 years ago

Hi Gary thanks so much for this great tutorial. It worked great. Is there a way to select and move the resulting list (or better the files named on that list) to another folder for review (or the bin)?

Gary Rosenzweig

3 years ago

Steph: Yes. You'd need to use mv instead of rm in the last part to move the file instead of remove it. Not sure of the exact syntax so you'd need to do some research and try some things out.

Bob Sander-Cederlof

3 years ago

The problem some are having with the grep command is because an older version of grep which does not support the "-" as the tile specifier after -f. By changing the uniq and grep parts to the following it will work on older OS/grep
| uniq -d > /tmp/keys.txt
| grep -hif /tmp/keys.txt /tmp/filelist.txt

This now works on my Mojave mac in bash.

Bob Sander-Cederlof

3 years ago

Sorry, I wrote the .txt file extension, when it should have been .tmp, in my previous comment.
| uniq -d > /tmp/keys.tmp
| grep -hif /tmp/keys.tmp /tmp/filelist.tmp

Comments are closed for this post.

2 Ways To Find Duplicate Files On a Mac

Video Transcript

Comments: 38 Comments

Welcome to MacMost

Free Weekly Newsletter

MacMost Online Courses

Keyboard Shortcuts PDF

Connect with MacMost

MacMost Sections

Popular Tutorials

Information