First signs of a problem
I copied my Movies directory from my old Mac to my new Mac, using target disk mode. On my old Mac, the Movies directory took up 84G. But on my new Mac, it took up 149G. What was going on?
My Movies directory contained hard links, which I wrote about last time.
The Movies directory contained 65G of files that were hard-linked to other files also within the Movies directory. When I copied them the usual way (by drag and drop, or cp
), the hard linked files were copied one time for each hard link. So tons of duplication, tons of wasted space.
Investigating hard links
The command du
, which calculates disk usage of a directory, is useful for understanding what hard links you have and where.
If I look at the disk usage for the Movies directory, I see:
$ du -sh Movies
84G Movies
Here’s what that means:
du
– display disk usage statistics-s
– provide the total disk usage for each file/directory specified on the command line-h
– provide it in human readable form (as GB, rather than 84123456789 bytes)Movies
– tell me about the Movies directory
I get consistent information if I ask du
about the top-level directories inside Movies:
$ du -sh Movies/*
80G Movies/iMovie Library.imovielibrary/
4.0G Movies/iMovie Events.localized/
301M Movies/iMovie Projects.localized/
8.0K Movies/iMovie Theater.theater/
But if I look at the top-level directories individually, I see something different:
$ du -sh "Movies/iMovie Library.imovielibrary/"
80G Movies/iMovie Library.imovielibrary/
$ du -sh "Movies/iMovie Events.localized/"
69G Movies/iMovie Events.localized/
80G + 69G = 149G. This is much bigger than the 84G that du
claims the Movies directory contains.
It turns out there are hard-links between “iMovie Library.imovielibrary” and “iMovie Events.localized”. du
keeps track of the inode for any hard-linked files it comes across, and only includes those inodes once in its calculations. See this discussion on Stack Exchange
So “iMovie Events.localized” contains 65G of files that are hard-linked with files in “iMovie Library.imovielibrary”, and only 4G that belong to it independently.
I wanted to make sure that when I copied the Movies directory, I did not duplicate any hard linked files. So first, I needed to identify where all the hard links went.
The magical find
and hard links
find
is the Swiss army knife of Unix.
$ find Movies -type f -links +1 -ls | wc
908 16037 160068
The arguments mean:
find
– walk a file hierarchyMovies
– begin in the Movies directory-type f
– only look at files, not directories.-links +1
– tell us about files that have more than 1 link (that is, 2 or more links)-ls
– give the long listing for each matching file| wc
– pipe the answer to word count
So the output of that command tells us that there are 908 files in the Movies directory that are hard-linked to something.
If we then ask how many files have more than 2 links:
$ find Movies -type f -links +2 -ls | wc
0 0 0
We find that no file has more than 2 links.
Alternately, we could have asked how many files have exactly 2 links:
$ find Movies -type f -links 2 -ls | wc
908 16037 160068
There are 908 files with exactly 2 links, and therefore none with more than 2 links.
Using find
to get related hard links
Now let’s check where those files are linked to. Let’s look at the first hard linked file:
$ find Movies -type f -links 2 -ls | head -1
3373844 172976 -rw-r--r-- 2 sasha staff 88560000 Jun 22 2008 Movies/iMovie Events.localized/raspberries/clip.dv
The first column is the inode number, and the last column (which has wrapped to the next line) is one of the filenames pointing to that inode.
This one has inode 3337844, and filename Movies/iMovie Events.localized/raspberries/clip.dv
.
We can check where the other hard linked file is by searching by inode explicitly (-inum
):
find Movies -type f -inum 3373844 -print
Or searching for other files linked to the same inode as a particular filename (-samefile
):
$ find Movies -type f -samefile "Movies/iMovie Events.localized/raspberries/clip.dv" -print
In either case, I get the following result:
Movies/iMovie Events.localized/raspberries/clip.dv
Movies/iMovie Library.imovielibrary/raspberries/Original Media/clip.dv
So I know that some files in iMovies Events.localized
are linked to files in iMovie Library.imovielibrary
.
Note that find
only searches the directories it is told to search. In this case, it is searching the Movies
directory. If a file is hard linked to a file outside the Movies
directory, find
will report that it has hard links, because the reference count is greater than one. But if the other hard link is outside the Movies
directory, then find
will not be able to locate it.
Depending on where the hard links are, you may need to back up to a higher-level directory to find them:
find ~ -type f -samefile "Movies/iMovie Events.localized/raspberries/clip.dv" -print
Using awk
to get total bytes
You can use awk
to calculate the total number of bytes that are hard-linked within various directories. This lets you see if you’ve found all of the corresponding hard-linked files. In the find
listing, column 7 contains the size in bytes of each matching file.
$ find "Movies/iMovie Events.localized" -type f -links 2 -ls | awk '{sum+= $7} END {print sum}'
69739632133
$ find "Movies/iMovie Library.imovielibrary" -type f -links 2 -ls | awk '{sum+= $7} END {print sum}'
69740948475
$ find "Movies/iMovie Projects.localized" -type f -links 2 -ls | awk '{sum+= $7} END {print sum}'
1316342
Then, you can use expr
to check that the numbers match up:
$ expr 69740948475 - 69739632133 - 1316342
0
For my case, the number of bytes of hard links in “iMovie Events.localized” is the exact same as the sum of the bytes in “iMovie Library.imovielibrary” and “iMovie Projects.localized”. This is a pretty good indication that all my hard links are contained entirely within the Movies directory.
Copying hard linked files
After you figure out where all your hard links are, you need to copy the top-level directory recursively, and maintain the hard links over the course of that copy. Both cp
and drag and drop will double-copy every hard-linked file. You need rsync
.
Here is the command I used
$ rsync -vaEH --protect-args --progress "/Volumes/Macintosh HD 1/Users/sasha/Movies/" /Users/sasha/Movies
This means:
rsync
– remote synchronize-v
– verbose-a
– archive mode – recurse into directories, and preserve symlinks, permissions, timestamps, owners, groups, devices, and special files-E
– preserve extended attributes (necessary for Macs)-H
– preserve hard links--protect-args
– properly handle filenames with spaces--progress
– show progress during transfer"/Volumes Machintosh HD 1/Users/sasha/Movies/"
– directory to copy from/Users/sasha/Movies
– directory to copy to
Here is the last tricky/annoying piece in this process. Mac OS X 10.11 ships with rsync 2.6.9, which does not protect filenames with spaces. This means that rsync will copy some files properly, and will fail on others with a pretty useless error message:
rsync: link_stat /Users/sasha/Movies/<blah> failed: No such file or directory (2)
If you search online, you will find suggestions to use complicated sets of backslashes and nested single and double quotes to get around this. This is a fragile solution. It may work, depending on where in the directory hierarchy the spaces are, but it may not.
The better solution is to download the latest version of rsync from homebrew (currently rsync 3.1.2), and then use the --protect-args
option, which protects spaces. To get rsync, you need the homebrew/dupes
tap, as described on Stack Overflow.
Hi
I notice a lot of duplicated files between iMovie and Photos.app. i.e. two files, with the same md5 checksum in both places. Can I safely replace one instance of the file, with a hard link, without corrupting the library?