At what hard link count is a file effectively deleted?

how do I trace the original file or every hardlink using /usr/bin/bash file as reference

With GNU find (or any other version of find that has the -samefile option), and assuming that /usr/bin/bash is located on the / filesystem, this is correct:

find / -xdev -samefile /bin/bash

Use -xdev since hard links can't cross filesystem boundaries. Do not redirect errors: if you don't have permission to traverse a directory, a hard link could be present under that directory and you'd miss it.

The mistake you're making is that you're looking for another hard link that doesn't exist. You actually have the information to know that it doesn't exist:

1310813 -rwxr-xr-x 1 root root 1183448 Jun 18 21:14 /bin/bash*
                   ^

The hard link count of /bin/bash is 1.

There is a single file in / which is the same as /usr/bin/bash. The file /bin/bash is the same as /usr/bin/bash for a different reason: the directories /bin and /usr/bin are the same file. Since find / -samefile /bin/bash points to /usr/bin/bash, /bin has a symbolic link to /usr/bin. More precisely, from the information in the question, and assuming that /bin is not a directory hard link (a poorly supported, rarely used feature), we know that /bin is a symbolic link which resolves to /usr/bin; it could be a symbolic link to another symbolic link and so on eventually resolving to /usr/bin or to some equivalent path such as ///////usr/bin/, but most likely it's a symbolic link whose target is /usr/bin.

Looking for all symbolic links to a file on the whole system is not particularly productive. For example, on Linux, there's a file /proc/*/exe which is a symbolic link to /usr/bin/bash (or /bin/bash) for every process that's running bash. And if you look for symbolic links to a directory, you'll end up inside infinite recursion, for example with /proc/*/root pointing to / (except for chrooted processes).

If you need to know whether two paths point to the same file, on Linux, you can use either of

[ /bin/bash -ef /usr/bin/bash ]
test /bin/bash -ef /usr/bin/bash

(-ef isn't POSIX but it's in dash, bash, BusyBox and GNU coreutils). If you need to get a canonical path to a file, in the sense that distinct files always have distinct canonical names, you can use

readlink -f /bin/bash

(This can miss files that are equal via mounted directories, for example if the same network location is mounted in two different ways.)

You may have heard the terms “hard link” and “soft link” used in the context of Unix and Unix-like operating systems. But do you know what they are and how you create them? In this post, we’ll look at the differences between hard links and soft links and understand how to create them.

Commands We’ll Be Using

It helps to know these Unix commands, but if you don’t, we’ll look at how to use them:

  • touch and echo (+ the output redirection operator >) for creating files
  • ls for listing files in a directory (use -a to show all files)
  • cat for printing the contents of a file
  • stat for viewing information about a file
  • ln for creating links (don’t worry if you’re not familiar with this)
  • readlink for printing the value of a link (more on that later)

We’ll use these to explore hard links and symbolic links in this blog post.

When we talk about files in English, we typically picture a folder, binder, or some other container that directly stores documents or information. But files in the computer sense are nothing more than named entries in a directory.

A file does not directly store or point to its data. Instead, a file points to an intermediate data structure in computer memory called an inode.

Each file is associated with an inode, and inodes are packed full of rich information about the file’s data, including:

  • File attributes:
    • size in bytes
    • user/group to which it belongs
    • date created
    • date last modified
    • read/write/execute permissions
  • Pointers to blocks on the hard disk containing the raw data.

So you can think of a file system roughly as follows:

Directory --> File (Name) --> Inode --> Raw Data on disk

Now, in the simplest terms, linking is the process of “referencing” or pointing to an inode in memory. Thus, we say that a link is a pointer to an inode. When we create a file for the first time, the name that we assign it becomes the first link to its corresponding inode. Diagrammatically speaking, the link is the arrow between a file and its inode:

File (Name) --> Inode

Let’s create a file with echo Hello, links > file and view information about its inode using the stat command:

At what hard link count is a file effectively deleted?

Observe the line that reads Inode: 4785074605327413. This is an inode number. Each inode has a unique numerical ID associated with it that’s generated when the inode is created.

Here’s the really interesting bit: There can be multiple links to a single inode. The link count of an inode tracks the number of files that are pointing to it. Above, we see that the inode associated with file has a link count of 1. This makes sense if you think about it—if creating a file in turn creates an inode somewhere in memory that’s associated with the file’s data, then surely the inode’s initial link count should be 1 and not 0.

Keep an eye on that 1—it’ll change as we begin experimenting with soft links and hard links.

A symbolic link (also known as a “soft link” or “symlink”) is a file like any other, but its data is special. Whereas regular files can be created at will—initially empty or with some contents—symbolic links cannot be created out of thin air. Rather, to create a symbolic link, you must associate it with some other file. Thus, a symbolic link’s raw data is actually the path (relative or absolute) to its target file.

To create a soft link on a Unix system, you use the ln (link) command and supply the -s flag (for “symbolic”), followed by the original file name and the name of the soft link, in that order:

At what hard link count is a file effectively deleted?

Now, let’s run the stat command again on both files:

At what hard link count is a file effectively deleted?

Observe the following:

  • The original file (file) and the soft link (softLink) have different inode numbers. This means that they’re actually two different files.
  • The original file’s link count didn’t change. Again, this is because the soft link is an entirely new file that points to a different inode.
  • The original file and the soft link have different file sizes. The original file’s contents are Hello, links (12 characters). Including the newline character from when we ran echo, this constitutes 13 bytes (hence Size: 13). As we mentioned above, a soft link’s data is the path of the original file. In this case, it’s just the string file, which has four characters and is therefore four bytes (hence Size: 4).

Let’s also look at their contents using the cat command:

At what hard link count is a file effectively deleted?

Even though the symbolic link’s underlying data is the path of the original file, running the cat command effectively resolves or follows the symbolic link and prints the contents of the original file: Hello, links instead of file. Naturally, this implies that if the original file’s contents change, the result of running cat softLink will also change.

Exercises:

  1. Change the contents of the original file. Note how its size changes, whereas the symbolic link’s file size remains the same. However, both print the same text when cat-ed.
  2. Create a file with a longer name. Then, create a symbolic link to that file. Can you determine what the size of the symbolic link will be in bytes?

Let’s see what happens if you create a soft link to a file that’s not in the same directory:

At what hard link count is a file effectively deleted?

This time, the symbolic link’s file size is no longer 4 bytes. Rather, it’s 10: the length of the string links/ (which is 6) plus the length of the target file name itself (4).

Because a symbolic link’s data is the path to the original file, there are two natural consequences:

  • If the original file is renamed or moved to a different directory, the soft link will “break.”
  • If the original file is deleted, the soft link will “break.”

Here’s an example showing what happens when we move the original file up one directory:

At what hard link count is a file effectively deleted?

Notice these two lines in particular for the soft link:

File: softLink -> file
Size: 4

The symbolic link is unaware of the fact that we moved the original file! So what happens if we cat the two files?

At what hard link count is a file effectively deleted?

While the original file’s contents are printed just fine, the terminal hints that something is wrong. We can see this with the ls command—the soft link’s name now appears in red to indicate that it’s gone “bad.” This is known formally as link rot.

Before we move on to discussing hard links, note that there’s an additional command you can use: readlink. According to the man page for readlink, this command prints the value of the symbolic link, which we know to be the path of the target file. Let’s run this on our rotten symlink:

At what hard link count is a file effectively deleted?

And there’s our problem! The symbolic link is still pointing to the original file name, in the same directory. But it no longer exists because we moved it up one directory level.

On the other hand, a hard link acts as an alias for the target file. It has the same file size and the same inode number but a different name. Creating a hard link for a target file will increment the link count for that file’s inode. For these reasons, hard links are also known as physical links.

To create a hard link in Linux, we use the ln command and supply the -P flag (for “physical”):

At what hard link count is a file effectively deleted?

Notice that both the original file and the hard link are 13 bytes in size, have the same inode number, have the same permissions, and have a link count of 2. There are two links to the original file’s inode: the original file itself, and the hard link we just created manually. In fact, notice that the results of stat-ing both the original file and the hard link are identical.

Unlike a soft link, a hard link will not rot if we change the original file’s name or move it to a different directory because it points to that file’s inode, whereas a soft link references the file’s path. It also will not rot if we delete the original file. Here’s an example of moving the file:

At what hard link count is a file effectively deleted?

Let’s delete the target file and cat the hard link:

At what hard link count is a file effectively deleted?

Interesting…

If we think back to what “deleting” a file really means, this should make sense: A file is not truly deleted until its corresponding inode’s link count reaches zero. In this case, creating a hard link for the file increments its inode’s link count to 2. When we delete the original file, the link count goes down to 1. Only if we now delete the hard link will the file reach a link count of zero and disappear.

It’s worth mentioning that hard links have two limitations that symbolic links do not:

  • You cannot create a hard link to a directory, whereas you can create a symbolic link to a directory.
  • You cannot create a hard link to a file that’s on a different volume/disk partition.

It’s a tradeoff: While symbolic links do not face these limitations, they are prone to rotting if the original file is renamed, moved, or deleted.

So far, we’ve looked at creating hard links and soft links to plaintext files. More often, you’ll be creating links to executables in Unix.

Recall from before that running cat on a soft link or hard link would essentially “follow” that link to the underlying file’s inode and print its contents. This isn’t behavior unique to the cat command, though. If we invoke any other command on a link, or we try to run it as an executable, it’ll once again resolve itself to the referenced file.

If you take a look at /usr/bin/, you’ll find many soft links to executables:

At what hard link count is a file effectively deleted?

You can also create a custom link:

At what hard link count is a file effectively deleted?

As expected, invoking the symlink invokes the underlying executable.

Additional Exercises

Try these out on your end:

  1. What do you expect will happen if you change the permissions of a hard link using chmod? What about changing the permissions of a soft link?
  2. What happens if you create a hard link to a soft link?

Further Reading

Soft links and hard links aren’t as mysterious as they may seem at first—they just offer two similar (but notably different) ways to reference files on an operating system.

Here are some additional resources on hard links vs. soft links:

  • Hard vs Soft Links in Linux (Linux Links)
  • Explaining Soft Link And Hard Link In Linux With Examples
  • What is the difference between a symbolic link and a hard link?
  • Modern Operating Systems by Tanenbaum, Chapter 4.2.4
  • How to take advantage of symbolic links in Windows 10

Attributions

Social media preview: Photo by Sandy Millar (Unsplash).

What's the maximum file size when writing data to a FAT32 drive?

The maximum possible size for a file on a FAT32 volume is 4 GB.

What was the early standard Linux file system?

Linux initially used the MINIX file system, and very early distributions relied on that. The extended file system quickly took over, followed by Ext2 and xiafs (which was never much developed, and ultimately disappeared in favour of Ext2).

What Macos system application tracks each block on a volume to determine which blocks are in use and which ones are available to receive data?

Chapter 6-13 multiple choice.

At what distance can the EMR from a computer monitor?

First of all, computer monitors do emit a relatively small amount of EMF radiation at reasonable distances. So the absolute best thing you can do is keep at least a reasonable distance (3 feet or more) between you and the monitor whenever possible.