Creating a unique system identifier

Introduction

A couple of days I was working on a project, that required me to generate an identifier, that uniquely identifies a (linux) system. This identifier had to stay the same after rebooting, logging in as different user, modifying network connections, installing/uninstalling software etc. So, I started looking … and it turns out to be more challenging than I thought …

Identifier Types

Essentially, a unique system identifier can be generated from (multiple) software identifiers, hardware identifiers, or a combination of both.

Hardware Identifiers

Hardware identifiers are great in the sense that they are unlikely to change (kind of). Drives, installed memory, connected peripherals can change fairly often. Therefore, those aren’t a great option for generating a unique identifier. Hardware like the mainboard and the CPU however, will always be the same. If they do get changed, then the system is a different one in the first place, and should be identified as such.

Software Identifiers

There also a thousand different attributes, one can use to generate a software based identifier. However, installed software, network interfaces and user accounts aren’t great options, because they can change often. However, some properties, like the inode value of the root or etc directory, the root partition size or the creating date of the root partition are unlikely to change, under normal circumstances. And if they do change, then the changes are (in my opinion) severe enough, to consider the system “different”.

Existing solutions

I looked around, what options for unique system identifiers are available. The most commonly recommended option was /etc/machine-id. This lead me to the blog post On IDs by Lennart Poettering, the creator of systemd. He found himself with the same problem, that a lot of the available identifiers aren’t a good option. This is mostly because a lot of identifiers are either not unique, not always available, or the vendor chose to put some generic value for the serial numbers.

So … why not use /etc/machine-id? There are two main reasons:

Therefore, I decided to try generating a unique system identifier myself.

What attributes?

In order to design a unique system identifier, I had to decide which hardware and software attributes to use and establish the circumstances under which it would be acceptable for the unique identifier to change.

I settled on the following:

No alteration of unique identifier:

Alteration of unique identifier:

With this in mind, I decided on a combination of the following attributes.

CPU model name

To obtain the CPU model name, we can parse the /proc/cpuinfo file, which is present on essentially every linux system. In case the file doesn’t exist, the function returns Unknown. If the CPU is replaced, then this would obviously result in a different model name, which is fine.

Root partition size

The root partition size can be obtained by reading struct statfs, and multiplying the block size with the total block count. Every system should have a root partition, and in case the stat syscall fails, the function returns 0.

Root partition creation time

The root partition creation time can be obtained by reading the stx_btime.tv_sec field from struct statx. This assumes we are running linux kernel 4.11 or later. The root partition creation time is set when the system is installed and will never change under normal circumstances.

Inode value of /etc and /bin

The inode values of system directories are assigned during the OS installation, and will never change under normal circumstances. The /bin and /etc directories are considered standard directories and are present on virtually all linux systems.


In summary, we now have the following identifiers:

Attribute Description Size Availability Affected by
CPU model The model name of the installed CPU Varies Almost always CPU replacement
Root Partition Size The total size of the root partition in bytes 64 Bits Always Resizing /
Root partition creation time The Unix timestamp of the root partition’s creation time. Specified during OS installation. 64 Bits Always >=4.11 OS reinstall, Manual modification
/etc inode The inode value of the configuration file directory 64 Bits Always OS reinstall, Manual modification
/bin inode The inode value of the binary directory 64 Bits Always OS reinstall, Manual modification

Now, that we collected all the attributes needed, the last step is to map all of them to a fixed size alpha-numeric value. We achieve this, by concatenating all attributes, and computing the MD5 hash value of the combined string. (MD5 should be acceptable in this scenario).

The code of my implementation is available on my GitHub.

Conclusion, Remarks and Limitations

I am fully aware that this implementation is by no means perfect. I also don’t claim it is better than /etc/machine-id or other implementations. However, I needed an identifier that fullfills my criteria:

I purpousefullu excluded the MAC-Address from the attributes. A lot of systems nowadays randomize their MAC address on boot, to prevent being tracked on networks. I am also aware that my solution requires does not conform to the POSIX standard, because of the use of struct statx, which was introduced in Linux kernel 4.11. Since I don’t aim for any standardization, I think my approach is fine, for what it is. I just want that a piece of software can figure out, if it was executed on that linux system already, or not.

security
hardware
software
linux