Creating a unique system identifier

Introduction

A couple of days I was working on a project, that required me to generate an identifier, that uniquely identifies a (linux) system. This identifier had to stay the same after rebooting, logging in as different user, modifying network connections, installing/uninstalling software etc. So, I started looking … and it turns out to be more challenging than I thought …

Identifier Types

Essentially, a unique system identifier can be generated from (multiple) software identifiers, hardware identifiers, or a combination of both.

Hardware Identifiers

Hardware identifiers are great in the sense that they are unlikely to change (kind of). Drives, installed memory, connected peripherals can change fairly often. Therefore, those aren’t a great option for generating a unique identifier. Hardware like the mainboard and the CPU however, will always be the same. If they do get changed, then the system is a different one in the first place, and should be identified as such.

Software Identifiers

There also a thousand different attributes, one can use to generate a software based identifier. However, installed software, network interfaces and user accounts aren’t great options, because they can change often. However, some properties, like the inode value of the root or etc directory, the root partition size or the creating date of the root partition are unlikely to change, under normal circumstances. And if they do change, then the changes are (in my opinion) severe enough, to consider the system “different”.

Existing solutions

I looked around, what options for unique system identifiers are available. The most commonly recommended option was /etc/machine-id. This lead me to the blog post On IDs by Lennart Poettering, the creator of systemd. He found himself with the same problem, that a lot of the available identifiers aren’t a good option. This is mostly because a lot of identifiers are either not unique, not always available, or the vendor chose to put some generic value for the serial numbers.

So … why not use /etc/machine-id? There are two main reasons:

It is only available on systemd-based systems.
It does not change when hardware is modified.

Therefore, I decided to try generating a unique system identifier myself.

What attributes?

In order to design a unique system identifier, I had to decide which hardware and software attributes to use and establish the circumstances under which it would be acceptable for the unique identifier to change.

I settled on the following:

No alteration of unique identifier:

System reboot
Software installed/removed
Network interfaces modified
Locale changed
MAC-address changed
Hardware (excl. CPU) is added/removed/modified.

Alteration of unique identifier:

CPU replaced
OS re-installed
Root partition resized

With this in mind, I decided on a combination of the following attributes.

CPU model name

To obtain the CPU model name, we can parse the /proc/cpuinfo file, which is present on essentially every linux system. In case the file doesn’t exist, the function returns Unknown. If the CPU is replaced, then this would obviously result in a different model name, which is fine.

Root partition size

The root partition size can be obtained by reading struct statfs, and multiplying the block size with the total block count. Every system should have a root partition, and in case the stat syscall fails, the function returns 0.

Root partition creation time

The root partition creation time can be obtained by reading the stx_btime.tv_sec field from struct statx. This assumes we are running linux kernel 4.11 or later. The root partition creation time is set when the system is installed and will never change under normal circumstances.

Inode value of /etc and /bin

The inode values of system directories are assigned during the OS installation, and will never change under normal circumstances. The /bin and /etc directories are considered standard directories and are present on virtually all linux systems.

In summary, we now have the following identifiers:

Attribute	Description	Size	Availability	Affected by
CPU model	The model name of the installed CPU	Varies	Almost always	CPU replacement
Root Partition Size	The total size of the root partition in bytes	64 Bits	Always	Resizing `/`
Root partition creation time	The Unix timestamp of the root partition’s creation time. Specified during OS installation.	64 Bits	Always >=4.11	OS reinstall, Manual modification
`/etc` inode	The inode value of the configuration file directory	64 Bits	Always	OS reinstall, Manual modification
`/bin` inode	The inode value of the binary directory	64 Bits	Always	OS reinstall, Manual modification

Now, that we collected all the attributes needed, the last step is to map all of them to a fixed size alpha-numeric value. We achieve this, by concatenating all attributes, and computing the MD5 hash value of the combined string. (MD5 should be acceptable in this scenario).

The code of my implementation is available on my GitHub.

Conclusion, Remarks and Limitations

I am fully aware that this implementation is by no means perfect. I also don’t claim it is better than /etc/machine-id or other implementations. However, I needed an identifier that fullfills my criteria:

Works on wide range of linux systems
- Not limited to systemd distributions
Persistent across reboots
Depends on hardware and software attributes
Unlikely to change, unless the user explicitly tries to

I purpousefullu excluded the MAC-Address from the attributes. A lot of systems nowadays randomize their MAC address on boot, to prevent being tracked on networks. I am also aware that my solution requires does not conform to the POSIX standard, because of the use of struct statx, which was introduced in Linux kernel 4.11. Since I don’t aim for any standardization, I think my approach is fine, for what it is. I just want that a piece of software can figure out, if it was executed on that linux system already, or not.

security

hardware

software

linux