Creating a unique system identifier
Introduction
A couple of days I was working on a project, that required me to generate an identifier, that uniquely identifies a (linux) system. This identifier had to stay the same after rebooting, logging in as different user, modifying network connections, installing/uninstalling software etc. So, I started looking … and it turns out to be more challenging than I thought …
Identifier Types
Essentially, a unique system identifier can be generated from (multiple) software identifiers, hardware identifiers, or a combination of both.
Hardware Identifiers
Hardware identifiers are great in the sense that they are unlikely to change (kind of). Drives, installed memory, connected peripherals can change fairly often. Therefore, those aren’t a great option for generating a unique identifier. Hardware like the mainboard and the CPU however, will always be the same. If they do get changed, then the system is a different one in the first place, and should be identified as such.
Software Identifiers
There also a thousand different attributes, one can use to generate a software based identifier. However, installed software, network interfaces and user accounts aren’t great options, because they can change often. However, some properties, like the inode value of the root or etc directory, the root partition size or the creating date of the root partition are unlikely to change, under normal circumstances. And if they do change, then the changes are (in my opinion) severe enough, to consider the system “different”.
Existing solutions
I looked around, what options for unique system identifiers are
available. The most commonly recommended option was
/etc/machine-id
. This lead me to the blog post On IDs by Lennart
Poettering, the creator of systemd. He found himself with the same
problem, that a lot of the available identifiers aren’t a good option.
This is mostly because a lot of identifiers are either not unique, not
always available, or the vendor chose to put some generic value for the
serial numbers.
So … why not use /etc/machine-id
? There are two main
reasons:
- It is only available on systemd-based systems.
- It does not change when hardware is modified.
Therefore, I decided to try generating a unique system identifier myself.
What attributes?
In order to design a unique system identifier, I had to decide which hardware and software attributes to use and establish the circumstances under which it would be acceptable for the unique identifier to change.
I settled on the following:
No alteration of unique identifier:
- System reboot
- Software installed/removed
- Network interfaces modified
- Locale changed
- MAC-address changed
- Hardware (excl. CPU) is added/removed/modified.
Alteration of unique identifier:
- CPU replaced
- OS re-installed
- Root partition resized
With this in mind, I decided on a combination of the following attributes.
CPU model name
To obtain the CPU model name, we can parse the
/proc/cpuinfo
file, which is present on essentially every
linux system. In case the file doesn’t exist, the function returns
Unknown
. If the CPU is replaced, then this would obviously
result in a different model name, which is fine.
Root partition size
The root partition size can be obtained by reading
struct statfs
, and multiplying the block size with the
total block count. Every system should have a root partition, and in
case the stat syscall fails, the function returns 0.
Root partition creation time
The root partition creation time can be obtained by reading the
stx_btime.tv_sec
field from struct statx
. This
assumes we are running linux kernel 4.11 or later. The root partition
creation time is set when the system is installed and will never change
under normal circumstances.
Inode value of /etc and /bin
The inode values of system directories are assigned during the OS
installation, and will never change under normal circumstances. The
/bin
and /etc
directories are considered
standard directories and are present on virtually all linux systems.
In summary, we now have the following identifiers:
Attribute | Description | Size | Availability | Affected by |
---|---|---|---|---|
CPU model | The model name of the installed CPU | Varies | Almost always | CPU replacement |
Root Partition Size | The total size of the root partition in bytes | 64 Bits | Always |
Resizing /
|
Root partition creation time | The Unix timestamp of the root partition’s creation time. Specified during OS installation. | 64 Bits | Always >=4.11 | OS reinstall, Manual modification |
/etc inode
|
The inode value of the configuration file directory | 64 Bits | Always | OS reinstall, Manual modification |
/bin inode
|
The inode value of the binary directory | 64 Bits | Always | OS reinstall, Manual modification |
Now, that we collected all the attributes needed, the last step is to map all of them to a fixed size alpha-numeric value. We achieve this, by concatenating all attributes, and computing the MD5 hash value of the combined string. (MD5 should be acceptable in this scenario).
The code of my implementation is available on my GitHub.
Conclusion, Remarks and Limitations
I am fully aware that this implementation is by no means perfect. I
also don’t claim it is better than /etc/machine-id
or other
implementations. However, I needed an identifier that fullfills my
criteria:
- Works on wide range of linux systems
- Not limited to systemd distributions
- Persistent across reboots
- Depends on hardware and software attributes
- Unlikely to change, unless the user explicitly tries to
I purpousefullu excluded the MAC-Address from the attributes. A lot
of systems nowadays randomize their MAC address on boot, to prevent
being tracked on networks. I am also aware that my solution requires
does not conform to the POSIX standard, because of the use of
struct statx
, which was introduced in Linux kernel 4.11.
Since I don’t aim for any standardization, I think my approach is fine,
for what it is. I just want that a piece of software can figure out, if
it was executed on that linux system already, or not.