[LU-13118] change client instance to respect ASLR Created: 08/Jan/20  Updated: 28/Sep/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Upstream

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: James A Simmons
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11803 sanity test 255c fails with 'Ladvise ... Resolved
is related to LU-11809 conf-sanity test 28A hangs on file sy... Resolved
is related to LU-12511 Prepare lustre for adoption into the ... Open
is related to LU-13499 client UUID is truncated Resolved
is related to LU-12521 print_instance() incorrect if fsname ... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

The client mount currently uses the superblock address as the unique configuration instance ("unsigned long cfg_instance" in the code) to distinguish multiple client mountpoints on the same node for the purpose of processing configuration records, which are treated as independent Lustre clients from the server point of view.

This cfg_instance can actually be any unique value as it is mostly just used as an identifier in messages and tunable parameters in procfs and sysfs. One place where it is actually checked is config_log_find(), but it only compares for identical instance numbers, and does not interpret the value itself.

To comply with ASLR requirements, it would be better to use some other identifier for cfg_instance. It does not need to be unique across all clients, only within a single client. One option would be a 64-bit random number, which has about a 1-in-4B chance of collision for multiple mounts on the same client, which is generally uncommon, but not impossible to hit. Another option would be to use the client mount UUID, but this is a much larger value and changes would be needed to many parts of the code. A compromise might be to use the 16-byte UUID folded over itself (i.e. first half XOR second half) to fit into an 8-byte value.

On the server, class_config_llog_handler() and config_log_find_or_add() use cfg_instance to pass an obd device reference, but this is only used to get the obd_name field. We may be able to use config_log_find() on the server to find the obd device back again? We might consider to just add the obd_name string into the cfg_instance itself? If we also added the client hostname into the cfg_instance this might be convenient to allow having tunable parameters that could be client specific (e.g. "llite.*_node27.max_pages_per_rpc"), but that might be overloading this too much.

On the test front, the cfg_instance is used to distinguish client vs. server OSC devices, mostly using "*-osc-[^M]*" or variants of this. There are a few tests that are using "*osc-[\-0\-9a\-f]*" but they should be made consistent.



 Comments   
Comment by Andreas Dilger [ 13/Jan/20 ]

Actually, a good option here would be to change this to use the ASLR remapping of the superblock address. That would be guaranteed unique, and also makes it easier to relate to other cases in the debug logs. The reason that this was a problem originally is because the raw "%p" usage resulted in a string with leading spaces that screwed up parsing. If we can get the remapped address in the form of an integer that would work well.

Comment by Andreas Dilger [ 13/Jan/20 ]

I now recall after reading LU-11809 that during early boot the kernel RNG isn't seeded enough to generate a random value for ASLR. That makes me think that folding the 128-bit client mount UUID onto itself to make a 64-bit instance number is the best path forward. Since the instance number has no other meaning and changes on every mount, we can always use something else in the future.

Generated at Sat Feb 10 02:58:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.