[LU-17174] lustre hashes broken now. Created: 09/Oct/23 Updated: 02/Dec/23 Resolved: 29/Nov/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Alexey Lyashkov | Assignee: | Alexey Lyashkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
lustre hash functions broken now after landing static __always_inline u32 cfs_hash_64(u64 val, unsigned int bits) { #if BITS_PER_LONG == 64 / 64x64-bit multiply is efficient on all 64-bit processors / return val * GOLDEN_RATIO_64 >> (64 - bits); #else / Hash 64 bits using only 32x32-bit multiply. / return cfs_hash_32(((u32)val ^ ((val >> 32) * GOLDEN_RATIO_32)), bits); #endif } static unsigned ldlm_export_flock_hash(struct cfs_hash hs, const void key, unsigned mask) { - return cfs_hash_u64_hash(*(__u64 *)key, mask); + return cfs_hash_64(*(__u64 *)key, 0) & mask; } this change means we have shift for 64bits for any result, it caused a return zero/0xfffff... at any input and warning with debug kernel. [10939.945272] ================================================================================ [10939.946792] UBSAN: Undefined behaviour in include/linux/hash.h:81:31 [10939.948193] shift exponent 64 is too large for 64-bit type 'long long unsigned int' [10939.949869] CPU: 2 PID: 384127 Comm: ll_mgs_0002 Tainted: G B W OE ---------r- - 4.18.0-305.25.1.el8_4.x86_64+debug #1 [10939.952333] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014 [10939.954274] Call Trace: [10939.954823] dump_stack+0x8e/0xd0 [10939.955543] ubsan_epilogue+0x5/0x21 [10939.956329] __ubsan_handle_shift_out_of_bounds.cold.13+0x14/0x98 [10939.957581] ? rcu_read_unlock+0x50/0x50 [10939.958418] ? lock_acquired+0x6c6/0xe60 [10939.959367] ? lprocfs_stats_lock+0x15d/0x1b0 [obdclass] [10939.960699] ldlm_export_lock_hash+0x49/0x4d [ptlrpc] [10939.961715] cfs_hash_bd_from_key+0x88/0x2e0 [libcfs] [10939.962821] cfs_hash_add+0xef/0xb60 [libcfs] [10939.963830] ? class_handle_hash+0x274/0x5f0 [obdclass] [10939.964961] ? cfs_hash_rehash+0x7a0/0x7a0 [libcfs] [10939.966167] ? ldlm_lock_create+0x734/0x1e20 [ptlrpc] [10939.967287] ldlm_handle_enqueue+0x8dc/0x48e0 [ptlrpc] [10939.968352] ? do_raw_spin_unlock+0x14b/0x230 [10939.969377] ? ldlm_setup+0x1af0/0x1af0 [ptlrpc] [10939.970509] ? __req_capsule_get+0x7ff/0x11f0 [ptlrpc] [10939.971710] ? lustre_swab_ldlm_lock_desc+0x230/0x230 [ptlrpc] [10939.972973] tgt_enqueue+0x148/0x5a0 [ptlrpc] [10939.974139] tgt_request_handle+0x179c/0x3ff0 [ptlrpc] [10939.975403] ? tgt_brw_write+0x5a00/0x5a00 [ptlrpc] [10939.976594] ptlrpc_server_handle_request+0xa34/0x1f50 [ptlrpc] [10939.977936] ? lu_context_exit+0x15a/0x2c0 [obdclass] [10939.979045] ptlrpc_main+0x1ae0/0x2f40 [ptlrpc] [10939.980062] ? __kthread_parkme+0xc4/0x190 [10939.981068] ? ptlrpc_wait_event+0xf40/0xf40 [ptlrpc] [10939.982107] kthread+0x344/0x410 [10939.982811] ? kthread_insert_work_sanity_check+0xd0/0xd0 [10939.983941] ret_from_fork+0x3a/0x50 [10939.984776] ================================================================================ |
| Comments |
| Comment by Gerrit Updater [ 10/Oct/23 ] |
|
"Alexey Lyashkov <alexey.lyashkov@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52611 |
| Comment by Gerrit Updater [ 29/Nov/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52611/ |
| Comment by Peter Jones [ 29/Nov/23 ] |
|
Landed for 2.16 |
| Comment by Alex Zhuravlev [ 01/Dec/23 ] |
|
I noticed slowdown running FSTYPE=zfs ONLY=123f bash sanity.sh: just checked the local runtime for subtests: sanity-benchmark@bonnie 77 349 sanity-quota@38 472 693 sanity@103e 630 754 sanity@123e 21 114 sanity@123ab 58 145 sanity@55b 268 337 conf-sanity@135 82 137 sanity@123aa 103 151 sanity@51e 134 171 sanity-lfsck@10 77 107 conf-sanity@48 324 352 sanity@60c 19 41 |
| Comment by Alexey Lyashkov [ 01/Dec/23 ] |
|
Alex can you look into lu_site stats ? |
| Comment by Alex Zhuravlev [ 02/Dec/23 ] |
|
yes, sure, though I guess you can do as well, nothing special.. with: w/o: |