[LU-18049] ost-pools test_25, sanity-sec test_31: crash in ext4_htree_store_dirent kmalloc - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.16.0
Affects Version/s: Lustre 2.16.0
Labels:
- JSS

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/3576290d-064a-4573-a087-75b59fff6df7

test_25 failed with the following error:

trevis-106vm10, trevis-106vm11 crashed during ost-pools test_25

Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4542 - 5.14.0-362.24.1.el9_3.x86_64
servers: https://build.whamcloud.com/job/lustre-b_es6_0/666 - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64

Two clients both crashed in ext4_htree_store_dirent() (NOT ldiskfs) in kmalloc, so it looks like some kind of client-side memory corruption?

[27299.419062] Lustre: MGC10.240.44.44@tcp: Connection restored to  (at 10.240.44.44@tcp)
[27299.448580] LustreError: 886364:0:(client.c:3288:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req@ffff902e1402e3c0 x1802432488249280/t691489734702(691489734702) o101->lustre-MDT0000-mdc-ffff902e04449800@10.240.44.44@tcp:12/10 lens 520/608 e 0 to 0 dl 1718935667 ref 2 fl Interpret:RPQU/204/0 rc 301/301 job:'lfs.0' uid:0 gid:0
[27300.013294] Lustre: lustre-MDT0000-mdc-ffff902e04449800: Connection restored to  (at 10.240.44.44@tcp)
[27305.358931] Lustre: 886365:0:(client.c:2334:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1718935641/real 1718935641]  req@ffff902e547d9380 x1802432536672704/t0(0) o400->lustre-MDT0000-mdc-ffff902e04449800@10.240.44.44@tcp:12/10 lens 224/224 e 0 to 1 dl 1718935657 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0
[27306.104959] BUG: unable to handle page fault for address: ffff902ee5338778
[27306.105697] #PF: supervisor read access in kernel mode
[27306.106204] #PF: error_code(0x0000) - not-present page
[27306.107607] CPU: 1 PID: 1109213 Comm: bash Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-362.24.1.el9_3.x86_64 #1
[27306.108653] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[27306.109209] RIP: 0010:__kmalloc+0x11b/0x370
[27306.118475] Call Trace:
[27306.118756]  <TASK>
[27306.120625]  ? __die_body.cold+0x8/0xd
[27306.121006]  ? page_fault_oops+0x134/0x170
[27306.121437]  ? kernelmode_fixup_or_oops+0x84/0x110
[27306.121944]  ? exc_page_fault+0xa8/0x150
[27306.122371]  ? asm_exc_page_fault+0x22/0x30
[27306.122806]  ? ext4_htree_store_dirent+0x36/0x100 [ext4]
[27306.123359]  ? __kmalloc+0x11b/0x370
[27306.123740]  ext4_htree_store_dirent+0x36/0x100 [ext4]
[27306.124269]  htree_dirblock_to_tree+0x1ab/0x310 [ext4]
[27306.124809]  ext4_htree_fill_tree+0x203/0x3b0 [ext4]
[27306.125333]  ext4_dx_readdir+0x10d/0x360 [ext4]
[27306.125817]  ext4_readdir+0x392/0x550 [ext4]
[27306.126275]  iterate_dir+0x17c/0x1c0
[27306.126711]  __x64_sys_getdents64+0x80/0x120
[27306.128187]  do_syscall_64+0x5c/0x90

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
ost-pools test_25 - trevis-106vm10, trevis-106vm11 crashed during ost-pools test_25

Attachments

Issue Links

is related to

LU-18230 sanity-sec: crash in lov_connect/lmv_connect ->__kernfs_new_node->kstrdup->__kmalloc_track_caller

Resolved

LU-16307 sanity-sec: test_31: export for 10.240.26.216@tcp on MGS should not exist

Resolved

is related to

LU-13306 allow clients to accept mgs_nidtbl_entry with IPv6 NIDs

Resolved

mentioned in: Page Loading...

Activity

[LU-18049] ost-pools test_25, sanity-sec test_31: crash in ext4_htree_store_dirent kmalloc

Gerrit Updater added a comment - 24/Sep/24 11:37 PM

"Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56499
Subject: ~~LU-18049~~ obdclass: fix class_add_nids_to_uuid
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f2756870b791e434cd3a80720b6022575305c0bb

Gerrit Updater added a comment - 24/Sep/24 11:37 PM "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56499 Subject: LU-18049 obdclass: fix class_add_nids_to_uuid Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f2756870b791e434cd3a80720b6022575305c0bb

Gerrit Updater added a comment - 24/Sep/24 11:21 PM - edited

~~"Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch:~~ https://review.whamcloud.com/c/fs/lustre-release/+/56498
~~Subject: ~~LU-18049~~ mgc: fix memory corruption~~
~~Project: fs/lustre-release~~
~~Branch: master~~
~~Current Patch Set: 1~~
~~Commit: 9a673590a9023d7c567e8d5ae74819af4a35f6dc~~

Gerrit Updater added a comment - 24/Sep/24 11:21 PM - edited "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56498 Subject: LU-18049 mgc: fix memory corruption Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9a673590a9023d7c567e8d5ae74819af4a35f6dc

Sergey Cheremencev added a comment - 24/Sep/24 4:15 PM

adilger , that is what I'm doing right now. There is also a chance that the problem might come from internal functions. I found at least one suspicious place:

serega@Sergeys-MacBook-Pro repro % cat potential_fix 
index f8db9fc..96eb7bd 100644
--- a/lustre/obdclass/lustre_peer.c
+++ b/lustre/obdclass/lustre_peer.c
@@ -203,11 +203,11 @@ int class_add_nids_to_uuid(struct obd_uuid *uuid, struct lnet_nid *nidlist,
                        if (NID_BYTES(&nidlist[i]) > nid_size)
                                continue;
 
-                       entry->un_nid_count++;
                        memset(&entry->un_nids[entry->un_nid_count], 0,
                               sizeof(entry->un_nids[entry->un_nid_count]));
                        memcpy(&entry->un_nids[entry->un_nid_count],
                               &nidlist[i], nid_size);
+                       entry->un_nid_count++;
                }
                break;

But this didn't help. I'll send this patch later.

Sergey Cheremencev added a comment - 24/Sep/24 4:15 PM adilger , that is what I'm doing right now. There is also a chance that the problem might come from internal functions. I found at least one suspicious place: serega@Sergeys-MacBook-Pro repro % cat potential_fix index f8db9fc..96eb7bd 100644 --- a/lustre/obdclass/lustre_peer.c +++ b/lustre/obdclass/lustre_peer.c @@ -203,11 +203,11 @@ int class_add_nids_to_uuid(struct obd_uuid *uuid, struct lnet_nid *nidlist, if (NID_BYTES(&nidlist[i]) > nid_size) continue; - entry->un_nid_count++; memset(&entry->un_nids[entry->un_nid_count], 0, sizeof(entry->un_nids[entry->un_nid_count])); memcpy(&entry->un_nids[entry->un_nid_count], &nidlist[i], nid_size); + entry->un_nid_count++; } break; But this didn't help. I'll send this patch later.

Andreas Dilger added a comment - 24/Sep/24 4:06 PM

scherementsev I would suggest to add "CDEBUG(D_MALLOC, "nodemap =%pk\n"" kind of messages throughout these modified mgc functions to print all of the addresses of structures being accessed so that we can hopefully isolate where the corruption is coming from.

Andreas Dilger added a comment - 24/Sep/24 4:06 PM scherementsev I would suggest to add " CDEBUG(D_MALLOC, "nodemap =%pk\n" " kind of messages throughout these modified mgc functions to print all of the addresses of structures being accessed so that we can hopefully isolate where the corruption is coming from.

Sergey Cheremencev added a comment - 24/Sep/24 3:06 PM - edited

adilger , I haven't looked yet into the exact failure you pointed. I've analyzed at least 10 different crash dumps gathered on my local testing system and didn't find corrupted address in lustre malloc/free logs. Probably there was a problem with setting appropriate debug level, I'm not sure yet. Continue to do that. I will also take a look at the logs from our test patch in gerrit, but don't have big expectations as in my local testing it didn't give results.

Sergey Cheremencev added a comment - 24/Sep/24 3:06 PM - edited adilger , I haven't looked yet into the exact failure you pointed. I've analyzed at least 10 different crash dumps gathered on my local testing system and didn't find corrupted address in lustre malloc/free logs. Probably there was a problem with setting appropriate debug level, I'm not sure yet. Continue to do that. I will also take a look at the logs from our test patch in gerrit, but don't have big expectations as in my local testing it didn't give results.

Sergey Cheremencev added a comment - 24/Sep/24 3:00 PM

=============================================================================
[  449.857657] BUG kmalloc-64 (Tainted: G           OE  ------------  ): Poison overwritten
[  449.857686] -----------------------------------------------------------------------------
               
[  449.857730] Disabling lock debugging due to kernel taint
[  449.857732] INFO: 0xffff947c86f9ac88-0xffff947c86f9ac9b. First byte 0x0 instead of 0x6b
[  449.857778] INFO: Allocated in cl_key_init+0x20/0xd0 [obdclass] age=10 cpu=5 pid=4912
[  449.857798]  __slab_alloc+0x40/0x5c
[  449.857810]  kmem_cache_alloc_trace+0x1a7/0x200
[  449.857842]  cl_key_init+0x20/0xd0 [obdclass]
[  449.857880]  keys_fill+0x96/0x130 [obdclass]
[  449.857910]  lu_context_init+0xd3/0x1f0 [obdclass]
[  449.858039]  lu_env_init+0x1a/0x30 [obdclass]
[  449.858085]  class_process_config+0x2007/0x27e0 [obdclass]
[  449.858130]  class_config_llog_handler+0x807/0x13d0 [obdclass]
[  449.858168]  llog_process_thread+0xc44/0x1c20 [obdclass]
[  449.858203]  llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
[  449.858221]  kthread+0xd1/0xe0
[  449.858231]  ret_from_fork_nospec_end+0x0/0x39
[  449.858270] INFO: Freed in cl_key_fini+0x5b/0xd0 [obdclass] age=11 cpu=5 pid=4912
[  449.858288]  kfree+0x106/0x140
[  449.858313]  cl_key_fini+0x5b/0xd0 [obdclass]
[  449.858378]  key_fini+0x53/0x170 [obdclass]
[  449.858409]  lu_context_fini+0x4d/0x230 [obdclass]
[  449.858438]  lu_env_fini+0x1a/0x30 [obdclass]
[  449.858481]  class_process_config+0x2035/0x27e0 [obdclass]
[  449.858512]  class_config_llog_handler+0x807/0x13d0 [obdclass]
[  449.858541]  llog_process_thread+0xc44/0x1c20 [obdclass]
[  449.858569]  llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
[  449.858585]  kthread+0xd1/0xe0
[  449.858595]  ret_from_fork_nospec_end+0x0/0x39
[  449.858607] INFO: Slab 0xfffffcacc31be680 objects=20 used=19 fp=0xffff947c86f9bdb8 flags=0x1fffff00004080
[  449.859054] INFO: Object 0xffff947c86f9ac88 @offset=3208 fp=0xffff947c86f9b5e8
[  449.859885] Redzone ffff947c86f9ac80: bb bb bb bb bb bb bb bb                          ........
[  449.860363] Object ffff947c86f9ac88: 00 02 03 e7 c0 a8 01 2c 00 00 00 00 00 00 00 00  .......,........
[  449.861551] Object ffff947c86f9ac98: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  ....kkkkkkkkkkkk
[  449.862412] Object ffff947c86f9aca8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  449.863428] Object ffff947c86f9acb8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
[  449.864402] Redzone ffff947c86f9acc8: bb bb bb bb bb bb bb bb                          ........
[  449.864821] Padding ffff947c86f9ae08: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
[  449.865310] CPU: 3 PID: 4890 Comm: mount.lustre Kdump: loaded Tainted: G    B      OE  ------------   3.10.0-1160.49.1.el7_lustre.x86_64 #1
[  449.865311] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  449.865313] Call Trace:
[  449.865318]  [<ffffffff87983539>] dump_stack+0x19/0x1b
[  449.865322]  [<ffffffff87424291>] print_trailer+0x161/0x280
[  449.865324]  [<ffffffff8742451f>] check_bytes_and_report+0xcf/0x110
[  449.865326]  [<ffffffff87425127>] check_object+0x257/0x2a0
[  449.865329]  [<ffffffff8759a56e>] ? kasprintf+0x4e/0x70
[  449.865331]  [<ffffffff8797fb2a>] alloc_debug_processing+0x92/0x11d
[  449.865334]  [<ffffffff874280cd>] ___slab_alloc+0x4dd/0x520
[  449.865335]  [<ffffffff87423510>] ? set_track+0x70/0x1d0
[  449.865337]  [<ffffffff8759a56e>] ? kasprintf+0x4e/0x70
[  449.865339]  [<ffffffff87427d62>] ? ___slab_alloc+0x172/0x520
[  449.865342]  [<ffffffff8759a56e>] ? kasprintf+0x4e/0x70
[  449.865344]  [<ffffffff8797fe65>] __slab_alloc+0x40/0x5c
[  449.865346]  [<ffffffff8742b091>] __kmalloc_track_caller+0x1c1/0x240
[  449.865348]  [<ffffffff8759a56e>] ? kasprintf+0x4e/0x70
[  449.865355]  [<ffffffff8759a4e1>] kvasprintf+0x61/0xa0
[  449.865357]  [<ffffffff8759a56e>] kasprintf+0x4e/0x70
[  449.865359]  [<ffffffff87589a03>] ? kobject_get_path+0xa3/0x100
[  449.865383]  [<ffffffffc07cd4c0>] class_modify_config+0x2f0/0x470 [obdclass]
[  449.865404]  [<ffffffffc07e4147>] ? keys_fill+0xe7/0x130 [obdclass]
[  449.865411]  [<ffffffffc0db4a63>] mdc_process_config+0x23/0x30 [mdc]
[  449.865440]  [<ffffffffc07d61f9>] class_process_config+0x2029/0x27e0 [obdclass]
[  449.865447]  [<ffffffffc0a5863f>] mgc_apply_recover_logs+0xbbf/0x16f0 [mgc]
[  449.865452]  [<ffffffffc0a59d6c>] mgc_process_recover_log+0xbfc/0xdc0 [mgc]
[  449.865456]  [<ffffffffc0a5bbe9>] mgc_process_log+0x7f9/0xeb0 [mgc]
[  449.865459]  [<ffffffffc0a5dfc2>] mgc_process_config+0xac2/0xd60 [mgc]
[  449.865484]  [<ffffffffc07c44f8>] ? lprocfs_counter_add+0xf8/0x1c0 [obdclass]
[  449.865517]  [<ffffffffc07dbf64>] lustre_process_log+0x2f4/0xb50 [obdclass]
[  449.865520]  [<ffffffff8758b0ab>] ? kobject_uevent+0xb/0x10
[  449.865522]  [<ffffffff8758a4b6>] ? kset_register+0x56/0x70
[  449.865538]  [<ffffffffc12b49a6>] ll_fill_super+0x8a6/0x10f0 [lustre]
[  449.865557]  [<ffffffffc07dda5c>] ? lustre_start_mgc+0x27c/0x2510 [obdclass]
[  449.865574]  [<ffffffffc07afeda>] ? obd_zombie_barrier+0x3a/0xc0 [obdclass]
[  449.865588]  [<ffffffffc12e21ad>] lustre_fill_super+0x3ad/0x4d0 [lustre]
[  449.865600]  [<ffffffffc12e1e00>] ? ll_alloc_inode+0x140/0x140 [lustre]
[  449.865603]  [<ffffffff8745249f>] mount_nodev+0x4f/0xb0
[  449.865615]  [<ffffffffc12e1aa8>] lustre_mount+0x18/0x20 [lustre]
[  449.865617]  [<ffffffff8745301e>] mount_fs+0x3e/0x1b0
[  449.865620]  [<ffffffff87471a87>] vfs_kern_mount+0x67/0x110
[  449.865622]  [<ffffffff874741bf>] do_mount+0x1ef/0xd00
[  449.865625]  [<ffffffff87429377>] ? kmem_cache_alloc_trace+0x1a7/0x200
[  449.865627]  [<ffffffff87475013>] SyS_mount+0x83/0xd0
[  449.865630]  [<ffffffff87995f92>] system_call_fastpath+0x25/0x2a
[  449.865633]  [<ffffffff87995ed5>] ? system_call_after_swapgs+0xa2/0x13a
[  449.865635] FIX kmalloc-64: Restoring 0xffff947c86f9ac88-0xffff947c86f9ac9b=0x6b

That is what I regularly have in my testing. I do testing with v2_15_58-45-ge4d2d4ff74 on a client side and 2.14 on server.
I've tried to apply https://review.whamcloud.com/c/fs/lustre-release/+/56493 at 2.15.58-45 but it still fails. Probably I missed something when did porting, here is my patch:

--- a/lustre/mgc/mgc_request.c
+++ b/lustre/mgc/mgc_request.c
@@ -1403,8 +1403,10 @@ fail:;
                               libcfs_nidstr(&nidlist[0]), rc);
 
                        /* For old NID format case the nidlist was allocated. */
-                       if (entry->mne_nid_type == 0)
+                       if (entry->mne_nid_type == 0) {
                                OBD_FREE_PTR_ARRAY(nidlist, entry->mne_nid_count);
+                               nidlist = NULL;
+                       }
                        break;
                }
 
@@ -1438,8 +1440,10 @@ fail:;
                /* continue, even one with error */
 free_nids:
                /* For old NID format case the nidlist was allocated. */
-               if (entry->mne_nid_type == 0)
+               if (entry->mne_nid_type == 0) {
                        OBD_FREE_PTR_ARRAY(nidlist, entry->mne_nid_count);
+                       nidlist = NULL;
+               }
        }

There is a small difference with the latest code in master, so I've fixed just 2 places instead of 3 in comparing with origin 56493. Could it be the reason it still fails in my testing?

Continue investigation.

Sergey Cheremencev added a comment - 24/Sep/24 3:00 PM ============================================================================= [ 449.857657] BUG kmalloc-64 (Tainted: G OE ------------ ): Poison overwritten [ 449.857686] ----------------------------------------------------------------------------- [ 449.857730] Disabling lock debugging due to kernel taint [ 449.857732] INFO: 0xffff947c86f9ac88-0xffff947c86f9ac9b. First byte 0x0 instead of 0x6b [ 449.857778] INFO: Allocated in cl_key_init+0x20/0xd0 [obdclass] age=10 cpu=5 pid=4912 [ 449.857798] __slab_alloc+0x40/0x5c [ 449.857810] kmem_cache_alloc_trace+0x1a7/0x200 [ 449.857842] cl_key_init+0x20/0xd0 [obdclass] [ 449.857880] keys_fill+0x96/0x130 [obdclass] [ 449.857910] lu_context_init+0xd3/0x1f0 [obdclass] [ 449.858039] lu_env_init+0x1a/0x30 [obdclass] [ 449.858085] class_process_config+0x2007/0x27e0 [obdclass] [ 449.858130] class_config_llog_handler+0x807/0x13d0 [obdclass] [ 449.858168] llog_process_thread+0xc44/0x1c20 [obdclass] [ 449.858203] llog_process_thread_daemonize+0xa4/0xe0 [obdclass] [ 449.858221] kthread+0xd1/0xe0 [ 449.858231] ret_from_fork_nospec_end+0x0/0x39 [ 449.858270] INFO: Freed in cl_key_fini+0x5b/0xd0 [obdclass] age=11 cpu=5 pid=4912 [ 449.858288] kfree+0x106/0x140 [ 449.858313] cl_key_fini+0x5b/0xd0 [obdclass] [ 449.858378] key_fini+0x53/0x170 [obdclass] [ 449.858409] lu_context_fini+0x4d/0x230 [obdclass] [ 449.858438] lu_env_fini+0x1a/0x30 [obdclass] [ 449.858481] class_process_config+0x2035/0x27e0 [obdclass] [ 449.858512] class_config_llog_handler+0x807/0x13d0 [obdclass] [ 449.858541] llog_process_thread+0xc44/0x1c20 [obdclass] [ 449.858569] llog_process_thread_daemonize+0xa4/0xe0 [obdclass] [ 449.858585] kthread+0xd1/0xe0 [ 449.858595] ret_from_fork_nospec_end+0x0/0x39 [ 449.858607] INFO: Slab 0xfffffcacc31be680 objects=20 used=19 fp=0xffff947c86f9bdb8 flags=0x1fffff00004080 [ 449.859054] INFO: Object 0xffff947c86f9ac88 @offset=3208 fp=0xffff947c86f9b5e8 [ 449.859885] Redzone ffff947c86f9ac80: bb bb bb bb bb bb bb bb ........ [ 449.860363] Object ffff947c86f9ac88: 00 02 03 e7 c0 a8 01 2c 00 00 00 00 00 00 00 00 .......,........ [ 449.861551] Object ffff947c86f9ac98: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ....kkkkkkkkkkkk [ 449.862412] Object ffff947c86f9aca8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 449.863428] Object ffff947c86f9acb8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. [ 449.864402] Redzone ffff947c86f9acc8: bb bb bb bb bb bb bb bb ........ [ 449.864821] Padding ffff947c86f9ae08: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ [ 449.865310] CPU: 3 PID: 4890 Comm: mount.lustre Kdump: loaded Tainted: G B OE ------------ 3.10.0-1160.49.1.el7_lustre.x86_64 #1 [ 449.865311] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 449.865313] Call Trace: [ 449.865318] [<ffffffff87983539>] dump_stack+0x19/0x1b [ 449.865322] [<ffffffff87424291>] print_trailer+0x161/0x280 [ 449.865324] [<ffffffff8742451f>] check_bytes_and_report+0xcf/0x110 [ 449.865326] [<ffffffff87425127>] check_object+0x257/0x2a0 [ 449.865329] [<ffffffff8759a56e>] ? kasprintf+0x4e/0x70 [ 449.865331] [<ffffffff8797fb2a>] alloc_debug_processing+0x92/0x11d [ 449.865334] [<ffffffff874280cd>] ___slab_alloc+0x4dd/0x520 [ 449.865335] [<ffffffff87423510>] ? set_track+0x70/0x1d0 [ 449.865337] [<ffffffff8759a56e>] ? kasprintf+0x4e/0x70 [ 449.865339] [<ffffffff87427d62>] ? ___slab_alloc+0x172/0x520 [ 449.865342] [<ffffffff8759a56e>] ? kasprintf+0x4e/0x70 [ 449.865344] [<ffffffff8797fe65>] __slab_alloc+0x40/0x5c [ 449.865346] [<ffffffff8742b091>] __kmalloc_track_caller+0x1c1/0x240 [ 449.865348] [<ffffffff8759a56e>] ? kasprintf+0x4e/0x70 [ 449.865355] [<ffffffff8759a4e1>] kvasprintf+0x61/0xa0 [ 449.865357] [<ffffffff8759a56e>] kasprintf+0x4e/0x70 [ 449.865359] [<ffffffff87589a03>] ? kobject_get_path+0xa3/0x100 [ 449.865383] [<ffffffffc07cd4c0>] class_modify_config+0x2f0/0x470 [obdclass] [ 449.865404] [<ffffffffc07e4147>] ? keys_fill+0xe7/0x130 [obdclass] [ 449.865411] [<ffffffffc0db4a63>] mdc_process_config+0x23/0x30 [mdc] [ 449.865440] [<ffffffffc07d61f9>] class_process_config+0x2029/0x27e0 [obdclass] [ 449.865447] [<ffffffffc0a5863f>] mgc_apply_recover_logs+0xbbf/0x16f0 [mgc] [ 449.865452] [<ffffffffc0a59d6c>] mgc_process_recover_log+0xbfc/0xdc0 [mgc] [ 449.865456] [<ffffffffc0a5bbe9>] mgc_process_log+0x7f9/0xeb0 [mgc] [ 449.865459] [<ffffffffc0a5dfc2>] mgc_process_config+0xac2/0xd60 [mgc] [ 449.865484] [<ffffffffc07c44f8>] ? lprocfs_counter_add+0xf8/0x1c0 [obdclass] [ 449.865517] [<ffffffffc07dbf64>] lustre_process_log+0x2f4/0xb50 [obdclass] [ 449.865520] [<ffffffff8758b0ab>] ? kobject_uevent+0xb/0x10 [ 449.865522] [<ffffffff8758a4b6>] ? kset_register+0x56/0x70 [ 449.865538] [<ffffffffc12b49a6>] ll_fill_super+0x8a6/0x10f0 [lustre] [ 449.865557] [<ffffffffc07dda5c>] ? lustre_start_mgc+0x27c/0x2510 [obdclass] [ 449.865574] [<ffffffffc07afeda>] ? obd_zombie_barrier+0x3a/0xc0 [obdclass] [ 449.865588] [<ffffffffc12e21ad>] lustre_fill_super+0x3ad/0x4d0 [lustre] [ 449.865600] [<ffffffffc12e1e00>] ? ll_alloc_inode+0x140/0x140 [lustre] [ 449.865603] [<ffffffff8745249f>] mount_nodev+0x4f/0xb0 [ 449.865615] [<ffffffffc12e1aa8>] lustre_mount+0x18/0x20 [lustre] [ 449.865617] [<ffffffff8745301e>] mount_fs+0x3e/0x1b0 [ 449.865620] [<ffffffff87471a87>] vfs_kern_mount+0x67/0x110 [ 449.865622] [<ffffffff874741bf>] do_mount+0x1ef/0xd00 [ 449.865625] [<ffffffff87429377>] ? kmem_cache_alloc_trace+0x1a7/0x200 [ 449.865627] [<ffffffff87475013>] SyS_mount+0x83/0xd0 [ 449.865630] [<ffffffff87995f92>] system_call_fastpath+0x25/0x2a [ 449.865633] [<ffffffff87995ed5>] ? system_call_after_swapgs+0xa2/0x13a [ 449.865635] FIX kmalloc-64: Restoring 0xffff947c86f9ac88-0xffff947c86f9ac9b=0x6b That is what I regularly have in my testing. I do testing with v2_15_58-45-ge4d2d4ff74 on a client side and 2.14 on server. I've tried to apply https://review.whamcloud.com/c/fs/lustre-release/+/56493 at 2.15.58-45 but it still fails. Probably I missed something when did porting, here is my patch: --- a/lustre/mgc/mgc_request.c +++ b/lustre/mgc/mgc_request.c @@ -1403,8 +1403,10 @@ fail:; libcfs_nidstr(&nidlist[0]), rc); /* For old NID format case the nidlist was allocated. */ - if (entry->mne_nid_type == 0) + if (entry->mne_nid_type == 0) { OBD_FREE_PTR_ARRAY(nidlist, entry->mne_nid_count); + nidlist = NULL; + } break; } @@ -1438,8 +1440,10 @@ fail:; /* continue, even one with error */ free_nids: /* For old NID format case the nidlist was allocated. */ - if (entry->mne_nid_type == 0) + if (entry->mne_nid_type == 0) { OBD_FREE_PTR_ARRAY(nidlist, entry->mne_nid_count); + nidlist = NULL; + } } There is a small difference with the latest code in master, so I've fixed just 2 places instead of 3 in comparing with origin 56493. Could it be the reason it still fails in my testing? Continue investigation.

Gerrit Updater added a comment - 24/Sep/24 1:59 PM

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56493
Subject: ~~LU-18049~~ mgc: clear nidlist pointer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bc9cf03d2a2d96d862b8ff0686eb4fd5d796b0b9

Gerrit Updater added a comment - 24/Sep/24 1:59 PM "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56493 Subject: LU-18049 mgc: clear nidlist pointer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bc9cf03d2a2d96d862b8ff0686eb4fd5d796b0b9

Andreas Dilger added a comment - 24/Sep/24 10:08 AM

The 2.15 clients is receiving mgs_nidtbl_entry with large NIDs (20 bytes) instead of the older 64bit NID values so its over flowing the memory. Old clients expect 64 bit NID values only.

simmonsja this is the opposite case. This is a new client (patch v2_15_58-45-ge4d2d4ff74 and later) running with 2.14 or 2.15 servers, so the older servers should definitely not sending large NIDs (nor could they even understand IPv6 NIDs).

Andreas Dilger added a comment - 24/Sep/24 10:08 AM The 2.15 clients is receiving mgs_nidtbl_entry with large NIDs (20 bytes) instead of the older 64bit NID values so its over flowing the memory. Old clients expect 64 bit NID values only. simmonsja this is the opposite case. This is a new client (patch v2_15_58-45-ge4d2d4ff74 and later) running with 2.14 or 2.15 servers, so the older servers should definitely not sending large NIDs (nor could they even understand IPv6 NIDs).

James A Simmons added a comment - 24/Sep/24 9:39 AM

I believe I know what is happening. The 2.15 clients is receiving mgs_nidtbl_entry with large NIDs (20 bytes) instead of the older 64bit NID values so its over flowing the memory. Old clients expect 64 bit NID values only. The question is why the MGS is sending the large NID to the client mgc. In master branch mgc_process_recovery_log() fills in struct mgs_config_body to send to the MGS and we get back struct mgs_config_res which contains the NID info. When creating struct mgs_config_body we set the field mcb_rec_nid_size to the NID size which tells the MGS to send large NIDs. The idea for older clients was mcb_rec_nid_size was originally mcb_nm_cur_pass which should only be set for nodemap logs, not recovery logs which in that case should be zero which tells newer MGS to send 64 bit NID size instead of large NIDs. Also in master recovery logs and nodemap logs are handled separately. That is not the case for 2.15 clients. I suspect mcb_nm_cur_pass is being set for recovery logs as well on 2.15 clients so MGS running master see mcb_nm_cur_pass for recovery log as signal to use large NIDs. Sergey can you confirm this is what is happening.

If this is happening as a work around we should check mcb_cur_pass == sizeof(struct lnet_nid). If not set to zero. That makes the bug less likely but not 100%. If mcb_nm_cur_pass is being set for recovery log case on client we need patch to set it always to zero. That is proper fix.

James A Simmons added a comment - 24/Sep/24 9:39 AM I believe I know what is happening. The 2.15 clients is receiving mgs_nidtbl_entry with large NIDs (20 bytes) instead of the older 64bit NID values so its over flowing the memory. Old clients expect 64 bit NID values only. The question is why the MGS is sending the large NID to the client mgc. In master branch mgc_process_recovery_log() fills in struct mgs_config_body to send to the MGS and we get back struct mgs_config_res which contains the NID info. When creating struct mgs_config_body we set the field mcb_rec_nid_size to the NID size which tells the MGS to send large NIDs. The idea for older clients was mcb_rec_nid_size was originally mcb_nm_cur_pass which should only be set for nodemap logs, not recovery logs which in that case should be zero which tells newer MGS to send 64 bit NID size instead of large NIDs. Also in master recovery logs and nodemap logs are handled separately. That is not the case for 2.15 clients. I suspect mcb_nm_cur_pass is being set for recovery logs as well on 2.15 clients so MGS running master see mcb_nm_cur_pass for recovery log as signal to use large NIDs. Sergey can you confirm this is what is happening. If this is happening as a work around we should check mcb_cur_pass == sizeof(struct lnet_nid). If not set to zero. That makes the bug less likely but not 100%. If mcb_nm_cur_pass is being set for recovery log case on client we need patch to set it always to zero. That is proper fix.

Andreas Dilger added a comment - 24/Sep/24 12:35 AM

Did you look at the +malloc debug logs to see if the corrupted address matches something in the mgc functions modified by that patch? It may be that some extra CDEBUG(D_MALLOC, ...) lines to print the addresses of the fields in that function.

Andreas Dilger added a comment - 24/Sep/24 12:35 AM Did you look at the +malloc debug logs to see if the corrupted address matches something in the mgc functions modified by that patch? It may be that some extra CDEBUG(D_MALLOC, ...) lines to print the addresses of the fields in that function.

Sergey Cheremencev added a comment - 23/Sep/24 3:37 PM - edited

I could reproduce it on my local testing system. It has the a different back trace, but I believe still the same sympthoms. It failed just after several runs of sanity-sec_31 in a cycle(client v2_15_58-46 + server 2.14).

[51444.997787] Lustre: Unmounted lustre-client
[51445.513711] LNet: Removed LNI 192.168.1.82@tcp999
[51448.012522] general protection fault: 0000 [#1] SMP 
[51448.013216] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey loop jbd2 mbcache sunrpc iosf_mbi crc32_pclmul snd_intel8x0 ghash_clmulni_intel snd_ac97_codec ac97_bus ppdev snd_seq snd_seq_device aesni_intel lrw snd_pcm gf128mul sg glue_helper ablk_helper cryptd pcspkr snd_timer snd parport_pc parport video soundcore i2c_piix4 ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ttm libahci ata_piix drm libata crct10dif_pclmul crct10dif_common e1000 crc32c_intel serio_raw drm_panel_orientation_quirks dm_mirror
[51448.019903]  dm_region_hash dm_log dm_mod [last unloaded: libcfs]
[51448.021331] CPU: 1 PID: 9863 Comm: systemd-udevd Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.49.1.el7_lustre.x86_64 #1
[51448.023377] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[51448.024086] task: ffff9991a021e300 ti: ffff99918462c000 task.ti: ffff99918462c000
[51448.024758] RIP: 0010:[<ffffffffbbe02f88>]  [<ffffffffbbe02f88>] unlink_anon_vmas+0x58/0x1b0
[51448.028746] RSP: 0018:ffff99918462fd18  EFLAGS: 00010202
[51448.029501] RAX: ffff999135dfdaf0 RBX: 2c01a8c0e70301f0 RCX: 0000000000000000
[51448.030109] RDX: 0000000000000000 RSI: ffff9991b4e76300 RDI: ffff99917d1e9540
[51448.030835] RBP: ffff99918462fd58 R08: 00007fec8e4bd000 R09: 0000000000000000
[51448.031478] R10: ffff99917d1e9560 R11: ffff99917d1e9b80 R12: ffff99917d1e9540
[51448.032120] R13: ffff99918f2fbdf8 R14: ffff999135dfdaf0 R15: ffff9991b4e762d0
[51448.032907] FS:  0000000000000000(0000) GS:ffff9991bc040000(0000) knlGS:0000000000000000
[51448.033444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[51448.034145] CR2: 00007fec8cb0e504 CR3: 00000000a389a000 CR4: 00000000000606e0
[51448.034655] Call Trace:
[51448.036556]  [<ffffffffbbdf0ac8>] free_pgtables+0xa8/0x120
[51448.037670]  [<ffffffffbbdfd656>] exit_mmap+0xc6/0x1a0
[51448.038670]  [<ffffffffbbc97b87>] mmput+0x67/0xf0
[51448.039169]  [<ffffffffbbca1908>] do_exit+0x288/0xa30
[51448.041924]  [<ffffffffbc390678>] ? __do_page_fault+0x238/0x500
[51448.042947]  [<ffffffffbc395ed5>] ? system_call_after_swapgs+0xa2/0x13a
[51448.043398]  [<ffffffffbbca212f>] do_group_exit+0x3f/0xa0
[51448.044037]  [<ffffffffbbca21a4>] SyS_exit_group+0x14/0x20
[51448.044449]  [<ffffffffbc395f92>] system_call_fastpath+0x25/0x2a
[51448.044994]  [<ffffffffbc395ed5>] ? system_call_after_swapgs+0xa2/0x13a
[51448.045429] Code: 67 f0 4c 89 7d c8 48 8d 59 f0 75 3a e9 82 00 00 00 66 90 48 8b 7d c8 e8 47 3a 1a 00 48 8b 3d b0 3d fc 00 4c 89 e6 e8 b8 3e 02 00 <48> 8b 43 10 48 8d 4b 10 48 89 4d c8 48 83 e8 10 4c 39 e9 74 53 
[51448.046774] RIP  [<ffffffffbbe02f88>] unlink_anon_vmas+0x58/0x1b0
[51448.047162]  RSP <ffff99918462fd18>

Here is the problem In RBX. There should be a valid address instead of 2c01a8c0e70301f0. It is not hard to notice that 0x2c01a8c0 is a reversed IP 192.168.1.44(address of my server).

Sergey Cheremencev added a comment - 23/Sep/24 3:37 PM - edited I could reproduce it on my local testing system. It has the a different back trace, but I believe still the same sympthoms. It failed just after several runs of sanity-sec_31 in a cycle(client v2_15_58-46 + server 2.14). [51444.997787] Lustre: Unmounted lustre-client [51445.513711] LNet: Removed LNI 192.168.1.82@tcp999 [51448.012522] general protection fault: 0000 [#1] SMP [51448.013216] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey loop jbd2 mbcache sunrpc iosf_mbi crc32_pclmul snd_intel8x0 ghash_clmulni_intel snd_ac97_codec ac97_bus ppdev snd_seq snd_seq_device aesni_intel lrw snd_pcm gf128mul sg glue_helper ablk_helper cryptd pcspkr snd_timer snd parport_pc parport video soundcore i2c_piix4 ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ttm libahci ata_piix drm libata crct10dif_pclmul crct10dif_common e1000 crc32c_intel serio_raw drm_panel_orientation_quirks dm_mirror [51448.019903] dm_region_hash dm_log dm_mod [last unloaded: libcfs] [51448.021331] CPU: 1 PID: 9863 Comm: systemd-udevd Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.49.1.el7_lustre.x86_64 #1 [51448.023377] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [51448.024086] task: ffff9991a021e300 ti: ffff99918462c000 task.ti: ffff99918462c000 [51448.024758] RIP: 0010:[<ffffffffbbe02f88>] [<ffffffffbbe02f88>] unlink_anon_vmas+0x58/0x1b0 [51448.028746] RSP: 0018:ffff99918462fd18 EFLAGS: 00010202 [51448.029501] RAX: ffff999135dfdaf0 RBX: 2c01a8c0e70301f0 RCX: 0000000000000000 [51448.030109] RDX: 0000000000000000 RSI: ffff9991b4e76300 RDI: ffff99917d1e9540 [51448.030835] RBP: ffff99918462fd58 R08: 00007fec8e4bd000 R09: 0000000000000000 [51448.031478] R10: ffff99917d1e9560 R11: ffff99917d1e9b80 R12: ffff99917d1e9540 [51448.032120] R13: ffff99918f2fbdf8 R14: ffff999135dfdaf0 R15: ffff9991b4e762d0 [51448.032907] FS: 0000000000000000(0000) GS:ffff9991bc040000(0000) knlGS:0000000000000000 [51448.033444] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [51448.034145] CR2: 00007fec8cb0e504 CR3: 00000000a389a000 CR4: 00000000000606e0 [51448.034655] Call Trace: [51448.036556] [<ffffffffbbdf0ac8>] free_pgtables+0xa8/0x120 [51448.037670] [<ffffffffbbdfd656>] exit_mmap+0xc6/0x1a0 [51448.038670] [<ffffffffbbc97b87>] mmput+0x67/0xf0 [51448.039169] [<ffffffffbbca1908>] do_exit+0x288/0xa30 [51448.041924] [<ffffffffbc390678>] ? __do_page_fault+0x238/0x500 [51448.042947] [<ffffffffbc395ed5>] ? system_call_after_swapgs+0xa2/0x13a [51448.043398] [<ffffffffbbca212f>] do_group_exit+0x3f/0xa0 [51448.044037] [<ffffffffbbca21a4>] SyS_exit_group+0x14/0x20 [51448.044449] [<ffffffffbc395f92>] system_call_fastpath+0x25/0x2a [51448.044994] [<ffffffffbc395ed5>] ? system_call_after_swapgs+0xa2/0x13a [51448.045429] Code: 67 f0 4c 89 7d c8 48 8d 59 f0 75 3a e9 82 00 00 00 66 90 48 8b 7d c8 e8 47 3a 1a 00 48 8b 3d b0 3d fc 00 4c 89 e6 e8 b8 3e 02 00 <48> 8b 43 10 48 8d 4b 10 48 89 4d c8 48 83 e8 10 4c 39 e9 74 53 [51448.046774] RIP [<ffffffffbbe02f88>] unlink_anon_vmas+0x58/0x1b0 [51448.047162] RSP <ffff99918462fd18> Here is the problem In RBX. There should be a valid address instead of 2c01a8c0e70301f0. It is not hard to notice that 0x2c01a8c0 is a reversed IP 192.168.1.44(address of my server).

People

Assignee:: Sergey Cheremencev

Reporter:: Qian Yingjin

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 19/Jul/24 1:21 AM

Updated:: 02/Oct/24 6:19 PM

Resolved:: 30/Sep/24 4:08 PM