Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.16.0
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/3576290d-064a-4573-a087-75b59fff6df7
test_25 failed with the following error:
trevis-106vm10, trevis-106vm11 crashed during ost-pools test_25
Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4542 - 5.14.0-362.24.1.el9_3.x86_64
servers: https://build.whamcloud.com/job/lustre-b_es6_0/666 - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64
Two clients both crashed in ext4_htree_store_dirent() (NOT ldiskfs) in kmalloc, so it looks like some kind of client-side memory corruption?
[27299.419062] Lustre: MGC10.240.44.44@tcp: Connection restored to (at 10.240.44.44@tcp) [27299.448580] LustreError: 886364:0:(client.c:3288:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff902e1402e3c0 x1802432488249280/t691489734702(691489734702) o101->lustre-MDT0000-mdc-ffff902e04449800@10.240.44.44@tcp:12/10 lens 520/608 e 0 to 0 dl 1718935667 ref 2 fl Interpret:RPQU/204/0 rc 301/301 job:'lfs.0' uid:0 gid:0 [27300.013294] Lustre: lustre-MDT0000-mdc-ffff902e04449800: Connection restored to (at 10.240.44.44@tcp) [27305.358931] Lustre: 886365:0:(client.c:2334:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1718935641/real 1718935641] req@ffff902e547d9380 x1802432536672704/t0(0) o400->lustre-MDT0000-mdc-ffff902e04449800@10.240.44.44@tcp:12/10 lens 224/224 e 0 to 1 dl 1718935657 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 [27306.104959] BUG: unable to handle page fault for address: ffff902ee5338778 [27306.105697] #PF: supervisor read access in kernel mode [27306.106204] #PF: error_code(0x0000) - not-present page [27306.107607] CPU: 1 PID: 1109213 Comm: bash Kdump: loaded Tainted: G OE ------- --- 5.14.0-362.24.1.el9_3.x86_64 #1 [27306.108653] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [27306.109209] RIP: 0010:__kmalloc+0x11b/0x370 [27306.118475] Call Trace: [27306.118756] <TASK> [27306.120625] ? __die_body.cold+0x8/0xd [27306.121006] ? page_fault_oops+0x134/0x170 [27306.121437] ? kernelmode_fixup_or_oops+0x84/0x110 [27306.121944] ? exc_page_fault+0xa8/0x150 [27306.122371] ? asm_exc_page_fault+0x22/0x30 [27306.122806] ? ext4_htree_store_dirent+0x36/0x100 [ext4] [27306.123359] ? __kmalloc+0x11b/0x370 [27306.123740] ext4_htree_store_dirent+0x36/0x100 [ext4] [27306.124269] htree_dirblock_to_tree+0x1ab/0x310 [ext4] [27306.124809] ext4_htree_fill_tree+0x203/0x3b0 [ext4] [27306.125333] ext4_dx_readdir+0x10d/0x360 [ext4] [27306.125817] ext4_readdir+0x392/0x550 [ext4] [27306.126275] iterate_dir+0x17c/0x1c0 [27306.126711] __x64_sys_getdents64+0x80/0x120 [27306.128187] do_syscall_64+0x5c/0x90
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
ost-pools test_25 - trevis-106vm10, trevis-106vm11 crashed during ost-pools test_25
Attachments
Issue Links
- is related to
-
LU-18230 sanity-sec: crash in lov_connect/lmv_connect ->__kernfs_new_node->kstrdup->__kmalloc_track_caller
-
- Resolved
-
-
LU-16307 sanity-sec: test_31: export for 10.240.26.216@tcp on MGS should not exist
-
- Resolved
-
- is related to
-
LU-13306 allow clients to accept mgs_nidtbl_entry with IPv6 NIDs
-
- Resolved
-
- mentioned in
-
Page No Confluence page found with the given URL.
I believe I know what is happening. The 2.15 clients is receiving mgs_nidtbl_entry with large NIDs (20 bytes) instead of the older 64bit NID values so its over flowing the memory. Old clients expect 64 bit NID values only. The question is why the MGS is sending the large NID to the client mgc. In master branch mgc_process_recovery_log() fills in struct mgs_config_body to send to the MGS and we get back struct mgs_config_res which contains the NID info. When creating struct mgs_config_body we set the field mcb_rec_nid_size to the NID size which tells the MGS to send large NIDs. The idea for older clients was mcb_rec_nid_size was originally mcb_nm_cur_pass which should only be set for nodemap logs, not recovery logs which in that case should be zero which tells newer MGS to send 64 bit NID size instead of large NIDs. Also in master recovery logs and nodemap logs are handled separately. That is not the case for 2.15 clients. I suspect mcb_nm_cur_pass is being set for recovery logs as well on 2.15 clients so MGS running master see mcb_nm_cur_pass for recovery log as signal to use large NIDs. Sergey can you confirm this is what is happening.
If this is happening as a work around we should check mcb_cur_pass == sizeof(struct lnet_nid). If not set to zero. That makes the bug less likely but not 100%. If mcb_nm_cur_pass is being set for recovery log case on client we need patch to set it always to zero. That is proper fix.