Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.16.0
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/3576290d-064a-4573-a087-75b59fff6df7
test_25 failed with the following error:
trevis-106vm10, trevis-106vm11 crashed during ost-pools test_25
Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4542 - 5.14.0-362.24.1.el9_3.x86_64
servers: https://build.whamcloud.com/job/lustre-b_es6_0/666 - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64
Two clients both crashed in ext4_htree_store_dirent() (NOT ldiskfs) in kmalloc, so it looks like some kind of client-side memory corruption?
[27299.419062] Lustre: MGC10.240.44.44@tcp: Connection restored to (at 10.240.44.44@tcp) [27299.448580] LustreError: 886364:0:(client.c:3288:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff902e1402e3c0 x1802432488249280/t691489734702(691489734702) o101->lustre-MDT0000-mdc-ffff902e04449800@10.240.44.44@tcp:12/10 lens 520/608 e 0 to 0 dl 1718935667 ref 2 fl Interpret:RPQU/204/0 rc 301/301 job:'lfs.0' uid:0 gid:0 [27300.013294] Lustre: lustre-MDT0000-mdc-ffff902e04449800: Connection restored to (at 10.240.44.44@tcp) [27305.358931] Lustre: 886365:0:(client.c:2334:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1718935641/real 1718935641] req@ffff902e547d9380 x1802432536672704/t0(0) o400->lustre-MDT0000-mdc-ffff902e04449800@10.240.44.44@tcp:12/10 lens 224/224 e 0 to 1 dl 1718935657 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0 [27306.104959] BUG: unable to handle page fault for address: ffff902ee5338778 [27306.105697] #PF: supervisor read access in kernel mode [27306.106204] #PF: error_code(0x0000) - not-present page [27306.107607] CPU: 1 PID: 1109213 Comm: bash Kdump: loaded Tainted: G OE ------- --- 5.14.0-362.24.1.el9_3.x86_64 #1 [27306.108653] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [27306.109209] RIP: 0010:__kmalloc+0x11b/0x370 [27306.118475] Call Trace: [27306.118756] <TASK> [27306.120625] ? __die_body.cold+0x8/0xd [27306.121006] ? page_fault_oops+0x134/0x170 [27306.121437] ? kernelmode_fixup_or_oops+0x84/0x110 [27306.121944] ? exc_page_fault+0xa8/0x150 [27306.122371] ? asm_exc_page_fault+0x22/0x30 [27306.122806] ? ext4_htree_store_dirent+0x36/0x100 [ext4] [27306.123359] ? __kmalloc+0x11b/0x370 [27306.123740] ext4_htree_store_dirent+0x36/0x100 [ext4] [27306.124269] htree_dirblock_to_tree+0x1ab/0x310 [ext4] [27306.124809] ext4_htree_fill_tree+0x203/0x3b0 [ext4] [27306.125333] ext4_dx_readdir+0x10d/0x360 [ext4] [27306.125817] ext4_readdir+0x392/0x550 [ext4] [27306.126275] iterate_dir+0x17c/0x1c0 [27306.126711] __x64_sys_getdents64+0x80/0x120 [27306.128187] do_syscall_64+0x5c/0x90
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
ost-pools test_25 - trevis-106vm10, trevis-106vm11 crashed during ost-pools test_25
Attachments
Issue Links
- is related to
-
LU-18230 sanity-sec: crash in lov_connect/lmv_connect ->__kernfs_new_node->kstrdup->__kmalloc_track_caller
- Resolved
-
LU-16307 sanity-sec: test_31: export for 10.240.26.216@tcp on MGS should not exist
- Resolved
- is related to
-
LU-13306 allow clients to accept mgs_nidtbl_entry with IPv6 NIDs
- Resolved
- mentioned in
-
Page Loading...