Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Arm64, v8.0, virtual machine, All in one node.
-
3
-
9223372036854775807
Description
All in one Arm64 node, with configuration setup as below:
mds_HOST="lustre-build.novalocal" MDSCOUNT=2 mds1_HOST=$mds_HOST MDSDEV1=/dev/vdb mds1_MOUNT=/mnt/mdtb mds2_HOST=$mds_HOST MDSDEV2=/dev/vdc mds2_MOUNT=/mnt/mdtc OSTCOUNT=2 ost_HOST="lustre-build.novalocal" ost1_HOST=$mds_HOST OSTDEV1=/dev/vdd ost1_MOUNT=/mnt/ost bost2_HOST=$mds_HOST OSTDEV2=/dev/vde ost2_MOUNT=/mnt/ostc
Setup 2 MDTs in this node.
After ./llmount.sh, then we run the command:
lfs mkdir -i1 -c2 -H crush /mnt/lustre/test.sanity
We will see the kernel oops, the dmesg output like this:
[67451.989655] Lustre: trans no 4294967299 committed transno 4294967299 [67451.994582] Lustre: NRS stop fifo request from 12345-0@lo, seq: 37 [67451.995916] Lustre: lustre-MDT0000-osp-MDT0001: committing for last_committed 4294967299 gen 2 [67451.998986] Lustre: Completed RPC req@000000002a9095b6 pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184768:0@lo:1000:osp_up0-1.0 [67452.002300] Lustre: ou 00000000edcca9a9 version 3 rpc_version 3 [67452.003592] Lustre: Sending RPC req@00000000900684ad pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184832:0@lo:1000:osp_up0-1.0 [67452.006792] Lustre: peer: 12345-0@lo (source: 12345-0@lo) [67452.007970] Lustre: set 000000003bee5318 going to sleep for 6 seconds [67452.007986] Lustre: got req x1718654554184832 [67452.010276] Lustre: NRS start fifo request from 12345-0@lo, seq: 38 [67452.011603] Lustre: Handling RPC req@0000000075731868 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out05_002:lustre-MDT0001-mdtlov_UUID+6:10524:x1718654554184832:12345-0@lo:1000:osp_up0-1.0 [67452.021774] Lustre: lustre-MDT0000: transno 4294967300 is committed [67452.023272] Lustre: Handled RPC req@0000000075731868 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out05_002:lustre-MDT0001-mdtlov_UUID+6:10524:x1718654554184832:12345-0@lo:1000:osp_up0-1.0 Request processed in 11665us (16473us total) trans 4294967300 rc 0/0 [67452.023284] Lustre: trans no 4294967300 committed transno 4294967300 [67452.027833] Lustre: NRS stop fifo request from 12345-0@lo, seq: 38 [67452.029139] Lustre: lustre-MDT0000-osp-MDT0001: committing for last_committed 4294967300 gen 2 [67452.032292] Lustre: Completed RPC req@00000000900684ad pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184832:0@lo:1000:osp_up0-1.0 [67452.035424] Lustre: ou 00000000edcca9a9 version 4 rpc_version 4 [67452.036812] Lustre: Sending RPC req@00000000c14ec545 pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184896:0@lo:1000:osp_up0-1.0 [67452.039895] Lustre: peer: 12345-0@lo (source: 12345-0@lo) [67452.041057] Lustre: set 0000000077eb37df going to sleep for 6 seconds [67452.041071] Lustre: got req x1718654554184896 [67452.043452] Lustre: NRS start fifo request from 12345-0@lo, seq: 39 [67452.044766] Lustre: Handling RPC req@000000008399fed6 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out05_002:lustre-MDT0001-mdtlov_UUID+6:10524:x1718654554184896:12345-0@lo:1000:osp_up0-1.0 [67452.049965] Lustre: Handled RPC req@000000008399fed6 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out05_002:lustre-MDT0001-mdtlov_UUID+7:10524:x1718654554184896:12345-0@lo:1000:osp_up0-1.0 Request processed in 5184us (10052us total) trans 4294967301 rc 0/0 [67452.050038] Lustre: Completed RPC req@00000000c14ec545 pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184896:0@lo:1000:osp_up0-1.0 [67452.054980] Lustre: NRS stop fifo request from 12345-0@lo, seq: 39 [67452.065583] Lustre: ### ldlm_lock_addref(PW) ns: mdt-lustre-MDT0001_UUID lock: 000000001c761af5/0xae74a97a4f2331dd lrc: 3/0,1 mode: --/PW res: [0x240000402:0x1:0x0].0x0 bits 0x0/0x0 rrc: 2 type: IBT gid 0 flags: 0x40000000000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0 [67452.070936] Lustre: ### About to add lock: ns: mdt-lustre-MDT0001_UUID lock: 000000001c761af5/0xae74a97a4f2331dd lrc: 3/0,1 mode: PW/PW res: [0x240000402:0x1:0x0].0x0 bits 0x2/0x0 rrc: 2 type: IBT gid 0 flags: 0x50210001000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0 [67452.076701] Lustre: ### client-side local enqueue handler, new lock created ns: mdt-lustre-MDT0001_UUID lock: 000000001c761af5/0xae74a97a4f2331dd lrc: 3/0,1 mode: PW/PW res: [0x240000402:0x1:0x0].0x0 bits 0x2/0x0 rrc: 2 type: IBT gid 0 flags: 0x40210001000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0 [67452.082993] Lustre: ### ldlm_lock_addref(PW) ns: mdt-lustre-MDT0001_UUID lock: 00000000324d3623/0xae74a97a4f2331e4 lrc: 3/0,1 mode: --/PW res: [0x240000400:0x2:0x0].0x0 bits 0x0/0x0 rrc: 2 type: IBT gid 0 flags: 0x40000000000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0 [67452.088849] Lustre: ### About to add lock: ns: mdt-lustre-MDT0001_UUID lock: 00000000324d3623/0xae74a97a4f2331e4 lrc: 3/0,1 mode: PW/PW res: [0x240000400:0x2:0x0].0x0 bits 0x2/0x0 rrc: 2 type: IBT gid 0 flags: 0x50210001000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0 [67452.094616] Lustre: ### client-side local enqueue handler, new lock created ns: mdt-lustre-MDT0001_UUID lock: 00000000324d3623/0xae74a97a4f2331e4 lrc: 3/0,1 mode: PW/PW res: [0x240000400:0x2:0x0].0x0 bits 0x2/0x0 rrc: 2 type: IBT gid 0 flags: 0x40210001000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0 [67452.100790] Unable to handle kernel paging request at virtual address ffffb6d6a5c60804 [67452.102606] Mem abort info: [67452.103219] ESR = 0x96000021 [67452.103865] Exception class = DABT (current EL), IL = 32 bits [67452.105141] SET = 0, FnV = 0 [67452.105816] EA = 0, S1PTW = 0 [67452.106492] Data abort info: [67452.107096] ISV = 0, ISS = 0x00000021 [67452.107912] CM = 0, WnR = 0 [67452.108564] swapper pgtable: 64k pages, 48-bit VAs, pgdp = 000000008ce20289 [67452.110150] [ffffb6d6a5c60804] pgd=000000083ffd0003, pud=000000083ffd0003, pmd=000000083ff30003, pte=00e8000165c60f13 [67452.112534] Internal error: Oops: 96000021 [#1] SMP [67452.113564] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) mbcache jbd2 lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) crc32_generic libcfs(OE) dm_flakey vfat fat virtio_gpu crct10dif_ce drm_kms_helper ghash_ce sha2_ce drm sha256_arm64 fb_sys_fops syscopyarea sysfillrect sha1_ce sysimgblt virtio_balloon binfmt_misc xfs libcrc32c virtio_net net_failover virtio_blk failover virtio_mmio sunrpc dm_mirror dm_region_hash dm_log dm_mod [67452.124678] CPU: 3 PID: 10246 Comm: mdt01_002 Kdump: loaded Tainted: G W OE --------- - - 4.18.0-348.2.1.el8_lustre_debug_debug.aarch64 #1 [67452.127604] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [67452.129116] pstate: 10000005 (nzcV daif -PAN -UAO) [67452.130187] pc : __ll_sc_atomic64_or+0x4/0x18 [67452.131178] lr : lod_object_lock+0x81c/0x15c0 [lod] [67452.131804] Lustre: Sending RPC req@00000000880f234c pname:cluuid:pid:xid:nid:opc:job ptlrpcd_06_00:lustre-MDT0001-mdtlov_UUID:8379:1718654554184960:0@lo:41:osp-pre-0-1.0 [67452.132273] sp : ffffb6d68a5a7270 [67452.135807] Lustre: peer: 12345-0@lo (source: 12345-0@lo) [67452.136435] x29: ffffb6d68a5a72b0 x28: ffff20000ae70280 [67452.137672] Lustre: got req x1718654554184960 [67452.138704] x27: ffff20007517e888 x26: 0000000000000001 [67452.139760] Lustre: NRS start fifo request from 12345-0@lo, seq: 67 [67452.140824] x25: 0000000000000001 x24: 0000000000000001 [67452.142263] Lustre: Handling RPC req@00000000ce02a9f7 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out06_002:lustre-MDT0001-mdtlov_UUID+7:8379:x1718654554184960:12345-0@lo:41:osp-pre-0-1.0 [67452.143359] x23: ffffb6d80566a150 x22: ffffb6d6a766e150 [67452.146813] Lustre: blocks cached 0 granted 2146304 pending 0 free 126251008 avail 114745344 [67452.147920] x21: ffffb6d6bc9ca7d0 x20: ffffb6d8056415c8 [67452.149808] Lustre: Handled RPC req@00000000ce02a9f7 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out06_002:lustre-MDT0001-mdtlov_UUID+7:8379:x1718654554184960:12345-0@lo:41:osp-pre-0-1.0 Request processed in 7546us (13999us total) trans 0 rc 0/0 [67452.149851] Lustre: Completed RPC req@00000000880f234c pname:cluuid:pid:xid:nid:opc:job ptlrpcd_06_00:lustre-MDT0001-mdtlov_UUID:8379:1718654554184960:0@lo:41:osp-pre-0-1.0 [67452.150828] x19: 0000000000000008 x18: 0000000000000000 [67452.155565] Lustre: NRS stop fifo request from 12345-0@lo, seq: 67 [67452.158845] x17: 0000000000000000 x16: ffff200072dfc718 [67452.162462] x15: dfff200000000000 x14: 636f6c203a64696e [67452.163623] x13: 0000000000000000 x12: ffff16dad4ecdc2e [67452.164801] x11: 1ffff6dad4ecdc2d x10: ffff16dad4ecdc2d [67452.165982] x9 : 0000000000000000 x8 : 0000000000000000 [67452.167180] x7 : 1ffff6db00acd42a x6 : ffff16dad4ecdc2e [67452.168351] x5 : ffffb6d6a766e158 x4 : 0000000000000000 [67452.169529] x3 : 0000000000000000 x2 : 0000000000000000 [67452.170696] x1 : ffffb6d6a5c60804 x0 : 0000000000000002 [67452.171910] Process mdt01_002 (pid: 10246, stack limit = 0x000000006659ab27) [67452.173530] Call trace: [67452.174090] __ll_sc_atomic64_or+0x4/0x18 [67452.174994] mdd_object_lock+0xac/0x170 [mdd] [67452.175992] mdt_reint_striped_lock+0x494/0xf10 [mdt] [67452.177185] mdt_create+0x23c8/0x4818 [mdt] [67452.178150] mdt_reint_create+0x6c4/0xbb8 [mdt] [67452.179201] mdt_reint_rec+0x27c/0x708 [mdt] [67452.180176] mdt_reint_internal+0xbd4/0x2408 [mdt] [67452.181292] mdt_reint+0x190/0x378 [mdt] [67452.182314] tgt_handle_request0+0x238/0x1368 [ptlrpc] [67452.183587] tgt_request_handle+0x1364/0x3ec0 [ptlrpc] [67452.184872] ptlrpc_server_handle_request+0x9ec/0x28d0 [ptlrpc] [67452.186329] ptlrpc_main+0x1aa4/0x3f68 [ptlrpc] [67452.187371] kthread+0x3b0/0x460 [67452.188119] ret_from_fork+0x10/0x18 [67452.188955] Code: f84107fe d65f03c0 d503201f f9800031 (c85f7c31) [67452.190637] SMP: stopping secondary CPUs [67452.200205] Starting crashdump kernel... [67452.201386] Bye!
Attachments
Issue Links
- is related to
-
LU-10300 Can the Lustre 2.10.x clients support 64K kernel page?
- Resolved