Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15364

Kernel oops when stripe on Arm64 Server end multiple MDTs

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.15.0
    • None
    • Arm64, v8.0, virtual machine, All in one node.
    • 3
    • 9223372036854775807

    Description

       All in one Arm64 node, with configuration setup as below:

       

      mds_HOST="lustre-build.novalocal"
      MDSCOUNT=2
      mds1_HOST=$mds_HOST
      MDSDEV1=/dev/vdb
      mds1_MOUNT=/mnt/mdtb
      mds2_HOST=$mds_HOST
      MDSDEV2=/dev/vdc
      mds2_MOUNT=/mnt/mdtc
      
      OSTCOUNT=2
      ost_HOST="lustre-build.novalocal"
      ost1_HOST=$mds_HOST
      OSTDEV1=/dev/vdd
      ost1_MOUNT=/mnt/ost
      bost2_HOST=$mds_HOST
      OSTDEV2=/dev/vde
      ost2_MOUNT=/mnt/ostc 

       

       

      Setup 2 MDTs in this node.

      After ./llmount.sh, then we run the command:

      lfs mkdir -i1 -c2 -H crush /mnt/lustre/test.sanity 

      We will see the kernel oops, the dmesg output like this:

      [67451.989655] Lustre: trans no 4294967299 committed transno 4294967299
      [67451.994582] Lustre: NRS stop fifo request from 12345-0@lo, seq: 37
      [67451.995916] Lustre: lustre-MDT0000-osp-MDT0001: committing for last_committed 4294967299 gen 2
      [67451.998986] Lustre: Completed RPC req@000000002a9095b6 pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184768:0@lo:1000:osp_up0-1.0
      [67452.002300] Lustre: ou 00000000edcca9a9 version 3 rpc_version 3
      [67452.003592] Lustre: Sending RPC req@00000000900684ad pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184832:0@lo:1000:osp_up0-1.0
      [67452.006792] Lustre: peer: 12345-0@lo (source: 12345-0@lo)
      [67452.007970] Lustre: set 000000003bee5318 going to sleep for 6 seconds
      [67452.007986] Lustre: got req x1718654554184832
      [67452.010276] Lustre: NRS start fifo request from 12345-0@lo, seq: 38
      [67452.011603] Lustre: Handling RPC req@0000000075731868 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out05_002:lustre-MDT0001-mdtlov_UUID+6:10524:x1718654554184832:12345-0@lo:1000:osp_up0-1.0
      [67452.021774] Lustre: lustre-MDT0000: transno 4294967300 is committed
      [67452.023272] Lustre: Handled RPC req@0000000075731868 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out05_002:lustre-MDT0001-mdtlov_UUID+6:10524:x1718654554184832:12345-0@lo:1000:osp_up0-1.0 Request processed in 11665us (16473us total) trans 4294967300 rc 0/0
      [67452.023284] Lustre: trans no 4294967300 committed transno 4294967300
      [67452.027833] Lustre: NRS stop fifo request from 12345-0@lo, seq: 38
      [67452.029139] Lustre: lustre-MDT0000-osp-MDT0001: committing for last_committed 4294967300 gen 2
      [67452.032292] Lustre: Completed RPC req@00000000900684ad pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184832:0@lo:1000:osp_up0-1.0
      [67452.035424] Lustre: ou 00000000edcca9a9 version 4 rpc_version 4
      [67452.036812] Lustre: Sending RPC req@00000000c14ec545 pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184896:0@lo:1000:osp_up0-1.0
      [67452.039895] Lustre: peer: 12345-0@lo (source: 12345-0@lo)
      [67452.041057] Lustre: set 0000000077eb37df going to sleep for 6 seconds
      [67452.041071] Lustre: got req x1718654554184896
      [67452.043452] Lustre: NRS start fifo request from 12345-0@lo, seq: 39
      [67452.044766] Lustre: Handling RPC req@000000008399fed6 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out05_002:lustre-MDT0001-mdtlov_UUID+6:10524:x1718654554184896:12345-0@lo:1000:osp_up0-1.0
      [67452.049965] Lustre: Handled RPC req@000000008399fed6 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out05_002:lustre-MDT0001-mdtlov_UUID+7:10524:x1718654554184896:12345-0@lo:1000:osp_up0-1.0 Request processed in 5184us (10052us total) trans 4294967301 rc 0/0
      [67452.050038] Lustre: Completed RPC req@00000000c14ec545 pname:cluuid:pid:xid:nid:opc:job osp_up0-1:lustre-MDT0001-mdtlov_UUID:10524:1718654554184896:0@lo:1000:osp_up0-1.0
      [67452.054980] Lustre: NRS stop fifo request from 12345-0@lo, seq: 39
      [67452.065583] Lustre: ### ldlm_lock_addref(PW) ns: mdt-lustre-MDT0001_UUID lock: 000000001c761af5/0xae74a97a4f2331dd lrc: 3/0,1 mode: --/PW res: [0x240000402:0x1:0x0].0x0 bits 0x0/0x0 rrc: 2 type: IBT gid 0 flags: 0x40000000000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0
      [67452.070936] Lustre: ### About to add lock: ns: mdt-lustre-MDT0001_UUID lock: 000000001c761af5/0xae74a97a4f2331dd lrc: 3/0,1 mode: PW/PW res: [0x240000402:0x1:0x0].0x0 bits 0x2/0x0 rrc: 2 type: IBT gid 0 flags: 0x50210001000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0
      [67452.076701] Lustre: ### client-side local enqueue handler, new lock created ns: mdt-lustre-MDT0001_UUID lock: 000000001c761af5/0xae74a97a4f2331dd lrc: 3/0,1 mode: PW/PW res: [0x240000402:0x1:0x0].0x0 bits 0x2/0x0 rrc: 2 type: IBT gid 0 flags: 0x40210001000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0
      [67452.082993] Lustre: ### ldlm_lock_addref(PW) ns: mdt-lustre-MDT0001_UUID lock: 00000000324d3623/0xae74a97a4f2331e4 lrc: 3/0,1 mode: --/PW res: [0x240000400:0x2:0x0].0x0 bits 0x0/0x0 rrc: 2 type: IBT gid 0 flags: 0x40000000000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0
      [67452.088849] Lustre: ### About to add lock: ns: mdt-lustre-MDT0001_UUID lock: 00000000324d3623/0xae74a97a4f2331e4 lrc: 3/0,1 mode: PW/PW res: [0x240000400:0x2:0x0].0x0 bits 0x2/0x0 rrc: 2 type: IBT gid 0 flags: 0x50210001000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0
      [67452.094616] Lustre: ### client-side local enqueue handler, new lock created ns: mdt-lustre-MDT0001_UUID lock: 00000000324d3623/0xae74a97a4f2331e4 lrc: 3/0,1 mode: PW/PW res: [0x240000400:0x2:0x0].0x0 bits 0x2/0x0 rrc: 2 type: IBT gid 0 flags: 0x40210001000000 nid: local remote: 0x0 expref: -99 pid: 10246 timeout: 0 lvb_type: 0
      [67452.100790] Unable to handle kernel paging request at virtual address ffffb6d6a5c60804
      [67452.102606] Mem abort info:
      [67452.103219]   ESR = 0x96000021
      [67452.103865]   Exception class = DABT (current EL), IL = 32 bits
      [67452.105141]   SET = 0, FnV = 0
      [67452.105816]   EA = 0, S1PTW = 0
      [67452.106492] Data abort info:
      [67452.107096]   ISV = 0, ISS = 0x00000021
      [67452.107912]   CM = 0, WnR = 0
      [67452.108564] swapper pgtable: 64k pages, 48-bit VAs, pgdp = 000000008ce20289
      [67452.110150] [ffffb6d6a5c60804] pgd=000000083ffd0003, pud=000000083ffd0003, pmd=000000083ff30003, pte=00e8000165c60f13
      [67452.112534] Internal error: Oops: 96000021 [#1] SMP
      [67452.113564] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) mbcache jbd2 lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) crc32_generic libcfs(OE) dm_flakey vfat fat virtio_gpu crct10dif_ce drm_kms_helper ghash_ce sha2_ce drm sha256_arm64 fb_sys_fops syscopyarea sysfillrect sha1_ce sysimgblt virtio_balloon binfmt_misc xfs libcrc32c virtio_net net_failover virtio_blk failover virtio_mmio sunrpc dm_mirror dm_region_hash dm_log dm_mod
      [67452.124678] CPU: 3 PID: 10246 Comm: mdt01_002 Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-348.2.1.el8_lustre_debug_debug.aarch64 #1
      [67452.127604] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [67452.129116] pstate: 10000005 (nzcV daif -PAN -UAO)
      [67452.130187] pc : __ll_sc_atomic64_or+0x4/0x18
      [67452.131178] lr : lod_object_lock+0x81c/0x15c0 [lod]
      [67452.131804] Lustre: Sending RPC req@00000000880f234c pname:cluuid:pid:xid:nid:opc:job ptlrpcd_06_00:lustre-MDT0001-mdtlov_UUID:8379:1718654554184960:0@lo:41:osp-pre-0-1.0
      [67452.132273] sp : ffffb6d68a5a7270
      [67452.135807] Lustre: peer: 12345-0@lo (source: 12345-0@lo)
      [67452.136435] x29: ffffb6d68a5a72b0 x28: ffff20000ae70280
      [67452.137672] Lustre: got req x1718654554184960
      [67452.138704] x27: ffff20007517e888 x26: 0000000000000001
      [67452.139760] Lustre: NRS start fifo request from 12345-0@lo, seq: 67
      [67452.140824] x25: 0000000000000001 x24: 0000000000000001
      [67452.142263] Lustre: Handling RPC req@00000000ce02a9f7 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out06_002:lustre-MDT0001-mdtlov_UUID+7:8379:x1718654554184960:12345-0@lo:41:osp-pre-0-1.0
      [67452.143359] x23: ffffb6d80566a150 x22: ffffb6d6a766e150
      [67452.146813] Lustre: blocks cached 0 granted 2146304 pending 0 free 126251008 avail 114745344
      [67452.147920] x21: ffffb6d6bc9ca7d0 x20: ffffb6d8056415c8
      [67452.149808] Lustre: Handled RPC req@00000000ce02a9f7 pname:cluuid+ref:pid:xid:nid:opc:job mdt_out06_002:lustre-MDT0001-mdtlov_UUID+7:8379:x1718654554184960:12345-0@lo:41:osp-pre-0-1.0 Request processed in 7546us (13999us total) trans 0 rc 0/0
      [67452.149851] Lustre: Completed RPC req@00000000880f234c pname:cluuid:pid:xid:nid:opc:job ptlrpcd_06_00:lustre-MDT0001-mdtlov_UUID:8379:1718654554184960:0@lo:41:osp-pre-0-1.0
      [67452.150828] x19: 0000000000000008 x18: 0000000000000000
      [67452.155565] Lustre: NRS stop fifo request from 12345-0@lo, seq: 67
      [67452.158845] x17: 0000000000000000 x16: ffff200072dfc718
      [67452.162462] x15: dfff200000000000 x14: 636f6c203a64696e
      [67452.163623] x13: 0000000000000000 x12: ffff16dad4ecdc2e
      [67452.164801] x11: 1ffff6dad4ecdc2d x10: ffff16dad4ecdc2d
      [67452.165982] x9 : 0000000000000000 x8 : 0000000000000000
      [67452.167180] x7 : 1ffff6db00acd42a x6 : ffff16dad4ecdc2e
      [67452.168351] x5 : ffffb6d6a766e158 x4 : 0000000000000000
      [67452.169529] x3 : 0000000000000000 x2 : 0000000000000000
      [67452.170696] x1 : ffffb6d6a5c60804 x0 : 0000000000000002
      [67452.171910] Process mdt01_002 (pid: 10246, stack limit = 0x000000006659ab27)
      [67452.173530] Call trace:
      [67452.174090]  __ll_sc_atomic64_or+0x4/0x18
      [67452.174994]  mdd_object_lock+0xac/0x170 [mdd]
      [67452.175992]  mdt_reint_striped_lock+0x494/0xf10 [mdt]
      [67452.177185]  mdt_create+0x23c8/0x4818 [mdt]
      [67452.178150]  mdt_reint_create+0x6c4/0xbb8 [mdt]
      [67452.179201]  mdt_reint_rec+0x27c/0x708 [mdt]
      [67452.180176]  mdt_reint_internal+0xbd4/0x2408 [mdt]
      [67452.181292]  mdt_reint+0x190/0x378 [mdt]
      [67452.182314]  tgt_handle_request0+0x238/0x1368 [ptlrpc]
      [67452.183587]  tgt_request_handle+0x1364/0x3ec0 [ptlrpc]
      [67452.184872]  ptlrpc_server_handle_request+0x9ec/0x28d0 [ptlrpc]
      [67452.186329]  ptlrpc_main+0x1aa4/0x3f68 [ptlrpc]
      [67452.187371]  kthread+0x3b0/0x460
      [67452.188119]  ret_from_fork+0x10/0x18
      [67452.188955] Code: f84107fe d65f03c0 d503201f f9800031 (c85f7c31)
      [67452.190637] SMP: stopping secondary CPUs
      [67452.200205] Starting crashdump kernel...
      [67452.201386] Bye! 

       

      Attachments

        Issue Links

          Activity

            [LU-15364] Kernel oops when stripe on Arm64 Server end multiple MDTs
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45922/
            Subject: LU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e2ac5f28c06a34318c9eb2c741ffbf47eea4690d

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45922/ Subject: LU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs Project: fs/lustre-release Branch: master Current Patch Set: Commit: e2ac5f28c06a34318c9eb2c741ffbf47eea4690d

            "Kevin Zhao <kevin.zhao@linaro.org>" uploaded a new patch: https://review.whamcloud.com/45922
            Subject: LU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bcd813aeda834c46d063b8bdc7c8ec958705cc35

            gerrit Gerrit Updater added a comment - "Kevin Zhao <kevin.zhao@linaro.org>" uploaded a new patch: https://review.whamcloud.com/45922 Subject: LU-15364 ldlm: Kernel oops when stripe on Arm64 multiple MDTs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bcd813aeda834c46d063b8bdc7c8ec958705cc35
            kevin.zhao Kevin Zhao added a comment -

            Will workout a patch to fix this.

            kevin.zhao Kevin Zhao added a comment - Will workout a patch to fix this.
            kevin.zhao Kevin Zhao added a comment - - edited

            Thanks @Xinliang Liu for the info. I can supplement more details here:

            When setup with multiple MDTs, the atomic operation is needed for `set_bit` operation. On Arm64 platform, the atomic operation will rely on the exclusive access, which is requesting the address alignment[1]. So that's why we see that the __ll_sc_atomic64_or+0x4 is crashed. __ll_sc_atomic64_or+0x4 is LDXR instruction, directly load the value from address exclusively. The atomic64 required the access the 64 bits alignment address, but the struct element ha_map is 4 bytes alignment, that is the root cause. The Error code of this crash is ESR = 0x96000021, which is the alignment issue[2].

                1. https://developer.arm.com/documentation/den0024/a/ch05s01s02
                2. https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/ESR-EL1-Exception-Syndrome-RegisterEL1

            kevin.zhao Kevin Zhao added a comment - - edited Thanks @Xinliang Liu for the info. I can supplement more details here: When setup with multiple MDTs, the atomic operation is needed for `set_bit` operation. On Arm64 platform, the atomic operation will rely on the exclusive access, which is requesting the address alignment [1] . So that's why we see that the __ll_sc_atomic64_or+0x4 is crashed. __ll_sc_atomic64_or+0x4 is LDXR instruction, directly load the value from address exclusively. The atomic64 required the access the 64 bits alignment address, but the struct element ha_map is 4 bytes alignment, that is the root cause. The Error code of this crash is ESR = 0x96000021, which is the alignment issue [2] .     1. https://developer.arm.com/documentation/den0024/a/ch05s01s02     2. https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/ESR-EL1-Exception-Syndrome-RegisterEL1
            xinliang Xinliang Liu added a comment - - edited

            Arm64 atomic64 operation requires address 64bit (a.k.a 8 bytes) aligned[1] whereas x86 doesn't require this.  ffffb6d6a5c60804 is obviously not 64bit aligned.

            Yes, like Andreas said we have to find out which line causes this oops, which is the line before line "lr : lod_object_lock+0x81c". Kevin has figured out that recently maybe he can comment the root cause here. @Kevin Zhao

            [1] "Unaligned address support" of https://developer.arm.com/documentation/den0024/a/ch05s01s02

            xinliang Xinliang Liu added a comment - - edited Arm64 atomic64 operation requires address 64bit (a.k.a 8 bytes) aligned [1] whereas x86 doesn't require this.  ffffb6d6a5c60804 is obviously not 64bit aligned. Yes, like Andreas said we have to find out which line causes this oops, which is the line before line "lr : lod_object_lock+0x81c". Kevin has figured out that recently maybe he can comment the root cause here. @ Kevin Zhao [1] "Unaligned address support" of https://developer.arm.com/documentation/den0024/a/ch05s01s02

            Is this actually 2.14.0, or is it 2.14.55-something from the tip of master?

            I'm not familiar with ARM stack traces. The message "Unable to handle kernel paging request at virtual address ffffb6d6a5c60804" looks like an x86 address, but I'm not sure where that would come from, especially since both the client and server are on the same ARM node. Also, what is "__ll_sc_atomic64_or()"?

            Probably the first thing for you to do to debug this is determine what line lod_object_lock+0x81c is, with "gdb lustre/lod/lod.ko; list *(lod_object_lock+0x81c)".

            adilger Andreas Dilger added a comment - Is this actually 2.14.0, or is it 2.14.55-something from the tip of master? I'm not familiar with ARM stack traces. The message " Unable to handle kernel paging request at virtual address ffffb6d6a5c60804 " looks like an x86 address, but I'm not sure where that would come from, especially since both the client and server are on the same ARM node. Also, what is " __ll_sc_atomic64_or() "? Probably the first thing for you to do to debug this is determine what line lod_object_lock+0x81c is, with " gdb lustre/lod/lod.ko; list *(lod_object_lock+0x81c) ".

            People

              kevin.zhao Kevin Zhao
              kevin.zhao Kevin Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: