Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16152

PFL YAML file with extent >= 2G leads to overflow when used as template; may trigger MDS kernel panic

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.12.8, Lustre 2.15.1
    • None
    • Servers: CentOS 7.9 (3.10.0-1160.49.1.el7.x86_64), Lustre 2.12.8, ZFS 0.8.5
      2.12 Clients: CentOS 7.9 (3.10.0-1160.53.1.el7.x86_64), Lustre 2.12.8
      2.15 Clients: CentOS 7.9 (3.10.0-1160.76.1.el7.x86_64), Lustre 2.15.1
    • 3
    • 9223372036854775807

    Description

      When applying a YAML layout file with extents of 2147483648 or larger, these extents appear as 18446744071562067968. Example:

      # lfs setstripe -E 2048M -c 4 -E EOF -c 8 testdir
      # lfs getstripe --yaml -d testdir > 2048M.lyl
      # cat 2048M.lyl
        lcm_layout_gen:    0
        lcm_mirror_count:  1
        lcm_entry_count:   2
        component0:
          lcme_id:             N/A
          lcme_mirror_id:      N/A
          lcme_flags:          0
          lcme_extent.e_start: 0
          lcme_extent.e_end:   2147483648
          sub_layout:
            stripe_count:  4
            stripe_size:   1048576
            pattern:       raid0
            stripe_offset: -1  component1:
          lcme_id:             N/A
          lcme_mirror_id:      N/A
          lcme_flags:          0
          lcme_extent.e_start: 2147483648
          lcme_extent.e_end:   EOF
          sub_layout:
            stripe_count:  8
            stripe_size:   1048576
            pattern:       raid0
            stripe_offset: -1
      
      # mkdir tst_2048M
      # lfs setstripe --yaml 2048M.lyl tst_2048M
      # lfs getstripe -d tst_2048M
        lcm_layout_gen:    0
        lcm_mirror_count:  1
        lcm_entry_count:   2
          lcme_id:             N/A
          lcme_mirror_id:      N/A
          lcme_flags:          0
          lcme_extent.e_start: 0
          lcme_extent.e_end:   18446744071562067968
            stripe_count:  4       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1    lcme_id:             N/A
          lcme_mirror_id:      N/A
          lcme_flags:          0
          lcme_extent.e_start: 18446744071562067968
          lcme_extent.e_end:   EOF
            stripe_count:  8       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1

      Using "lfs setstripe --copy testdir" instead of "lfs setstripe --yaml 2048M.lyl" works as intended. Ending the first component at 2047M works with either method.

      Unfortunately, I did not catch this in time and several files were restriped with similar insane layouts. Attempts to re-stripe them properly occasionally trigger kernel panics on the metadata server. Here is one of the early ones, which would happen immediately after the MDS recovered after reboot.

      [Aug31 20:55] Lustre: DFS-L-MDT0000: Denying connection for new client 65a07c20-0fc9-26df-0102-6dd1be2412e7 (at 10.201.32.11@o2ib1), waiting for 369 known clients (329 recovered, 6 in progress, and 0 evicted) to recover in 4:19
      [ +13.335095] Lustre: DFS-L-MDT0000: Recovery over after 0:54, of 369 clients 369 recovered and 0 were evicted.
      [  +0.001392] LustreError: 41361:0:(osd_io.c:311:kmem_to_page()) ASSERTION( !((unsigned long)addr & ~(~(((1UL) << 12)-1))) ) failed: 
      [  +0.000079] LustreError: 41361:0:(osd_io.c:311:kmem_to_page()) LBUG
      [  +0.000038] Pid: 41361, comm: mdt_io00_002 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021
      [  +0.000001] Call Trace:
      [  +0.000013]  [<ffffffffc0b647cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [  +0.000015]  [<ffffffffc0b6487c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [  +0.000008]  [<ffffffffc11d3a7c>] osd_zap_lookup.isra.15.part.16+0x0/0x36 [osd_zfs]
      [  +0.000017]  [<ffffffffc11b845f>] osd_bufs_get+0x5ff/0xf80 [osd_zfs]
      [  +0.000011]  [<ffffffffc1361389>] mdt_obd_preprw+0xd09/0x10a0 [mdt]
      [  +0.000032]  [<ffffffffc103365e>] tgt_brw_read+0xa1e/0x1ed0 [ptlrpc]
      [  +0.000095]  [<ffffffffc1031eea>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      [  +0.000058]  [<ffffffffc0fd6bcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [  +0.000047]  [<ffffffffc0fda534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      [  +0.000044]  [<ffffffffa2ec5e61>] kthread+0xd1/0xe0
      [  +0.000008]  [<ffffffffa3595df7>] ret_from_fork_nospec_end+0x0/0x39
      [  +0.000008]  [<ffffffffffffffff>] 0xffffffffffffffff
      [  +0.000026] Kernel panic - not syncing: LBUG
      [  +0.000030] CPU: 22 PID: 41361 Comm: mdt_io00_002 Tainted: P           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
      [  +0.000059] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
      [  +0.000043] Call Trace:
      [  +0.000021]  [<ffffffffa3583539>] dump_stack+0x19/0x1b
      [  +0.000042]  [<ffffffffa357d241>] panic+0xe8/0x21f
      [  +0.000037]  [<ffffffffc0b648cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [  +0.000050]  [<ffffffffc11d3a7c>] kmem_to_page.part.16+0x36/0x36 [osd_zfs]
      [  +0.000052]  [<ffffffffc11b845f>] osd_bufs_get+0x5ff/0xf80 [osd_zfs]
      [  +0.000056]  [<ffffffffc1361389>] mdt_obd_preprw+0xd09/0x10a0 [mdt]
      [  +0.000085]  [<ffffffffc103365e>] tgt_brw_read+0xa1e/0x1ed0 [ptlrpc]
      [  +0.000082]  [<ffffffffc0d1df29>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [  +0.000087]  [<ffffffffc1001cd6>] ? null_alloc_rs+0x186/0x340 [ptlrpc]
      [  +0.000080]  [<ffffffffc0fc9985>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc]
      [  +0.000084]  [<ffffffffc0fc9b4f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
      [  +0.000080]  [<ffffffffc0fc9cd1>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
      [  +0.000088]  [<ffffffffc1031eea>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      [  +0.000085]  [<ffffffffc100b601>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
      [  +0.000052]  [<ffffffffc0b64bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
      [  +0.000080]  [<ffffffffc0fd6bcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [  +0.000082]  [<ffffffffc0fd36e5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
      [  +0.000042]  [<ffffffffa2ed3233>] ? __wake_up+0x13/0x20
      [  +0.000070]  [<ffffffffc0fda534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      [  +0.000077]  [<ffffffffc0fd9a00>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
      [  +0.000045]  [<ffffffffa2ec5e61>] kthread+0xd1/0xe0
      [  +0.000032]  [<ffffffffa2ec5d90>] ? insert_kthread_work+0x40/0x40
      [  +0.000048]  [<ffffffffa3595df7>] ret_from_fork_nospec_begin+0x21/0x21
      [  +0.000040]  [<ffffffffa2ec5d90>] ? insert_kthread_work+0x40/0x40
      [  +0.000042] ------------[ cut here ]------------
      [  +0.000032] WARNING: CPU: 26 PID: 6640 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x65/0x70
      [  +0.000042] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) crct10dif_generic ksocklnd(OE) ko2iblnd(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace libcfs(OE) fscache iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi zfs(POE) mgag200 i2c_algo_bit ttm kvm drm_kms_helper irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel syscopyarea ghash_clmulni_intel sysfillrect sysimgblt fb_sys_fops aesni_intel lrw gf128mul glue_helper zunicode(POE) ablk_helper cryptd drm zlua(POE) pcspkr zcommon(POE) znvpair(POE) zavl(POE) icp(POE) lpc_ich drm_panel_orientation_quirks mei_me mei ib_iser spl(OE) libiscsi scsi_transport_iscsi ses enclosure sg wmi
      [  +0.000434]  acpi_power_meter rpcrdma sunrpc ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm sch_fq ip_tables ipmi_si ipmi_devintf ipmi_msghandler mlx5_ib ib_uverbs ib_core mlx5_core mlxfw devlink mpt3sas raid_class scsi_transport_sas tg3 ptp pps_core ahci libahci libata sd_mod crc_t10dif crct10dif_common
      [  +0.000164] CPU: 26 PID: 6640 Comm: z_wr_iss Tainted: P           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
      [  +0.000046] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
      [  +0.001264] Call Trace:
      [  +0.001247]  [<ffffffffa3583539>] dump_stack+0x19/0x1b
      [  +0.001259]  [<ffffffffa2e9b278>] __warn+0xd8/0x100
      [  +0.001253]  [<ffffffffa2e9b3bd>] warn_slowpath_null+0x1d/0x20
      [  +0.001226]  [<ffffffffa2e59495>] native_smp_send_reschedule+0x65/0x70
      [  +0.001200]  [<ffffffffa2edac9e>] try_to_wake_up+0x2fe/0x390
      [  +0.001188]  [<ffffffffa2edadab>] wake_up_q+0x5b/0x80
      [  +0.001177]  [<ffffffffa318f18b>] rwsem_wake+0x8b/0xe0
      [  +0.001156]  [<ffffffffa31981eb>] call_rwsem_wake+0x1b/0x30
      [  +0.001140]  [<ffffffffc08505e5>]
      
       

      Killing off all processes on the clients stuck on IO, then re-mounting the filesystem allowed the metadata server to stay up. A "lctl lfsck_start -A -c -C -o" completed with no major issues, and scrubbing the zpools also completed successfully, so I don't think data was lost.

      Searching this Jira and the Internet did not get me any relevant hits for either the PFL YAML issue or failed assertion.

      Attachments

        1. BadPFL.txt
          3 kB
        2. lod_lov.c.patch
          2 kB
        3. lod_object.c.patch
          1 kB
        4. lov_ea.c.patch
          1 kB

        Issue Links

          Activity

            People

              flei Feng Lei
              nathan.crawford@uci.edu Nathan Crawford
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: