Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16152

PFL YAML file with extent >= 2G leads to overflow when used as template; may trigger MDS kernel panic

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.12.8, Lustre 2.15.1
    • None
    • Servers: CentOS 7.9 (3.10.0-1160.49.1.el7.x86_64), Lustre 2.12.8, ZFS 0.8.5
      2.12 Clients: CentOS 7.9 (3.10.0-1160.53.1.el7.x86_64), Lustre 2.12.8
      2.15 Clients: CentOS 7.9 (3.10.0-1160.76.1.el7.x86_64), Lustre 2.15.1
    • 3
    • 9223372036854775807

    Description

      When applying a YAML layout file with extents of 2147483648 or larger, these extents appear as 18446744071562067968. Example:

      # lfs setstripe -E 2048M -c 4 -E EOF -c 8 testdir
      # lfs getstripe --yaml -d testdir > 2048M.lyl
      # cat 2048M.lyl
        lcm_layout_gen:    0
        lcm_mirror_count:  1
        lcm_entry_count:   2
        component0:
          lcme_id:             N/A
          lcme_mirror_id:      N/A
          lcme_flags:          0
          lcme_extent.e_start: 0
          lcme_extent.e_end:   2147483648
          sub_layout:
            stripe_count:  4
            stripe_size:   1048576
            pattern:       raid0
            stripe_offset: -1  component1:
          lcme_id:             N/A
          lcme_mirror_id:      N/A
          lcme_flags:          0
          lcme_extent.e_start: 2147483648
          lcme_extent.e_end:   EOF
          sub_layout:
            stripe_count:  8
            stripe_size:   1048576
            pattern:       raid0
            stripe_offset: -1
      
      # mkdir tst_2048M
      # lfs setstripe --yaml 2048M.lyl tst_2048M
      # lfs getstripe -d tst_2048M
        lcm_layout_gen:    0
        lcm_mirror_count:  1
        lcm_entry_count:   2
          lcme_id:             N/A
          lcme_mirror_id:      N/A
          lcme_flags:          0
          lcme_extent.e_start: 0
          lcme_extent.e_end:   18446744071562067968
            stripe_count:  4       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1    lcme_id:             N/A
          lcme_mirror_id:      N/A
          lcme_flags:          0
          lcme_extent.e_start: 18446744071562067968
          lcme_extent.e_end:   EOF
            stripe_count:  8       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1

      Using "lfs setstripe --copy testdir" instead of "lfs setstripe --yaml 2048M.lyl" works as intended. Ending the first component at 2047M works with either method.

      Unfortunately, I did not catch this in time and several files were restriped with similar insane layouts. Attempts to re-stripe them properly occasionally trigger kernel panics on the metadata server. Here is one of the early ones, which would happen immediately after the MDS recovered after reboot.

      [Aug31 20:55] Lustre: DFS-L-MDT0000: Denying connection for new client 65a07c20-0fc9-26df-0102-6dd1be2412e7 (at 10.201.32.11@o2ib1), waiting for 369 known clients (329 recovered, 6 in progress, and 0 evicted) to recover in 4:19
      [ +13.335095] Lustre: DFS-L-MDT0000: Recovery over after 0:54, of 369 clients 369 recovered and 0 were evicted.
      [  +0.001392] LustreError: 41361:0:(osd_io.c:311:kmem_to_page()) ASSERTION( !((unsigned long)addr & ~(~(((1UL) << 12)-1))) ) failed: 
      [  +0.000079] LustreError: 41361:0:(osd_io.c:311:kmem_to_page()) LBUG
      [  +0.000038] Pid: 41361, comm: mdt_io00_002 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021
      [  +0.000001] Call Trace:
      [  +0.000013]  [<ffffffffc0b647cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [  +0.000015]  [<ffffffffc0b6487c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [  +0.000008]  [<ffffffffc11d3a7c>] osd_zap_lookup.isra.15.part.16+0x0/0x36 [osd_zfs]
      [  +0.000017]  [<ffffffffc11b845f>] osd_bufs_get+0x5ff/0xf80 [osd_zfs]
      [  +0.000011]  [<ffffffffc1361389>] mdt_obd_preprw+0xd09/0x10a0 [mdt]
      [  +0.000032]  [<ffffffffc103365e>] tgt_brw_read+0xa1e/0x1ed0 [ptlrpc]
      [  +0.000095]  [<ffffffffc1031eea>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      [  +0.000058]  [<ffffffffc0fd6bcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [  +0.000047]  [<ffffffffc0fda534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      [  +0.000044]  [<ffffffffa2ec5e61>] kthread+0xd1/0xe0
      [  +0.000008]  [<ffffffffa3595df7>] ret_from_fork_nospec_end+0x0/0x39
      [  +0.000008]  [<ffffffffffffffff>] 0xffffffffffffffff
      [  +0.000026] Kernel panic - not syncing: LBUG
      [  +0.000030] CPU: 22 PID: 41361 Comm: mdt_io00_002 Tainted: P           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
      [  +0.000059] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
      [  +0.000043] Call Trace:
      [  +0.000021]  [<ffffffffa3583539>] dump_stack+0x19/0x1b
      [  +0.000042]  [<ffffffffa357d241>] panic+0xe8/0x21f
      [  +0.000037]  [<ffffffffc0b648cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [  +0.000050]  [<ffffffffc11d3a7c>] kmem_to_page.part.16+0x36/0x36 [osd_zfs]
      [  +0.000052]  [<ffffffffc11b845f>] osd_bufs_get+0x5ff/0xf80 [osd_zfs]
      [  +0.000056]  [<ffffffffc1361389>] mdt_obd_preprw+0xd09/0x10a0 [mdt]
      [  +0.000085]  [<ffffffffc103365e>] tgt_brw_read+0xa1e/0x1ed0 [ptlrpc]
      [  +0.000082]  [<ffffffffc0d1df29>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [  +0.000087]  [<ffffffffc1001cd6>] ? null_alloc_rs+0x186/0x340 [ptlrpc]
      [  +0.000080]  [<ffffffffc0fc9985>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc]
      [  +0.000084]  [<ffffffffc0fc9b4f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
      [  +0.000080]  [<ffffffffc0fc9cd1>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
      [  +0.000088]  [<ffffffffc1031eea>] tgt_request_handle+0xada/0x1570 [ptlrpc]
      [  +0.000085]  [<ffffffffc100b601>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
      [  +0.000052]  [<ffffffffc0b64bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
      [  +0.000080]  [<ffffffffc0fd6bcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [  +0.000082]  [<ffffffffc0fd36e5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
      [  +0.000042]  [<ffffffffa2ed3233>] ? __wake_up+0x13/0x20
      [  +0.000070]  [<ffffffffc0fda534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
      [  +0.000077]  [<ffffffffc0fd9a00>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
      [  +0.000045]  [<ffffffffa2ec5e61>] kthread+0xd1/0xe0
      [  +0.000032]  [<ffffffffa2ec5d90>] ? insert_kthread_work+0x40/0x40
      [  +0.000048]  [<ffffffffa3595df7>] ret_from_fork_nospec_begin+0x21/0x21
      [  +0.000040]  [<ffffffffa2ec5d90>] ? insert_kthread_work+0x40/0x40
      [  +0.000042] ------------[ cut here ]------------
      [  +0.000032] WARNING: CPU: 26 PID: 6640 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x65/0x70
      [  +0.000042] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) crct10dif_generic ksocklnd(OE) ko2iblnd(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace libcfs(OE) fscache iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi zfs(POE) mgag200 i2c_algo_bit ttm kvm drm_kms_helper irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel syscopyarea ghash_clmulni_intel sysfillrect sysimgblt fb_sys_fops aesni_intel lrw gf128mul glue_helper zunicode(POE) ablk_helper cryptd drm zlua(POE) pcspkr zcommon(POE) znvpair(POE) zavl(POE) icp(POE) lpc_ich drm_panel_orientation_quirks mei_me mei ib_iser spl(OE) libiscsi scsi_transport_iscsi ses enclosure sg wmi
      [  +0.000434]  acpi_power_meter rpcrdma sunrpc ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm sch_fq ip_tables ipmi_si ipmi_devintf ipmi_msghandler mlx5_ib ib_uverbs ib_core mlx5_core mlxfw devlink mpt3sas raid_class scsi_transport_sas tg3 ptp pps_core ahci libahci libata sd_mod crc_t10dif crct10dif_common
      [  +0.000164] CPU: 26 PID: 6640 Comm: z_wr_iss Tainted: P           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
      [  +0.000046] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
      [  +0.001264] Call Trace:
      [  +0.001247]  [<ffffffffa3583539>] dump_stack+0x19/0x1b
      [  +0.001259]  [<ffffffffa2e9b278>] __warn+0xd8/0x100
      [  +0.001253]  [<ffffffffa2e9b3bd>] warn_slowpath_null+0x1d/0x20
      [  +0.001226]  [<ffffffffa2e59495>] native_smp_send_reschedule+0x65/0x70
      [  +0.001200]  [<ffffffffa2edac9e>] try_to_wake_up+0x2fe/0x390
      [  +0.001188]  [<ffffffffa2edadab>] wake_up_q+0x5b/0x80
      [  +0.001177]  [<ffffffffa318f18b>] rwsem_wake+0x8b/0xe0
      [  +0.001156]  [<ffffffffa31981eb>] call_rwsem_wake+0x1b/0x30
      [  +0.001140]  [<ffffffffc08505e5>]
      
       

      Killing off all processes on the clients stuck on IO, then re-mounting the filesystem allowed the metadata server to stay up. A "lctl lfsck_start -A -c -C -o" completed with no major issues, and scrubbing the zpools also completed successfully, so I don't think data was lost.

      Searching this Jira and the Internet did not get me any relevant hits for either the PFL YAML issue or failed assertion.

      Attachments

        1. BadPFL.txt
          3 kB
        2. lod_lov.c.patch
          2 kB
        3. lod_object.c.patch
          1 kB
        4. lov_ea.c.patch
          1 kB

        Issue Links

          Activity

            [LU-16152] PFL YAML file with extent >= 2G leads to overflow when used as template; may trigger MDS kernel panic

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49065
            Subject: LU-16152 lov: handle negative PFL layout offsets
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: baf015a9582594c9b8c3589d54f18f60fdb04f34

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49065 Subject: LU-16152 lov: handle negative PFL layout offsets Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: baf015a9582594c9b8c3589d54f18f60fdb04f34

            An example bad 20K file (with reasonable extent values):

            du --apparent-size  badfile
            20

            but the output from "lfs getstripe -v badfile" looks like it belongs to a different file:

            composite_header:
              lcm_magic:         0x0BD60BD0
              lcm_size:          744
              lcm_flags:         0
              lcm_layout_gen:    6
              lcm_mirror_count:  1
              lcm_entry_count:   5
            components:
              - lcme_id:             1
                lcme_mirror_id:      0
                lcme_flags:          init
                lcme_extent.e_start: 0
                lcme_extent.e_end:   131072
                lcme_offset:         272
                lcme_size:           32
                sub_layout:
                  lmm_magic:         0x0BD10BD0
                  lmm_seq:           0x20000ec94
                  lmm_object_id:     0x1defe
                  lmm_fid:           [0x20000ec94:0x1defe:0x0]
                  lmm_stripe_count:  0
                  lmm_stripe_size:   131072
                  lmm_pattern:       mdt
                  lmm_layout_gen:    0
                  lmm_stripe_offset: 0
            
              - lcme_id:             2
                lcme_mirror_id:      0
                lcme_flags:          init
                lcme_extent.e_start: 131072
                lcme_extent.e_end:   16777216
                lcme_offset:         304
                lcme_size:           56
                sub_layout:
                  lmm_magic:         0x0BD10BD0
                  lmm_seq:           0x20000ec94
                  lmm_object_id:     0x1defe
                  lmm_fid:           [0x20000ec94:0x1defe:0x0]
                  lmm_stripe_count:  1
                  lmm_stripe_size:   1048576
                  lmm_pattern:       raid0
                  lmm_layout_gen:    0
                  lmm_stripe_offset: 1
                  lmm_objects:
                  - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e4:0x0] }
            
              - lcme_id:             3
                lcme_mirror_id:      0
                lcme_flags:          init
                lcme_extent.e_start: 16777216
                lcme_extent.e_end:   1073741824
                lcme_offset:         360
                lcme_size:           80
                sub_layout:
                  lmm_magic:         0x0BD10BD0
                  lmm_seq:           0x20000ec94
                  lmm_object_id:     0x1defe
                  lmm_fid:           [0x20000ec94:0x1defe:0x0]
                  lmm_stripe_count:  2
                  lmm_stripe_size:   1048576
                  lmm_pattern:       raid0
                  lmm_layout_gen:    0
                  lmm_stripe_offset: 2
                  lmm_objects:
                  - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2831:0x0] }
                  - 1: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadca:0x0] }
            
              - lcme_id:             4
                lcme_mirror_id:      0
                lcme_flags:          init
                lcme_extent.e_start: 1073741824
                lcme_extent.e_end:   34359738368
                lcme_offset:         440
                lcme_size:           128
                sub_layout:
                  lmm_magic:         0x0BD10BD0
                  lmm_seq:           0x20000ec94
                  lmm_object_id:     0x1defe
                  lmm_fid:           [0x20000ec94:0x1defe:0x0]
                  lmm_stripe_count:  4
                  lmm_stripe_size:   1048576
                  lmm_pattern:       raid0
                  lmm_layout_gen:    0
                  lmm_stripe_offset: 3
                  lmm_objects:
                  - 0: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadcb:0x0] }
                  - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x115f8373:0x0] }
                  - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2832:0x0] }
                  - 3: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e5:0x0] }
            
              - lcme_id:             5
                lcme_mirror_id:      0
                lcme_flags:          init
                lcme_extent.e_start: 34359738368
                lcme_extent.e_end:   EOF
                lcme_offset:         568
                lcme_size:           176
                sub_layout:
                  lmm_magic:         0x0BD10BD0
                  lmm_seq:           0x20000ec94
                  lmm_object_id:     0x1defe
                  lmm_fid:           [0x20000ec94:0x1defe:0x0]
                  lmm_stripe_count:  6
                  lmm_stripe_size:   1048576
                  lmm_pattern:       raid0
                  lmm_layout_gen:    0
                  lmm_stripe_offset: 0
                  lmm_objects:
                  - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x6c01b46:0x0] }
                  - 1: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadcc:0x0] }
                  - 2: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e6:0x0] }
                  - 3: { l_ost_idx: 5, l_fid: [0x100050000:0x115f8374:0x0] }
                  - 4: { l_ost_idx: 4, l_fid: [0x100040000:0xa4c5a4d:0x0] }
                  - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2833:0x0] }
            
            nathan.crawford@uci.edu Nathan Crawford added a comment - An example bad 20K file (with reasonable extent values): du --apparent-size  badfile 20 but the output from "lfs getstripe -v badfile" looks like it belongs to a different file: composite_header:   lcm_magic:         0x0BD60BD0   lcm_size:          744   lcm_flags:         0   lcm_layout_gen:    6   lcm_mirror_count:  1   lcm_entry_count:   5 components:   - lcme_id:             1     lcme_mirror_id:      0     lcme_flags:          init     lcme_extent.e_start: 0     lcme_extent.e_end:   131072     lcme_offset:         272     lcme_size:           32     sub_layout:       lmm_magic:         0x0BD10BD0       lmm_seq:           0x20000ec94       lmm_object_id:     0x1defe       lmm_fid:           [0x20000ec94:0x1defe:0x0]       lmm_stripe_count:  0       lmm_stripe_size:   131072       lmm_pattern:       mdt       lmm_layout_gen:    0       lmm_stripe_offset: 0   - lcme_id:             2     lcme_mirror_id:      0     lcme_flags:          init     lcme_extent.e_start: 131072     lcme_extent.e_end:   16777216     lcme_offset:         304     lcme_size:           56     sub_layout:       lmm_magic:         0x0BD10BD0       lmm_seq:           0x20000ec94       lmm_object_id:     0x1defe       lmm_fid:           [0x20000ec94:0x1defe:0x0]       lmm_stripe_count:  1       lmm_stripe_size:   1048576       lmm_pattern:       raid0       lmm_layout_gen:    0       lmm_stripe_offset: 1       lmm_objects:       - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e4:0x0] }   - lcme_id:             3     lcme_mirror_id:      0     lcme_flags:          init     lcme_extent.e_start: 16777216     lcme_extent.e_end:   1073741824     lcme_offset:         360     lcme_size:           80     sub_layout:       lmm_magic:         0x0BD10BD0       lmm_seq:           0x20000ec94       lmm_object_id:     0x1defe       lmm_fid:           [0x20000ec94:0x1defe:0x0]       lmm_stripe_count:  2       lmm_stripe_size:   1048576       lmm_pattern:       raid0       lmm_layout_gen:    0       lmm_stripe_offset: 2       lmm_objects:       - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2831:0x0] }       - 1: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadca:0x0] }   - lcme_id:             4     lcme_mirror_id:      0     lcme_flags:          init     lcme_extent.e_start: 1073741824     lcme_extent.e_end:   34359738368     lcme_offset:         440     lcme_size:           128     sub_layout:       lmm_magic:         0x0BD10BD0       lmm_seq:           0x20000ec94       lmm_object_id:     0x1defe       lmm_fid:           [0x20000ec94:0x1defe:0x0]       lmm_stripe_count:  4       lmm_stripe_size:   1048576       lmm_pattern:       raid0       lmm_layout_gen:    0       lmm_stripe_offset: 3       lmm_objects:       - 0: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadcb:0x0] }       - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x115f8373:0x0] }       - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2832:0x0] }       - 3: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e5:0x0] }   - lcme_id:             5     lcme_mirror_id:      0     lcme_flags:          init     lcme_extent.e_start: 34359738368     lcme_extent.e_end:   EOF     lcme_offset:         568     lcme_size:           176     sub_layout:       lmm_magic:         0x0BD10BD0       lmm_seq:           0x20000ec94       lmm_object_id:     0x1defe       lmm_fid:           [0x20000ec94:0x1defe:0x0]       lmm_stripe_count:  6       lmm_stripe_size:   1048576       lmm_pattern:       raid0       lmm_layout_gen:    0       lmm_stripe_offset: 0       lmm_objects:       - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x6c01b46:0x0] }       - 1: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadcc:0x0] }       - 2: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e6:0x0] }       - 3: { l_ost_idx: 5, l_fid: [0x100050000:0x115f8374:0x0] }       - 4: { l_ost_idx: 4, l_fid: [0x100040000:0xa4c5a4d:0x0] }       - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2833:0x0] }

            Regarding the kernel panics on the MDS, I have NOT been able to reproduce them on a single-server test system. I can generate files with the bad PFL, read them, copy them, restripe them with lfs_migrate, etc.. 

            Also confusing: some files on the original problem file system that previously had bad extent values do not now, but still panic when accessed.

            I will try a few more permutations of setting bad layouts and restriping on the test system to see if I can re-create the panic.

            nathan.crawford@uci.edu Nathan Crawford added a comment - Regarding the kernel panics on the MDS, I have NOT been able to reproduce them on a single-server test system. I can generate files with the bad PFL, read them, copy them, restripe them with lfs_migrate, etc..  Also confusing: some files on the original problem file system that previously had bad extent values do not now, but still panic when accessed. I will try a few more permutations of setting bad layouts and restriping on the test system to see if I can re-create the panic.
            nathan.crawford@uci.edu Nathan Crawford added a comment - - edited

            Andreas, the patches were generated against 2.12.8 with commands like "git diff 2.12.8 lustre/lov/lov_ea.c". The actual version of lustre is 6 commits past 2.12.8 on b2_12 (up to 5457c37ec9f76e2fb1656c29848412522dbb81fd, "LU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7]", 2 Dec 2021). The only other modifications were fixes to lustre-dkms_pre-build.sh to cope with an output format change of "dkms status", and to config/lustre-build-zfs.m4 to handle when multiple subdirectories of /usr/src match "*zfs*".

            nathan.crawford@uci.edu Nathan Crawford added a comment - - edited Andreas, the patches were generated against 2.12.8 with commands like "git diff 2.12.8 lustre/lov/lov_ea.c". The actual version of lustre is 6 commits past 2.12.8 on b2_12 (up to 5457c37ec9f76e2fb1656c29848412522dbb81fd, " LU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7] ", 2 Dec 2021). The only other modifications were fixes to lustre-dkms_pre-build.sh to cope with an output format change of "dkms status", and to config/lustre-build-zfs.m4 to handle when multiple subdirectories of /usr/src match "*zfs*".
            flei Feng Lei added a comment -

            No e_start == LUSTRE_EOF. I mean do not convert LUSTRE_EOF from 0xffff ffff ffff ffff to 0x0000 0000 ffff ffff for e_end;

            flei Feng Lei added a comment - No e_start == LUSTRE_EOF . I mean do not convert LUSTRE_EOF from 0xffff ffff ffff ffff to 0x0000 0000 ffff ffff for e_end ;

            Feng Lei, it isn't clear whether there is a valid case where e_start == LUSTRE_EOF is ever valid?

            adilger Andreas Dilger added a comment - Feng Lei, it isn't clear whether there is a valid case where e_start == LUSTRE_EOF is ever valid?

            Nathan, I grabbed your patches yesterday and was working on a patch that I could push to Gerrit instead of as attachments here. There were a few changes that I'd made from your patches - made a LOD_COMP_EXT_BAD constant, moved all of the extent checks into a helper function instead of duplicating the code multiple times.

            One question I am had wa which Lustre version your patch was against? One of the patches didn't apply and the code didn't look very similar.

            adilger Andreas Dilger added a comment - Nathan, I grabbed your patches yesterday and was working on a patch that I could push to Gerrit instead of as attachments here. There were a few changes that I'd made from your patches - made a LOD_COMP_EXT_BAD constant, moved all of the extent checks into a helper function instead of duplicating the code multiple times. One question I am had wa which Lustre version your patch was against? One of the patches didn't apply and the code didn't look very similar.
            flei Feng Lei added a comment -

            nathan.crawford@uci.edu 

            I guess LUSTRE_EOF should be excluded, so the condition should be like this:

            if (lsme->lsme_extent.e_start != LUSTRE_EOF && 
                (lsme->lsme_extent.e_start >> 32) == 0xffffffffULL)
            
            
            flei Feng Lei added a comment - nathan.crawford@uci.edu   I guess LUSTRE_EOF should be excluded, so the condition should be like this: if (lsme->lsme_extent.e_start != LUSTRE_EOF && (lsme->lsme_extent.e_start >> 32) == 0xffffffffULL)

            Attempted to patch as in referenced LU-16194, but kernel panic remains. I'm attaching my patches against 2.12.8 to lod_lov.c.patch, lod_object.c.patch, and lov_ea.c.patch

            Notes:
              – I'm not familiar with best practices for CWARN messages and don't know what useful info should be included. Made some guesses.
              – In the few seconds the MDS was up before panicking, I saw some of the CWARN message come though from both lod_lov.c and lod_object.c. I believe that they were interpreting extent.e_end=eof as "-1", then converting it to 4G. The extent-end-checking part needs to handle this; I'll try to rig something up.
              – The lov_ea.c patch compiles, but was not tested. I don't know how to get the FID into the error message. Is there anything in the scope of lsm_unpackmd_comp_md_v1() that is recommended to use?

            I'm going to spin up a single-server, single-client system to try to reproduce the bug and test fixes. For now, I've set the subdirectory that was migrated to 000 permissions. I believe the problem files are all there.

             

            nathan.crawford@uci.edu Nathan Crawford added a comment - Attempted to patch as in referenced LU-16194 , but kernel panic remains. I'm attaching my patches against 2.12.8 to lod_lov.c.patch , lod_object.c.patch , and lov_ea.c.patch .  Notes:   – I'm not familiar with best practices for CWARN messages and don't know what useful info should be included. Made some guesses.   – In the few seconds the MDS was up before panicking, I saw some of the CWARN message come though from both lod_lov.c and lod_object.c. I believe that they were interpreting extent.e_end=eof as "-1", then converting it to 4G. The extent-end-checking part needs to handle this; I'll try to rig something up.   – The lov_ea.c patch compiles, but was not tested. I don't know how to get the FID into the error message. Is there anything in the scope of lsm_unpackmd_comp_md_v1() that is recommended to use? I'm going to spin up a single-server, single-client system to try to reproduce the bug and test fixes. For now, I've set the subdirectory that was migrated to 000 permissions. I believe the problem files are all there.  

            Thanks! Will attempt and report.

            nathan.crawford@uci.edu Nathan Crawford added a comment - Thanks! Will attempt and report.

            Nathan, the patch https://review.whamcloud.com/48684 "LU-16194 lod: define negative extent offset as invalid" should stop the MDS from crashing, but it will currently mark the whole file as having an invalid layout and the files would be inaccessible It may be possible to change the code temporarily so that e_start/e_end = 0xffffffffnnnnnnnn are interpreted as 0xnnnnnnnn, but this hasn't been implemented yet.

            I left some notes in that patch so you could potentially make a patch to fix this yourself, I don't think it would be too complex. This would both allow you to migrate the affected files without crashing (it would "fix" the layouts in memory only), and ideally be a proper patch that could be landed in case anyone else is affected by this bug before the YAML import fix is widely deployed.

            adilger Andreas Dilger added a comment - Nathan, the patch https://review.whamcloud.com/48684 " LU-16194 lod: define negative extent offset as invalid " should stop the MDS from crashing, but it will currently mark the whole file as having an invalid layout and the files would be inaccessible It may be possible to change the code temporarily so that e_start/e_end = 0xffffffffnnnnnnnn are interpreted as 0xnnnnnnnn , but this hasn't been implemented yet. I left some notes in that patch so you could potentially make a patch to fix this yourself, I don't think it would be too complex. This would both allow you to migrate the affected files without crashing (it would "fix" the layouts in memory only), and ideally be a proper patch that could be landed in case anyone else is affected by this bug before the YAML import fix is widely deployed.

            People

              flei Feng Lei
              nathan.crawford@uci.edu Nathan Crawford
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: