[LU-16152] PFL YAML file with extent >= 2G leads to overflow when used as template; may trigger MDS kernel panic - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.16.0
Affects Version/s: Lustre 2.12.8, Lustre 2.15.1
Labels:
None
Environment:
Servers: CentOS 7.9 (3.10.0-1160.49.1.el7.x86_64), Lustre 2.12.8, ZFS 0.8.5
2.12 Clients: CentOS 7.9 (3.10.0-1160.53.1.el7.x86_64), Lustre 2.12.8
2.15 Clients: CentOS 7.9 (3.10.0-1160.76.1.el7.x86_64), Lustre 2.15.1

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

When applying a YAML layout file with extents of 2147483648 or larger, these extents appear as 18446744071562067968. Example:

# lfs setstripe -E 2048M -c 4 -E EOF -c 8 testdir
# lfs getstripe --yaml -d testdir > 2048M.lyl
# cat 2048M.lyl
  lcm_layout_gen:    0
  lcm_mirror_count:  1
  lcm_entry_count:   2
  component0:
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 0
    lcme_extent.e_end:   2147483648
    sub_layout:
      stripe_count:  4
      stripe_size:   1048576
      pattern:       raid0
      stripe_offset: -1  component1:
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 2147483648
    lcme_extent.e_end:   EOF
    sub_layout:
      stripe_count:  8
      stripe_size:   1048576
      pattern:       raid0
      stripe_offset: -1

# mkdir tst_2048M
# lfs setstripe --yaml 2048M.lyl tst_2048M
# lfs getstripe -d tst_2048M
  lcm_layout_gen:    0
  lcm_mirror_count:  1
  lcm_entry_count:   2
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 0
    lcme_extent.e_end:   18446744071562067968
      stripe_count:  4       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 18446744071562067968
    lcme_extent.e_end:   EOF
      stripe_count:  8       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1

Using "lfs setstripe --copy testdir" instead of "lfs setstripe --yaml 2048M.lyl" works as intended. Ending the first component at 2047M works with either method.

Unfortunately, I did not catch this in time and several files were restriped with similar insane layouts. Attempts to re-stripe them properly occasionally trigger kernel panics on the metadata server. Here is one of the early ones, which would happen immediately after the MDS recovered after reboot.

[Aug31 20:55] Lustre: DFS-L-MDT0000: Denying connection for new client 65a07c20-0fc9-26df-0102-6dd1be2412e7 (at 10.201.32.11@o2ib1), waiting for 369 known clients (329 recovered, 6 in progress, and 0 evicted) to recover in 4:19
[ +13.335095] Lustre: DFS-L-MDT0000: Recovery over after 0:54, of 369 clients 369 recovered and 0 were evicted.
[  +0.001392] LustreError: 41361:0:(osd_io.c:311:kmem_to_page()) ASSERTION( !((unsigned long)addr & ~(~(((1UL) << 12)-1))) ) failed: 
[  +0.000079] LustreError: 41361:0:(osd_io.c:311:kmem_to_page()) LBUG
[  +0.000038] Pid: 41361, comm: mdt_io00_002 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021
[  +0.000001] Call Trace:
[  +0.000013]  [<ffffffffc0b647cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[  +0.000015]  [<ffffffffc0b6487c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[  +0.000008]  [<ffffffffc11d3a7c>] osd_zap_lookup.isra.15.part.16+0x0/0x36 [osd_zfs]
[  +0.000017]  [<ffffffffc11b845f>] osd_bufs_get+0x5ff/0xf80 [osd_zfs]
[  +0.000011]  [<ffffffffc1361389>] mdt_obd_preprw+0xd09/0x10a0 [mdt]
[  +0.000032]  [<ffffffffc103365e>] tgt_brw_read+0xa1e/0x1ed0 [ptlrpc]
[  +0.000095]  [<ffffffffc1031eea>] tgt_request_handle+0xada/0x1570 [ptlrpc]
[  +0.000058]  [<ffffffffc0fd6bcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  +0.000047]  [<ffffffffc0fda534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[  +0.000044]  [<ffffffffa2ec5e61>] kthread+0xd1/0xe0
[  +0.000008]  [<ffffffffa3595df7>] ret_from_fork_nospec_end+0x0/0x39
[  +0.000008]  [<ffffffffffffffff>] 0xffffffffffffffff
[  +0.000026] Kernel panic - not syncing: LBUG
[  +0.000030] CPU: 22 PID: 41361 Comm: mdt_io00_002 Tainted: P           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
[  +0.000059] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
[  +0.000043] Call Trace:
[  +0.000021]  [<ffffffffa3583539>] dump_stack+0x19/0x1b
[  +0.000042]  [<ffffffffa357d241>] panic+0xe8/0x21f
[  +0.000037]  [<ffffffffc0b648cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[  +0.000050]  [<ffffffffc11d3a7c>] kmem_to_page.part.16+0x36/0x36 [osd_zfs]
[  +0.000052]  [<ffffffffc11b845f>] osd_bufs_get+0x5ff/0xf80 [osd_zfs]
[  +0.000056]  [<ffffffffc1361389>] mdt_obd_preprw+0xd09/0x10a0 [mdt]
[  +0.000085]  [<ffffffffc103365e>] tgt_brw_read+0xa1e/0x1ed0 [ptlrpc]
[  +0.000082]  [<ffffffffc0d1df29>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[  +0.000087]  [<ffffffffc1001cd6>] ? null_alloc_rs+0x186/0x340 [ptlrpc]
[  +0.000080]  [<ffffffffc0fc9985>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc]
[  +0.000084]  [<ffffffffc0fc9b4f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
[  +0.000080]  [<ffffffffc0fc9cd1>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
[  +0.000088]  [<ffffffffc1031eea>] tgt_request_handle+0xada/0x1570 [ptlrpc]
[  +0.000085]  [<ffffffffc100b601>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[  +0.000052]  [<ffffffffc0b64bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
[  +0.000080]  [<ffffffffc0fd6bcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  +0.000082]  [<ffffffffc0fd36e5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[  +0.000042]  [<ffffffffa2ed3233>] ? __wake_up+0x13/0x20
[  +0.000070]  [<ffffffffc0fda534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[  +0.000077]  [<ffffffffc0fd9a00>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[  +0.000045]  [<ffffffffa2ec5e61>] kthread+0xd1/0xe0
[  +0.000032]  [<ffffffffa2ec5d90>] ? insert_kthread_work+0x40/0x40
[  +0.000048]  [<ffffffffa3595df7>] ret_from_fork_nospec_begin+0x21/0x21
[  +0.000040]  [<ffffffffa2ec5d90>] ? insert_kthread_work+0x40/0x40
[  +0.000042] ------------[ cut here ]------------
[  +0.000032] WARNING: CPU: 26 PID: 6640 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x65/0x70
[  +0.000042] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) crct10dif_generic ksocklnd(OE) ko2iblnd(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace libcfs(OE) fscache iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi zfs(POE) mgag200 i2c_algo_bit ttm kvm drm_kms_helper irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel syscopyarea ghash_clmulni_intel sysfillrect sysimgblt fb_sys_fops aesni_intel lrw gf128mul glue_helper zunicode(POE) ablk_helper cryptd drm zlua(POE) pcspkr zcommon(POE) znvpair(POE) zavl(POE) icp(POE) lpc_ich drm_panel_orientation_quirks mei_me mei ib_iser spl(OE) libiscsi scsi_transport_iscsi ses enclosure sg wmi
[  +0.000434]  acpi_power_meter rpcrdma sunrpc ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm sch_fq ip_tables ipmi_si ipmi_devintf ipmi_msghandler mlx5_ib ib_uverbs ib_core mlx5_core mlxfw devlink mpt3sas raid_class scsi_transport_sas tg3 ptp pps_core ahci libahci libata sd_mod crc_t10dif crct10dif_common
[  +0.000164] CPU: 26 PID: 6640 Comm: z_wr_iss Tainted: P           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
[  +0.000046] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
[  +0.001264] Call Trace:
[  +0.001247]  [<ffffffffa3583539>] dump_stack+0x19/0x1b
[  +0.001259]  [<ffffffffa2e9b278>] __warn+0xd8/0x100
[  +0.001253]  [<ffffffffa2e9b3bd>] warn_slowpath_null+0x1d/0x20
[  +0.001226]  [<ffffffffa2e59495>] native_smp_send_reschedule+0x65/0x70
[  +0.001200]  [<ffffffffa2edac9e>] try_to_wake_up+0x2fe/0x390
[  +0.001188]  [<ffffffffa2edadab>] wake_up_q+0x5b/0x80
[  +0.001177]  [<ffffffffa318f18b>] rwsem_wake+0x8b/0xe0
[  +0.001156]  [<ffffffffa31981eb>] call_rwsem_wake+0x1b/0x30
[  +0.001140]  [<ffffffffc08505e5>]

Killing off all processes on the clients stuck on IO, then re-mounting the filesystem allowed the metadata server to stay up. A "lctl lfsck_start -A -c -C -o" completed with no major issues, and scrubbing the zpools also completed successfully, so I don't think data was lost.

Searching this Jira and the Internet did not get me any relevant hits for either the PFL YAML issue or failed assertion.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

BadPFL.txt
3 kB
25/Oct/22 7:45 PM
lod_lov.c.patch
2 kB
31/Oct/22 6:39 PM
lod_object.c.patch
1 kB
31/Oct/22 6:40 PM
lov_ea.c.patch
1 kB
31/Oct/22 6:41 PM

Issue Links

is related to

LU-16194 Define negative PFL extent start/end as invalid

Resolved

Activity

[LU-16152] PFL YAML file with extent >= 2G leads to overflow when used as template; may trigger MDS kernel panic

Gerrit Updater added a comment - 07/Nov/22 10:42 PM

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49065
Subject: ~~LU-16152~~ lov: handle negative PFL layout offsets
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: baf015a9582594c9b8c3589d54f18f60fdb04f34

Gerrit Updater added a comment - 07/Nov/22 10:42 PM "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49065 Subject: LU-16152 lov: handle negative PFL layout offsets Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: baf015a9582594c9b8c3589d54f18f60fdb04f34

Nathan Crawford added a comment - 02/Nov/22 8:18 PM

An example bad 20K file (with reasonable extent values):

du --apparent-size badfile
20

but the output from "lfs getstripe -v badfile" looks like it belongs to a different file:

composite_header:
  lcm_magic:         0x0BD60BD0
  lcm_size:          744
  lcm_flags:         0
  lcm_layout_gen:    6
  lcm_mirror_count:  1
  lcm_entry_count:   5
components:
  - lcme_id:             1
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   131072
    lcme_offset:         272
    lcme_size:           32
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000ec94
      lmm_object_id:     0x1defe
      lmm_fid:           [0x20000ec94:0x1defe:0x0]
      lmm_stripe_count:  0
      lmm_stripe_size:   131072
      lmm_pattern:       mdt
      lmm_layout_gen:    0
      lmm_stripe_offset: 0

  - lcme_id:             2
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 131072
    lcme_extent.e_end:   16777216
    lcme_offset:         304
    lcme_size:           56
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000ec94
      lmm_object_id:     0x1defe
      lmm_fid:           [0x20000ec94:0x1defe:0x0]
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 1
      lmm_objects:
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e4:0x0] }

  - lcme_id:             3
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 16777216
    lcme_extent.e_end:   1073741824
    lcme_offset:         360
    lcme_size:           80
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000ec94
      lmm_object_id:     0x1defe
      lmm_fid:           [0x20000ec94:0x1defe:0x0]
      lmm_stripe_count:  2
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2831:0x0] }
      - 1: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadca:0x0] }

  - lcme_id:             4
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 1073741824
    lcme_extent.e_end:   34359738368
    lcme_offset:         440
    lcme_size:           128
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000ec94
      lmm_object_id:     0x1defe
      lmm_fid:           [0x20000ec94:0x1defe:0x0]
      lmm_stripe_count:  4
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 3
      lmm_objects:
      - 0: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadcb:0x0] }
      - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x115f8373:0x0] }
      - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2832:0x0] }
      - 3: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e5:0x0] }

  - lcme_id:             5
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 34359738368
    lcme_extent.e_end:   EOF
    lcme_offset:         568
    lcme_size:           176
    sub_layout:
      lmm_magic:         0x0BD10BD0
      lmm_seq:           0x20000ec94
      lmm_object_id:     0x1defe
      lmm_fid:           [0x20000ec94:0x1defe:0x0]
      lmm_stripe_count:  6
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x6c01b46:0x0] }
      - 1: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadcc:0x0] }
      - 2: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e6:0x0] }
      - 3: { l_ost_idx: 5, l_fid: [0x100050000:0x115f8374:0x0] }
      - 4: { l_ost_idx: 4, l_fid: [0x100040000:0xa4c5a4d:0x0] }
      - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2833:0x0] }

Nathan Crawford added a comment - 02/Nov/22 8:18 PM An example bad 20K file (with reasonable extent values): du --apparent-size badfile 20 but the output from "lfs getstripe -v badfile" looks like it belongs to a different file: composite_header: lcm_magic: 0x0BD60BD0 lcm_size: 744 lcm_flags: 0 lcm_layout_gen: 6 lcm_mirror_count: 1 lcm_entry_count: 5 components: - lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 131072 lcme_offset: 272 lcme_size: 32 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x20000ec94 lmm_object_id: 0x1defe lmm_fid: [0x20000ec94:0x1defe:0x0] lmm_stripe_count: 0 lmm_stripe_size: 131072 lmm_pattern: mdt lmm_layout_gen: 0 lmm_stripe_offset: 0 - lcme_id: 2 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 131072 lcme_extent.e_end: 16777216 lcme_offset: 304 lcme_size: 56 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x20000ec94 lmm_object_id: 0x1defe lmm_fid: [0x20000ec94:0x1defe:0x0] lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 lmm_objects: - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e4:0x0] } - lcme_id: 3 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 16777216 lcme_extent.e_end: 1073741824 lcme_offset: 360 lcme_size: 80 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x20000ec94 lmm_object_id: 0x1defe lmm_fid: [0x20000ec94:0x1defe:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 2 lmm_objects: - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2831:0x0] } - 1: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadca:0x0] } - lcme_id: 4 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 1073741824 lcme_extent.e_end: 34359738368 lcme_offset: 440 lcme_size: 128 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x20000ec94 lmm_object_id: 0x1defe lmm_fid: [0x20000ec94:0x1defe:0x0] lmm_stripe_count: 4 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 3 lmm_objects: - 0: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadcb:0x0] } - 1: { l_ost_idx: 5, l_fid: [0x100050000:0x115f8373:0x0] } - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2832:0x0] } - 3: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e5:0x0] } - lcme_id: 5 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 34359738368 lcme_extent.e_end: EOF lcme_offset: 568 lcme_size: 176 sub_layout: lmm_magic: 0x0BD10BD0 lmm_seq: 0x20000ec94 lmm_object_id: 0x1defe lmm_fid: [0x20000ec94:0x1defe:0x0] lmm_stripe_count: 6 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x6c01b46:0x0] } - 1: { l_ost_idx: 3, l_fid: [0x100030000:0xa0cadcc:0x0] } - 2: { l_ost_idx: 1, l_fid: [0x100010000:0x76112e6:0x0] } - 3: { l_ost_idx: 5, l_fid: [0x100050000:0x115f8374:0x0] } - 4: { l_ost_idx: 4, l_fid: [0x100040000:0xa4c5a4d:0x0] } - 5: { l_ost_idx: 2, l_fid: [0x100020000:0x74e2833:0x0] }

Nathan Crawford added a comment - 02/Nov/22 7:58 PM

Regarding the kernel panics on the MDS, I have NOT been able to reproduce them on a single-server test system. I can generate files with the bad PFL, read them, copy them, restripe them with lfs_migrate, etc..

Also confusing: some files on the original problem file system that previously had bad extent values do not now, but still panic when accessed.

I will try a few more permutations of setting bad layouts and restriping on the test system to see if I can re-create the panic.

Nathan Crawford added a comment - 02/Nov/22 7:58 PM Regarding the kernel panics on the MDS, I have NOT been able to reproduce them on a single-server test system. I can generate files with the bad PFL, read them, copy them, restripe them with lfs_migrate, etc.. Also confusing: some files on the original problem file system that previously had bad extent values do not now, but still panic when accessed. I will try a few more permutations of setting bad layouts and restriping on the test system to see if I can re-create the panic.

Nathan Crawford added a comment - 02/Nov/22 6:55 PM - edited

Andreas, the patches were generated against 2.12.8 with commands like "git diff 2.12.8 lustre/lov/lov_ea.c". The actual version of lustre is 6 commits past 2.12.8 on b2_12 (up to 5457c37ec9f76e2fb1656c29848412522dbb81fd, "~~LU-15292~~ kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7]", 2 Dec 2021). The only other modifications were fixes to lustre-dkms_pre-build.sh to cope with an output format change of "dkms status", and to config/lustre-build-zfs.m4 to handle when multiple subdirectories of /usr/src match "*zfs*".

Nathan Crawford added a comment - 02/Nov/22 6:55 PM - edited Andreas, the patches were generated against 2.12.8 with commands like "git diff 2.12.8 lustre/lov/lov_ea.c". The actual version of lustre is 6 commits past 2.12.8 on b2_12 (up to 5457c37ec9f76e2fb1656c29848412522dbb81fd, " LU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7] ", 2 Dec 2021). The only other modifications were fixes to lustre-dkms_pre-build.sh to cope with an output format change of "dkms status", and to config/lustre-build-zfs.m4 to handle when multiple subdirectories of /usr/src match "*zfs*".

Feng Lei added a comment - 02/Nov/22 5:03 AM

No e_start == LUSTRE_EOF. I mean do not convert LUSTRE_EOF from 0xffff ffff ffff ffff to 0x0000 0000 ffff ffff for e_end;

Feng Lei added a comment - 02/Nov/22 5:03 AM No e_start == LUSTRE_EOF . I mean do not convert LUSTRE_EOF from 0xffff ffff ffff ffff to 0x0000 0000 ffff ffff for e_end ;

Andreas Dilger added a comment - 02/Nov/22 4:45 AM

Feng Lei, it isn't clear whether there is a valid case where e_start == LUSTRE_EOF is ever valid?

Andreas Dilger added a comment - 02/Nov/22 4:45 AM Feng Lei, it isn't clear whether there is a valid case where e_start == LUSTRE_EOF is ever valid?

Andreas Dilger added a comment - 02/Nov/22 4:44 AM

Nathan, I grabbed your patches yesterday and was working on a patch that I could push to Gerrit instead of as attachments here. There were a few changes that I'd made from your patches - made a LOD_COMP_EXT_BAD constant, moved all of the extent checks into a helper function instead of duplicating the code multiple times.

One question I am had wa which Lustre version your patch was against? One of the patches didn't apply and the code didn't look very similar.

Andreas Dilger added a comment - 02/Nov/22 4:44 AM Nathan, I grabbed your patches yesterday and was working on a patch that I could push to Gerrit instead of as attachments here. There were a few changes that I'd made from your patches - made a LOD_COMP_EXT_BAD constant, moved all of the extent checks into a helper function instead of duplicating the code multiple times. One question I am had wa which Lustre version your patch was against? One of the patches didn't apply and the code didn't look very similar.

Feng Lei added a comment - 01/Nov/22 6:47 AM

nathan.crawford@uci.edu

I guess LUSTRE_EOF should be excluded, so the condition should be like this:

if (lsme->lsme_extent.e_start != LUSTRE_EOF && 
    (lsme->lsme_extent.e_start >> 32) == 0xffffffffULL)

Feng Lei added a comment - 01/Nov/22 6:47 AM nathan.crawford@uci.edu I guess LUSTRE_EOF should be excluded, so the condition should be like this: if (lsme->lsme_extent.e_start != LUSTRE_EOF && (lsme->lsme_extent.e_start >> 32) == 0xffffffffULL)

Nathan Crawford added a comment - 31/Oct/22 7:20 PM

Attempted to patch as in referenced ~~LU-16194~~, but kernel panic remains. I'm attaching my patches against 2.12.8 to lod_lov.c.patch, lod_object.c.patch, and lov_ea.c.patch.

Notes:
– I'm not familiar with best practices for CWARN messages and don't know what useful info should be included. Made some guesses.
– In the few seconds the MDS was up before panicking, I saw some of the CWARN message come though from both lod_lov.c and lod_object.c. I believe that they were interpreting extent.e_end=eof as "-1", then converting it to 4G. The extent-end-checking part needs to handle this; I'll try to rig something up.
– The lov_ea.c patch compiles, but was not tested. I don't know how to get the FID into the error message. Is there anything in the scope of lsm_unpackmd_comp_md_v1() that is recommended to use?

I'm going to spin up a single-server, single-client system to try to reproduce the bug and test fixes. For now, I've set the subdirectory that was migrated to 000 permissions. I believe the problem files are all there.

Nathan Crawford added a comment - 31/Oct/22 7:20 PM Attempted to patch as in referenced LU-16194 , but kernel panic remains. I'm attaching my patches against 2.12.8 to lod_lov.c.patch , lod_object.c.patch , and lov_ea.c.patch . Notes: – I'm not familiar with best practices for CWARN messages and don't know what useful info should be included. Made some guesses. – In the few seconds the MDS was up before panicking, I saw some of the CWARN message come though from both lod_lov.c and lod_object.c. I believe that they were interpreting extent.e_end=eof as "-1", then converting it to 4G. The extent-end-checking part needs to handle this; I'll try to rig something up. – The lov_ea.c patch compiles, but was not tested. I don't know how to get the FID into the error message. Is there anything in the scope of lsm_unpackmd_comp_md_v1() that is recommended to use? I'm going to spin up a single-server, single-client system to try to reproduce the bug and test fixes. For now, I've set the subdirectory that was migrated to 000 permissions. I believe the problem files are all there.

Nathan Crawford added a comment - 25/Oct/22 11:06 PM

Thanks! Will attempt and report.

Nathan Crawford added a comment - 25/Oct/22 11:06 PM Thanks! Will attempt and report.

Andreas Dilger added a comment - 25/Oct/22 9:18 PM

Nathan, the patch https://review.whamcloud.com/48684 "LU-16194 lod: define negative extent offset as invalid" should stop the MDS from crashing, but it will currently mark the whole file as having an invalid layout and the files would be inaccessible It may be possible to change the code temporarily so that e_start/e_end = 0xffffffffnnnnnnnn are interpreted as 0xnnnnnnnn, but this hasn't been implemented yet.

I left some notes in that patch so you could potentially make a patch to fix this yourself, I don't think it would be too complex. This would both allow you to migrate the affected files without crashing (it would "fix" the layouts in memory only), and ideally be a proper patch that could be landed in case anyone else is affected by this bug before the YAML import fix is widely deployed.

Andreas Dilger added a comment - 25/Oct/22 9:18 PM Nathan, the patch https://review.whamcloud.com/48684 " LU-16194 lod: define negative extent offset as invalid " should stop the MDS from crashing, but it will currently mark the whole file as having an invalid layout and the files would be inaccessible It may be possible to change the code temporarily so that e_start/e_end = 0xffffffffnnnnnnnn are interpreted as 0xnnnnnnnn , but this hasn't been implemented yet. I left some notes in that patch so you could potentially make a patch to fix this yourself, I don't think it would be too complex. This would both allow you to migrate the affected files without crashing (it would "fix" the layouts in memory only), and ideally be a proper patch that could be landed in case anyone else is affected by this bug before the YAML import fix is widely deployed.

People

Assignee:: Feng Lei

Reporter:: Nathan Crawford

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 14/Sep/22 12:07 AM

Updated:: 22/Aug/24 6:57 PM

Resolved:: 19/Jul/23 2:50 AM