[LU-16152] PFL YAML file with extent >= 2G leads to overflow when used as template; may trigger MDS kernel panic - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.16.0
Affects Version/s: Lustre 2.12.8, Lustre 2.15.1
Labels:
None
Environment:
Servers: CentOS 7.9 (3.10.0-1160.49.1.el7.x86_64), Lustre 2.12.8, ZFS 0.8.5
2.12 Clients: CentOS 7.9 (3.10.0-1160.53.1.el7.x86_64), Lustre 2.12.8
2.15 Clients: CentOS 7.9 (3.10.0-1160.76.1.el7.x86_64), Lustre 2.15.1

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

When applying a YAML layout file with extents of 2147483648 or larger, these extents appear as 18446744071562067968. Example:

# lfs setstripe -E 2048M -c 4 -E EOF -c 8 testdir
# lfs getstripe --yaml -d testdir > 2048M.lyl
# cat 2048M.lyl
  lcm_layout_gen:    0
  lcm_mirror_count:  1
  lcm_entry_count:   2
  component0:
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 0
    lcme_extent.e_end:   2147483648
    sub_layout:
      stripe_count:  4
      stripe_size:   1048576
      pattern:       raid0
      stripe_offset: -1  component1:
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 2147483648
    lcme_extent.e_end:   EOF
    sub_layout:
      stripe_count:  8
      stripe_size:   1048576
      pattern:       raid0
      stripe_offset: -1

# mkdir tst_2048M
# lfs setstripe --yaml 2048M.lyl tst_2048M
# lfs getstripe -d tst_2048M
  lcm_layout_gen:    0
  lcm_mirror_count:  1
  lcm_entry_count:   2
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 0
    lcme_extent.e_end:   18446744071562067968
      stripe_count:  4       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 18446744071562067968
    lcme_extent.e_end:   EOF
      stripe_count:  8       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1

Using "lfs setstripe --copy testdir" instead of "lfs setstripe --yaml 2048M.lyl" works as intended. Ending the first component at 2047M works with either method.

Unfortunately, I did not catch this in time and several files were restriped with similar insane layouts. Attempts to re-stripe them properly occasionally trigger kernel panics on the metadata server. Here is one of the early ones, which would happen immediately after the MDS recovered after reboot.

[Aug31 20:55] Lustre: DFS-L-MDT0000: Denying connection for new client 65a07c20-0fc9-26df-0102-6dd1be2412e7 (at 10.201.32.11@o2ib1), waiting for 369 known clients (329 recovered, 6 in progress, and 0 evicted) to recover in 4:19
[ +13.335095] Lustre: DFS-L-MDT0000: Recovery over after 0:54, of 369 clients 369 recovered and 0 were evicted.
[  +0.001392] LustreError: 41361:0:(osd_io.c:311:kmem_to_page()) ASSERTION( !((unsigned long)addr & ~(~(((1UL) << 12)-1))) ) failed: 
[  +0.000079] LustreError: 41361:0:(osd_io.c:311:kmem_to_page()) LBUG
[  +0.000038] Pid: 41361, comm: mdt_io00_002 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021
[  +0.000001] Call Trace:
[  +0.000013]  [<ffffffffc0b647cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[  +0.000015]  [<ffffffffc0b6487c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[  +0.000008]  [<ffffffffc11d3a7c>] osd_zap_lookup.isra.15.part.16+0x0/0x36 [osd_zfs]
[  +0.000017]  [<ffffffffc11b845f>] osd_bufs_get+0x5ff/0xf80 [osd_zfs]
[  +0.000011]  [<ffffffffc1361389>] mdt_obd_preprw+0xd09/0x10a0 [mdt]
[  +0.000032]  [<ffffffffc103365e>] tgt_brw_read+0xa1e/0x1ed0 [ptlrpc]
[  +0.000095]  [<ffffffffc1031eea>] tgt_request_handle+0xada/0x1570 [ptlrpc]
[  +0.000058]  [<ffffffffc0fd6bcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  +0.000047]  [<ffffffffc0fda534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[  +0.000044]  [<ffffffffa2ec5e61>] kthread+0xd1/0xe0
[  +0.000008]  [<ffffffffa3595df7>] ret_from_fork_nospec_end+0x0/0x39
[  +0.000008]  [<ffffffffffffffff>] 0xffffffffffffffff
[  +0.000026] Kernel panic - not syncing: LBUG
[  +0.000030] CPU: 22 PID: 41361 Comm: mdt_io00_002 Tainted: P           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
[  +0.000059] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
[  +0.000043] Call Trace:
[  +0.000021]  [<ffffffffa3583539>] dump_stack+0x19/0x1b
[  +0.000042]  [<ffffffffa357d241>] panic+0xe8/0x21f
[  +0.000037]  [<ffffffffc0b648cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[  +0.000050]  [<ffffffffc11d3a7c>] kmem_to_page.part.16+0x36/0x36 [osd_zfs]
[  +0.000052]  [<ffffffffc11b845f>] osd_bufs_get+0x5ff/0xf80 [osd_zfs]
[  +0.000056]  [<ffffffffc1361389>] mdt_obd_preprw+0xd09/0x10a0 [mdt]
[  +0.000085]  [<ffffffffc103365e>] tgt_brw_read+0xa1e/0x1ed0 [ptlrpc]
[  +0.000082]  [<ffffffffc0d1df29>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[  +0.000087]  [<ffffffffc1001cd6>] ? null_alloc_rs+0x186/0x340 [ptlrpc]
[  +0.000080]  [<ffffffffc0fc9985>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc]
[  +0.000084]  [<ffffffffc0fc9b4f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
[  +0.000080]  [<ffffffffc0fc9cd1>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
[  +0.000088]  [<ffffffffc1031eea>] tgt_request_handle+0xada/0x1570 [ptlrpc]
[  +0.000085]  [<ffffffffc100b601>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[  +0.000052]  [<ffffffffc0b64bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
[  +0.000080]  [<ffffffffc0fd6bcb>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[  +0.000082]  [<ffffffffc0fd36e5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[  +0.000042]  [<ffffffffa2ed3233>] ? __wake_up+0x13/0x20
[  +0.000070]  [<ffffffffc0fda534>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[  +0.000077]  [<ffffffffc0fd9a00>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[  +0.000045]  [<ffffffffa2ec5e61>] kthread+0xd1/0xe0
[  +0.000032]  [<ffffffffa2ec5d90>] ? insert_kthread_work+0x40/0x40
[  +0.000048]  [<ffffffffa3595df7>] ret_from_fork_nospec_begin+0x21/0x21
[  +0.000040]  [<ffffffffa2ec5d90>] ? insert_kthread_work+0x40/0x40
[  +0.000042] ------------[ cut here ]------------
[  +0.000032] WARNING: CPU: 26 PID: 6640 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x65/0x70
[  +0.000042] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) crct10dif_generic ksocklnd(OE) ko2iblnd(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace libcfs(OE) fscache iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi zfs(POE) mgag200 i2c_algo_bit ttm kvm drm_kms_helper irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel syscopyarea ghash_clmulni_intel sysfillrect sysimgblt fb_sys_fops aesni_intel lrw gf128mul glue_helper zunicode(POE) ablk_helper cryptd drm zlua(POE) pcspkr zcommon(POE) znvpair(POE) zavl(POE) icp(POE) lpc_ich drm_panel_orientation_quirks mei_me mei ib_iser spl(OE) libiscsi scsi_transport_iscsi ses enclosure sg wmi
[  +0.000434]  acpi_power_meter rpcrdma sunrpc ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm sch_fq ip_tables ipmi_si ipmi_devintf ipmi_msghandler mlx5_ib ib_uverbs ib_core mlx5_core mlxfw devlink mpt3sas raid_class scsi_transport_sas tg3 ptp pps_core ahci libahci libata sd_mod crc_t10dif crct10dif_common
[  +0.000164] CPU: 26 PID: 6640 Comm: z_wr_iss Tainted: P           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
[  +0.000046] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.11.0 11/02/2019
[  +0.001264] Call Trace:
[  +0.001247]  [<ffffffffa3583539>] dump_stack+0x19/0x1b
[  +0.001259]  [<ffffffffa2e9b278>] __warn+0xd8/0x100
[  +0.001253]  [<ffffffffa2e9b3bd>] warn_slowpath_null+0x1d/0x20
[  +0.001226]  [<ffffffffa2e59495>] native_smp_send_reschedule+0x65/0x70
[  +0.001200]  [<ffffffffa2edac9e>] try_to_wake_up+0x2fe/0x390
[  +0.001188]  [<ffffffffa2edadab>] wake_up_q+0x5b/0x80
[  +0.001177]  [<ffffffffa318f18b>] rwsem_wake+0x8b/0xe0
[  +0.001156]  [<ffffffffa31981eb>] call_rwsem_wake+0x1b/0x30
[  +0.001140]  [<ffffffffc08505e5>]

Killing off all processes on the clients stuck on IO, then re-mounting the filesystem allowed the metadata server to stay up. A "lctl lfsck_start -A -c -C -o" completed with no major issues, and scrubbing the zpools also completed successfully, so I don't think data was lost.

Searching this Jira and the Internet did not get me any relevant hits for either the PFL YAML issue or failed assertion.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

BadPFL.txt
3 kB
25/Oct/22 7:45 PM
lod_lov.c.patch
2 kB
31/Oct/22 6:39 PM
lod_object.c.patch
1 kB
31/Oct/22 6:40 PM
lov_ea.c.patch
1 kB
31/Oct/22 6:41 PM

Issue Links

is related to

LU-16194 Define negative PFL extent start/end as invalid

Resolved

PFL YAML file with extent >= 2G leads to overflow when used as template; may trigger MDS kernel panic

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates