[LU-12832] soft lockup in ldlm_bl_xx threads at read for a single shared strided file Created: 06/Oct/19  Updated: 01/Aug/20  Resolved: 13/Nov/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Shuichi Ihara Assignee: Wang Shilong (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

master (commit fcf219d) on both server and client


Attachments: Text File vmcore-dmesg.txt     Text File vmcore-dmesg.txt    
Issue Links:
Related
is related to LU-12786 io hard read fails due to data verifi... Resolved
is related to LU-13131 Partial writes on multi-client stride... Resolved
is related to LU-12377 stuck in ldlm thread on client when l... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

many clients got soft lockup in ldlm_bl_xx at read for a single shared strided file that is same workload of io500 hard. at the end, ior failed becouse of clients were evicted.

# mkdir /scratch0/io500.out/ior_hard -p
# lfs setstripe -C 240 -S 16M /scratch0/io500.out/ior_hard

# salloc --nodes=10 --ntasks-per-node=24 mpirun --allow-run-as-root /scratch0/ior -w -s 132000 -a POSIX -i 1 -C -Q 1 -g -G 27 -k -e -t 47008 -b 47008 -o /scratch0/io500.out/ior_hard/IOR_file -O stoneWallingStatusFile=/scratch0/io500.out/ior_hard/stonewall -O stoneWallingWearOut=1 -D 300 -W

one of server log

Oct  6 21:27:46 es400nv-vm1 kernel: LustreError: 26369:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.0.13.151@o2ib10  ns: filter-ai400-OST0000_UUID lock: ffff995029f90480/0xe6d707f7408ef34e lrc: 3/0,0 mode: PW/PW res: [0x19:0x0:0x0].0x0 rrc: 30741 type: EXT [5842382848->5842853887] (req 5842382848->5842431999) flags: 0x60000400000020 nid: 10.0.13.151@o2ib10 remote: 0xeff507f1ffb50fc0 expref: 158923 pid: 27145 timeout: 11179 lvb_type: 0
Oct  6 21:27:46 es400nv-vm1 kernel: Lustre: ai400-OST0000: Connection restored to 85106299-51f6-4 (at 10.0.13.154@o2ib10)
Oct  6 21:27:46 es400nv-vm1 kernel: Lustre: Skipped 127 previous similar messages
Oct  6 21:27:49 es400nv-vm1 kernel: LustreError: 26369:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.0.13.152@o2ib10  ns: filter-ai400-OST0001_UUID lock: ffff9950aee0d8c0/0xe6d707f740dbdd52 lrc: 3/0,0 mode: PW/PW res: [0x8:0x0:0x0].0x0 rrc: 26996 type: EXT [5140467712->5140746239] (req 5140467712->5140516863) flags: 0x60000400020020 nid: 10.0.13.152@o2ib10 remote: 0xc5564b8fed3e7bd7 expref: 166664 pid: 26904 timeout: 11182 lvb_type: 0
Oct  6 21:27:49 es400nv-vm1 kernel: LustreError: 26369:0:(ldlm_lockd.c:261:expired_lock_main()) Skipped 2 previous similar messages
Oct  6 21:27:53 es400nv-vm1 kernel: LustreError: 26986:0:(ldlm_lockd.c:2437:ldlm_cancel_handler()) ldlm_cancel from 10.0.13.152@o2ib10 arrived at 1570364871 with bad export cookie 16633772507216549573
Oct  6 21:27:59 es400nv-vm1 kernel: LustreError: 26369:0:(ldlm_lockd.c:261:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.0.13.159@o2ib10  ns: filter-ai400-OST0000_UUID lock: ffff9950587b6400/0xe6d707f7402a10dc lrc: 3/0,0 mode: PW/PW res: [0x15:0x0:0x0].0x0 rrc: 18417 type: EXT [6164672512->6164860927] (req 6164672512->6164721663) flags: 0x60000400020020 nid: 10.0.13.159@o2ib10 remote: 0xd7d9cb14933b97f6 expref: 153927 pid: 26985 timeout: 11192 lvb_type: 0

one of client log

Oct  6 21:27:54 c05 kernel: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ldlm_bl_01:151183]
Oct  6 21:27:54 c05 kernel: Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ip6table_nat ip6_tables xt_statistic xt_physdev xt_NFLOG nfnetlink_log veth dummy vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 ip_set_hash_ip xt_set ip_set xt_nat ipt_REJECT nf_reject_ipv4 xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc nfsv3 nfs_acl nfs lockd grace fscache overlay(T) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE)
Oct  6 21:27:54 c05 kernel: mlx5_core(OE) mlxfw(OE) mlx4_en(OE) sunrpc iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev sg i2c_i801 mei_me lpc_ich ioatdma mei ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_pad acpi_power_meter binfmt_misc knem(OE) ip_tables ext4 mbcache jbd2 mlx4_ib(OE) ib_core(OE) sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci igb crct10dif_pclmul crct10dif_common libata crc32c_intel mlx4_core(OE) ptp pps_core devlink dca i2c_algo_bit drm_panel_orientation_quirks mlx_compat(OE) [last unloaded: libcfs]
Oct  6 21:27:54 c05 kernel: CPU: 9 PID: 151183 Comm: ldlm_bl_01 Kdump: loaded Tainted: G           OEL ------------ T 3.10.0-957.27.2.el7.x86_64 #1
Oct  6 21:27:54 c05 kernel: Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018
Oct  6 21:27:54 c05 kernel: task: ffff8eb4c1eb4100 ti: ffff8eb64c178000 task.ti: ffff8eb64c178000
Oct  6 21:27:54 c05 kernel: RIP: 0010:[<ffffffff89d13536>]  [<ffffffff89d13536>] native_queued_spin_lock_slowpath+0x126/0x200
Oct  6 21:27:54 c05 kernel: RSP: 0018:ffff8eb64c17bc70  EFLAGS: 00000246
Oct  6 21:27:54 c05 kernel: RAX: 0000000000000000 RBX: ffff8ec614df8280 RCX: 0000000000490000
Oct  6 21:27:54 c05 kernel: RDX: ffff8eb67e79b780 RSI: 0000000000310001 RDI: ffff8ec614de0058
Oct  6 21:27:54 c05 kernel: RBP: ffff8eb64c17bc70 R08: ffff8eb67e85b780 R09: 0000000000000000
Oct  6 21:27:54 c05 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8eb64c17bc38
Oct  6 21:27:54 c05 kernel: R13: ffff8eb1aa758000 R14: ffff8eb2af3e46c8 R15: ffff8eb1aa758098
Oct  6 21:27:54 c05 kernel: FS:  0000000000000000(0000) GS:ffff8eb67e840000(0000) knlGS:0000000000000000
Oct  6 21:27:54 c05 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  6 21:27:54 c05 kernel: CR2: 00000000029cb000 CR3: 0000000f51610000 CR4: 00000000003607e0
Oct  6 21:27:54 c05 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct  6 21:27:54 c05 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct  6 21:27:54 c05 kernel: Call Trace:
Oct  6 21:27:54 c05 kernel: [<ffffffff8a35e2cb>] queued_spin_lock_slowpath+0xb/0xf
Oct  6 21:27:54 c05 kernel: [<ffffffff8a36c7a0>] _raw_spin_lock+0x20/0x30
Oct  6 21:27:54 c05 kernel: [<ffffffffc0c3b32a>] cl_object_attr_lock+0x1a/0x20 [obdclass]
Oct  6 21:27:54 c05 kernel: [<ffffffffc0fea746>] osc_ldlm_blocking_ast+0x2f6/0x3a0 [osc]
Oct  6 21:27:54 c05 kernel: [<ffffffffc0e3e54a>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc]
Oct  6 21:27:54 c05 kernel: [<ffffffffc0c14979>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
Oct  6 21:27:54 c05 kernel: [<ffffffffc0e49460>] ldlm_cli_cancel_local+0xa0/0x3f0 [ptlrpc]
Oct  6 21:27:54 c05 kernel: [<ffffffffc0e4d97a>] ldlm_cli_cancel_list_local+0xea/0x260 [ptlrpc]
Oct  6 21:27:54 c05 kernel: [<ffffffffc0e54370>] ldlm_bl_thread_main+0x5a0/0xa40 [ptlrpc]
Oct  6 21:27:54 c05 kernel: [<ffffffff89cd7c40>] ? wake_up_state+0x20/0x20
Oct  6 21:27:54 c05 kernel: [<ffffffffc0e53dd0>] ? ldlm_handle_bl_callback+0x4e0/0x4e0 [ptlrpc]
Oct  6 21:27:54 c05 kernel: [<ffffffff89cc2e81>] kthread+0xd1/0xe0
Oct  6 21:27:54 c05 kernel: [<ffffffff89cc2db0>] ? insert_kthread_work+0x40/0x40
Oct  6 21:27:54 c05 kernel: [<ffffffff8a376c1d>] ret_from_fork_nospec_begin+0x7/0x21
Oct  6 21:27:54 c05 kernel: [<ffffffff89cc2db0>] ? insert_kthread_work+0x40/0x40
Oct  6 21:27:54 c05 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 a0 bf 94 8a 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 
Oct  6 21:27:56 c05 kernel: LustreError: 152233:0:(ldlm_resource.c:1147:ldlm_resource_complain()) ai400-OST0006-osc-ffff8eafdda3d800: namespace resource [0x10:0x0:0x0].0x0 (ffff8ebf4ddcc840) refcount nonzero (1404) after lock cleanup; forcing cleanup.
Oct  6 21:27:56 c05 kernel: Lustre: ai400-OST0006-osc-ffff8eafdda3d800: Connection restored to 10.0.10.178@o2ib10 (at 10.0.10.178@o2ib10)
Oct  6 21:27:58 c05 kernel: LustreError: 152229:0:(ldlm_resource.c:1147:ldlm_resource_complain()) ai400-OST0002-osc-ffff8eafdda3d800: namespace resource [0x10:0x0:0x0].0x0 (ffff8ec4263355c0) refcount nonzero (1729) after lock cleanup; forcing cleanup.
Oct  6 21:27:58 c05 kernel: LustreError: 152229:0:(ldlm_resource.c:1147:ldlm_resource_complain()) Skipped 59 previous similar messages
Oct  6 21:27:58 c05 kernel: Lustre: ai400-OST0002-osc-ffff8eafdda3d800: Connection restored to 10.0.10.176@o2ib10 (at 10.0.10.176@o2ib10)


 Comments   
Comment by James A Simmons [ 06/Oct/19 ]

can you try patch https://review.whamcloud.com/#/c/36229/

Comment by Shuichi Ihara [ 07/Oct/19 ]

can you try patch https://review.whamcloud.com/#/c/36229/

tested this patch. although it's still problem, but it was another call trace at __write_lock_failed after patch.

Oct  7 12:45:10 c10 kernel: NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [ldlm_bl_13:31247]
Oct  7 12:45:10 c10 kernel: Modules linked in: mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) xt_statistic xt_physdev xt_NFLOG nfnetlink_log veth dummy vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 ip_set_hash_ip xt_set ip_set xt_nat ipt_REJECT nf_reject_ipv4 xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc nfsv3 nfs_acl nfs lockd grace fscache overlay(T) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE)
Oct  7 12:45:10 c10 kernel: mlxfw(OE) mlx4_en(OE) sunrpc iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev i2c_i801 sg lpc_ich mei_me ioatdma mei ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_pad acpi_power_meter binfmt_misc knem(OE) ip_tables ext4 mbcache jbd2 mlx4_ib(OE) ib_core(OE) sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci igb crct10dif_pclmul crct10dif_common crc32c_intel mlx4_core(OE) libata ptp pps_core devlink dca i2c_algo_bit drm_panel_orientation_quirks mlx_compat(OE) [last unloaded: libcfs]
Oct  7 12:45:10 c10 kernel: CPU: 10 PID: 31247 Comm: ldlm_bl_13 Kdump: loaded Tainted: G           OEL ------------ T 3.10.0-957.27.2.el7.x86_64 #1
Oct  7 12:45:10 c10 kernel: Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018
Oct  7 12:45:10 c10 kernel: task: ffff9fd7a9180000 ti: ffff9fe5e8364000 task.ti: ffff9fe5e8364000
Oct  7 12:45:10 c10 kernel: RIP: 0010:[<ffffffff82d88889>]  [<ffffffff82d88889>] __write_lock_failed+0x9/0x20
Oct  7 12:45:10 c10 kernel: RSP: 0018:ffff9fe5e8367c80  EFLAGS: 00000297
Oct  7 12:45:10 c10 kernel: RAX: ffff9fd56365a1d8 RBX: 0000000000000140 RCX: ffff9fd56365a1b0
Oct  7 12:45:10 c10 kernel: RDX: 0000000000000001 RSI: ffff9fd6a7258150 RDI: ffff9fd56365a1cc
Oct  7 12:45:10 c10 kernel: RBP: ffff9fe5e8367c80 R08: ffff9fd6463d8bb0 R09: 0000000000000001
Oct  7 12:45:10 c10 kernel: R10: 0000000000000000 R11: ffffe16951a6ab00 R12: ffff9fd2833c1180
Oct  7 12:45:10 c10 kernel: R13: ffffffffc12ae36f R14: ffff9fe5e8367c38 R15: ffff9fd34eb66150
Oct  7 12:45:10 c10 kernel: FS:  0000000000000000(0000) GS:ffff9fd6fe880000(0000) knlGS:0000000000000000
Oct  7 12:45:10 c10 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  7 12:45:10 c10 kernel: CR2: 00000000023b6000 CR3: 000000028e610000 CR4: 00000000003607e0
Oct  7 12:45:10 c10 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct  7 12:45:10 c10 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct  7 12:45:10 c10 kernel: Call Trace:
Oct  7 12:45:10 c10 kernel: [<ffffffff8316c577>] _raw_write_lock+0x17/0x20
Oct  7 12:45:10 c10 kernel: [<ffffffffc0ee635a>] cl_object_attr_write_lock+0x1a/0x20 [obdclass]
Oct  7 12:45:10 c10 kernel: [<ffffffffc1295746>] osc_ldlm_blocking_ast+0x2f6/0x3a0 [osc]
Oct  7 12:45:10 c10 kernel: [<ffffffffc10e954a>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc]
Oct  7 12:45:10 c10 kernel: [<ffffffffc0ebf979>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
Oct  7 12:45:10 c10 kernel: [<ffffffffc10f4460>] ldlm_cli_cancel_local+0xa0/0x3f0 [ptlrpc]
Oct  7 12:45:10 c10 kernel: [<ffffffffc10f897a>] ldlm_cli_cancel_list_local+0xea/0x260 [ptlrpc]
Oct  7 12:45:10 c10 kernel: [<ffffffffc10ff370>] ldlm_bl_thread_main+0x5a0/0xa40 [ptlrpc]
Oct  7 12:45:10 c10 kernel: [<ffffffff82ad7c40>] ? wake_up_state+0x20/0x20
Oct  7 12:45:10 c10 kernel: [<ffffffffc10fedd0>] ? ldlm_handle_bl_callback+0x4e0/0x4e0 [ptlrpc]
Oct  7 12:45:10 c10 kernel: [<ffffffff82ac2e81>] kthread+0xd1/0xe0
Oct  7 12:45:10 c10 kernel: [<ffffffff82ac2db0>] ? insert_kthread_work+0x40/0x40
Oct  7 12:45:10 c10 kernel: [<ffffffff83176c1d>] ret_from_fork_nospec_begin+0x7/0x21
Oct  7 12:45:10 c10 kernel: [<ffffffff82ac2db0>] ? insert_kthread_work+0x40/0x40
Oct  7 12:45:10 c10 kernel: Code: 00 00 0f 1f 44 00 00 41 ff e7 e8 07 00 00 00 f3 90 0f ae e8 eb f9 4c 89 3c 24 c3 90 90 90 90 90 90 90 55 48 89 e5 f0 ff 07 f3 90 <83> 3f 01 75 f9 f0 ff 0f 75 f1 5d c3 90 66 2e 0f 1f 84 00 00 00 
Comment by Peter Jones [ 08/Oct/19 ]

Shilong

Could you please advise

Thanks

Peter

Comment by Shuichi Ihara [ 09/Oct/19 ]

it enabled kernel.softlockup_panic and we got crash and crash dump when softlockup happens on client.

Comment by Wang Shilong (Inactive) [ 09/Oct/19 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/36413
Subject: LU-12832 osc: invalidate kms upon blocking ast
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 92f87e3a4b25392930e3b6df9b5c6a46eb6191ff

Comment by Wang Shilong (Inactive) [ 10/Oct/19 ]

Just re-think the problem, IOR seems try to write file firstly stridely and in this case we might have every
client full with many PW extents, more than 60K each client.

And before Read starts, lock cancel might happen which will try to cancel tons of cache? which might make our spinlock very hot.

Comment by Wang Shilong (Inactive) [ 10/Oct/19 ]

So to walk around the problem, maybe use a smaller LRU SIZE shall be fine.

But we might need add some debug information to extent tree locks to understand details etc.

Comment by Shuichi Ihara [ 10/Oct/19 ]

yes, fixed lru_size or short age for lru cache could be workaround, but the other hand, some application prefer keeping locks in caches as much as possible. For just an benchmark with ior for strided write/read is fine, but we need a sophisticate logic for both workload reasonably works properly

Comment by Gerrit Updater [ 10/Oct/19 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/36419
Subject: LU-12832 ldlm: debug extent tree
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ec320272aa8d3f3b6353aff2f556b9641f9245a9

Comment by Wang Shilong (Inactive) [ 10/Oct/19 ]

dongyang found following problem might fix the problem:

diff --git a/lustre/ldlm/ldlm_extent.c b/lustre/ldlm/ldlm_extent.c
index 9e046c91ff..36998cb5f9 100644
--- a/lustre/ldlm/ldlm_extent.c
+++ b/lustre/ldlm/ldlm_extent.c
@@ -890,7 +890,7 @@ static enum interval_iter ldlm_kms_shift_cb(struct interval_node *n,
    /* If we find a lock with a greater or equal kms, we are not the
     * highest lock (or we share that distinction with another lock), and
     * don't need to update KMS.  Return old_kms and stop looking. */
-   if (lock->l_policy_data.l_extent.end >= arg->old_kms) {
+   if (lock->l_policy_data.l_extent.end + 1 >= arg->old_kms) {
        arg->kms = arg->old_kms;
        arg->complete = true;
        RETURN(INTERVAL_ITER_STOP);

The point is It is very possible that arg->old_kms = last extent end. and if we fail last one check,
we will have to iterate all PR extent locks in tree which is O(N).

Comment by Andreas Dilger [ 11/Oct/19 ]

It definitely seems possible that the above change could fix the problem, as the comment before ldlm_extent_shift_kms() says:

NB: A lock on [x,y] protects a KMS of up to y + 1 bytes!

.

That said, the above-referenced code in ldlm_kms_shift_cb() hasn't changed since patch https://review.whamcloud.com/20779 "LU-8272 ldlm: Use interval tree to update kms" in 2016-10-17, so it shouldn't be the immediate cause of a regression since the last time this code was working, so it is at least going to be a bad interaction with a patch that landed more recently. I'm not saying the above code is not the source of the problem, just that it would be good to know why this problem was suddenly being hit. It doesn't look like it is directly related to lock conversion, since that patch landed as v2_11_52_0-78-g541902a3f9 in 2018-07-03 before the 2.12.0 release and we haven't been seeing this problem very long.

It looks like there are 3 patches to ldlm/ldlm_extent.c in the 2.13 development cycle:
https://review.whamcloud.com/35057 2019-08-27 "LU-12017 ldlm: DoM truncate deadlock"
https://review.whamcloud.com/35486 2019-07-20 "LU-12542 ldlm: simplify lock_mode_to_index()"
https://review.whamcloud.com/19665 2019-03-21 "LU-8047 llite: optimizations for not granted lock processing"

There are also a few other patches in 2.13 that are related to KMS handling:
https://review.whamcloud.com/36200 2019-09-30 "LU-12681 osc: wrong cache of LVB attrs, part2"
https://review.whamcloud.com/36199 2019-09-30 "LU-12681 osc: wrong cache of LVB attrs"
https://review.whamcloud.com/34095 2019-06-25 "LU-12043 llite: improve single-thread read performance"

which would be a good place to start to see if reverting those patches, or testing the commit immediately before each patch avoids this problem. If we know the first time this problem was seen it might make it easy to track down the problematic patch.

If there are problems tracking down the source of this bug then it would be possible to use "git bisect" at the cost of running more tests. There are currently 1008 patches from 2.12.50..fcf219d so if the problem is reliably hit during testing then it would be possible to find the problem in 10 test runs or fewer. Since it is likely to be a client-side problem then it (probably) wouldn't be necessary to update the server version each time if that makes it faster for each test run.

Comment by Wang Shilong (Inactive) [ 12/Oct/19 ]

Yup, it is intresting to know why we did not hit the problem before, this could potentially related to LRU SIZE? or testing file size?
or some locks changes regression raised the problem.

Comment by Shuichi Ihara [ 12/Oct/19 ]

I've tested above change with same workload, but it caused another regression at write (not read). Please see console messages whne client crashed.

[295995.970954] LustreError: 92739:0:(osc_cache.c:1153:osc_extent_make_ready()) ASSERTION( last_oap_count > 0 ) failed: 
[295995.971011] LustreError: 92739:0:(osc_cache.c:1153:osc_extent_make_ready()) LBUG
[295995.971047] Pid: 92739, comm: ldlm_bl_02 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019
[295995.971048] Call Trace:
[295995.971058]  [<ffffffffc0cb58ac>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[295995.971071]  [<ffffffffc0cb595c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[295995.971077]  [<ffffffffc123204b>] osc_extent_make_ready+0x91b/0xe50 [osc]
[295995.971097]  [<ffffffffc123499f>] osc_io_unplug0+0xd9f/0x1910 [osc]
[295995.971105]  [<ffffffffc12391e0>] osc_cache_writeback_range+0x9c0/0xff0 [osc]
[295995.971113]  [<ffffffffc1224275>] osc_lock_flush+0x195/0x290 [osc]
[295995.971120]  [<ffffffffc1224733>] osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc]
[295995.971126]  [<ffffffffc107854a>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc]
[295995.971161]  [<ffffffffc1083460>] ldlm_cli_cancel_local+0xa0/0x3f0 [ptlrpc]
[295995.971181]  [<ffffffffc10891c1>] ldlm_cli_cancel+0x161/0x650 [ptlrpc]
[295995.971205]  [<ffffffffc12245ca>] osc_ldlm_blocking_ast+0x17a/0x3a0 [osc]
[295995.971213]  [<ffffffffc108d9dd>] ldlm_handle_bl_callback+0xed/0x4e0 [ptlrpc]
[295995.971233]  [<ffffffffc108e5d0>] ldlm_bl_thread_main+0x800/0xa40 [ptlrpc]
[295995.971252]  [<ffffffff932c2e81>] kthread+0xd1/0xe0
[295995.971258]  [<ffffffff93976c1d>] ret_from_fork_nospec_begin+0x7/0x21
[295995.971264]  [<ffffffffffffffff>] 0xffffffffffffffff
[295995.971294] Kernel panic - not syncing: LBUG
Comment by Shuichi Ihara [ 12/Oct/19 ]

I wonder if overstriping could be related. I have been enabling overstriping for this testing, but will test without overstriping in master. If the problem is still exist, we can test old commit. It takes too long time to finish this ior job without overstriping though...

Comment by Shuichi Ihara [ 23/Oct/19 ]

background aggressive lock canceling in advance helps to avoid this problem.
e.g. setting 30sec to lru_max_age 'lctl set_param ldlm.namespaces.*.lru_max_age=30000' 
After this change, I didn't see see problem.
However, now I'm reproducing LU-12786 again.. I will re-open LU-12786.

Comment by Peter Jones [ 13/Nov/19 ]

seems to be hw issue

Generated at Sat Feb 10 02:56:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.