[LU-11751] racer deadlocks due to DOM glimpse request - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.12.2
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

racer test_1 hangs in locking. This issue looks a lot like LU-10852, but the stack trace is a bit different and this failure is not in a DNE environment. So, opening a new ticket.

Looking at the logs at https://testing.whamcloud.com/test_sets/1c19f7d0-f9d7-11e8-b216-52540065bddc, we can see that man processes on the clients are blocked. From client 2 (vm9), in the client console we see

[46895.114548] LustreError: 11-0: lustre-MDT0000-mdc-ffff9d96fa9a7800: operation ldlm_enqueue to node 10.2.8.143@tcp failed: rc = -107
[46895.115945] Lustre: lustre-MDT0000-mdc-ffff9d96fa9a7800: Connection to lustre-MDT0000 (at 10.2.8.143@tcp) was lost; in progress operations using this service will wait for recovery to complete
[46895.119762] LustreError: 167-0: lustre-MDT0000-mdc-ffff9d96fa9a7800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[46895.133418] LustreError: 2635:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000403:0x942:0x0] error: rc = -5
[46895.140482] LustreError: 2167:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200000405:0x70a:0x0] error -108.
[46895.146261] LustreError: 3286:0:(lmv_obd.c:1250:lmv_fid_alloc()) Can't alloc new fid, rc -19
[46895.147487] LustreError: 18765:0:(mdc_locks.c:1257:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -108
[46895.168736] LustreError: 18884:0:(mdc_request.c:1427:mdc_read_page()) lustre-MDT0000-mdc-ffff9d96fa9a7800: [0x200000402:0x23c:0x0] lock enqueue fails: rc = -108
[46895.208950] LustreError: 19579:0:(file.c:216:ll_close_inode_openhandle()) lustre-clilmv-ffff9d96fa9a7800: inode [0x200000402:0x23c:0x0] mdc close failed: rc = -108
[46895.222023] LustreError: 12501:0:(ldlm_resource.c:1146:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff9d96fa9a7800: namespace resource [0x200000402:0x23d:0x0].0x0 (ffff9d96f98e9e40) refcount nonzero (1) after lock cleanup; forcing cleanup.
[46895.633613] LustreError: 13031:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -108
[46895.634939] LustreError: 13031:0:(file.c:4393:ll_inode_revalidate_fini()) Skipped 609 previous similar messages
[46895.635482] Lustre: lustre-MDT0000-mdc-ffff9d96fa9a7800: Connection restored to 10.2.8.143@tcp (at 10.2.8.143@tcp)
[46920.333393] INFO: task setfattr:32731 blocked for more than 120 seconds.
[46920.334207] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[46920.335014] setfattr        D ffff9d9695c36eb0     0 32731    638 0x00000080
[46920.335773] Call Trace:
[46920.336068]  [<ffffffffae22ae2e>] ? generic_permission+0x15e/0x1d0
[46920.336704]  [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70
[46920.337337]  [<ffffffffae717c17>] __mutex_lock_slowpath+0xc7/0x1d0
[46920.337958]  [<ffffffffae716fff>] mutex_lock+0x1f/0x2f
[46920.338471]  [<ffffffffae24700f>] vfs_removexattr+0x5f/0x130
[46920.339043]  [<ffffffffae247135>] removexattr+0x55/0x80
[46920.339578]  [<ffffffffae230a7d>] ? putname+0x3d/0x60
[46920.340101]  [<ffffffffae231c92>] ? user_path_at_empty+0x72/0xc0
[46920.340711]  [<ffffffffae2221c8>] ? __sb_start_write+0x58/0x110
[46920.341297]  [<ffffffffae72056c>] ? __do_page_fault+0x1bc/0x4f0
[46920.341900]  [<ffffffffae241d9c>] ? mnt_want_write+0x2c/0x50
[46920.342455]  [<ffffffffae247f34>] SyS_removexattr+0x94/0xd0
[46920.343017]  [<ffffffffae72579b>] system_call_fastpath+0x22/0x27
[46920.343616]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.344266] INFO: task cp:3161 blocked for more than 120 seconds.
[46920.344878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[46920.345642] cp              D ffff9d96e3c45ee0     0  3161    645 0x00000080
[46920.346377] Call Trace:
[46920.346659]  [<ffffffffae2d5152>] ? security_inode_permission+0x22/0x30
[46920.347301]  [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70
[46920.347946]  [<ffffffffae717c17>] __mutex_lock_slowpath+0xc7/0x1d0
[46920.348543]  [<ffffffffae716fff>] mutex_lock+0x1f/0x2f
[46920.349083]  [<ffffffffae710bee>] lookup_slow+0x33/0xa7
[46920.349608]  [<ffffffffae22e7a8>] path_lookupat+0x838/0x8b0
[46920.350178]  [<ffffffffae1970cb>] ? unlock_page+0x2b/0x30
[46920.350732]  [<ffffffffae1faf39>] ? kmem_cache_alloc+0x179/0x1f0
[46920.351320]  [<ffffffffae230aef>] ? getname_flags+0x4f/0x1a0
[46920.351894]  [<ffffffffae22e84b>] filename_lookup+0x2b/0xc0
[46920.352442]  [<ffffffffae231c87>] user_path_at_empty+0x67/0xc0
[46920.353036]  [<ffffffffae127b72>] ? from_kgid_munged+0x12/0x20
[46920.353630]  [<ffffffffae2251df>] ? cp_new_stat+0x14f/0x180
[46920.354180]  [<ffffffffae231cf1>] user_path_at+0x11/0x20
[46920.354718]  [<ffffffffae224cd3>] vfs_fstatat+0x63/0xc0
[46920.355234]  [<ffffffffae22523e>] SYSC_newstat+0x2e/0x60
[46920.355776]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.356427]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
[46920.357093]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.357748]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
[46920.358394]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.359050]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
[46920.359715]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.360355]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
[46920.361018]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.361676]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
[46920.362314]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.362967]  [<ffffffffae22551e>] SyS_newstat+0xe/0x10
[46920.363469]  [<ffffffffae72579b>] system_call_fastpath+0x22/0x27
[46920.364082]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.364739] INFO: task ln:3169 blocked for more than 120 seconds.
[46920.365340] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[46920.366106] ln              D ffff9d96d8966eb0     0  3169    616 0x00000080
[46920.366863] Call Trace:
[46920.367122]  [<ffffffffae22e092>] ? path_lookupat+0x122/0x8b0
[46920.367698]  [<ffffffffae1f8e19>] ? ___slab_alloc+0x209/0x4f0
[46920.368264]  [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70

On client 1 (vm10), we see dir_create, ls, mkdir and other calls blocked

[46775.380960] LustreError: 167-0: lustre-MDT0000-mdc-ffff9b35bbd7d000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[46775.392616] LustreError: 14241:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000401:0x122:0x0] error: rc = -5
[46775.394009] LustreError: 14241:0:(file.c:4393:ll_inode_revalidate_fini()) Skipped 2 previous similar messages
[46775.395224] LustreError: 30588:0:(mdc_locks.c:1257:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -108
[46775.402186] LustreError: 18530:0:(file.c:216:ll_close_inode_openhandle()) lustre-clilmv-ffff9b35bbd7d000: inode [0x200000404:0x148:0x0] mdc close failed: rc = -108
[46775.403682] LustreError: 18530:0:(file.c:216:ll_close_inode_openhandle()) Skipped 1 previous similar message
[46775.415258] LustreError: 17871:0:(llite_lib.c:1547:ll_md_setattr()) md_setattr fails: rc = -108
[46775.421674] LustreError: 18956:0:(lmv_obd.c:1250:lmv_fid_alloc()) Can't alloc new fid, rc -19
[46775.431989] Lustre: lustre-MDT0000-mdc-ffff9b35bbd7d000: Connection restored to 10.2.8.143@tcp (at 10.2.8.143@tcp)
[46920.246223] INFO: task dir_create.sh:4489 blocked for more than 120 seconds.
[46920.247112] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[46920.247952] dir_create.sh   D ffff9b35baf1af70     0  4489   4447 0x00000080
[46920.248755] Call Trace:
[46920.249111]  [<ffffffffc0ea57e2>] ? ll_dcompare+0x72/0x2e0 [lustre]
[46920.249780]  [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70
[46920.250439]  [<ffffffff94317c17>] __mutex_lock_slowpath+0xc7/0x1d0
[46920.251074]  [<ffffffff94316fff>] mutex_lock+0x1f/0x2f
[46920.251612]  [<ffffffff94310bee>] lookup_slow+0x33/0xa7
[46920.252169]  [<ffffffff93e2d01f>] link_path_walk+0x80f/0x8b0
[46920.252746]  [<ffffffff93e30205>] path_openat+0xb5/0x640
[46920.253285]  [<ffffffff93e31dbd>] do_filp_open+0x4d/0xb0
[46920.253848]  [<ffffffff93e3f1e4>] ? __alloc_fd+0xc4/0x170
[46920.254401]  [<ffffffff93e1e0d7>] do_sys_open+0x137/0x240
[46920.254959]  [<ffffffff943256d5>] ? system_call_after_swapgs+0xa2/0x146
[46920.255631]  [<ffffffff93e1e1fe>] SyS_open+0x1e/0x20
[46920.256150]  [<ffffffff9432579b>] system_call_fastpath+0x22/0x27
[46920.256760]  [<ffffffff943256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.257485] INFO: task ls:6624 blocked for more than 120 seconds.
[46920.258108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[46920.258901] ls              D ffff9b359eb6eeb0     0  6624   4495 0x00000080
[46920.259661] Call Trace:
[46920.259932]  [<ffffffff93cc6081>] ? in_group_p+0x31/0x40
[46920.260487]  [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70
[46920.261139]  [<ffffffff94317c17>] __mutex_lock_slowpath+0xc7/0x1d0
[46920.261777]  [<ffffffff94316fff>] mutex_lock+0x1f/0x2f
[46920.262300]  [<ffffffff93e2f11f>] do_last+0x28f/0x12c0
[46920.262826]  [<ffffffff93e30227>] path_openat+0xd7/0x640
[46920.263362]  [<ffffffff93e30aef>] ? getname_flags+0x4f/0x1a0
[46920.263957]  [<ffffffffc0eb2159>] ? ll_file_data_put+0x89/0x180 [lustre]
[46920.264625]  [<ffffffff93e31dbd>] do_filp_open+0x4d/0xb0
[46920.265173]  [<ffffffff93e3f1e4>] ? __alloc_fd+0xc4/0x170
[46920.265739]  [<ffffffff93e1e0d7>] do_sys_open+0x137/0x240
[46920.266281]  [<ffffffff943256d5>] ? system_call_after_swapgs+0xa2/0x146
[46920.266939]  [<ffffffff93e1e214>] SyS_openat+0x14/0x20
[46920.267473]  [<ffffffff9432579b>] system_call_fastpath+0x22/0x27
[46920.268081]  [<ffffffff943256e1>] ? system_call_after_swapgs+0xae/0x146
[46920.268778] INFO: task ln:8693 blocked for more than 120 seconds.
[46920.269388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[46920.270180] ln              D ffff9b3595468fd0     0  8693   4475 0x00000080
[46920.270940] Call Trace:
[46920.271201]  [<ffffffff93e2e092>] ? path_lookupat+0x122/0x8b0
[46920.271790]  [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70

ON the MDS (vm12), we see many stack traces like the following

[46559.329558] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-41vm10,onyx-41vm9.onyx.whamcloud.com DURATION=900 ============ 09:34:14 (1544088854)
[46564.017497] LustreError: 5798:0:(mdt_lvb.c:430:mdt_lvbo_fill()) lustre-MDT0000: small buffer size 544 for EA 568 (max_mdsize 568): rc = -34
[46643.846755] LNet: Service thread pid 5790 was inactive for 62.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[46643.848574] Pid: 5790, comm: mdt00_011 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Wed Dec 5 03:41:24 UTC 2018
[46643.849563] Call Trace:
[46643.849882]  [<ffffffffc0ef8031>] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[46643.850848]  [<ffffffffc0ef8dcc>] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc]
[46643.851624]  [<ffffffffc11e951b>] mdt_object_local_lock+0x50b/0xb20 [mdt]
[46643.852657]  [<ffffffffc11e9ba0>] mdt_object_lock_internal+0x70/0x360 [mdt]
[46643.853470]  [<ffffffffc11e9f47>] mdt_object_lock_try+0x27/0xb0 [mdt]
[46643.854167]  [<ffffffffc11eb687>] mdt_getattr_name_lock+0x1287/0x1c30 [mdt]
[46643.854970]  [<ffffffffc11f2ba5>] mdt_intent_getattr+0x2b5/0x480 [mdt]
[46643.855694]  [<ffffffffc11efa08>] mdt_intent_policy+0x2e8/0xd00 [mdt]
[46643.856459]  [<ffffffffc0edeec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
[46643.857209]  [<ffffffffc0f078a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
[46643.858036]  [<ffffffffc0f8e242>] tgt_enqueue+0x62/0x210 [ptlrpc]
[46643.858866]  [<ffffffffc0f9529a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[46643.859648]  [<ffffffffc0f3991b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[46643.860560]  [<ffffffffc0f3d24c>] ptlrpc_main+0xafc/0x1fb0 [ptlrpc]
[46643.861315]  [<ffffffff95cbdf21>] kthread+0xd1/0xe0
[46643.861876]  [<ffffffff963255f7>] ret_from_fork_nospec_end+0x0/0x39
[46643.862590]  [<ffffffffffffffff>] 0xffffffffffffffff
[46643.863234] LustreError: dumping log to /tmp/lustre-log.1544088939.5790
…
[47031.433294] Pid: 5782, comm: mdt00_004 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Wed Dec 5 03:41:24 UTC 2018
[47031.434325] Call Trace:
[47031.434609]  [<ffffffffc0ef8031>] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
[47031.435489]  [<ffffffffc0ef8dcc>] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc]
[47031.436322]  [<ffffffffc11e951b>] mdt_object_local_lock+0x50b/0xb20 [mdt]
[47031.437360]  [<ffffffffc11e9ba0>] mdt_object_lock_internal+0x70/0x360 [mdt]
[47031.438194]  [<ffffffffc11ec09a>] mdt_object_find_lock+0x6a/0x1a0 [mdt]
[47031.438983]  [<ffffffffc120aa5e>] mdt_reint_setxattr+0x1ce/0xfd0 [mdt]
[47031.439782]  [<ffffffffc12085d3>] mdt_reint_rec+0x83/0x210 [mdt]
[47031.440507]  [<ffffffffc11e51a3>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[47031.441242]  [<ffffffffc11f0487>] mdt_reint+0x67/0x140 [mdt]
[47031.441942]  [<ffffffffc0f9529a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[47031.442833]  [<ffffffffc0f3991b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[47031.443725]  [<ffffffffc0f3d24c>] ptlrpc_main+0xafc/0x1fb0 [ptlrpc]
[47031.444586]  [<ffffffff95cbdf21>] kthread+0xd1/0xe0
[47031.445234]  [<ffffffff963255f7>] ret_from_fork_nospec_end+0x0/0x39
[47031.445925]  [<ffffffffffffffff>] 0xffffffffffffffff
[47031.446664] LustreError: dumping log to /tmp/lustre-log.1544089327.5782

Additional logs for this issue are at
https://testing.whamcloud.com/sub_tests/04a3b75e-f729-11e8-bfe1-52540065bddc
https://testing.whamcloud.com/sub_tests/24b8ffb6-f9ce-11e8-bb6b-52540065bddc

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

fir-md1-s1-kernel-20190428.log
971 kB
28/Apr/19 5:38 PM
fir-md1-s2-kernel-20190428.log
627 kB
28/Apr/19 5:38 PM

Issue Links

is related to

LU-11359 racer test 1 times out with client hung in dir_create.sh, ls, … and MDS in ldlm_completion_ast()

Resolved

LU-11285 don't stop on the first blocked lock in ldlm_reprocess_queue()

Resolved

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(3 mentioned in)

Activity

[LU-11751] racer deadlocks due to DOM glimpse request

Stephane Thiell added a comment - 06/May/19 5:33 PM - edited

Quick update... our Fir system (running the patches above) has been stable last week and last weekend. Note that we're "only" running patchset #11 of ~~LU-11359~~, and I see that since then it has evolved quite a bit...

Stephane Thiell added a comment - 06/May/19 5:33 PM - edited Quick update... our Fir system (running the patches above) has been stable last week and last weekend. Note that we're "only" running patchset #11 of LU-11359 , and I see that since then it has evolved quite a bit...

Stephane Thiell added a comment - 01/May/19 5:27 AM

Mike, I just checked and since this last event, so about two days now, I haven't seen any new ldlm_completion_ast call trace at all on both MDS, this is surprising (in a good way!). Before that, we had many of them per day. But after the problem reported above, I added the patch for ~~LU-11285~~ so... perhaps it did help. We're currently running 2.12.0 + the following patches:

5081aa7 ~~LU-10777~~ dom: disable read-on-open with resend
455d39b ~~LU-11285~~ ldlm: reprocess whole waiting queue for IBITS
2eec4f8 ~~LU-12018~~ quota: do not start a thread under memory pressure
1819063 ~~LU-11359~~ mdt: fix mdt_dom_discard_data() timeouts
3ed10f4 ~~LU-11964~~ mdc: prevent glimpse lock count grow
565011c ~~LU-12037~~ mdt: add option for cross-MDT rename
b6be1d9 ~~LU-12065~~ lnd: increase CQ entries
6b2c97b ~~LU-12037~~ mdt: call mdt_dom_discard_data() after rename unlock

Also we're now 100% sure that all clients are running with ~~LU-11359~~, so async_discard is enabled everywhere, this is important too I guess.

I'll update if things change (which usually occurs shortly after posting a comment here... )

Stephane Thiell added a comment - 01/May/19 5:27 AM Mike, I just checked and since this last event, so about two days now, I haven't seen any new ldlm_completion_ast call trace at all on both MDS, this is surprising (in a good way!). Before that, we had many of them per day. But after the problem reported above, I added the patch for LU-11285 so... perhaps it did help. We're currently running 2.12.0 + the following patches: 5081aa7 LU-10777 dom: disable read-on-open with resend 455d39b LU-11285 ldlm: reprocess whole waiting queue for IBITS 2eec4f8 LU-12018 quota: do not start a thread under memory pressure 1819063 LU-11359 mdt: fix mdt_dom_discard_data() timeouts 3ed10f4 LU-11964 mdc: prevent glimpse lock count grow 565011c LU-12037 mdt: add option for cross-MDT rename b6be1d9 LU-12065 lnd: increase CQ entries 6b2c97b LU-12037 mdt: call mdt_dom_discard_data() after rename unlock Also we're now 100% sure that all clients are running with LU-11359 , so async_discard is enabled everywhere, this is important too I guess. I'll update if things change (which usually occurs shortly after posting a comment here... )

Mikhail Pershin added a comment - 29/Apr/19 5:28 PM

Stephane, thanks for logs, I didn't find the reason yet, working on that

Mikhail Pershin added a comment - 29/Apr/19 5:28 PM Stephane, thanks for logs, I didn't find the reason yet, working on that

Stephane Thiell added a comment - 28/Apr/19 8:44 PM

I commented in ~~LU-11285~~, I think we have a great dk trace with +dlmtrace. Hope that helps to fix this definitively!! For us, the issue has been resolved (for now) after holding this user's jobs and restarting MDT3 in abort_recov (didn't work otherwise...).

Stephane Thiell added a comment - 28/Apr/19 8:44 PM I commented in LU-11285 , I think we have a great dk trace with +dlmtrace. Hope that helps to fix this definitively!! For us, the issue has been resolved (for now) after holding this user's jobs and restarting MDT3 in abort_recov (didn't work otherwise...).

Stephane Thiell added a comment - 28/Apr/19 5:39 PM

Attached fir-md1-s1-kernel-20190428.log and fir-md1-s2-kernel-20190428.log which are kernel logs from both MDS.

Stephane Thiell added a comment - 28/Apr/19 5:39 PM Attached fir-md1-s1-kernel-20190428.log and fir-md1-s2-kernel-20190428.log which are kernel logs from both MDS.

Stephane Thiell added a comment - 28/Apr/19 5:30 PM - edited

I believe we are getting this issue right now, example on FID 0x2c001ad81:0xe26:0x0 but there are others (0x13 + 0x40):

Apr 28 10:16:47 fir-md1-s1 kernel: LustreError: 35049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471717, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5012ae7980/0x378007fdc6c24f96 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 143 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 35049 timeout: 0 lvb_type: 0

Apr 28 10:17:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.7.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b51f73e72c0/0x378007fdc5f51cf6 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 143 type: IBT flags: 0x60200400000020 nid: 10.8.7.17@o2ib6 remote: 0x1b8755abd081ffc4 expref: 10 pid: 34487 timeout: 471701 lvb_type: 0

[root@fir-rbh01 ~]# lfs fid2path /fir 0x2c001ad81:0xe26:0x0
/fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput
[root@fir-rbh01 ~]# stat /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput
  File: ‘/fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput’
  Size: 956       	Blocks: 16         IO Block: 4194304 regular file
Device: e64e03a8h/3863872424d	Inode: 198160228309536294  Links: 1
Access: (0644/-rw-r--r--)  Uid: (331789/     jmz)   Gid: (11886/    euan)
Access: 2019-04-28 01:46:50.000000000 -0700
Modify: 2019-04-28 10:28:15.000000000 -0700
Change: 2019-04-28 10:28:15.000000000 -0700
 Birth: -

followed after some time by the typical ldlm_completion_ast traces.

Even restarting the MDTs doesn't seem to fix the issue... what can we do apart from remounting with abort_recov?

Stephane Thiell added a comment - 28/Apr/19 5:30 PM - edited I believe we are getting this issue right now, example on FID 0x2c001ad81:0xe26:0x0 but there are others (0x13 + 0x40): Apr 28 10:16:47 fir-md1-s1 kernel: LustreError: 35049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471717, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5012ae7980/0x378007fdc6c24f96 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 143 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 35049 timeout: 0 lvb_type: 0 Apr 28 10:17:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.7.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b51f73e72c0/0x378007fdc5f51cf6 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 143 type: IBT flags: 0x60200400000020 nid: 10.8.7.17@o2ib6 remote: 0x1b8755abd081ffc4 expref: 10 pid: 34487 timeout: 471701 lvb_type: 0 [root@fir-rbh01 ~]# lfs fid2path /fir 0x2c001ad81:0xe26:0x0 /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput [root@fir-rbh01 ~]# stat /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput File: ‘/fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput’ Size: 956 Blocks: 16 IO Block: 4194304 regular file Device: e64e03a8h/3863872424d Inode: 198160228309536294 Links: 1 Access: (0644/-rw-r--r--) Uid: (331789/ jmz) Gid: (11886/ euan) Access: 2019-04-28 01:46:50.000000000 -0700 Modify: 2019-04-28 10:28:15.000000000 -0700 Change: 2019-04-28 10:28:15.000000000 -0700 Birth: - followed after some time by the typical ldlm_completion_ast traces. Even restarting the MDTs doesn't seem to fix the issue... what can we do apart from remounting with abort_recov ?

Mikhail Pershin added a comment - 28/Apr/19 6:22 AM - edited

But first I have to check how glimpse works right now, my understanding it should not be blocked in waiting queue.

Mikhail Pershin added a comment - 28/Apr/19 6:22 AM - edited But first I have to check how glimpse works right now, my understanding it should not be blocked in waiting queue.

Mikhail Pershin added a comment - 28/Apr/19 6:18 AM

Patrick, it is already so, if we CAN return DOM lock, but if not then separate glimpse is issued to get size. What I am thinking about is to do glimpse from server in advance, if we cannot return DOM bit and there will be glimpse request for sure.

Mikhail Pershin added a comment - 28/Apr/19 6:18 AM Patrick, it is already so, if we CAN return DOM lock, but if not then separate glimpse is issued to get size. What I am thinking about is to do glimpse from server in advance, if we cannot return DOM bit and there will be glimpse request for sure.

Patrick Farrell (Inactive) added a comment - 27/Apr/19 9:12 PM

Mike,

Is it not possible to make the PR DOM lock part of the getattr request on MDC? Or would that cause problems (or at least be wasteful) if there was an existing DOM lock?

Patrick Farrell (Inactive) added a comment - 27/Apr/19 9:12 PM Mike, Is it not possible to make the PR DOM lock part of the getattr request on MDC? Or would that cause problems (or at least be wasteful) if there was an existing DOM lock?

Mikhail Pershin added a comment - 27/Apr/19 6:58 PM

This looks like ~~LU-11359~~ but has different cause. The problem exists with DOM files and is related to glimpse. When getattr RPC is returned to MDC without size then llite is asking for size attribute immediately by glimpse lock request. But at the moment original GETATTR lock is still in busy state, usually with 0x13 inodebits. Glimpse is PR lock with DOM bits - 0x40. The problem is that any other lock between these two may stuck on the first one and block the glimpse in waiting queue. But without finished glimpse the first lock will stay forever. This situation appeared along with DOM because glimpses on OSTs are about different namespace and resource.

I think that patch from ~~LU-11285~~ will help with the problem because locks with non-crossing ibits will be not blocked in waiting queue, but ideally glimpse should be done atomically. I have also 'glimpse-ahead' patch for that which was considered as an optimization but it will take time to apply it for current master and pass tests. This will be done under this ticket.

Mikhail Pershin added a comment - 27/Apr/19 6:58 PM This looks like LU-11359 but has different cause. The problem exists with DOM files and is related to glimpse. When getattr RPC is returned to MDC without size then llite is asking for size attribute immediately by glimpse lock request. But at the moment original GETATTR lock is still in busy state, usually with 0x13 inodebits. Glimpse is PR lock with DOM bits - 0x40. The problem is that any other lock between these two may stuck on the first one and block the glimpse in waiting queue. But without finished glimpse the first lock will stay forever. This situation appeared along with DOM because glimpses on OSTs are about different namespace and resource. I think that patch from LU-11285 will help with the problem because locks with non-crossing ibits will be not blocked in waiting queue, but ideally glimpse should be done atomically. I have also 'glimpse-ahead' patch for that which was considered as an optimization but it will take time to apply it for current master and pass tests. This will be done under this ticket.

Mikhail Pershin added a comment - 12/Dec/18 10:06 PM

it looks like ~~LU-11359~~ duplicate

Mikhail Pershin added a comment - 12/Dec/18 10:06 PM it looks like LU-11359 duplicate

People

Assignee:: Mikhail Pershin

Reporter:: James Nunez (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/Dec/18 5:48 PM

Updated:: 16/Jan/22 8:56 AM

Resolved:: 16/Jan/22 8:56 AM