Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11751

racer deadlocks due to DOM glimpse request

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.12.0, Lustre 2.12.2
    • None
    • 3
    • 9223372036854775807

    Description

      racer test_1 hangs in locking. This issue looks a lot like LU-10852, but the stack trace is a bit different and this failure is not in a DNE environment. So, opening a new ticket.

      Looking at the logs at https://testing.whamcloud.com/test_sets/1c19f7d0-f9d7-11e8-b216-52540065bddc, we can see that man processes on the clients are blocked. From client 2 (vm9), in the client console we see

      [46895.114548] LustreError: 11-0: lustre-MDT0000-mdc-ffff9d96fa9a7800: operation ldlm_enqueue to node 10.2.8.143@tcp failed: rc = -107
      [46895.115945] Lustre: lustre-MDT0000-mdc-ffff9d96fa9a7800: Connection to lustre-MDT0000 (at 10.2.8.143@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [46895.119762] LustreError: 167-0: lustre-MDT0000-mdc-ffff9d96fa9a7800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      [46895.133418] LustreError: 2635:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000403:0x942:0x0] error: rc = -5
      [46895.140482] LustreError: 2167:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200000405:0x70a:0x0] error -108.
      [46895.146261] LustreError: 3286:0:(lmv_obd.c:1250:lmv_fid_alloc()) Can't alloc new fid, rc -19
      [46895.147487] LustreError: 18765:0:(mdc_locks.c:1257:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -108
      [46895.168736] LustreError: 18884:0:(mdc_request.c:1427:mdc_read_page()) lustre-MDT0000-mdc-ffff9d96fa9a7800: [0x200000402:0x23c:0x0] lock enqueue fails: rc = -108
      [46895.208950] LustreError: 19579:0:(file.c:216:ll_close_inode_openhandle()) lustre-clilmv-ffff9d96fa9a7800: inode [0x200000402:0x23c:0x0] mdc close failed: rc = -108
      [46895.222023] LustreError: 12501:0:(ldlm_resource.c:1146:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff9d96fa9a7800: namespace resource [0x200000402:0x23d:0x0].0x0 (ffff9d96f98e9e40) refcount nonzero (1) after lock cleanup; forcing cleanup.
      [46895.633613] LustreError: 13031:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -108
      [46895.634939] LustreError: 13031:0:(file.c:4393:ll_inode_revalidate_fini()) Skipped 609 previous similar messages
      [46895.635482] Lustre: lustre-MDT0000-mdc-ffff9d96fa9a7800: Connection restored to 10.2.8.143@tcp (at 10.2.8.143@tcp)
      [46920.333393] INFO: task setfattr:32731 blocked for more than 120 seconds.
      [46920.334207] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [46920.335014] setfattr        D ffff9d9695c36eb0     0 32731    638 0x00000080
      [46920.335773] Call Trace:
      [46920.336068]  [<ffffffffae22ae2e>] ? generic_permission+0x15e/0x1d0
      [46920.336704]  [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70
      [46920.337337]  [<ffffffffae717c17>] __mutex_lock_slowpath+0xc7/0x1d0
      [46920.337958]  [<ffffffffae716fff>] mutex_lock+0x1f/0x2f
      [46920.338471]  [<ffffffffae24700f>] vfs_removexattr+0x5f/0x130
      [46920.339043]  [<ffffffffae247135>] removexattr+0x55/0x80
      [46920.339578]  [<ffffffffae230a7d>] ? putname+0x3d/0x60
      [46920.340101]  [<ffffffffae231c92>] ? user_path_at_empty+0x72/0xc0
      [46920.340711]  [<ffffffffae2221c8>] ? __sb_start_write+0x58/0x110
      [46920.341297]  [<ffffffffae72056c>] ? __do_page_fault+0x1bc/0x4f0
      [46920.341900]  [<ffffffffae241d9c>] ? mnt_want_write+0x2c/0x50
      [46920.342455]  [<ffffffffae247f34>] SyS_removexattr+0x94/0xd0
      [46920.343017]  [<ffffffffae72579b>] system_call_fastpath+0x22/0x27
      [46920.343616]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.344266] INFO: task cp:3161 blocked for more than 120 seconds.
      [46920.344878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [46920.345642] cp              D ffff9d96e3c45ee0     0  3161    645 0x00000080
      [46920.346377] Call Trace:
      [46920.346659]  [<ffffffffae2d5152>] ? security_inode_permission+0x22/0x30
      [46920.347301]  [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70
      [46920.347946]  [<ffffffffae717c17>] __mutex_lock_slowpath+0xc7/0x1d0
      [46920.348543]  [<ffffffffae716fff>] mutex_lock+0x1f/0x2f
      [46920.349083]  [<ffffffffae710bee>] lookup_slow+0x33/0xa7
      [46920.349608]  [<ffffffffae22e7a8>] path_lookupat+0x838/0x8b0
      [46920.350178]  [<ffffffffae1970cb>] ? unlock_page+0x2b/0x30
      [46920.350732]  [<ffffffffae1faf39>] ? kmem_cache_alloc+0x179/0x1f0
      [46920.351320]  [<ffffffffae230aef>] ? getname_flags+0x4f/0x1a0
      [46920.351894]  [<ffffffffae22e84b>] filename_lookup+0x2b/0xc0
      [46920.352442]  [<ffffffffae231c87>] user_path_at_empty+0x67/0xc0
      [46920.353036]  [<ffffffffae127b72>] ? from_kgid_munged+0x12/0x20
      [46920.353630]  [<ffffffffae2251df>] ? cp_new_stat+0x14f/0x180
      [46920.354180]  [<ffffffffae231cf1>] user_path_at+0x11/0x20
      [46920.354718]  [<ffffffffae224cd3>] vfs_fstatat+0x63/0xc0
      [46920.355234]  [<ffffffffae22523e>] SYSC_newstat+0x2e/0x60
      [46920.355776]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.356427]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
      [46920.357093]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.357748]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
      [46920.358394]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.359050]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
      [46920.359715]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.360355]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
      [46920.361018]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.361676]  [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146
      [46920.362314]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.362967]  [<ffffffffae22551e>] SyS_newstat+0xe/0x10
      [46920.363469]  [<ffffffffae72579b>] system_call_fastpath+0x22/0x27
      [46920.364082]  [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.364739] INFO: task ln:3169 blocked for more than 120 seconds.
      [46920.365340] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [46920.366106] ln              D ffff9d96d8966eb0     0  3169    616 0x00000080
      [46920.366863] Call Trace:
      [46920.367122]  [<ffffffffae22e092>] ? path_lookupat+0x122/0x8b0
      [46920.367698]  [<ffffffffae1f8e19>] ? ___slab_alloc+0x209/0x4f0
      [46920.368264]  [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70
      

      On client 1 (vm10), we see dir_create, ls, mkdir and other calls blocked

      [46775.380960] LustreError: 167-0: lustre-MDT0000-mdc-ffff9b35bbd7d000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      [46775.392616] LustreError: 14241:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000401:0x122:0x0] error: rc = -5
      [46775.394009] LustreError: 14241:0:(file.c:4393:ll_inode_revalidate_fini()) Skipped 2 previous similar messages
      [46775.395224] LustreError: 30588:0:(mdc_locks.c:1257:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -108
      [46775.402186] LustreError: 18530:0:(file.c:216:ll_close_inode_openhandle()) lustre-clilmv-ffff9b35bbd7d000: inode [0x200000404:0x148:0x0] mdc close failed: rc = -108
      [46775.403682] LustreError: 18530:0:(file.c:216:ll_close_inode_openhandle()) Skipped 1 previous similar message
      [46775.415258] LustreError: 17871:0:(llite_lib.c:1547:ll_md_setattr()) md_setattr fails: rc = -108
      [46775.421674] LustreError: 18956:0:(lmv_obd.c:1250:lmv_fid_alloc()) Can't alloc new fid, rc -19
      [46775.431989] Lustre: lustre-MDT0000-mdc-ffff9b35bbd7d000: Connection restored to 10.2.8.143@tcp (at 10.2.8.143@tcp)
      [46920.246223] INFO: task dir_create.sh:4489 blocked for more than 120 seconds.
      [46920.247112] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [46920.247952] dir_create.sh   D ffff9b35baf1af70     0  4489   4447 0x00000080
      [46920.248755] Call Trace:
      [46920.249111]  [<ffffffffc0ea57e2>] ? ll_dcompare+0x72/0x2e0 [lustre]
      [46920.249780]  [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70
      [46920.250439]  [<ffffffff94317c17>] __mutex_lock_slowpath+0xc7/0x1d0
      [46920.251074]  [<ffffffff94316fff>] mutex_lock+0x1f/0x2f
      [46920.251612]  [<ffffffff94310bee>] lookup_slow+0x33/0xa7
      [46920.252169]  [<ffffffff93e2d01f>] link_path_walk+0x80f/0x8b0
      [46920.252746]  [<ffffffff93e30205>] path_openat+0xb5/0x640
      [46920.253285]  [<ffffffff93e31dbd>] do_filp_open+0x4d/0xb0
      [46920.253848]  [<ffffffff93e3f1e4>] ? __alloc_fd+0xc4/0x170
      [46920.254401]  [<ffffffff93e1e0d7>] do_sys_open+0x137/0x240
      [46920.254959]  [<ffffffff943256d5>] ? system_call_after_swapgs+0xa2/0x146
      [46920.255631]  [<ffffffff93e1e1fe>] SyS_open+0x1e/0x20
      [46920.256150]  [<ffffffff9432579b>] system_call_fastpath+0x22/0x27
      [46920.256760]  [<ffffffff943256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.257485] INFO: task ls:6624 blocked for more than 120 seconds.
      [46920.258108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [46920.258901] ls              D ffff9b359eb6eeb0     0  6624   4495 0x00000080
      [46920.259661] Call Trace:
      [46920.259932]  [<ffffffff93cc6081>] ? in_group_p+0x31/0x40
      [46920.260487]  [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70
      [46920.261139]  [<ffffffff94317c17>] __mutex_lock_slowpath+0xc7/0x1d0
      [46920.261777]  [<ffffffff94316fff>] mutex_lock+0x1f/0x2f
      [46920.262300]  [<ffffffff93e2f11f>] do_last+0x28f/0x12c0
      [46920.262826]  [<ffffffff93e30227>] path_openat+0xd7/0x640
      [46920.263362]  [<ffffffff93e30aef>] ? getname_flags+0x4f/0x1a0
      [46920.263957]  [<ffffffffc0eb2159>] ? ll_file_data_put+0x89/0x180 [lustre]
      [46920.264625]  [<ffffffff93e31dbd>] do_filp_open+0x4d/0xb0
      [46920.265173]  [<ffffffff93e3f1e4>] ? __alloc_fd+0xc4/0x170
      [46920.265739]  [<ffffffff93e1e0d7>] do_sys_open+0x137/0x240
      [46920.266281]  [<ffffffff943256d5>] ? system_call_after_swapgs+0xa2/0x146
      [46920.266939]  [<ffffffff93e1e214>] SyS_openat+0x14/0x20
      [46920.267473]  [<ffffffff9432579b>] system_call_fastpath+0x22/0x27
      [46920.268081]  [<ffffffff943256e1>] ? system_call_after_swapgs+0xae/0x146
      [46920.268778] INFO: task ln:8693 blocked for more than 120 seconds.
      [46920.269388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [46920.270180] ln              D ffff9b3595468fd0     0  8693   4475 0x00000080
      [46920.270940] Call Trace:
      [46920.271201]  [<ffffffff93e2e092>] ? path_lookupat+0x122/0x8b0
      [46920.271790]  [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70
      

      ON the MDS (vm12), we see many stack traces like the following

      [46559.329558] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-41vm10,onyx-41vm9.onyx.whamcloud.com DURATION=900 ============ 09:34:14 (1544088854)
      [46564.017497] LustreError: 5798:0:(mdt_lvb.c:430:mdt_lvbo_fill()) lustre-MDT0000: small buffer size 544 for EA 568 (max_mdsize 568): rc = -34
      [46643.846755] LNet: Service thread pid 5790 was inactive for 62.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [46643.848574] Pid: 5790, comm: mdt00_011 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Wed Dec 5 03:41:24 UTC 2018
      [46643.849563] Call Trace:
      [46643.849882]  [<ffffffffc0ef8031>] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
      [46643.850848]  [<ffffffffc0ef8dcc>] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc]
      [46643.851624]  [<ffffffffc11e951b>] mdt_object_local_lock+0x50b/0xb20 [mdt]
      [46643.852657]  [<ffffffffc11e9ba0>] mdt_object_lock_internal+0x70/0x360 [mdt]
      [46643.853470]  [<ffffffffc11e9f47>] mdt_object_lock_try+0x27/0xb0 [mdt]
      [46643.854167]  [<ffffffffc11eb687>] mdt_getattr_name_lock+0x1287/0x1c30 [mdt]
      [46643.854970]  [<ffffffffc11f2ba5>] mdt_intent_getattr+0x2b5/0x480 [mdt]
      [46643.855694]  [<ffffffffc11efa08>] mdt_intent_policy+0x2e8/0xd00 [mdt]
      [46643.856459]  [<ffffffffc0edeec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
      [46643.857209]  [<ffffffffc0f078a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
      [46643.858036]  [<ffffffffc0f8e242>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [46643.858866]  [<ffffffffc0f9529a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [46643.859648]  [<ffffffffc0f3991b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [46643.860560]  [<ffffffffc0f3d24c>] ptlrpc_main+0xafc/0x1fb0 [ptlrpc]
      [46643.861315]  [<ffffffff95cbdf21>] kthread+0xd1/0xe0
      [46643.861876]  [<ffffffff963255f7>] ret_from_fork_nospec_end+0x0/0x39
      [46643.862590]  [<ffffffffffffffff>] 0xffffffffffffffff
      [46643.863234] LustreError: dumping log to /tmp/lustre-log.1544088939.5790
      …
      [47031.433294] Pid: 5782, comm: mdt00_004 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Wed Dec 5 03:41:24 UTC 2018
      [47031.434325] Call Trace:
      [47031.434609]  [<ffffffffc0ef8031>] ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
      [47031.435489]  [<ffffffffc0ef8dcc>] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc]
      [47031.436322]  [<ffffffffc11e951b>] mdt_object_local_lock+0x50b/0xb20 [mdt]
      [47031.437360]  [<ffffffffc11e9ba0>] mdt_object_lock_internal+0x70/0x360 [mdt]
      [47031.438194]  [<ffffffffc11ec09a>] mdt_object_find_lock+0x6a/0x1a0 [mdt]
      [47031.438983]  [<ffffffffc120aa5e>] mdt_reint_setxattr+0x1ce/0xfd0 [mdt]
      [47031.439782]  [<ffffffffc12085d3>] mdt_reint_rec+0x83/0x210 [mdt]
      [47031.440507]  [<ffffffffc11e51a3>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
      [47031.441242]  [<ffffffffc11f0487>] mdt_reint+0x67/0x140 [mdt]
      [47031.441942]  [<ffffffffc0f9529a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [47031.442833]  [<ffffffffc0f3991b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [47031.443725]  [<ffffffffc0f3d24c>] ptlrpc_main+0xafc/0x1fb0 [ptlrpc]
      [47031.444586]  [<ffffffff95cbdf21>] kthread+0xd1/0xe0
      [47031.445234]  [<ffffffff963255f7>] ret_from_fork_nospec_end+0x0/0x39
      [47031.445925]  [<ffffffffffffffff>] 0xffffffffffffffff
      [47031.446664] LustreError: dumping log to /tmp/lustre-log.1544089327.5782
      

      Additional logs for this issue are at
      https://testing.whamcloud.com/sub_tests/04a3b75e-f729-11e8-bfe1-52540065bddc
      https://testing.whamcloud.com/sub_tests/24b8ffb6-f9ce-11e8-bb6b-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-11751] racer deadlocks due to DOM glimpse request
            sthiell Stephane Thiell added a comment - - edited

            Quick update... our Fir system (running the patches above) has been stable last week and last weekend. Note that we're "only" running patchset #11 of LU-11359, and I see that since then it has evolved quite a bit...

            sthiell Stephane Thiell added a comment - - edited Quick update... our Fir system (running the patches above) has been stable last week and last weekend. Note that we're "only" running patchset #11 of LU-11359 , and I see that since then it has evolved quite a bit...

            Mike, I just checked and since this last event, so about two days now, I haven't seen any new ldlm_completion_ast call trace at all on both MDS, this is surprising (in a good way!). Before that, we had many of them per day. But after the problem reported above, I added the patch for LU-11285 so... perhaps it did help. We're currently running 2.12.0 + the following patches:

            5081aa7 LU-10777 dom: disable read-on-open with resend
            455d39b LU-11285 ldlm: reprocess whole waiting queue for IBITS
            2eec4f8 LU-12018 quota: do not start a thread under memory pressure
            1819063 LU-11359 mdt: fix mdt_dom_discard_data() timeouts
            3ed10f4 LU-11964 mdc: prevent glimpse lock count grow
            565011c LU-12037 mdt: add option for cross-MDT rename
            b6be1d9 LU-12065 lnd: increase CQ entries
            6b2c97b LU-12037 mdt: call mdt_dom_discard_data() after rename unlock

            Also we're now 100% sure that all clients are running with LU-11359, so async_discard is enabled everywhere, this is important too I guess.

            I'll update if things change (which usually occurs shortly after posting a comment here... )

            sthiell Stephane Thiell added a comment - Mike, I just checked and since this last event, so about two days now, I haven't seen any new ldlm_completion_ast call trace at all on both MDS, this is surprising (in a good way!). Before that, we had many of them per day. But after the problem reported above, I added the patch for LU-11285 so... perhaps it did help. We're currently running 2.12.0 + the following patches: 5081aa7 LU-10777 dom: disable read-on-open with resend 455d39b LU-11285 ldlm: reprocess whole waiting queue for IBITS 2eec4f8 LU-12018 quota: do not start a thread under memory pressure 1819063 LU-11359 mdt: fix mdt_dom_discard_data() timeouts 3ed10f4 LU-11964 mdc: prevent glimpse lock count grow 565011c LU-12037 mdt: add option for cross-MDT rename b6be1d9 LU-12065 lnd: increase CQ entries 6b2c97b LU-12037 mdt: call mdt_dom_discard_data() after rename unlock Also we're now 100% sure that all clients are running with LU-11359 , so async_discard is enabled everywhere, this is important too I guess. I'll update if things change (which usually occurs shortly after posting a comment here... )

            Stephane, thanks for logs, I didn't find the reason yet, working on that

            tappro Mikhail Pershin added a comment - Stephane, thanks for logs, I didn't find the reason yet, working on that

            I commented in LU-11285, I think we have a great dk trace with +dlmtrace. Hope that helps to fix this definitively!! For us, the issue has been resolved (for now) after holding this user's jobs and restarting MDT3 in abort_recov (didn't work otherwise...).

            sthiell Stephane Thiell added a comment - I commented in LU-11285 , I think we have a great dk trace with +dlmtrace. Hope that helps to fix this definitively!! For us, the issue has been resolved (for now) after holding this user's jobs and restarting MDT3 in abort_recov (didn't work otherwise...).

            Attached fir-md1-s1-kernel-20190428.log and fir-md1-s2-kernel-20190428.log which are kernel logs from both MDS.

            sthiell Stephane Thiell added a comment - Attached fir-md1-s1-kernel-20190428.log and fir-md1-s2-kernel-20190428.log which are kernel logs from both MDS.
            sthiell Stephane Thiell added a comment - - edited

            I believe we are getting this issue right now, example on FID 0x2c001ad81:0xe26:0x0 but there are others (0x13 + 0x40):

            Apr 28 10:16:47 fir-md1-s1 kernel: LustreError: 35049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471717, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5012ae7980/0x378007fdc6c24f96 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 143 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 35049 timeout: 0 lvb_type: 0
            
            Apr 28 10:17:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.7.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b51f73e72c0/0x378007fdc5f51cf6 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 143 type: IBT flags: 0x60200400000020 nid: 10.8.7.17@o2ib6 remote: 0x1b8755abd081ffc4 expref: 10 pid: 34487 timeout: 471701 lvb_type: 0
            
            [root@fir-rbh01 ~]# lfs fid2path /fir 0x2c001ad81:0xe26:0x0
            /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput
            [root@fir-rbh01 ~]# stat /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput
              File: ‘/fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput’
              Size: 956       	Blocks: 16         IO Block: 4194304 regular file
            Device: e64e03a8h/3863872424d	Inode: 198160228309536294  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (331789/     jmz)   Gid: (11886/    euan)
            Access: 2019-04-28 01:46:50.000000000 -0700
            Modify: 2019-04-28 10:28:15.000000000 -0700
            Change: 2019-04-28 10:28:15.000000000 -0700
             Birth: -
            

            followed after some time by the typical ldlm_completion_ast traces.

            Even restarting the MDTs doesn't seem to fix the issue... what can we do apart from remounting with abort_recov?

            sthiell Stephane Thiell added a comment - - edited I believe we are getting this issue right now, example on FID 0x2c001ad81:0xe26:0x0 but there are others (0x13 + 0x40): Apr 28 10:16:47 fir-md1-s1 kernel: LustreError: 35049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471717, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5012ae7980/0x378007fdc6c24f96 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 143 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 35049 timeout: 0 lvb_type: 0 Apr 28 10:17:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.7.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b51f73e72c0/0x378007fdc5f51cf6 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 143 type: IBT flags: 0x60200400000020 nid: 10.8.7.17@o2ib6 remote: 0x1b8755abd081ffc4 expref: 10 pid: 34487 timeout: 471701 lvb_type: 0 [root@fir-rbh01 ~]# lfs fid2path /fir 0x2c001ad81:0xe26:0x0 /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput [root@fir-rbh01 ~]# stat /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput File: ‘/fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput’ Size: 956 Blocks: 16 IO Block: 4194304 regular file Device: e64e03a8h/3863872424d Inode: 198160228309536294 Links: 1 Access: (0644/-rw-r--r--) Uid: (331789/ jmz) Gid: (11886/ euan) Access: 2019-04-28 01:46:50.000000000 -0700 Modify: 2019-04-28 10:28:15.000000000 -0700 Change: 2019-04-28 10:28:15.000000000 -0700 Birth: - followed after some time by the typical ldlm_completion_ast traces. Even restarting the MDTs doesn't seem to fix the issue... what can we do apart from remounting with abort_recov ?
            tappro Mikhail Pershin added a comment - - edited

            But first I have to check how glimpse works right now, my understanding it should not be blocked in waiting queue.

            tappro Mikhail Pershin added a comment - - edited But first I have to check how glimpse works right now, my understanding it should not be blocked in waiting queue.

            Patrick, it is already so, if we CAN return DOM lock, but if not then separate glimpse is issued to get size. What I am thinking about is to do glimpse from server in advance, if we cannot return DOM bit and there will be glimpse request for sure.

            tappro Mikhail Pershin added a comment - Patrick, it is already so, if we CAN return DOM lock, but if not then separate glimpse is issued to get size. What I am thinking about is to do glimpse from server in advance, if we cannot return DOM bit and there will be glimpse request for sure.

            Mike,

            Is it not possible to make the PR DOM lock part of the getattr request on MDC?  Or would that cause problems (or at least be wasteful) if there was an existing DOM lock?

            pfarrell Patrick Farrell (Inactive) added a comment - Mike, Is it not possible to make the PR DOM lock part of the getattr request on MDC?  Or would that cause problems (or at least be wasteful) if there was an existing DOM lock?

            This looks like LU-11359 but has different cause. The problem exists with DOM files and is related to glimpse. When getattr RPC is returned to MDC without size then llite is asking for size attribute immediately by glimpse lock request. But at the moment original GETATTR lock is still in busy state, usually with 0x13 inodebits. Glimpse is PR lock with DOM bits - 0x40. The problem is that any other lock between these two may stuck on the first one and block the glimpse in waiting queue. But without finished glimpse the first lock will stay forever. This situation appeared along with DOM because glimpses on OSTs are about different namespace and resource.

            I think that patch from LU-11285 will help with the problem because locks with non-crossing ibits will be not blocked in waiting queue, but ideally glimpse should be done atomically. I have also 'glimpse-ahead' patch for that which was considered as an optimization but it will take time to apply it for current master and pass tests. This will be done under this ticket.

            tappro Mikhail Pershin added a comment - This looks like LU-11359 but has different cause. The problem exists with DOM files and is related to glimpse. When getattr RPC is returned to MDC without size then llite is asking for size attribute immediately by glimpse lock request. But at the moment original GETATTR lock is still in busy state, usually with 0x13 inodebits. Glimpse is PR lock with DOM bits - 0x40. The problem is that any other lock between these two may stuck on the first one and block the glimpse in waiting queue. But without finished glimpse the first lock will stay forever. This situation appeared along with DOM because glimpses on OSTs are about different namespace and resource. I think that patch from LU-11285 will help with the problem because locks with non-crossing ibits will be not blocked in waiting queue, but ideally glimpse should be done atomically. I have also 'glimpse-ahead' patch for that which was considered as an optimization but it will take time to apply it for current master and pass tests. This will be done under this ticket.

            it looks like LU-11359 duplicate

            tappro Mikhail Pershin added a comment - it looks like LU-11359 duplicate

            People

              tappro Mikhail Pershin
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: