[LU-11751] racer deadlocks due to DOM glimpse request Created: 10/Dec/18 Updated: 16/Jan/22 Resolved: 16/Jan/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.12.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | James Nunez (Inactive) | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
racer test_1 hangs in locking. This issue looks a lot like LU-10852, but the stack trace is a bit different and this failure is not in a DNE environment. So, opening a new ticket. Looking at the logs at https://testing.whamcloud.com/test_sets/1c19f7d0-f9d7-11e8-b216-52540065bddc, we can see that man processes on the clients are blocked. From client 2 (vm9), in the client console we see [46895.114548] LustreError: 11-0: lustre-MDT0000-mdc-ffff9d96fa9a7800: operation ldlm_enqueue to node 10.2.8.143@tcp failed: rc = -107 [46895.115945] Lustre: lustre-MDT0000-mdc-ffff9d96fa9a7800: Connection to lustre-MDT0000 (at 10.2.8.143@tcp) was lost; in progress operations using this service will wait for recovery to complete [46895.119762] LustreError: 167-0: lustre-MDT0000-mdc-ffff9d96fa9a7800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. [46895.133418] LustreError: 2635:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000403:0x942:0x0] error: rc = -5 [46895.140482] LustreError: 2167:0:(vvp_io.c:1495:vvp_io_init()) lustre: refresh file layout [0x200000405:0x70a:0x0] error -108. [46895.146261] LustreError: 3286:0:(lmv_obd.c:1250:lmv_fid_alloc()) Can't alloc new fid, rc -19 [46895.147487] LustreError: 18765:0:(mdc_locks.c:1257:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -108 [46895.168736] LustreError: 18884:0:(mdc_request.c:1427:mdc_read_page()) lustre-MDT0000-mdc-ffff9d96fa9a7800: [0x200000402:0x23c:0x0] lock enqueue fails: rc = -108 [46895.208950] LustreError: 19579:0:(file.c:216:ll_close_inode_openhandle()) lustre-clilmv-ffff9d96fa9a7800: inode [0x200000402:0x23c:0x0] mdc close failed: rc = -108 [46895.222023] LustreError: 12501:0:(ldlm_resource.c:1146:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff9d96fa9a7800: namespace resource [0x200000402:0x23d:0x0].0x0 (ffff9d96f98e9e40) refcount nonzero (1) after lock cleanup; forcing cleanup. [46895.633613] LustreError: 13031:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -108 [46895.634939] LustreError: 13031:0:(file.c:4393:ll_inode_revalidate_fini()) Skipped 609 previous similar messages [46895.635482] Lustre: lustre-MDT0000-mdc-ffff9d96fa9a7800: Connection restored to 10.2.8.143@tcp (at 10.2.8.143@tcp) [46920.333393] INFO: task setfattr:32731 blocked for more than 120 seconds. [46920.334207] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [46920.335014] setfattr D ffff9d9695c36eb0 0 32731 638 0x00000080 [46920.335773] Call Trace: [46920.336068] [<ffffffffae22ae2e>] ? generic_permission+0x15e/0x1d0 [46920.336704] [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70 [46920.337337] [<ffffffffae717c17>] __mutex_lock_slowpath+0xc7/0x1d0 [46920.337958] [<ffffffffae716fff>] mutex_lock+0x1f/0x2f [46920.338471] [<ffffffffae24700f>] vfs_removexattr+0x5f/0x130 [46920.339043] [<ffffffffae247135>] removexattr+0x55/0x80 [46920.339578] [<ffffffffae230a7d>] ? putname+0x3d/0x60 [46920.340101] [<ffffffffae231c92>] ? user_path_at_empty+0x72/0xc0 [46920.340711] [<ffffffffae2221c8>] ? __sb_start_write+0x58/0x110 [46920.341297] [<ffffffffae72056c>] ? __do_page_fault+0x1bc/0x4f0 [46920.341900] [<ffffffffae241d9c>] ? mnt_want_write+0x2c/0x50 [46920.342455] [<ffffffffae247f34>] SyS_removexattr+0x94/0xd0 [46920.343017] [<ffffffffae72579b>] system_call_fastpath+0x22/0x27 [46920.343616] [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.344266] INFO: task cp:3161 blocked for more than 120 seconds. [46920.344878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [46920.345642] cp D ffff9d96e3c45ee0 0 3161 645 0x00000080 [46920.346377] Call Trace: [46920.346659] [<ffffffffae2d5152>] ? security_inode_permission+0x22/0x30 [46920.347301] [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70 [46920.347946] [<ffffffffae717c17>] __mutex_lock_slowpath+0xc7/0x1d0 [46920.348543] [<ffffffffae716fff>] mutex_lock+0x1f/0x2f [46920.349083] [<ffffffffae710bee>] lookup_slow+0x33/0xa7 [46920.349608] [<ffffffffae22e7a8>] path_lookupat+0x838/0x8b0 [46920.350178] [<ffffffffae1970cb>] ? unlock_page+0x2b/0x30 [46920.350732] [<ffffffffae1faf39>] ? kmem_cache_alloc+0x179/0x1f0 [46920.351320] [<ffffffffae230aef>] ? getname_flags+0x4f/0x1a0 [46920.351894] [<ffffffffae22e84b>] filename_lookup+0x2b/0xc0 [46920.352442] [<ffffffffae231c87>] user_path_at_empty+0x67/0xc0 [46920.353036] [<ffffffffae127b72>] ? from_kgid_munged+0x12/0x20 [46920.353630] [<ffffffffae2251df>] ? cp_new_stat+0x14f/0x180 [46920.354180] [<ffffffffae231cf1>] user_path_at+0x11/0x20 [46920.354718] [<ffffffffae224cd3>] vfs_fstatat+0x63/0xc0 [46920.355234] [<ffffffffae22523e>] SYSC_newstat+0x2e/0x60 [46920.355776] [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.356427] [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146 [46920.357093] [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.357748] [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146 [46920.358394] [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.359050] [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146 [46920.359715] [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.360355] [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146 [46920.361018] [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.361676] [<ffffffffae7256d5>] ? system_call_after_swapgs+0xa2/0x146 [46920.362314] [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.362967] [<ffffffffae22551e>] SyS_newstat+0xe/0x10 [46920.363469] [<ffffffffae72579b>] system_call_fastpath+0x22/0x27 [46920.364082] [<ffffffffae7256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.364739] INFO: task ln:3169 blocked for more than 120 seconds. [46920.365340] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [46920.366106] ln D ffff9d96d8966eb0 0 3169 616 0x00000080 [46920.366863] Call Trace: [46920.367122] [<ffffffffae22e092>] ? path_lookupat+0x122/0x8b0 [46920.367698] [<ffffffffae1f8e19>] ? ___slab_alloc+0x209/0x4f0 [46920.368264] [<ffffffffae719e59>] schedule_preempt_disabled+0x29/0x70 On client 1 (vm10), we see dir_create, ls, mkdir and other calls blocked [46775.380960] LustreError: 167-0: lustre-MDT0000-mdc-ffff9b35bbd7d000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. [46775.392616] LustreError: 14241:0:(file.c:4393:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000401:0x122:0x0] error: rc = -5 [46775.394009] LustreError: 14241:0:(file.c:4393:ll_inode_revalidate_fini()) Skipped 2 previous similar messages [46775.395224] LustreError: 30588:0:(mdc_locks.c:1257:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -108 [46775.402186] LustreError: 18530:0:(file.c:216:ll_close_inode_openhandle()) lustre-clilmv-ffff9b35bbd7d000: inode [0x200000404:0x148:0x0] mdc close failed: rc = -108 [46775.403682] LustreError: 18530:0:(file.c:216:ll_close_inode_openhandle()) Skipped 1 previous similar message [46775.415258] LustreError: 17871:0:(llite_lib.c:1547:ll_md_setattr()) md_setattr fails: rc = -108 [46775.421674] LustreError: 18956:0:(lmv_obd.c:1250:lmv_fid_alloc()) Can't alloc new fid, rc -19 [46775.431989] Lustre: lustre-MDT0000-mdc-ffff9b35bbd7d000: Connection restored to 10.2.8.143@tcp (at 10.2.8.143@tcp) [46920.246223] INFO: task dir_create.sh:4489 blocked for more than 120 seconds. [46920.247112] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [46920.247952] dir_create.sh D ffff9b35baf1af70 0 4489 4447 0x00000080 [46920.248755] Call Trace: [46920.249111] [<ffffffffc0ea57e2>] ? ll_dcompare+0x72/0x2e0 [lustre] [46920.249780] [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70 [46920.250439] [<ffffffff94317c17>] __mutex_lock_slowpath+0xc7/0x1d0 [46920.251074] [<ffffffff94316fff>] mutex_lock+0x1f/0x2f [46920.251612] [<ffffffff94310bee>] lookup_slow+0x33/0xa7 [46920.252169] [<ffffffff93e2d01f>] link_path_walk+0x80f/0x8b0 [46920.252746] [<ffffffff93e30205>] path_openat+0xb5/0x640 [46920.253285] [<ffffffff93e31dbd>] do_filp_open+0x4d/0xb0 [46920.253848] [<ffffffff93e3f1e4>] ? __alloc_fd+0xc4/0x170 [46920.254401] [<ffffffff93e1e0d7>] do_sys_open+0x137/0x240 [46920.254959] [<ffffffff943256d5>] ? system_call_after_swapgs+0xa2/0x146 [46920.255631] [<ffffffff93e1e1fe>] SyS_open+0x1e/0x20 [46920.256150] [<ffffffff9432579b>] system_call_fastpath+0x22/0x27 [46920.256760] [<ffffffff943256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.257485] INFO: task ls:6624 blocked for more than 120 seconds. [46920.258108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [46920.258901] ls D ffff9b359eb6eeb0 0 6624 4495 0x00000080 [46920.259661] Call Trace: [46920.259932] [<ffffffff93cc6081>] ? in_group_p+0x31/0x40 [46920.260487] [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70 [46920.261139] [<ffffffff94317c17>] __mutex_lock_slowpath+0xc7/0x1d0 [46920.261777] [<ffffffff94316fff>] mutex_lock+0x1f/0x2f [46920.262300] [<ffffffff93e2f11f>] do_last+0x28f/0x12c0 [46920.262826] [<ffffffff93e30227>] path_openat+0xd7/0x640 [46920.263362] [<ffffffff93e30aef>] ? getname_flags+0x4f/0x1a0 [46920.263957] [<ffffffffc0eb2159>] ? ll_file_data_put+0x89/0x180 [lustre] [46920.264625] [<ffffffff93e31dbd>] do_filp_open+0x4d/0xb0 [46920.265173] [<ffffffff93e3f1e4>] ? __alloc_fd+0xc4/0x170 [46920.265739] [<ffffffff93e1e0d7>] do_sys_open+0x137/0x240 [46920.266281] [<ffffffff943256d5>] ? system_call_after_swapgs+0xa2/0x146 [46920.266939] [<ffffffff93e1e214>] SyS_openat+0x14/0x20 [46920.267473] [<ffffffff9432579b>] system_call_fastpath+0x22/0x27 [46920.268081] [<ffffffff943256e1>] ? system_call_after_swapgs+0xae/0x146 [46920.268778] INFO: task ln:8693 blocked for more than 120 seconds. [46920.269388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [46920.270180] ln D ffff9b3595468fd0 0 8693 4475 0x00000080 [46920.270940] Call Trace: [46920.271201] [<ffffffff93e2e092>] ? path_lookupat+0x122/0x8b0 [46920.271790] [<ffffffff94319e59>] schedule_preempt_disabled+0x29/0x70 ON the MDS (vm12), we see many stack traces like the following [46559.329558] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-41vm10,onyx-41vm9.onyx.whamcloud.com DURATION=900 ============ 09:34:14 (1544088854) [46564.017497] LustreError: 5798:0:(mdt_lvb.c:430:mdt_lvbo_fill()) lustre-MDT0000: small buffer size 544 for EA 568 (max_mdsize 568): rc = -34 [46643.846755] LNet: Service thread pid 5790 was inactive for 62.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [46643.848574] Pid: 5790, comm: mdt00_011 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Wed Dec 5 03:41:24 UTC 2018 [46643.849563] Call Trace: [46643.849882] [<ffffffffc0ef8031>] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [46643.850848] [<ffffffffc0ef8dcc>] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [46643.851624] [<ffffffffc11e951b>] mdt_object_local_lock+0x50b/0xb20 [mdt] [46643.852657] [<ffffffffc11e9ba0>] mdt_object_lock_internal+0x70/0x360 [mdt] [46643.853470] [<ffffffffc11e9f47>] mdt_object_lock_try+0x27/0xb0 [mdt] [46643.854167] [<ffffffffc11eb687>] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] [46643.854970] [<ffffffffc11f2ba5>] mdt_intent_getattr+0x2b5/0x480 [mdt] [46643.855694] [<ffffffffc11efa08>] mdt_intent_policy+0x2e8/0xd00 [mdt] [46643.856459] [<ffffffffc0edeec6>] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] [46643.857209] [<ffffffffc0f078a7>] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] [46643.858036] [<ffffffffc0f8e242>] tgt_enqueue+0x62/0x210 [ptlrpc] [46643.858866] [<ffffffffc0f9529a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [46643.859648] [<ffffffffc0f3991b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [46643.860560] [<ffffffffc0f3d24c>] ptlrpc_main+0xafc/0x1fb0 [ptlrpc] [46643.861315] [<ffffffff95cbdf21>] kthread+0xd1/0xe0 [46643.861876] [<ffffffff963255f7>] ret_from_fork_nospec_end+0x0/0x39 [46643.862590] [<ffffffffffffffff>] 0xffffffffffffffff [46643.863234] LustreError: dumping log to /tmp/lustre-log.1544088939.5790 … [47031.433294] Pid: 5782, comm: mdt00_004 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Wed Dec 5 03:41:24 UTC 2018 [47031.434325] Call Trace: [47031.434609] [<ffffffffc0ef8031>] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] [47031.435489] [<ffffffffc0ef8dcc>] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] [47031.436322] [<ffffffffc11e951b>] mdt_object_local_lock+0x50b/0xb20 [mdt] [47031.437360] [<ffffffffc11e9ba0>] mdt_object_lock_internal+0x70/0x360 [mdt] [47031.438194] [<ffffffffc11ec09a>] mdt_object_find_lock+0x6a/0x1a0 [mdt] [47031.438983] [<ffffffffc120aa5e>] mdt_reint_setxattr+0x1ce/0xfd0 [mdt] [47031.439782] [<ffffffffc12085d3>] mdt_reint_rec+0x83/0x210 [mdt] [47031.440507] [<ffffffffc11e51a3>] mdt_reint_internal+0x6e3/0xaf0 [mdt] [47031.441242] [<ffffffffc11f0487>] mdt_reint+0x67/0x140 [mdt] [47031.441942] [<ffffffffc0f9529a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [47031.442833] [<ffffffffc0f3991b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [47031.443725] [<ffffffffc0f3d24c>] ptlrpc_main+0xafc/0x1fb0 [ptlrpc] [47031.444586] [<ffffffff95cbdf21>] kthread+0xd1/0xe0 [47031.445234] [<ffffffff963255f7>] ret_from_fork_nospec_end+0x0/0x39 [47031.445925] [<ffffffffffffffff>] 0xffffffffffffffff [47031.446664] LustreError: dumping log to /tmp/lustre-log.1544089327.5782 Additional logs for this issue are at |
| Comments |
| Comment by Alex Zhuravlev [ 12/Dec/18 ] |
|
I was unable to reproduce this with RACER_ENABLE_DOM=false
|
| Comment by Peter Jones [ 12/Dec/18 ] |
|
Mike Is this something that you already know about? Peter |
| Comment by Mikhail Pershin [ 12/Dec/18 ] |
|
it looks like |
| Comment by Mikhail Pershin [ 27/Apr/19 ] |
|
This looks like I think that patch from |
| Comment by Patrick Farrell (Inactive) [ 27/Apr/19 ] |
|
Mike, Is it not possible to make the PR DOM lock part of the getattr request on MDC? Or would that cause problems (or at least be wasteful) if there was an existing DOM lock? |
| Comment by Mikhail Pershin [ 28/Apr/19 ] |
|
Patrick, it is already so, if we CAN return DOM lock, but if not then separate glimpse is issued to get size. What I am thinking about is to do glimpse from server in advance, if we cannot return DOM bit and there will be glimpse request for sure. |
| Comment by Mikhail Pershin [ 28/Apr/19 ] |
|
But first I have to check how glimpse works right now, my understanding it should not be blocked in waiting queue. |
| Comment by Stephane Thiell [ 28/Apr/19 ] |
|
I believe we are getting this issue right now, example on FID 0x2c001ad81:0xe26:0x0 but there are others (0x13 + 0x40): Apr 28 10:16:47 fir-md1-s1 kernel: LustreError: 35049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556471717, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff8b5012ae7980/0x378007fdc6c24f96 lrc: 3/1,0 mode: --/PR res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x13/0x8 rrc: 143 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 35049 timeout: 0 lvb_type: 0 Apr 28 10:17:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.8.7.17@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b51f73e72c0/0x378007fdc5f51cf6 lrc: 3/0,0 mode: PW/PW res: [0x2c001ad81:0xe26:0x0].0x0 bits 0x40/0x0 rrc: 143 type: IBT flags: 0x60200400000020 nid: 10.8.7.17@o2ib6 remote: 0x1b8755abd081ffc4 expref: 10 pid: 34487 timeout: 471701 lvb_type: 0 [root@fir-rbh01 ~]# lfs fid2path /fir 0x2c001ad81:0xe26:0x0 /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput [root@fir-rbh01 ~]# stat /fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput File: ‘/fir/users/jmz/geo_activity/activitySplit/logs/testRunOutput’ Size: 956 Blocks: 16 IO Block: 4194304 regular file Device: e64e03a8h/3863872424d Inode: 198160228309536294 Links: 1 Access: (0644/-rw-r--r--) Uid: (331789/ jmz) Gid: (11886/ euan) Access: 2019-04-28 01:46:50.000000000 -0700 Modify: 2019-04-28 10:28:15.000000000 -0700 Change: 2019-04-28 10:28:15.000000000 -0700 Birth: - followed after some time by the typical ldlm_completion_ast traces. Even restarting the MDTs doesn't seem to fix the issue... what can we do apart from remounting with abort_recov? |
| Comment by Stephane Thiell [ 28/Apr/19 ] |
|
Attached fir-md1-s1-kernel-20190428.log |
| Comment by Stephane Thiell [ 28/Apr/19 ] |
|
I commented in |
| Comment by Mikhail Pershin [ 29/Apr/19 ] |
|
Stephane, thanks for logs, I didn't find the reason yet, working on that |
| Comment by Stephane Thiell [ 01/May/19 ] |
|
Mike, I just checked and since this last event, so about two days now, I haven't seen any new ldlm_completion_ast call trace at all on both MDS, this is surprising (in a good way!). Before that, we had many of them per day. But after the problem reported above, I added the patch for 5081aa7 Also we're now 100% sure that all clients are running with I'll update if things change (which usually occurs shortly after posting a comment here... |
| Comment by Stephane Thiell [ 06/May/19 ] |
|
Quick update... our Fir system (running the patches above) has been stable last week and last weekend. Note that we're "only" running patchset #11 of |
| Comment by Peter Jones [ 11/May/19 ] |
|
This seems encouraging enough to consider this ticket a duplicate (though it is hard to pin onto exactly which patch it is a duplicate of |
| Comment by James Nunez (Inactive) [ 10/Feb/20 ] |
|
Mike - I think we are still seeing this issue. Would you please take a look at the racer hang at https://testing.whamcloud.com/test_sets/611e867e-4b76-11ea-a1c8-52540065bddc ? If this is not the same hang, I'll open a new ticket. I know this ticket was closed as a duplicate, but I don't know what ticket it is duplicating and, thus, can't see if that one is still open. |
| Comment by Mikhail Pershin [ 10/Feb/20 ] |
|
James. I don't see evidences for that so far, neither server nor client has traces of DOM locks, but server stack traces contains osp_md_object_lock call and mdt_rename_lock on servers, so I tend to think this is DNE issue, like |