Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
None
-
3
-
6361
Description
We hit a CL_LOCK_ASSERT in cl_lock_hold_release() on about 20 clients when they were evicted. Note that we were testing 2.1.1-17chaos (which includes patchset 10 of the LU-1299 patch).
Here is a relevant log excerpt from the client:
2012-07-22 18:38:14 LustreError: 11-0: lc2-OST000b-osc-ffff880434f91800: Communicating with 10.1.1.48@o2ib9, operation ost_read failed with -107.
2012-07-22 18:38:14 LustreError: Skipped 6 previous similar messages
2012-07-22 18:38:14 Lustre: lc2-OST000b-osc-ffff880434f91800: Connection to lc2-OST000b (at 10.1.1.48@o2ib9) was lost; in progress operations using this service will wait for recovery to complete
2012-07-22 18:38:14 LustreError: 167-0: This client was evicted by lc2-OST000b; in progress operations using this service will fail.
2012-07-22 18:38:14 LustreError: 84597:0:(osc_lock.c:816:osc_ldlm_completion_ast()) lock@ffff8806c64d5978[4755 3 0 1 1 00000000] R(1):[491520, 665087]@[0x1000b0000:0x71c66f4:0x0] {
2012-07-22 18:38:14 LustreError: 84597:0:(osc_lock.c:816:osc_ldlm_completion_ast()) lovsub@ffff880781ea9ca0: [0 ffff88070951a200 R(1):[983040, 1329919]@[0x20018eee8:0x1f2e2:0x0]]
2012-07-22 18:38:14 LustreError: 84597:0:(osc_lock.c:816:osc_ldlm_completion_ast()) osc@ffff88077320eb50: ffff880822868240 40100001 0x798339c36387a9e3 3 ffff8808336bdad0 size: 4294967296 mtime: 1343005207 atime: 1343002227 ctime: 1343005207 blocks: 8316944
2012-07-22 18:38:14 LustreError: 84597:0:(osc_lock.c:816:osc_ldlm_completion_ast()) } lock@ffff8806c64d5978
2012-07-22 18:38:14 LustreError: 84597:0:(osc_lock.c:816:osc_ldlm_completion_ast()) dlmlock returned -5
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:1412:cl_unuse_try()) result = -5, this is unlikely!
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:1426:cl_unuse_locked()) lock@ffff8806c64d5438[1 3 0 1 0 00000000] R(1):[983040, 1329919]@[0x20018eee8:0x1f2e2:0x0] {
2012-07-22 18:38:14 LustreError: 84597:0:(ldlm_resource.c:755:ldlm_resource_complain()) Namespace lc2-OST000b-osc-ffff880434f91800 resource refcount nonzero (7) after lock cleanup; forcing cleanup.
2012-07-22 18:38:14 LustreError: 84597:0:(ldlm_resource.c:761:ldlm_resource_complain()) Resource: ffff8807c9fb56c0 (119301876/0/0/0) (rc: 7)
2012-07-22 18:38:14 Lustre: lc2-OST000b-osc-ffff880434f91800: Connection restored to lc2-OST000b (at 10.1.1.48@o2ib9)
2012-07-22 18:38:14 Lustre: Skipped 1 previous similar message
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:1426:cl_unuse_locked()) vvp@ffff880774bb94e8:
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:1426:cl_unuse_locked()) lov@ffff88070951a200: 1
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:1426:cl_unuse_locked()) 0 0: ---
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:1426:cl_unuse_locked())
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:1426:cl_unuse_locked()) } lock@ffff8806c64d5438
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:1426:cl_unuse_locked()) unuse return -5
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:906:cl_lock_hold_release()) lock@ffff8806c64d5438[1 3 0 0 0 00000000] R(1):[983040, 1329919]@[0x20018eee8:0x1f2e2:0x0] {
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:906:cl_lock_hold_release()) vvp@ffff880774bb94e8:
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:906:cl_lock_hold_release()) lov@ffff88070951a200: 1
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:906:cl_lock_hold_release()) 0 0: ---
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:906:cl_lock_hold_release())
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:906:cl_lock_hold_release()) } lock@ffff8806c64d5438
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:906:cl_lock_hold_release()) failed at lock->cll_state != CLS_HELD.
2012-07-22 18:38:14 LustreError: 84123:0:(cl_lock.c:906:cl_lock_hold_release()) LBUG