[LU-1527] Assertion at: (cl_lock.c:2211:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) Created: 15/Jun/12  Updated: 27/Jan/14  Resolved: 13/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.6
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Jinshan Xiong (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: mn1

Issue Links:
Related
is related to LU-2128 cl_lock_hold_add()) ASSERTION( lock->... Closed
Severity: 3
Rank (Obsolete): 5266

 Description   

With backtrace:

LustreError: 16393:0:(cl_lock.c:2211:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) failed:
LustreError: 16393:0:(cl_lock.c:2211:cl_lock_hold_add()) LBUG
Pid: 16393, comm: ll_close

Call Trace:
[<ffffffffa0ce3905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa0ce3f17>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0e64f71>] cl_lock_hold_add+0x141/0x150 [obdclass]
[<ffffffffa0e67c16>] cl_lock_peek+0x96/0x150 [obdclass]
[<ffffffffa0a0ac66>] cl_local_size+0x266/0x2f0 [lustre]
[<ffffffffa09ce846>] ll_done_writing_attr+0x46/0x190 [lustre]
[<ffffffffa09cec89>] ll_ioepoch_close+0x2f9/0x600 [lustre]
[<ffffffffa09d05d0>] ll_close_thread+0x2a0/0x1060 [lustre]
[<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
[<ffffffffa09d0330>] ? ll_close_thread+0x0/0x1060 [lustre]
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffffa09d0330>] ? ll_close_thread+0x0/0x1060 [lustre]
[<ffffffffa09d0330>] ? ll_close_thread+0x0/0x1060 [lustre]
[<ffffffff8100c140>] ? child_rip+0x0/0x20

LustreError: dumping log to /tmp/lustre-log.1339749052.16393



 Comments   
Comment by Jinshan Xiong (Inactive) [ 15/Jun/12 ]

We should check if the lock is freed in cl_lock_peek().

Comment by Jinshan Xiong (Inactive) [ 15/Jun/12 ]

patch is at: http://review.whamcloud.com/3117

Comment by Andreas Dilger [ 15/Oct/12 ]

I hit this on sanity.sh test_132() in my local single-node test setup (dual-core x86_64):

Lustre: 2247:0:(ofd_obd.c:1067:ofd_orphans_destroy()) testfs-OST0002: deleting orphan objects from 47607 to 47694
Lustre: 2247:0:(ofd_obd.c:1067:ofd_orphans_destroy()) Skipped 6 previous similar messages
LustreError: 2233:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 8000: rc -2
LustreError: 2233:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 20 previous similar messages
LustreError: 2242:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 7832: rc -2
LustreError: 2242:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 2 previous similar messages
Lustre: Mounted testfs-client
LustreError: 2225:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 7848: rc -2
LustreError: 2225:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 33 previous similar messages
Lustre: DEBUG MARKER: Using TIMEOUT=20
LustreError: 2224:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 28838: rc -2
LustreError: 2224:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 262 previous similar messages
LustreError: 2232:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 45068: rc -2
LustreError: 2232:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 5436 previous similar messages
Lustre: 2094:0:(mdt_handler.c:5649:mdt_connect_internal()) testfs-MDT0000: MDS has SOM enabled, but client does not support it
Lustre: DEBUG MARKER: cancel_lru_locks osc start
LustreError: 2526:0:(cl_lock.c:2176:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) failed: 
LustreError: 2526:0:(cl_lock.c:2176:cl_lock_hold_add()) LBUG
Pid: 2526, comm: ll_close

Call Trace:
[<ffffffffa078c905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa078cf17>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0980ac1>] cl_lock_hold_add+0x141/0x150 [obdclass]
[<ffffffffa0983736>] cl_lock_peek+0x96/0x150 [obdclass]
[<ffffffffa117b1b6>] cl_local_size+0x266/0x2f0 [lustre]
[<ffffffffa113df36>] ll_done_writing_attr+0x46/0x190 [lustre]
[<ffffffffa113e379>] ll_ioepoch_close+0x2f9/0x600 [lustre]
[<ffffffffa113fcc0>] ll_close_thread+0x2a0/0x1060 [lustre]
[<ffffffff81060250>] ? default_wake_function+0x0/0x20

The lvbo_init errors are unexpected in this test, so I'm not sure if they are the trigger, or unrelated to the problem.

Comment by Jinshan Xiong (Inactive) [ 13/Nov/12 ]

patch has landed.

Comment by Stephen Champion [ 28/May/13 ]

FYI, I ran into this running acceptance on a SLES11SP1 node with Lustre 2.1.5 client.
Andriy's patch looks applicable to b2_1, as well.
I just passed an acceptance run using a 2.1.5 base plus this patch.
Whitespace changes were required.

Comment by Jian Yu [ 27/Jan/14 ]

While testing patch http://review.whamcloud.com/8835 on Lustre b2_1 branch, the same failure occurred:

17:17:19:LustreError: 15141:0:(cl_lock.c:2188:cl_lock_hold_add()) ASSERTION( lock->cll_state != CLS_FREEING ) failed: 
17:17:19:LustreError: 15141:0:(cl_lock.c:2188:cl_lock_hold_add()) LBUG

Maloo report: https://maloo.whamcloud.com/test_sets/ec280652-8726-11e3-8928-52540035b04c

Generated at Sat Feb 10 01:17:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.