running truncated executable causes spewing of lock debug messages (LU-1299)

[LU-1356] Assertion triggered in osc_lock_wait() Created: 30/Apr/12  Updated: 04/May/12  Resolved: 04/May/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1
Fix Version/s: None

Type: Technical task Priority: Minor
Reporter: Prakash Surya (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

https://github.com/chaos/lustre


Rank (Obsolete): 10245

 Description   

We managed to trigger the following assertion:

2012-04-29 17:22:30 LustreError: 29592:0:(lov_lock.c:781:lov_lock_cancel()) lock@ffff88021e21caf8[1 1 -5 0 0 00000005] P(0):[0, 18446744073709551615]@[0x25b1608289:0x3:0x0] {
2012-04-29 17:22:30 LustreError: 29592:0:(lov_lock.c:781:lov_lock_cancel())     vvp@ffff880178f60498: 
2012-04-29 17:22:30 LustreError: 29592:0:(lov_lock.c:781:lov_lock_cancel())     lov@ffff880178d92268: 2
2012-04-29 17:22:30 LustreError: 29592:0:(lov_lock.c:781:lov_lock_cancel())     0 0: ---
2012-04-29 17:22:30 LustreError: 29592:0:(lov_lock.c:781:lov_lock_cancel())     1 0: ---
2012-04-29 17:22:30 LustreError: 29592:0:(lov_lock.c:781:lov_lock_cancel()) 
2012-04-29 17:22:30 LustreError: 29592:0:(lov_lock.c:781:lov_lock_cancel()) } lock@ffff88021e21caf8
2012-04-29 17:22:30 LustreError: 29592:0:(lov_lock.c:781:lov_lock_cancel()) lov_lock_cancel fails with -5.
...
2012-04-29 17:22:30 LustreError: 29592:0:(osc_lock.c:1225:osc_lock_wait()) ASSERTION(equi(olck->ols_state >= OLS_UPCALL_RECEIVED && lock->cll_error == 0, olck->ols_lock != NULL)) failed
2012-04-29 17:22:30 LustreError: 29592:0:(osc_lock.c:1225:osc_lock_wait()) LBUG


 Comments   
Comment by Ned Bass [ 30/Apr/12 ]

Backtrace.

PID: 29592  TASK: ffff88043541d500  CPU: 5   COMMAND: "rm"
 #0 [ffff880155ea9920] machine_kexec at ffffffff8103216b
 #1 [ffff880155ea9980] crash_kexec at ffffffff810b8c12
 #2 [ffff880155ea9a50] panic at ffffffff814ee6cf
 #3 [ffff880155ea9ad0] lbug_with_loc at ffffffffa03e0e1b [libcfs]
 #4 [ffff880155ea9af0] libcfs_assertion_failed at ffffffffa03ea42d [libcfs]
 #5 [ffff880155ea9b10] osc_lock_wait at ffffffffa085f419 [osc]
 #6 [ffff880155ea9b30] cl_wait_try at ffffffffa0558a65 [obdclass]
 #7 [ffff880155ea9b70] lov_lock_enqueue at ffffffffa08b81e4 [lov]
 #8 [ffff880155ea9c00] cl_enqueue_try at ffffffffa055a91c [obdclass]
 #9 [ffff880155ea9c50] cl_enqueue_locked at ffffffffa055bf4d [obdclass]
#10 [ffff880155ea9c90] cl_lock_request at ffffffffa055c20e [obdclass]
#11 [ffff880155ea9cf0] cl_io_lock at ffffffffa0560080 [obdclass]
#12 [ffff880155ea9d50] cl_io_loop at ffffffffa05602e2 [obdclass]
#13 [ffff880155ea9da0] ll_file_io_generic at ffffffffa093423b [lustre]
#14 [ffff880155ea9e20] ll_file_aio_write at ffffffffa093938c [lustre]
#15 [ffff880155ea9e80] ll_file_write at ffffffffa093963c [lustre]
#16 [ffff880155ea9ef0] vfs_write at ffffffff81177888
#17 [ffff880155ea9f30] sys_write at ffffffff81178291
#18 [ffff880155ea9f80] system_call_fastpath at ffffffff8100b0f2
    RIP: 00002aaaaada5a10  RSP: 00007fffffffa918  RFLAGS: 00010246
    RAX: 0000000000000001  RBX: ffffffff8100b0f2  RCX: 0000000000000002
    RDX: 0000000000000004  RSI: 00007fffffffa980  RDI: 0000000000000002
    RBP: 00007fffffffa980   R8: 00002aaaab05db20   R9: 0000000000000000
    R10: 00000000ffffffff  R11: 0000000000000246  R12: 0000000000000004
    R13: 00002aaaab057860  R14: 0000000000000004  R15: 00002aaaab057860
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
Comment by Ned Bass [ 30/Apr/12 ]

Here is the patch stack that was running at the time.

https://github.com/chaos/lustre/commits/2.1.1-10chaos

Comment by Jinshan Xiong (Inactive) [ 01/May/12 ]

Can you please upload the core dump, thanks.

Comment by Ned Bass [ 02/May/12 ]

See LU-1356-core-dump.tar.gz on ftp.whamcloud.com.

Comment by Jinshan Xiong (Inactive) [ 02/May/12 ]

Hi Ned, will you please upload lustre module files as well, thanks.

Comment by Ned Bass [ 02/May/12 ]

See LU-1356-lustre-modules-2.1.1-10chaos_2.6.32_220.13.1.2chaos.tar.gz on ftp.whamcloud.com

Comment by Jinshan Xiong (Inactive) [ 03/May/12 ]

I confirm this issue is imported by LU-1299, I'm working on this.

Comment by Jinshan Xiong (Inactive) [ 04/May/12 ]

Fixed in LU-1299.

Generated at Sat Feb 10 01:15:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.