[LU-4900] cl_use_try()) ASSERTION( result != -38 ) failed Created: 12/Apr/14  Updated: 21/Jan/15  Resolved: 21/Jan/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: John Hammond Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: HB, clio, llite

Severity: 3
Rank (Obsolete): 13536

 Description   

Running racer with OSTCOUNT=6 on 2.5.57-77-g586a609 with the following:

--- a/lustre/tests/racer/file_create.sh
+++ b/lustre/tests/racer/file_create.sh
@@ -8,8 +8,8 @@ OSTCOUNT=${OSTCOUNT:-$(lfs df $DIR 2> /dev/null | grep -c OST)}
 
 while /bin/true ; do 
        file=$((RANDOM % MAX))
-       SIZE=$((RANDOM * MAX_MB / 32))
-       echo "file_create: FILE=$DIR/$file SIZE=$SIZE"
+       SIZE=$((RANDOM % 4))
+       # echo "file_create: FILE=$DIR/$file SIZE=$SIZE"
        [ $OSTCOUNT -gt 0 ] &&
                lfs setstripe -c $((RANDOM % OSTCOUNT)) $DIR/$file 2> /dev/null
        dd if=/dev/zero of=$DIR/$file bs=1k count=$SIZE 2> /dev/null

I see the following failed assertion in cl_use_try():

LustreError: 16026:0:(cl_lock.c:1115:cl_use_try()) ASSERTION( result != -38 ) failed: 
LustreError: 16026:0:(cl_lock.c:1115:cl_use_try()) LBUG
Pid: 16026, comm: cat

PID: 16026  TASK: ffff8801ef603540  CPU: 0   COMMAND: "cat"
 #0 [ffff8801e6c899f0] machine_kexec at ffffffff81035d6b
 #1 [ffff8801e6c89a50] crash_kexec at ffffffff810c0e22
 #2 [ffff8801e6c89b20] panic at ffffffff8150f01f
 #3 [ffff8801e6c89ba0] lbug_with_loc at ffffffffa02a9eeb [libcfs]
 #4 [ffff8801e6c89bc0] cl_use_try at ffffffffa045f576 [obdclass]
 #5 [ffff8801e6c89c10] cl_enqueue_try at ffffffffa045f70d [obdclass]
 #6 [ffff8801e6c89c60] cl_enqueue_locked at ffffffffa04608ff [obdclass]
 #7 [ffff8801e6c89ca0] cl_lock_request at ffffffffa046154e [obdclass]
 #8 [ffff8801e6c89d00] cl_io_lock at ffffffffa04664f4 [obdclass]
 #9 [ffff8801e6c89d60] cl_io_loop at ffffffffa0466732 [obdclass]
#10 [ffff8801e6c89d90] ll_file_io_generic at ffffffffa0dee34f [lustre]
#11 [ffff8801e6c89e20] ll_file_aio_read at ffffffffa0dee8cf [lustre]
#12 [ffff8801e6c89e80] ll_file_read at ffffffffa0deed8c [lustre]
#13 [ffff8801e6c89ef0] vfs_read at ffffffff81183095
#14 [ffff8801e6c89f30] sys_read at ffffffff811831d1
#15 [ffff8801e6c89f80] system_call_fastpath at ffffffff8100b072
    RIP: 0000003d046db5f0  RSP: 00007fffa2567830  RFLAGS: 00010202
    RAX: 0000000000000000  RBX: ffffffff8100b072  RCX: 0000003d046e545a
    RDX: 0000000000400000  RSI: 00007f74960c7000  RDI: 0000000000000003
    RBP: 00007f74960c7000   R8: 00000000ffffffff   R9: 0000000000000000
    R10: 0000000000400fff  R11: 0000000000000246  R12: ffffffffffc00000
    R13: 0000000000000003  R14: 0000000000400000  R15: 0000000000000003
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b


 Comments   
Comment by Jodi Levi (Inactive) [ 10/Nov/14 ]

BobiJam,
Would you be able to look into this one?
Thank you!

Comment by Zhenyu Xu [ 20/Jan/15 ]

b2_7 does not contain this object's clo_use interface, so it does not affect b2_7, so it should not be a b2_7 stopper.

Comment by Andreas Dilger [ 21/Jan/15 ]

John, are you still able to reproduce this on master? If not, we can close it, because we aren't patching b2_6 or b2_5 with bugs hit with tests like racer, only bugs hit by end users.

Comment by John Hammond [ 21/Jan/15 ]

No longer seen on master.

Generated at Sat Feb 10 01:46:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.