[LU-4797] ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed Created: 21/Mar/14  Updated: 28/Apr/14  Resolved: 04/Apr/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sebastien Buisson (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Attachments: File eviction     Text File lascaux2890_dump_lustre_pb_eviction_1.gz     Text File lascaux2890_dump_lustre_pb_eviction_2.gz     Text File lascaux2891_dump_lustre_pb_eviction_1.gz     Text File lascaux2891_dump_lustre_pb_eviction_2.gz    
Issue Links:
Duplicate
is duplicated by LU-4693 (lovsub_lock.c:103:lovsub_lock_state(... Resolved
Related
is related to LU-4558 Crash in cl_lock_put on racer Resolved
is related to LU-4591 Related cl_lock failures on master/2.5 Resolved
Severity: 3
Rank (Obsolete): 13202

 Description   

Hi,

After 3 days in production with Lustre 2.4.2, CEA is suffering from the following "assertion failed" issue about 5 times a day:

LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) ASSERTION( cl_lock_is_mutexed(slice->cls_lock) ) failed:
LustreError: 4089:0:(lovsub_lock.c:103:lovsub_lock_state()) LBUG
Pid: 4089, comm: %%AQC.P.I.O

Call Trace:
 [<ffffffffa0af4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0af4e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa1065d51>] lovsub_lock_state+0x1a1/0x1b0 [lov]
 [<ffffffffa0bd7a88>] cl_lock_state_signal+0x68/0x160 [obdclass]
 [<ffffffffa0bd7bd5>] cl_lock_state_set+0x55/0x190 [obdclass]
 [<ffffffffa0bdb8d9>] cl_enqueue_try+0x149/0x300 [obdclass]
 [<ffffffffa105e0da>] lov_lock_enqueue+0x22a/0x850 [lov]
 [<ffffffffa0bdb88c>] cl_enqueue_try+0xfc/0x300 [obdclass]
 [<ffffffffa0bdcc7f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass]
 [<ffffffffa0bdd8ee>] cl_lock_request+0x7e/0x270 [obdclass]
 [<ffffffffa0be2b8c>] cl_io_lock+0x3cc/0x560 [obdclass]
 [<ffffffffa0be2dc2>] cl_io_loop+0xa2/0x1b0 [obdclass]
 [<ffffffffa10dba90>] ll_file_io_generic+0x450/0x600 [lustre]
 [<ffffffffa10dc9d2>] ll_file_aio_write+0x142/0x2c0 [lustre]
 [<ffffffffa10dccbc>] ll_file_write+0x16c/0x2a0 [lustre]
 [<ffffffff811895d8>] vfs_write+0xb8/0x1a0
 [<ffffffff81189ed1>] sys_write+0x51/0x90
 [<ffffffff81091039>] ? sys_times+0x29/0x70
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

This issue is very similar to LU-4693, which is itself a duplicate of LU-4692, for which there is unfortunately no fix yet.

Please ask if you need additional information that could help the diagnostic and resolution of the problem.

Sebastien.



 Comments   
Comment by Bruno Faccini (Inactive) [ 21/Mar/14 ]

Hello Sebastien, are there any crash-dump available ?? If yes, could it be possible to extract the debug-log content with the crash-tool expansion described in CFS BZ #13155 (source to be re-compiled are available, and I know you may need them to install+use on-site) ?? BTW, waht is the default debug mask you run with ??

Comment by Sebastien Buisson (Inactive) [ 21/Mar/14 ]

Hi Bruno,

I have forwarded your request to on-site Support team. Do you want us to attach the requested debug-log content to this ticket? Or could we have a look by ourselves and search for something specific?

Cheers,
Sebastien.

Comment by Peter Jones [ 21/Mar/14 ]

Bobijam

Does this appear to be a duplicate of LU-4692? Is there anything additional that would assist with debugging this issue?

Peter

Comment by Zhenyu Xu [ 25/Mar/14 ]

Beside crash-dump, is it possible to find a rehit procedure?

Comment by Sebastien Buisson (Inactive) [ 25/Mar/14 ]

Hi Bobijam,

All I know is that the impacted file is a log file in which several processes write.

I have forwarded your request to our on-site Support team.

Cheers,
Sebastien.

Comment by Sebastien Buisson (Inactive) [ 27/Mar/14 ]

The workload should be reproduced by launching the script run_reproducer_2.sh with 4 processes on 2 nodes.

::::::::::::::
run_reproducer_2.sh
::::::::::::::
#!/bin/bash
sleeptime=$(( ( ${SLURM_PROCID} * 10000 ) + 1000000 ))
reproducer2.sh 10 /<path>/mylog ${sleeptime} ${SLURM_JOBID}_${SLURM_PROCID}
::::::::::::::
reproducer2.sh
::::::::::::::
#!/bin/bash
#
for i in $(seq 1 $1)
do
  usleep $3
  echo $(date) $(date '+%N') $4 $3 testing write in append mode >> $2
done
Comment by Christopher Morrone [ 28/Mar/14 ]

LLNL also hit this in testing Lustre version 2.4.2-6chaos on a Lustre client.

Comment by Zhenyu Xu [ 28/Mar/14 ]

I've been being trying the reproduce script since yesterday, haven't gotten a hit yet. I did a little change for my VM test environment.

$ cat ~/tmp/reproducer.sh
#!/bin/bash

ls > /dev/null &
PID=$!

for i in $(seq 1 10000)
do
echo $PID - $i
usleep $((($PID * 100) + 1500000))
echo "$(date) $(date '+%N') $PID-$i ***** testing write in append mode" >> /mnt/lustre/file
done

and on 2 nodes, run "$~/tmp/reproducer.sh &" five times, I think the basic idea is the same.

Sebastien, How long would it rehit the issue in your case?

Comment by Jinshan Xiong (Inactive) [ 28/Mar/14 ]

Is it possible to collect a crash dump for this issue?

The only difference between 2.4.1 and 2.4.2 is LU-3027, there are two patches landed for it. Can anyone please revert them and try again? Probably this way we can get some clues for this issue.

Comment by Aurelien Degremont (Inactive) [ 28/Mar/14 ]

Please note that Sebastien's script is not a reproducer of this crash, but something similar to the workload that leads to this crash. This code only easily triggers a lot of evictions.

Comment by Bruno Faccini (Inactive) [ 28/Mar/14 ]

Seb, sorry to only answer to your own comment/reply on "21/Mar/14 3:53 PM", so yes it could be useful as a 1st debugging info to get the debug-log content extracted from the crash-dump. BTW, I hope that you run with a default debug-levels mask that will be enough to gather accurate traces for this problem ??...

Comment by Bruno Faccini (Inactive) [ 28/Mar/14 ]

Aurelien, concerning the evictions likely to be reproduced on site with this script, is it also possible to get a Lustre debug-log, at least from evicted Client side and with the full debug mask/traces enabled ?

Comment by Sebastien Buisson (Inactive) [ 28/Mar/14 ]

Bobijam,

To make it clear, 2.4.2 is the first b2_4 release used at CEA, so we are not comparing with 2.4.1, but with 2.1.6.

Sebastien.

Comment by Antoine Percher [ 28/Mar/14 ]

About the testcase :

  • the waiting time is too large than the initial reproducer by x15
  • fix the run a processor/core with taskset for example when
    you run /tmp/reproducer.sh on the 2 nodes
    for cpu in $(seq 0 5) # nb cpus nodes
    taskset -c $cpu /tmp/reproducer.sh & ; done
  • My first testcase was without usleep and 32 task/process by nodes
    and 1000 write/echo by task
  • no lustre trace enable
Comment by Zhenyu Xu [ 31/Mar/14 ]

my wait time is 1.5x of the initial reproducer, I'll try the taskset.

Comment by Zhenyu Xu [ 31/Mar/14 ]

update: tried taskset, usleep 1000000, each cpu took 250 echo taskes, 2 VM nodes, each node has 2 cpus, haven't got the hit yet.

Comment by Bruno Faccini (Inactive) [ 31/Mar/14 ]

Bobi, I attach the Client-side debug logs that have been taken during an on-site run of the reproducer on 2x nodes (lascaux[2890-2891]) with evictions :

_ file "eviction" is a log showing the reproducer run+errors and the actions taken.
_ files lascaux289[0,1]dump_lustre_pb_eviction[1,2].gz are Lustre debug logs taken on both nodes and as shown in "eviction" file/trace.

Hope this will allow you to get more infos to qualify.

Comment by Zhenyu Xu [ 02/Apr/14 ]

still checking the eviction logs, there are lock enqueue and blocking ast call trace intertwined, but that's normal, haven't found where the race happens causing this mutex assertion violation.

Comment by Jinshan Xiong (Inactive) [ 03/Apr/14 ]

Please try patch: http://review.whamcloud.com/9881

I believe this is the same issue in LU-4591.

Comment by Jinshan Xiong (Inactive) [ 04/Apr/14 ]

duplicate of LU-4558

Comment by Antoine Percher [ 10/Apr/14 ]

Hi Zhenyu Xu,
Could you explain me why you think "there are lock enqueue and blocking ast call trace intertwined"
and eviction. You said "that's normal". I don't understand why there a lot contention on this because
we just add some bytes at the end of one file with just 4 process on 2 nodes . for me and on a 2.1.x
lustre distribution we haven't this contention.
and that could explain why sometime we meet the race fix by LU-4558
LU-3027 and LU-4495 could explain this contention ?
thanks

Comment by Zhenyu Xu [ 11/Apr/14 ]

every write needs a exclusive lock, write from other node will cause the lock holder to relinquish the lock, and multiple write upon the same file from different node will cause lock enqueue and lock blocking ast intertwined, by that I meant normal.

Generated at Sat Feb 10 01:45:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.