Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.1.0
Labels:
None
Environment:
RHEL 6.2
kernel 2.6.32-207.1chaos.ch5.x86_64.debug

Severity:
3
Rank (Obsolete):
6531

Running Lustre 2.1 client on a debug kernel we got the following warning from the lock validator. I suspect this may be a false alarm since we didn't deadlock and the cl_lockset comments suggest holding multiple locks is unavoidable in some cases. Reporting here just in case this is a real bug.

=============================================
[ INFO: possible recursive locking detected ]
2.6.32-207.1chaos.ch5.x86_64.debug #1
---------------------------------------------
sh/5936 is trying to acquire lock:
 (EXT){+.+.+.}, at: [<ffffffffa05f18c0>] cl_lock_lockdep_acquire+0x0/0x50 [obdclass]

but task is already holding lock:
 (EXT){+.+.+.}, at: [<ffffffffa05f18c0>] cl_lock_lockdep_acquire+0x0/0x50 [obdclass]

other info that might help us debug this:
2 locks held by sh/5936:
 #0:  (&lli->lli_trunc_sem){.+.+.+}, at: [<ffffffffa08ca118>]
ll_file_io_generic+0x2c8/0x580 [lustre]
 #1:  (EXT){+.+.+.}, at: [<ffffffffa05f18c0>] cl_lock_lockdep_acquire+0x0/0x50
[obdclass]

stack backtrace:
Pid: 5936, comm: sh Not tainted 2.6.32-207.1chaos.ch5.x86_64.debug #1
Call Trace:
 [<ffffffff810af570>] ? __lock_acquire+0x11c0/0x1570
 [<ffffffff81013753>] ? native_sched_clock+0x13/0x60
 [<ffffffff81012c29>] ? sched_clock+0x9/0x10
 [<ffffffff8109d37d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff810af9c4>] ? lock_acquire+0xa4/0x120
 [<ffffffffa05f18c0>] ? cl_lock_lockdep_acquire+0x0/0x50 [obdclass]
 [<ffffffffa05f18fd>] ? cl_lock_lockdep_acquire+0x3d/0x50 [obdclass]
 [<ffffffffa05f18c0>] ? cl_lock_lockdep_acquire+0x0/0x50 [obdclass]
 [<ffffffffa05f68e9>] ? cl_lock_request+0x1e9/0x200 [obdclass]
 [<ffffffff810adc9d>] ? trace_hardirqs_on_caller+0x14d/0x190
 [<ffffffffa0918d40>] ? cl_glimpse_lock+0x180/0x390 [lustre]
 [<ffffffffa08df942>] ? ll_inode_size_unlock+0x52/0xf0 [lustre]
 [<ffffffff8151fabb>] ? _spin_unlock+0x2b/0x40
 [<ffffffffa091cda6>] ? ccc_prep_size+0x1c6/0x280 [lustre]
 [<ffffffff810adc9d>] ? trace_hardirqs_on_caller+0x14d/0x190
 [<ffffffffa091b691>] ? cl2ccc_io+0x21/0x80 [lustre]
 [<ffffffffa0921faf>] ? vvp_io_read_start+0xbf/0x3d0 [lustre]
 [<ffffffffa05f3525>] ? cl_wait+0xb5/0x290 [obdclass]
 [<ffffffffa05f6ec8>] ? cl_io_start+0x68/0x170 [obdclass]
 [<ffffffffa05fb930>] ? cl_io_loop+0x110/0x1c0 [obdclass]
 [<ffffffffa08ca217>] ? ll_file_io_generic+0x3c7/0x580 [lustre]
 [<ffffffffa04c9c22>] ? cfs_hash_rw_unlock+0x12/0x30 [libcfs]
 [<ffffffffa04c8754>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs]
 [<ffffffffa05ea8f9>] ? cl_env_get+0x29/0x350 [obdclass]
 [<ffffffffa08cf37c>] ? ll_file_aio_read+0x13c/0x310 [lustre]
 [<ffffffffa05eaa6d>] ? cl_env_get+0x19d/0x350 [obdclass]
 [<ffffffff81042c54>] ? __do_page_fault+0x244/0x4e0
 [<ffffffffa08cf6c1>] ? ll_file_read+0x171/0x310 [lustre]
 [<ffffffff8109d37d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff810aa27d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8109d4af>] ? cpu_clock+0x6f/0x80
 [<ffffffff81192875>] ? vfs_read+0xb5/0x1a0
 [<ffffffff811929b1>] ? sys_read+0x51/0x90
 [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b

This is the test that was running at the time (while debugging a cgroup-related problem):

 CGROUP_DIR=$(lssubsys -m memory | cut -d' ' -f2)
 P=$CGROUP_DIR/test

 move_current_to_cgroup() {
        echo > $1/tasks
 }

 clean_up_all() {
        move_current_to_cgroup $CGROUP_DIR
        rm ./tmpfile
        rmdir $CGROUP_DIR/test/A
        rmdir $CGROUP_DIR/test
        exit 1
 }

 trap clean_up_all INT

 mkdir $P
 echo > $P/tasks

 while sleep 1; do
        date
        T=$P/A       
        mkdir $T
        move_current_to_cgroup $T
        echo 300M > $T/memory.limit_in_bytes
        cat /proc/self/cgroup
        dd if=/dev/zero of=./tmpfile bs=4096 count=100000
        move_current_to_cgroup $P
        cat /proc/self/cgroup
        echo 0 > $T/memory.force_empty
        rmdir $T
        rm ./tmpfile
 done

is related to

LU-619 Recursive locking in ldlm_lock_change_resource

Resolved

Assignee:: Jinshan Xiong (Inactive)

Reporter:: Ned Bass (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 01/Nov/11 2:50 PM

Updated:: 04/Jun/12 2:46 AM

Resolved:: 04/Jun/12 2:46 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates