Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.1.0
-
None
-
RHEL 6.2
kernel 2.6.32-207.1chaos.ch5.x86_64.debug
-
3
-
6531
Description
Running Lustre 2.1 client on a debug kernel we got the following warning from the lock validator. I suspect this may be a false alarm since we didn't deadlock and the cl_lockset comments suggest holding multiple locks is unavoidable in some cases. Reporting here just in case this is a real bug.
=============================================
[ INFO: possible recursive locking detected ]
2.6.32-207.1chaos.ch5.x86_64.debug #1
---------------------------------------------
sh/5936 is trying to acquire lock:
(EXT){+.+.+.}, at: [<ffffffffa05f18c0>] cl_lock_lockdep_acquire+0x0/0x50 [obdclass]
but task is already holding lock:
(EXT){+.+.+.}, at: [<ffffffffa05f18c0>] cl_lock_lockdep_acquire+0x0/0x50 [obdclass]
other info that might help us debug this:
2 locks held by sh/5936:
#0: (&lli->lli_trunc_sem){.+.+.+}, at: [<ffffffffa08ca118>]
ll_file_io_generic+0x2c8/0x580 [lustre]
#1: (EXT){+.+.+.}, at: [<ffffffffa05f18c0>] cl_lock_lockdep_acquire+0x0/0x50
[obdclass]
stack backtrace:
Pid: 5936, comm: sh Not tainted 2.6.32-207.1chaos.ch5.x86_64.debug #1
Call Trace:
[<ffffffff810af570>] ? __lock_acquire+0x11c0/0x1570
[<ffffffff81013753>] ? native_sched_clock+0x13/0x60
[<ffffffff81012c29>] ? sched_clock+0x9/0x10
[<ffffffff8109d37d>] ? sched_clock_cpu+0xcd/0x110
[<ffffffff810af9c4>] ? lock_acquire+0xa4/0x120
[<ffffffffa05f18c0>] ? cl_lock_lockdep_acquire+0x0/0x50 [obdclass]
[<ffffffffa05f18fd>] ? cl_lock_lockdep_acquire+0x3d/0x50 [obdclass]
[<ffffffffa05f18c0>] ? cl_lock_lockdep_acquire+0x0/0x50 [obdclass]
[<ffffffffa05f68e9>] ? cl_lock_request+0x1e9/0x200 [obdclass]
[<ffffffff810adc9d>] ? trace_hardirqs_on_caller+0x14d/0x190
[<ffffffffa0918d40>] ? cl_glimpse_lock+0x180/0x390 [lustre]
[<ffffffffa08df942>] ? ll_inode_size_unlock+0x52/0xf0 [lustre]
[<ffffffff8151fabb>] ? _spin_unlock+0x2b/0x40
[<ffffffffa091cda6>] ? ccc_prep_size+0x1c6/0x280 [lustre]
[<ffffffff810adc9d>] ? trace_hardirqs_on_caller+0x14d/0x190
[<ffffffffa091b691>] ? cl2ccc_io+0x21/0x80 [lustre]
[<ffffffffa0921faf>] ? vvp_io_read_start+0xbf/0x3d0 [lustre]
[<ffffffffa05f3525>] ? cl_wait+0xb5/0x290 [obdclass]
[<ffffffffa05f6ec8>] ? cl_io_start+0x68/0x170 [obdclass]
[<ffffffffa05fb930>] ? cl_io_loop+0x110/0x1c0 [obdclass]
[<ffffffffa08ca217>] ? ll_file_io_generic+0x3c7/0x580 [lustre]
[<ffffffffa04c9c22>] ? cfs_hash_rw_unlock+0x12/0x30 [libcfs]
[<ffffffffa04c8754>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs]
[<ffffffffa05ea8f9>] ? cl_env_get+0x29/0x350 [obdclass]
[<ffffffffa08cf37c>] ? ll_file_aio_read+0x13c/0x310 [lustre]
[<ffffffffa05eaa6d>] ? cl_env_get+0x19d/0x350 [obdclass]
[<ffffffff81042c54>] ? __do_page_fault+0x244/0x4e0
[<ffffffffa08cf6c1>] ? ll_file_read+0x171/0x310 [lustre]
[<ffffffff8109d37d>] ? sched_clock_cpu+0xcd/0x110
[<ffffffff810aa27d>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8109d4af>] ? cpu_clock+0x6f/0x80
[<ffffffff81192875>] ? vfs_read+0xb5/0x1a0
[<ffffffff811929b1>] ? sys_read+0x51/0x90
[<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b
This is the test that was running at the time (while debugging a cgroup-related problem):
CGROUP_DIR=$(lssubsys -m memory | cut -d' ' -f2)
P=$CGROUP_DIR/test
move_current_to_cgroup() {
echo > $1/tasks
}
clean_up_all() {
move_current_to_cgroup $CGROUP_DIR
rm ./tmpfile
rmdir $CGROUP_DIR/test/A
rmdir $CGROUP_DIR/test
exit 1
}
trap clean_up_all INT
mkdir $P
echo > $P/tasks
while sleep 1; do
date
T=$P/A
mkdir $T
move_current_to_cgroup $T
echo 300M > $T/memory.limit_in_bytes
cat /proc/self/cgroup
dd if=/dev/zero of=./tmpfile bs=4096 count=100000
move_current_to_cgroup $P
cat /proc/self/cgroup
echo 0 > $T/memory.force_empty
rmdir $T
rm ./tmpfile
done
Attachments
Issue Links
- is related to
-
LU-619 Recursive locking in ldlm_lock_change_resource
-
- Resolved
-