Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.4.1
-
Lustre build: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs
MDSCOUNT=4
-
3
-
10186
Description
racer test hung as follows:
== racer test complete, duration 1006 sec == 00:42:55 (1378194175) CMD: wtm-29vm3 /usr/sbin/lctl get_param -n version CMD: wtm-29vm3 /usr/sbin/lctl conf_param lustre.quota.mdt=ug CMD: wtm-29vm3 /usr/sbin/lctl conf_param lustre.quota.ost=ug Stopping clients: wtm-29vm5,wtm-29vm6.rosso.whamcloud.com /mnt/lustre2 (opts:) CMD: wtm-29vm5,wtm-29vm6.rosso.whamcloud.com running=\$(grep -c /mnt/lustre2' ' /proc/mounts); if [ \$running -ne 0 ] ; then echo Stopping client \$(hostname) /mnt/lustre2 opts:; lsof /mnt/lustre2 || need_kill=no; if [ x != x -a x\$need_kill != xno ]; then pids=\$(lsof -t /mnt/lustre2 | sort -u); if [ -n \"\$pids\" ]; then kill -9 \$pids; fi fi; while umount /mnt/lustre2 2>&1 | grep -q busy; do echo /mnt/lustre2 is still busy, wait one second && sleep 1; done; fi Stopping client wtm-29vm6.rosso.whamcloud.com /mnt/lustre2 opts: Stopping client wtm-29vm5.rosso.whamcloud.com /mnt/lustre2 opts: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME dd 3713 root 1w REG 1273,181606 7769088 450359979933698183 /mnt/lustre/racer1/6 (deleted) dd 10024 root 1w REG 1273,181606 99767296 450359979950473219 /mnt/lustre2/racer1/2 (deleted) dd 10031 root 1w REG 1273,181606 9411584 450359979967250450 /mnt/lustre2/racer1/12 dd 10059 root 1w REG 1273,181606 85909504 432345581440991292 /mnt/lustre2/racer/6 dd 10062 root 1w REG 1273,181606 56402944 432345581440991267 /mnt/lustre/racer/8 dd 10074 root 1w REG 1273,181606 47879168 432345581457768455 /mnt/lustre/racer/9 dd 10076 root 1w REG 1273,181606 45958144 432345581407436811 /mnt/lustre2/racer/13 dd 10084 root 1w REG 1273,181606 47879168 432345581457768455 /mnt/lustre/racer/9 dd 10130 root 1w REG 1273,181606 18804736 468374378443178004 /mnt/lustre/racer2/13 dd 10173 root 1w REG 1273,181606 55358464 486388776969437193 /mnt/lustre/racer3/6 dd 10421 root 1w REG 1273,181606 2441216 486388776952660004 /mnt/lustre/racer3/10 dd 10450 root 1w REG 1273,181606 50086912 468374378476732457 /mnt/lustre/racer2/0 dd 10456 root 1w REG 1273,181606 31876096 468374378443178023 /mnt/lustre/racer2/13 (deleted) dd 10457 root 1w REG 1273,181606 64463872 486388776952660005 /mnt/lustre/racer3/19 dd 10527 root 1w REG 1273,181606 2441216 486388776952660004 /mnt/lustre2/racer3/10 dd 10581 root 1w REG 1273,181606 7365632 468374378459955230 /mnt/lustre2/racer2/11 dd 11360 root 1w REG 1273,181606 57447424 450359979967250475 /mnt/lustre2/racer1/10 cat 11906 root 1w REG 1273,181606 10503168 486388776935882790 /mnt/lustre2/racer3/2 (deleted) cat 11906 root 3r REG 1273,181606 39769088 486388776986214418 /mnt/lustre2/racer3/5 dd 11947 root 1w REG 1273,181606 10503168 486388776935882790 /mnt/lustre2/racer3/2 (deleted) cat 12675 root 1w REG 1273,181606 8477696 450359979933696110 /mnt/lustre/racer1/1 (deleted) cat 12675 root 3r REG 1273,181606 8478720 450359979933696048 /mnt/lustre/racer1/5 dd 15141 root 1w REG 1273,181606 58491904 468374378476732558 /mnt/lustre2/racer2/3 (deleted) dd 21547 root 1w REG 1273,181606 32379904 468374378476734932 /mnt/lustre2/racer2/5 dd 29365 root 1w REG 1273,181606 61625344 450359979916919106 /mnt/lustre/racer1/0 /mnt/lustre2 is still busy, wait one second
Console log on client wtm-29vm6 showed that:
00:42:04:cat D 0000000000000000 0 4282 2714 0x00000080 00:42:04: ffff880062aefc90 0000000000000082 0000000000000246 000000100000000d 00:42:04: ffffffff00000050 ffffffffa06b3390 ffff88006c91e1e8 ffff88006c91ca28 00:42:04: ffff880065a2f058 ffff880062aeffd8 000000000000fb88 ffff880065a2f058 00:42:04:Call Trace: 00:42:04: [<ffffffffa06b3390>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc] 00:42:04: [<ffffffffa09a9fa0>] ? lov_io_init_raid0+0x3e0/0x800 [lov] 00:42:04: [<ffffffff81510695>] rwsem_down_failed_common+0x95/0x1d0 00:42:04: [<ffffffff81510826>] rwsem_down_read_failed+0x26/0x30 00:42:04: [<ffffffff81283844>] call_rwsem_down_read_failed+0x14/0x30 00:42:04: [<ffffffff8150fd24>] ? down_read+0x24/0x30 00:42:04: [<ffffffffa0a7182d>] ll_file_io_generic+0x29d/0x600 [lustre] 00:42:04: [<ffffffffa0a71ccf>] ll_file_aio_read+0x13f/0x2c0 [lustre] 00:42:04: [<ffffffffa0a7256c>] ll_file_read+0x16c/0x2a0 [lustre] 00:42:04: [<ffffffff81181a95>] vfs_read+0xb5/0x1a0 00:42:04: [<ffffffff81181bd1>] sys_read+0x51/0x90 00:42:04: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 00:42:04: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Stack trace on client wtm-29vm5 showed that:
dd S 0000000000000000 0 3713 1 0x00000080 ffff880042549bd8 0000000000000086 ffffffffa0581e5c ffff8800679bf5b8 0000000000000002 ffff880067ac4040 ffff880042549be8 ffffffffa055d35c ffff880067ac45f8 ffff880042549fd8 000000000000fb88 ffff880067ac45f8 Call Trace: [<ffffffffa055d35c>] ? cl_lock_trace0+0x11c/0x130 [obdclass] [<ffffffffa0554825>] ? cl_env_info+0x15/0x20 [obdclass] [<ffffffffa03d86fe>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa056171a>] cl_lock_state_wait+0x1aa/0x320 [obdclass] [<ffffffff81063410>] ? default_wake_function+0x0/0x20 [<ffffffffa0561f0b>] cl_enqueue_locked+0x15b/0x1f0 [obdclass] [<ffffffffa0562a8e>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa0567d2c>] cl_io_lock+0x3cc/0x560 [obdclass] [<ffffffffa0567f62>] cl_io_loop+0xa2/0x1b0 [obdclass] [<ffffffffa09df9e0>] ll_file_io_generic+0x450/0x600 [lustre] [<ffffffffa09e0922>] ll_file_aio_write+0x142/0x2c0 [lustre] [<ffffffffa09e0c0c>] ll_file_write+0x16c/0x2a0 [lustre] [<ffffffff81181368>] vfs_write+0xb8/0x1a0 [<ffffffff81181c61>] sys_write+0x51/0x90 [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Maloo report: https://maloo.whamcloud.com/test_sets/66e0f40c-1649-11e3-aa2a-52540035b04c
Attachments
Issue Links
- mentioned in
-
Page Loading...