Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.6.0
-
None
-
RHEL6.5 clients running Lustre 2.6.0
-
3
-
17122
Description
Recently users have run into conditions where their file transfers never complete. The back trace observed is as follows:
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: INFO: task tar:12650 blocked
for more than 120 seconds.
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: Tainted: G W
--------------- 2.6.32-504.el6.x86_64 #1
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: tar D
0000000000000002 0 12650 11138 0x00000080
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: ffff8805f6919aa8
0000000000000086 ffff8805f6919a48 ffff8805f6919a28
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: ffff8805f6919a28
0000000000000082 ffff880122eebaa8 ffff8800282919a0
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: ffffffff81aac480
0000000000000282 ffff8808398b7058 ffff8805f6919fd8
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: Call Trace:
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a41e25>] ?
lustre_msg_buf+0x55/0x60 [ptlrpc]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152b346>]
__mutex_lock_slowpath+0x96/0x210
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a696ce>] ?
req_capsule_get_size+0x4e/0x90 [ptlrpc]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152ae6b>]
mutex_lock+0x2b/0x50
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0c1a96c>]
mdc_reint+0x3c/0x3b0 [mdc]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0c1c6f0>]
mdc_setattr+0x2a0/0xa00 [mdc]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0bc6962>]
lmv_setattr+0x232/0x5c0 [lmv]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfcfe6>]
ll_md_setattr+0x116/0x940 [lustre]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a1fc60>] ?
ldlm_completion_ast+0x0/0x930 [ptlrpc]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152c166>] ?
down_read+0x16/0x30
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfde6f>]
ll_setattr_raw+0x24f/0x10d0 [lustre]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811b07b0>] ?
mntput_no_expire+0x30/0x110
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfed55>]
ll_setattr+0x65/0xd0 [lustre]
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811ad0a8>]
notify_change+0x168/0x340
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c15ac>]
utimes_common+0xdc/0x1b0
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff810f036e>] ?
call_rcu+0xe/0x10
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811b07b0>] ?
mntput_no_expire+0x30/0x110
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c1750>]
do_utimes+0xd0/0x170
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c18f2>]
sys_utimensat+0x32/0x90
Jan 2 23:10:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: Call Trace:
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a41e25>] ?
lustre_msg_buf+0x55/0x60 [ptlrpc]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152b346>]
__mutex_lock_slowpath+0x96/0x210
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a696ce>] ?
req_capsule_get_size+0x4e/0x90 [ptlrpc]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152ae6b>]
mutex_lock+0x2b/0x50
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0c1a96c>]
mdc_reint+0x3c/0x3b0 [mdc]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0c1c6f0>]
mdc_setattr+0x2a0/0xa00 [mdc]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0bc6962>]
lmv_setattr+0x232/0x5c0 [lmv]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfcfe6>]
ll_md_setattr+0x116/0x940 [lustre]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0a1fc60>] ?
ldlm_completion_ast+0x0/0x930 [ptlrpc]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8152c166>] ?
down_read+0x16/0x30
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfde6f>]
ll_setattr_raw+0x24f/0x10d0 [lustre]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811b07b0>] ?
mntput_no_expire+0x30/0x110
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffffa0cfed55>]
ll_setattr+0x65/0xd0 [lustre]
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811ad0a8>]
notify_change+0x168/0x340
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c15ac>]
utimes_common+0xdc/0x1b0
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff810f036e>] ?
call_rcu+0xe/0x10
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811b07b0>] ?
mntput_no_expire+0x30/0x110
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c1750>]
do_utimes+0xd0/0x170
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff811c18f2>]
sys_utimensat+0x32/0x90
Jan 2 23:12:28 dtn01.ccs.ornl.gov kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b