Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
None
-
Lustre 2.8.0, Lustre 2.10.0, Lustre 2.11.0, Lustre 2.12.0
-
None
-
server: lustre-master build # 3071 EL7 ldiskfs
client: lustre-master build # 3071 EL6 ldiskfs
-
3
-
9223372036854775807
Description
This issue was created by maloo for sarah_lw <wei3.liu@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/8303c8ec-13d8-11e5-b4b0-5254006e85c2.
The sub-test test_1 failed with the following error:
test failed to respond and timed out
There is no MDS logs
client console
16:40:41:Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-40vm5,onyx-40vm6.onyx.hpdd.intel.com DURATION=900 == 09:30:44 (1434299444) 16:40:41:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 16:40:41:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 16:40:41:LustreError: 21155:0:(llite_lib.c:1497:ll_md_setattr()) md_setattr fails: rc = -30 16:40:42:Lustre: 21248:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299452/real 1434299452] req@ffff88006d4753c0 x1503970942065616/t0(0) o36->lustre-MDT0000-mdc-ffff88007c74cc00@10.2.4.185@tcp:12/10 lens 496/568 e 0 to 1 dl 1434299459 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 16:40:42:Lustre: lustre-MDT0000-mdc-ffff88007c74cc00: Connection to lustre-MDT0000 (at 10.2.4.185@tcp) was lost; in progress operations using this service will wait for recovery to complete 16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299452/real 1434299452] req@ffff880070c73c80 x1503970942069540/t0(0) o400->MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 224/224 e 0 to 1 dl 1434299459 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 11 previous similar messages 16:40:43:LustreError: 166-1: MGC10.2.4.185@tcp: Connection to MGS (at 10.2.4.185@tcp) was lost; in progress operations using this service will fail 16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299457/real 1434299457] req@ffff880070cd1080 x1503970942081192/t0(0) o400->lustre-MDT0000-mdc-ffff880079777c00@10.2.4.185@tcp:12/10 lens 224/224 e 0 to 1 dl 1434299464 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 1 previous similar message 16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299470/real 1434299470] req@ffff880070167380 x1503970942081360/t0(0) o250->MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 520/544 e 0 to 1 dl 1434299481 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 3 previous similar messages 16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1434299485/real 0] req@ffff880070a2c3c0 x1503970942081548/t0(0) o250->MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 520/544 e 0 to 1 dl 1434299501 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages 16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299515/real 1434299515] req@ffff88007010b080 x1503970942081892/t0(0) o38->lustre-MDT0000-mdc-ffff880079777c00@10.2.4.185@tcp:12/10 lens 520/544 e 0 to 1 dl 1434299531 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299535/real 1434299536] req@ffff88006d7f20c0 x1503970942082156/t0(0) o38->lustre-MDT0000-mdc-ffff88007c74cc00@10.2.4.185@tcp:12/10 lens 520/544 e 0 to 1 dl 1434299556 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages 16:40:44:INFO: task dir_create.sh:5103 blocked for more than 120 seconds. 16:40:44: Not tainted 2.6.32-504.16.2.el6.x86_64 #1 16:40:44:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 16:40:45:dir_create.sh D 0000000000000000 0 5103 5096 0x00000080 16:40:45: ffff88007c2e1bc8 0000000000000082 0000000000000000 ffff880000000000 16:40:45: ffff880000000065 ffff88007a19e104 ffffffffa1b012c0 0000000000000000 16:40:46: 000200000a0204b9 ffffffffa1dba813 ffff880079059068 ffff88007c2e1fd8 16:40:46:Call Trace: 16:40:46: [<ffffffffa1d5f5f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] 16:40:46: [<ffffffffa1d86a22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] 16:40:46: [<ffffffff8152b7a6>] __mutex_lock_slowpath+0x96/0x210 16:40:46: [<ffffffff8152b2cb>] mutex_lock+0x2b/0x50 16:40:46: [<ffffffffa1f40c7a>] mdc_close+0x19a/0xa60 [mdc] 16:40:46: [<ffffffffa2017c09>] ? ll_i2suppgid+0x19/0x30 [lustre] 16:40:46: [<ffffffffa1efefe7>] lmv_close+0x2c7/0x580 [lmv] 16:40:46: [<ffffffffa1feac4f>] ll_close_inode_openhandle+0x2ef/0xe90 [lustre] 16:40:47: [<ffffffffa1febb97>] ll_md_real_close+0x197/0x210 [lustre] 16:40:47: [<ffffffffa1fed331>] ll_file_release+0x641/0xad0 [lustre] 16:40:47: [<ffffffff8118fb85>] __fput+0xf5/0x210 16:40:47: [<ffffffff8118fcc5>] fput+0x25/0x30 16:40:47: [<ffffffff8118af1d>] filp_close+0x5d/0x90 16:40:47: [<ffffffff811a3760>] sys_dup3+0x130/0x190 16:40:47: [<ffffffff811a37d4>] sys_dup2+0x14/0x50 16:40:47: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 16:40:47:INFO: task cat:5274 blocked for more than 120 seconds. 16:40:47: Not tainted 2.6.32-504.16.2.el6.x86_64 #1 16:40:47:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 16:40:47:cat D 0000000000000000 0 5274 5133 0x00000080 16:40:47: ffff880071bfbbf8 0000000000000086 0000000000000000 ffff880000000000 16:40:47: ffff880000000065 ffff88007a19e104 ffffffffa1b01f80 0000000000000000 16:40:47: 000200000a0204b9 ffffffffa1dba813 ffff880071bf9ad8 ffff880071bfbfd8 16:40:48:Call Trace: 16:40:48: [<ffffffffa1d5f5f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] 16:40:48: [<ffffffffa1d86a22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] 16:40:48: [<ffffffff8152b7a6>] __mutex_lock_slowpath+0x96/0x210 16:40:48: [<ffffffff8152b2cb>] mutex_lock+0x2b/0x50 16:40:48: [<ffffffffa1f40c7a>] mdc_close+0x19a/0xa60 [mdc] 16:40:49: [<ffffffffa2017c09>] ? ll_i2suppgid+0x19/0x30 [lustre] 16:40:49: [<ffffffffa1efefe7>] lmv_close+0x2c7/0x580 [lmv] 16:40:49: [<ffffffffa1feac4f>] ll_close_inode_openhandle+0x2ef/0xe90 [lustre] 16:40:49: [<ffffffffa1febb97>] ll_md_real_close+0x197/0x210 [lustre] 16:40:49: [<ffffffffa1fed331>] ll_file_release+0x641/0xad0 [lustre] 16:40:49: [<ffffffff8118fb85>] __fput+0xf5/0x210 16:40:49: [<ffffffff8118fcc5>] fput+0x25/0x30 16:40:50: [<ffffffff8118af1d>] filp_close+0x5d/0x90 16:40:50: [<ffffffff8118aff5>] sys_close+0xa5/0x100 16:40:50: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 16:40:50:INFO: task dd:6983 blocked for more than 120 seconds. 16:40:50: Not tainted 2.6.32-504.16.2.el6.x86_64 #1 16:40:50:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 16:40:50:dd D 0000000000000000 0 6983 5154 0x00000080 16:40:50: ffff880071917bf8 0000000000000086 0000000000000000 ffff880000000000 16:40:50: ffff880000000065 ffff88007a56c244 ffffffffa1b03a00 0000000000000000 16:40:51: 000200000a0204b9 ffffffffa1dba813 ffff880070fafad8 ffff880071917fd8 16:40:51:Call Trace: 16:40:51: [<ffffffffa1d5f5f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] 16:40:51: [<ffffffffa1d86a22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] 16:40:51: [<ffffffff8152b7a6>] __mutex_lock_slowpath+0x96/0x210 16:40:51: [<ffffffff8152b2cb>] mutex_lock+0x2b/0x50 16:40:51: [<ffffffffa1f40c7a>] mdc_close+0x19a/0xa60 [mdc] 16:40:51: [<ffffffffa2017c09>] ? ll_i2suppgid+0x19/0x30 [lustre] 16:40:52: [<ffffffffa1efefe7>] lmv_close+0x2c7/0x580 [lmv] 16:40:52: [<ffffffffa1feac4f>] ll_close_inode_openhandle+0x2ef/0xe90 [lustre] 16:40:52: [<ffffffffa1febb97>] ll_md_real_close+0x197/0x210 [lustre] 16:40:52: [<ffffffffa1fed331>] ll_file_release+0x641/0xad0 [lustre] 16:40:53: [<ffffffff8118fb85>] __fput+0xf5/0x210 16:40:53: [<ffffffff8118fcc5>] fput+0x25/0x30 16:40:53: [<ffffffff8118af1d>] filp_close+0x5d/0x90 16:40:53: [<ffffffff8118aff5>] sys_close+0xa5/0x100 16:40:53: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Attachments
Issue Links
- is related to
-
LU-10525 racer test_1: Failure to initialize cl object
- Open
- is related to
-
LU-11359 racer test 1 times out with client hung in dir_create.sh, ls, … and MDS in ldlm_completion_ast()
- Resolved
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...