Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Won't Fix
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.8.0, Lustre 2.10.0, Lustre 2.11.0, Lustre 2.12.0
Labels:
None
Environment:
server: lustre-master build # 3071 EL7 ldiskfs
client: lustre-master build # 3071 EL6 ldiskfs

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/8303c8ec-13d8-11e5-b4b0-5254006e85c2.

The sub-test test_1 failed with the following error:

test failed to respond and timed out

There is no MDS logs
client console

16:40:41:Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-40vm5,onyx-40vm6.onyx.hpdd.intel.com DURATION=900 == 09:30:44 (1434299444)
16:40:41:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
16:40:41:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
16:40:41:LustreError: 21155:0:(llite_lib.c:1497:ll_md_setattr()) md_setattr fails: rc = -30
16:40:42:Lustre: 21248:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299452/real 1434299452]  req@ffff88006d4753c0 x1503970942065616/t0(0) o36->lustre-MDT0000-mdc-ffff88007c74cc00@10.2.4.185@tcp:12/10 lens 496/568 e 0 to 1 dl 1434299459 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
16:40:42:Lustre: lustre-MDT0000-mdc-ffff88007c74cc00: Connection to lustre-MDT0000 (at 10.2.4.185@tcp) was lost; in progress operations using this service will wait for recovery to complete
16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299452/real 1434299452]  req@ffff880070c73c80 x1503970942069540/t0(0) o400->MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 224/224 e 0 to 1 dl 1434299459 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
16:40:43:LustreError: 166-1: MGC10.2.4.185@tcp: Connection to MGS (at 10.2.4.185@tcp) was lost; in progress operations using this service will fail
16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299457/real 1434299457]  req@ffff880070cd1080 x1503970942081192/t0(0) o400->lustre-MDT0000-mdc-ffff880079777c00@10.2.4.185@tcp:12/10 lens 224/224 e 0 to 1 dl 1434299464 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:43:Lustre: 31077:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 1 previous similar message
16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299470/real 1434299470]  req@ffff880070167380 x1503970942081360/t0(0) o250->MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 520/544 e 0 to 1 dl 1434299481 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
16:40:43:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1434299485/real 0]  req@ffff880070a2c3c0 x1503970942081548/t0(0) o250->MGC10.2.4.185@tcp@10.2.4.185@tcp:26/25 lens 520/544 e 0 to 1 dl 1434299501 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299515/real 1434299515]  req@ffff88007010b080 x1503970942081892/t0(0) o38->lustre-MDT0000-mdc-ffff880079777c00@10.2.4.185@tcp:12/10 lens 520/544 e 0 to 1 dl 1434299531 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434299535/real 1434299536]  req@ffff88006d7f20c0 x1503970942082156/t0(0) o38->lustre-MDT0000-mdc-ffff88007c74cc00@10.2.4.185@tcp:12/10 lens 520/544 e 0 to 1 dl 1434299556 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
16:40:44:Lustre: 31075:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
16:40:44:INFO: task dir_create.sh:5103 blocked for more than 120 seconds.
16:40:44:      Not tainted 2.6.32-504.16.2.el6.x86_64 #1
16:40:44:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
16:40:45:dir_create.sh D 0000000000000000     0  5103   5096 0x00000080
16:40:45: ffff88007c2e1bc8 0000000000000082 0000000000000000 ffff880000000000
16:40:45: ffff880000000065 ffff88007a19e104 ffffffffa1b012c0 0000000000000000
16:40:46: 000200000a0204b9 ffffffffa1dba813 ffff880079059068 ffff88007c2e1fd8
16:40:46:Call Trace:
16:40:46: [<ffffffffa1d5f5f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
16:40:46: [<ffffffffa1d86a22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
16:40:46: [<ffffffff8152b7a6>] __mutex_lock_slowpath+0x96/0x210
16:40:46: [<ffffffff8152b2cb>] mutex_lock+0x2b/0x50
16:40:46: [<ffffffffa1f40c7a>] mdc_close+0x19a/0xa60 [mdc]
16:40:46: [<ffffffffa2017c09>] ? ll_i2suppgid+0x19/0x30 [lustre]
16:40:46: [<ffffffffa1efefe7>] lmv_close+0x2c7/0x580 [lmv]
16:40:46: [<ffffffffa1feac4f>] ll_close_inode_openhandle+0x2ef/0xe90 [lustre]
16:40:47: [<ffffffffa1febb97>] ll_md_real_close+0x197/0x210 [lustre]
16:40:47: [<ffffffffa1fed331>] ll_file_release+0x641/0xad0 [lustre]
16:40:47: [<ffffffff8118fb85>] __fput+0xf5/0x210
16:40:47: [<ffffffff8118fcc5>] fput+0x25/0x30
16:40:47: [<ffffffff8118af1d>] filp_close+0x5d/0x90
16:40:47: [<ffffffff811a3760>] sys_dup3+0x130/0x190
16:40:47: [<ffffffff811a37d4>] sys_dup2+0x14/0x50
16:40:47: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
16:40:47:INFO: task cat:5274 blocked for more than 120 seconds.
16:40:47:      Not tainted 2.6.32-504.16.2.el6.x86_64 #1
16:40:47:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
16:40:47:cat           D 0000000000000000     0  5274   5133 0x00000080
16:40:47: ffff880071bfbbf8 0000000000000086 0000000000000000 ffff880000000000
16:40:47: ffff880000000065 ffff88007a19e104 ffffffffa1b01f80 0000000000000000
16:40:47: 000200000a0204b9 ffffffffa1dba813 ffff880071bf9ad8 ffff880071bfbfd8
16:40:48:Call Trace:
16:40:48: [<ffffffffa1d5f5f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
16:40:48: [<ffffffffa1d86a22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
16:40:48: [<ffffffff8152b7a6>] __mutex_lock_slowpath+0x96/0x210
16:40:48: [<ffffffff8152b2cb>] mutex_lock+0x2b/0x50
16:40:48: [<ffffffffa1f40c7a>] mdc_close+0x19a/0xa60 [mdc]
16:40:49: [<ffffffffa2017c09>] ? ll_i2suppgid+0x19/0x30 [lustre]
16:40:49: [<ffffffffa1efefe7>] lmv_close+0x2c7/0x580 [lmv]
16:40:49: [<ffffffffa1feac4f>] ll_close_inode_openhandle+0x2ef/0xe90 [lustre]
16:40:49: [<ffffffffa1febb97>] ll_md_real_close+0x197/0x210 [lustre]
16:40:49: [<ffffffffa1fed331>] ll_file_release+0x641/0xad0 [lustre]
16:40:49: [<ffffffff8118fb85>] __fput+0xf5/0x210
16:40:49: [<ffffffff8118fcc5>] fput+0x25/0x30
16:40:50: [<ffffffff8118af1d>] filp_close+0x5d/0x90
16:40:50: [<ffffffff8118aff5>] sys_close+0xa5/0x100
16:40:50: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
16:40:50:INFO: task dd:6983 blocked for more than 120 seconds.
16:40:50:      Not tainted 2.6.32-504.16.2.el6.x86_64 #1
16:40:50:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
16:40:50:dd            D 0000000000000000     0  6983   5154 0x00000080
16:40:50: ffff880071917bf8 0000000000000086 0000000000000000 ffff880000000000
16:40:50: ffff880000000065 ffff88007a56c244 ffffffffa1b03a00 0000000000000000
16:40:51: 000200000a0204b9 ffffffffa1dba813 ffff880070fafad8 ffff880071917fd8
16:40:51:Call Trace:
16:40:51: [<ffffffffa1d5f5f5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
16:40:51: [<ffffffffa1d86a22>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
16:40:51: [<ffffffff8152b7a6>] __mutex_lock_slowpath+0x96/0x210
16:40:51: [<ffffffff8152b2cb>] mutex_lock+0x2b/0x50
16:40:51: [<ffffffffa1f40c7a>] mdc_close+0x19a/0xa60 [mdc]
16:40:51: [<ffffffffa2017c09>] ? ll_i2suppgid+0x19/0x30 [lustre]
16:40:52: [<ffffffffa1efefe7>] lmv_close+0x2c7/0x580 [lmv]
16:40:52: [<ffffffffa1feac4f>] ll_close_inode_openhandle+0x2ef/0xe90 [lustre]
16:40:52: [<ffffffffa1febb97>] ll_md_real_close+0x197/0x210 [lustre]
16:40:52: [<ffffffffa1fed331>] ll_file_release+0x641/0xad0 [lustre]
16:40:53: [<ffffffff8118fb85>] __fput+0xf5/0x210
16:40:53: [<ffffffff8118fcc5>] fput+0x25/0x30
16:40:53: [<ffffffff8118af1d>] filp_close+0x5d/0x90
16:40:53: [<ffffffff8118aff5>] sys_close+0xa5/0x100
16:40:53: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Attachments

Issue Links

is related to

LU-10525 racer test_1: Failure to initialize cl object

Open

is related to

LU-11359 racer test 1 times out with client hung in dir_create.sh, ls, … and MDS in ldlm_completion_ast()

Resolved

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(16 mentioned in)

Activity

People

Assignee:: WC Triage

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 23/Jun/15 10:26 PM

Updated:: 11/Sep/18 11:55 PM

Resolved:: 10/Sep/18 11:45 PM