[LU-6811] Interop 2.5.3<->master racer test_1: short description of the failure Created: 08/Jul/15  Updated: 11/Sep/18  Resolved: 11/Sep/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

server: 2.5.3
client: lustre-master build # 3092 RHEL6.6


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ba481c76-2513-11e5-9d0f-5254006e85c2.

The sub-test test_1 failed with the following error:

test failed to respond and timed out

test log

01:00:17:Lustre: DEBUG MARKER: == racer test 1: racer on clients: shadow-13vm10.shadow.whamcloud.com,shadow-13vm9 DURATION=900 == 00:55:29 (1436230529)
01:00:17:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
01:00:17:Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
01:00:17:LustreError: 1701:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000402:0x2:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 30151:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000403:0x7a:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 1409:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000400:0x42:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 11-0: lustre-OST0003-osc-ffff88002f179000: operation ldlm_enqueue to node 10.1.4.156@tcp failed: rc = -12
01:00:17:LustreError: 11381:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000400:0x93:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 28706:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000400:0x11c:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 3238:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff88002f179000: inode [0x200000401:0x17b:0x0] mdc close failed: rc = -13
01:00:17:LustreError: 8305:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000400:0x155:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 8305:0:(llite_lib.c:1647:ll_setattr_raw()) Skipped 1 previous similar message
01:00:17:LustreError: 11-0: lustre-OST0000-osc-ffff880064962c00: operation ldlm_enqueue to node 10.1.4.156@tcp failed: rc = -12
01:00:17:LustreError: 8995:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000400:0x2b9:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 8995:0:(llite_lib.c:1647:ll_setattr_raw()) Skipped 8 previous similar messages
01:00:17:LustreError: 11-0: lustre-OST0000-osc-ffff88002f179000: operation ldlm_enqueue to node 10.1.4.156@tcp failed: rc = -12
01:00:17:LustreError: Skipped 1 previous similar message
01:00:17:LustreError: 11815:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000402:0x384:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 11815:0:(llite_lib.c:1647:ll_setattr_raw()) Skipped 14 previous similar messages
01:00:17:LustreError: 13286:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000400:0x49a:0x0]: -16
01:00:17:LustreError: 4491:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x564:0x0]: -16
01:00:17:6[24703]: segfault at 8 ip 000000385220b453 sp 00007fff93158fa0 error 4 in ld-2.12.so[3852200000+20000]
01:00:17:6[26914]: segfault at 8 ip 000000385220b453 sp 00007fff31d4c470 error 4 in ld-2.12.so[3852200000+20000]
01:00:17:LustreError: 11-0: lustre-MDT0000-mdc-ffff88002f179000: operation ldlm_enqueue to node 10.1.4.157@tcp failed: rc = -71
01:00:17:LustreError: Skipped 1 previous similar message
01:00:17:LustreError: 3152:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000400:0x7b0:0x0]HSM set dirty failed: rc2 = -22
01:00:17:LustreError: 3152:0:(llite_lib.c:1647:ll_setattr_raw()) Skipped 21 previous similar messages
01:00:17:LustreError: 11-0: lustre-OST0004-osc-ffff88002f179000: operation ldlm_enqueue to node 10.1.4.156@tcp failed: rc = -12
01:00:17:LustreError: 15304:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff88002f179000: inode [0x200000403:0x941:0x0] mdc close failed: rc = -13
01:00:17:LustreError: 20600:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff88002f179000: inode [0x200000403:0xa2c:0x0] mdc close failed: rc = -13
01:00:17:LustreError: 11-0: lustre-MDT0000-mdc-ffff880064962c00: operation ldlm_enqueue to node 10.1.4.157@tcp failed: rc = -71
01:00:17:LustreError: Skipped 1 previous similar message
01:00:17:LustreError: 3212:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff88002f179000: inode [0x200000403:0xc20:0x0] mdc close failed: rc = -13
01:00:17:LustreError: 29399:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff880064962c00: inode [0x200000400:0xe62:0x0] mdc close failed: rc = -13
01:00:17:LustreError: 11-0: lustre-MDT0000-mdc-ffff880064962c00: operation ldlm_enqueue to node 10.1.4.157@tcp failed: rc = -71
01:09:53:LustreError: 15097:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000401:0xec9:0x0]HSM set dirty failed: rc2 = -22
01:09:53:LustreError: 15097:0:(llite_lib.c:1647:ll_setattr_raw()) Skipped 48 previous similar messages
01:09:53:LustreError: 16369:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000400:0x12ad:0x0]: -16
01:09:53:LustreError: 10510:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff880064962c00: inode [0x200000402:0x10c6:0x0] mdc close failed: rc = -13
01:09:53:LustreError: 11-0: lustre-MDT0000-mdc-ffff880064962c00: operation ldlm_enqueue to node 10.1.4.157@tcp failed: rc = -71
01:09:53:LustreError: Skipped 10 previous similar messages
01:09:53:LustreError: 28692:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff880064962c00: inode [0x200000400:0x16f6:0x0] mdc close failed: rc = -13
01:09:53:3[31441]: segfault at 8 ip 000000385220b453 sp 00007fff7e9345d0 error 4 in ld-2.12.so[3852200000+20000]
01:09:53:LustreError: 12673:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000400:0x1ab7:0x0]: -16
01:09:54:LustreError: 20907:0:(llite_lib.c:1647:ll_setattr_raw()) [0x200000402:0x1752:0x0]HSM set dirty failed: rc2 = -22
01:09:54:LustreError: 20907:0:(llite_lib.c:1647:ll_setattr_raw()) Skipped 105 previous similar messages
01:09:54:LustreError: 5767:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff880064962c00: inode [0x200000400:0x1cc1:0x0] mdc close failed: rc = -13
01:09:54:Lustre: 1525:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1436231046/real 1436231046]  req@ffff88005ecd36c0 x1505996565685080/t0(0) o36->lustre-MDT0000-mdc-ffff88002f179000@10.1.4.157@tcp:12/10 lens 608/1016 e 0 to 1 dl 1436231090 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
01:09:54:Lustre: 1525:0:(client.c:2018:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
01:09:54:Lustre: lustre-MDT0000-mdc-ffff88002f179000: Connection to lustre-MDT0000 (at 10.1.4.157@tcp) was lost; in progress operations using this service will wait for recovery to complete
01:09:54:Lustre: Skipped 1 previous similar message
01:09:54:Lustre: lustre-MDT0000-mdc-ffff88002f179000: Connection restored to lustre-MDT0000 (at 10.1.4.157@tcp)
01:09:54:Lustre: Skipped 2 previous similar messages
01:09:54:16[3172]: segfault at 8 ip 000000385220b453 sp 00007fff9358d480 error 4 in ld-2.12.so[3852200000+20000]
01:09:54:LustreError: 17858:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff88002f179000: inode [0x200000401:0x2162:0x0] mdc close failed: rc = -13
01:09:54:LustreError: 2957:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000400:0x26ad:0x0]: -16
01:09:54:LustreError: 11-0: lustre-MDT0000-mdc-ffff880064962c00: operation ldlm_enqueue to node 10.1.4.157@tcp failed: rc = -71
01:09:54:LustreError: Skipped 17 previous similar messages
01:09:54:19[28476]: segfault at 8 ip 000000385220b453 sp 00007fffe6786060 error 4 in ld-2.12.so[3852200000+20000]
01:09:54:INFO: task chmod:1525 blocked for more than 120 seconds.
01:09:54:      Tainted: G        W  ---------------    2.6.32-504.23.4.el6.x86_64 #1
01:09:54:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
01:09:54:chmod         D 0000000000000001     0  1525   1393 0x00000080
01:09:54: ffff88005a2ffcb8 0000000000000082 ffff88007904d440 0000000000000001
01:09:54: 00000000000158c0 ffff88007904d440 0000000000000001 ffff880037d9ccc0
01:09:54: ffff88005a2ffcf8 ffffffff81529a1e ffff88006a33bad8 ffff88005a2fffd8
01:09:54:Call Trace:
01:09:54: [<ffffffff81529a1e>] ? thread_return+0x4e/0x7d0
01:09:54: [<ffffffff8152b3a6>] __mutex_lock_slowpath+0x96/0x210
01:09:54: [<ffffffff8152aecb>] mutex_lock+0x2b/0x50
01:09:54: [<ffffffffa0a88755>] ll_setattr_raw+0x735/0x1430 [lustre]
01:09:54: [<ffffffffa0a894b5>] ll_setattr+0x65/0xd0 [lustre]
01:09:54: [<ffffffff811ad2d8>] notify_change+0x168/0x340
01:09:54: [<ffffffff8118ba97>] sys_fchmodat+0x117/0x170
01:09:54: [<ffffffff81193ee6>] ? sys_newstat+0x36/0x50
01:09:54: [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60
01:09:54: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
01:09:54:LustreError: 29000:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000400:0x2d2f:0x0]: -16
01:09:54:18[19161]: segfault at 8 ip 000000385220b453 sp 00007fff040c3900 error 4 in ld-2.12.so[3852200000+20000]
01:16:48:LustreError: 31170:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff880064962c00: inode [0x200000401:0x2b84:0x0] mdc close failed: rc = -13
01:16:48:LustreError: 31170:0:(file.c:184:ll_close_inode_openhandle()) Skipped 4 previous similar messages
01:16:48:INFO: task chmod:1525 blocked for more than 120 seconds.
01:16:48:      Tainted: G        W  ---------------    2.6.32-504.23.4.el6.x86_64 #1
01:16:48:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
01:16:48:chmod         D 0000000000000001     0  1525      1 0x00000080
01:16:48: ffff88005a2ffcb8 0000000000000082 ffff88007904d440 0000000000000001
01:16:48: 00000000000158c0 ffff88007904d440 0000000000000001 ffff880037d9ccc0
01:16:48: ffff88005a2ffcf8 ffffffff81529a1e ffff88006a33bad8 ffff88005a2fffd8
01:16:48:Call Trace:
01:16:48: [<ffffffff81529a1e>] ? thread_return+0x4e/0x7d0
01:16:48: [<ffffffff8152b3a6>] __mutex_lock_slowpath+0x96/0x210
01:16:48: [<ffffffff8152aecb>] mutex_lock+0x2b/0x50
01:16:48: [<ffffffffa0a88755>] ll_setattr_raw+0x735/0x1430 [lustre]
01:16:48: [<ffffffffa0a894b5>] ll_setattr+0x65/0xd0 [lustre]
01:16:48: [<ffffffff811ad2d8>] notify_change+0x168/0x340
01:16:48: [<ffffffff8118ba97>] sys_fchmodat+0x117/0x170
01:16:48: [<ffffffff81193ee6>] ? sys_newstat+0x36/0x50
01:16:48: [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60
01:16:48: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
01:16:48:INFO: task chmod:1525 blocked for more than 120 seconds.
01:16:48:      Tainted: G        W  ---------------    2.6.32-504.23.4.el6.x86_64 #1
01:16:48:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
01:16:48:chmod         D 0000000000000001     0  1525      1 0x00000080
01:16:48: ffff88005a2ffcb8 0000000000000082 ffff88007904d440 0000000000000001
01:16:48: 00000000000158c0 ffff88007904d440 0000000000000001 ffff880037d9ccc0
01:16:48: ffff88005a2ffcf8 ffffffff81529a1e ffff88006a33bad8 ffff88005a2fffd8
01:16:48:Call Trace:
01:16:48: [<ffffffff81529a1e>] ? thread_return+0x4e/0x7d0
01:16:48: [<ffffffff8152b3a6>] __mutex_lock_slowpath+0x96/0x210
01:16:48: [<ffffffff8152aecb>] mutex_lock+0x2b/0x50
01:16:48: [<ffffffffa0a88755>] ll_setattr_raw+0x735/0x1430 [lustre]
01:16:48: [<ffffffffa0a894b5>] ll_setattr+0x65/0xd0 [lustre]
01:16:48: [<ffffffff811ad2d8>] notify_change+0x168/0x340
01:16:48: [<ffffffff8118ba97>] sys_fchmodat+0x117/0x170
01:16:48: [<ffffffff81193ee6>] ? sys_newstat+0x36/0x50
01:16:48: [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60
01:16:48: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
01:16:48:INFO: task chmod:1525 blocked for more than 120 seconds.
01:16:48:      Tainted: G        W  ---------------    2.6.32-504.23.4.el6.x86_64 #1
01:16:48:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
01:16:48:chmod         D 0000000000000001     0  1525      1 0x00000080
01:16:48: ffff88005a2ffcb8 0000000000000082 ffff88007904d440 0000000000000001
01:16:48: 00000000000158c0 ffff88007904d440 0000000000000001 ffff880037d9ccc0
01:16:48: ffff88005a2ffcf8 ffffffff81529a1e ffff88006a33bad8 ffff88005a2fffd8
01:16:48:Call Trace:
01:16:48: [<ffffffff81529a1e>] ? thread_return+0x4e/0x7d0
01:16:48: [<ffffffff8152b3a6>] __mutex_lock_slowpath+0x96/0x210
01:16:48: [<ffffffff8152aecb>] mutex_lock+0x2b/0x50
01:16:48: [<ffffffffa0a88755>] ll_setattr_raw+0x735/0x1430 [lustre]
01:16:48: [<ffffffffa0a894b5>] ll_setattr+0x65/0xd0 [lustre]
01:16:48: [<ffffffff811ad2d8>] notify_change+0x168/0x340
01:16:48: [<ffffffff8118ba97>] sys_fchmodat+0x117/0x170
01:16:48: [<ffffffff81193ee6>] ? sys_newstat+0x36/0x50
01:16:48: [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60
01:16:48: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
01:16:48:INFO: task chmod:1525 blocked for more than 120 seconds.
01:16:48:      Tainted: G        W  ---------------    2.6.32-504.23.4.el6.x86_64 #1
01:16:48:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
01:16:48:chmod         D 0000000000000001     0  1525      1 0x00000080
01:16:48: ffff88005a2ffcb8 0000000000000082 ffff88007904d440 0000000000000001
01:16:48: 00000000000158c0 ffff88007904d440 0000000000000001 ffff880037d9ccc0
01:16:48: ffff88005a2ffcf8 ffffffff81529a1e ffff88006a33bad8 ffff88005a2fffd8
01:16:48:Call Trace:
01:16:48: [<ffffffff81529a1e>] ? thread_return+0x4e/0x7d0
01:16:48: [<ffffffff8152b3a6>] __mutex_lock_slowpath+0x96/0x210
01:16:48: [<ffffffff8152aecb>] mutex_lock+0x2b/0x50
01:16:48: [<ffffffffa0a88755>] ll_setattr_raw+0x735/0x1430 [lustre]
01:16:48: [<ffffffffa0a894b5>] ll_setattr+0x65/0xd0 [lustre]
01:16:48: [<ffffffff811ad2d8>] notify_change+0x168/0x340
01:16:48: [<ffffffff8118ba97>] sys_fchmodat+0x117/0x170
01:24:43: [<ffffffff81193ee6>] ? sys_newstat+0x36/0x50
01:24:43: [<ffffffff8100c6f5>] ? math_state_restore+0x45/0x60
01:24:43: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b


 Comments   
Comment by James Nunez (Inactive) [ 11/Sep/18 ]

Closing this ticket since this issue was seen on an older version of Lustre and has not been seen since. If we have more racer hangs or failures in interop testing, we should open a new ticket.

Generated at Sat Feb 10 02:03:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.