[LU-7166] Failover: recovery-small test_61: test failed to respond and timed out Created: 15/Sep/15  Updated: 11/Dec/15  Resolved: 16/Sep/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Client: lustre-master# 3166, SLES11 SP3
Server: lustre-master# 3166, RHEL 7


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/45ebfbec-52de-11e5-b61c-5254006e85c2.

The sub-test test_61 failed with the following error:

test failed to respond and timed out

mds dmesg:

[  324.583610] Lustre: DEBUG MARKER: == recovery-small test 61: Verify to not reuse orphan objects - bug 17025 ============================ 01:01:28 (1441267288)
[  324.768561] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version
[  325.119814] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version
[  325.444742] Lustre: DEBUG MARKER: lctl list_param osc.lustre-OST*-osc             > /dev/null 2>&1
[  325.761165] Lustre: DEBUG MARKER: lctl get_param -n osc.lustre-OST0000-osc-MDT*.connect_flags
[  325.930762] LustreError: 1704:0:(ldlm_lockd.c:677:ldlm_handle_ast_error()) ### client (nid 10.1.4.193@tcp) returned error from blocking AST (req status -107 rc -107), evict it ns: mdt-lustre-MDT0000_UUID lock: ffff880069a87a00/0x280320e8c568af9f lrc: 4/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0 bits 0x13 rrc: 4 type: IBT flags: 0x60200000000020 nid: 10.1.4.193@tcp remote: 0x7314c03cb6a7c1ad expref: 7 pid: 1705 timeout: 4295093226 lvb_type: 0
[  325.934334] LustreError: 138-a: lustre-MDT0000: A client on nid 10.1.4.193@tcp was evicted due to a lock blocking callback time out: rc -107
[  326.110147] Lustre: DEBUG MARKER: sync; sync; sync
[  326.931245] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
[  327.239483] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
[  327.387196] LustreError: 3188:0:(osd_handler.c:1380:osd_ro()) *** setting lustre-MDT0000 read-only ***
[  327.388887] Turning device dm-0 (0xfc00000) read-only
[  327.548071] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
[  327.703668] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
[  328.291521] Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
[  328.578880] Lustre: DEBUG MARKER: umount -d /mnt/mds1
[  328.750566] Lustre: Failing over lustre-MDT0000
[  331.905101] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.1.4.191@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[  331.908769] LustreError: Skipped 5 previous similar messages
[  333.253260] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.1.4.187@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[  334.827156] Lustre: 3384:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1441267293/real 1441267293]  req@ffff8800001ded00 x1511278203896684/t0(0) o251->MGC10.1.4.188@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1441267299 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[  334.836234] Removing read-only on unknown block (0xfc00000)
[  334.870635] Lustre: server umount lustre-MDT0000 complete
[  335.031375] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '


 Comments   
Comment by Sarah Liu [ 16/Sep/15 ]

ost dmesg

[ 9203.055012] kmmpd-dm-1      D ffff88007fd13680     0  7570      2 0x00000080
[ 9203.055012]  ffff88007477bcf8 0000000000000046 ffff88007477bfd8 0000000000013680
[ 9203.055012]  ffff88007477bfd8 0000000000013680 ffff880071664fa0 ffff88007fd13f48
[ 9203.055012]  ffff88007ff5f1e8 0000000000000002 ffffffff811f8bf0 ffff88007477bd70
[ 9203.055012] Call Trace:
[ 9203.055012]  [<ffffffff811f8bf0>] ? generic_block_bmap+0x70/0x70
[ 9203.055012]  [<ffffffff8160a7dd>] io_schedule+0x9d/0x140
[ 9203.055012]  [<ffffffff811f8bfe>] sleep_on_buffer+0xe/0x20
[ 9203.055012]  [<ffffffff816085a0>] __wait_on_bit+0x60/0x90
[ 9203.055012]  [<ffffffff811f8bf0>] ? generic_block_bmap+0x70/0x70
[ 9203.055012]  [<ffffffff81608657>] out_of_line_wait_on_bit+0x87/0xb0
[ 9203.055012]  [<ffffffff81098390>] ? autoremove_wake_function+0x40/0x40
[ 9203.055012]  [<ffffffff811fa0c0>] ? _submit_bh+0x160/0x220
[ 9203.055012]  [<ffffffff811fa1ca>] __wait_on_buffer+0x2a/0x30
[ 9203.055012]  [<ffffffffa0b87825>] write_mmp_block+0x125/0x170 [ldiskfs]
[ 9203.055012]  [<ffffffffa0b87a88>] kmmpd+0x1a8/0x430 [ldiskfs]
[ 9203.055012]  [<ffffffff81609fc5>] ? __schedule+0x2c5/0x7b0
[ 9203.055012]  [<ffffffffa0b878e0>] ? __dump_mmp_msg+0x70/0x70 [ldiskfs]
[ 9203.055012]  [<ffffffff8109739f>] kthread+0xcf/0xe0
[ 9203.055012]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
[ 9203.055012]  [<ffffffff81615018>] ret_from_fork+0x58/0x90
[ 9203.055012]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
Comment by Saurabh Tandan (Inactive) [ 16/Sep/15 ]

Duplicate of LU-6725

Comment by Saurabh Tandan (Inactive) [ 11/Dec/15 ]

master, build# 3264, 2.7.64 tag
Regression:EL7.1 Server/EL7.1 Client
https://testing.hpdd.intel.com/test_sets/755c5a8e-9f37-11e5-ba94-5254006e85c2

Generated at Sat Feb 10 02:06:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.