[LU-7166] Failover: recovery-small test_61: test failed to respond and timed out Created: 15/Sep/15 Updated: 11/Dec/15 Resolved: 16/Sep/15 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client: lustre-master# 3166, SLES11 SP3 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/45ebfbec-52de-11e5-b61c-5254006e85c2. The sub-test test_61 failed with the following error: test failed to respond and timed out mds dmesg: [ 324.583610] Lustre: DEBUG MARKER: == recovery-small test 61: Verify to not reuse orphan objects - bug 17025 ============================ 01:01:28 (1441267288) [ 324.768561] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version [ 325.119814] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version [ 325.444742] Lustre: DEBUG MARKER: lctl list_param osc.lustre-OST*-osc > /dev/null 2>&1 [ 325.761165] Lustre: DEBUG MARKER: lctl get_param -n osc.lustre-OST0000-osc-MDT*.connect_flags [ 325.930762] LustreError: 1704:0:(ldlm_lockd.c:677:ldlm_handle_ast_error()) ### client (nid 10.1.4.193@tcp) returned error from blocking AST (req status -107 rc -107), evict it ns: mdt-lustre-MDT0000_UUID lock: ffff880069a87a00/0x280320e8c568af9f lrc: 4/0,0 mode: PR/PR res: [0x200000007:0x1:0x0].0 bits 0x13 rrc: 4 type: IBT flags: 0x60200000000020 nid: 10.1.4.193@tcp remote: 0x7314c03cb6a7c1ad expref: 7 pid: 1705 timeout: 4295093226 lvb_type: 0 [ 325.934334] LustreError: 138-a: lustre-MDT0000: A client on nid 10.1.4.193@tcp was evicted due to a lock blocking callback time out: rc -107 [ 326.110147] Lustre: DEBUG MARKER: sync; sync; sync [ 326.931245] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno [ 327.239483] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly [ 327.387196] LustreError: 3188:0:(osd_handler.c:1380:osd_ro()) *** setting lustre-MDT0000 read-only *** [ 327.388887] Turning device dm-0 (0xfc00000) read-only [ 327.548071] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 [ 327.703668] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [ 328.291521] Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts [ 328.578880] Lustre: DEBUG MARKER: umount -d /mnt/mds1 [ 328.750566] Lustre: Failing over lustre-MDT0000 [ 331.905101] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.1.4.191@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 331.908769] LustreError: Skipped 5 previous similar messages [ 333.253260] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.1.4.187@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 334.827156] Lustre: 3384:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1441267293/real 1441267293] req@ffff8800001ded00 x1511278203896684/t0(0) o251->MGC10.1.4.188@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1441267299 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [ 334.836234] Removing read-only on unknown block (0xfc00000) [ 334.870635] Lustre: server umount lustre-MDT0000 complete [ 335.031375] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' |
| Comments |
| Comment by Sarah Liu [ 16/Sep/15 ] |
|
ost dmesg [ 9203.055012] kmmpd-dm-1 D ffff88007fd13680 0 7570 2 0x00000080 [ 9203.055012] ffff88007477bcf8 0000000000000046 ffff88007477bfd8 0000000000013680 [ 9203.055012] ffff88007477bfd8 0000000000013680 ffff880071664fa0 ffff88007fd13f48 [ 9203.055012] ffff88007ff5f1e8 0000000000000002 ffffffff811f8bf0 ffff88007477bd70 [ 9203.055012] Call Trace: [ 9203.055012] [<ffffffff811f8bf0>] ? generic_block_bmap+0x70/0x70 [ 9203.055012] [<ffffffff8160a7dd>] io_schedule+0x9d/0x140 [ 9203.055012] [<ffffffff811f8bfe>] sleep_on_buffer+0xe/0x20 [ 9203.055012] [<ffffffff816085a0>] __wait_on_bit+0x60/0x90 [ 9203.055012] [<ffffffff811f8bf0>] ? generic_block_bmap+0x70/0x70 [ 9203.055012] [<ffffffff81608657>] out_of_line_wait_on_bit+0x87/0xb0 [ 9203.055012] [<ffffffff81098390>] ? autoremove_wake_function+0x40/0x40 [ 9203.055012] [<ffffffff811fa0c0>] ? _submit_bh+0x160/0x220 [ 9203.055012] [<ffffffff811fa1ca>] __wait_on_buffer+0x2a/0x30 [ 9203.055012] [<ffffffffa0b87825>] write_mmp_block+0x125/0x170 [ldiskfs] [ 9203.055012] [<ffffffffa0b87a88>] kmmpd+0x1a8/0x430 [ldiskfs] [ 9203.055012] [<ffffffff81609fc5>] ? __schedule+0x2c5/0x7b0 [ 9203.055012] [<ffffffffa0b878e0>] ? __dump_mmp_msg+0x70/0x70 [ldiskfs] [ 9203.055012] [<ffffffff8109739f>] kthread+0xcf/0xe0 [ 9203.055012] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 [ 9203.055012] [<ffffffff81615018>] ret_from_fork+0x58/0x90 [ 9203.055012] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 |
| Comment by Saurabh Tandan (Inactive) [ 16/Sep/15 ] |
|
Duplicate of |
| Comment by Saurabh Tandan (Inactive) [ 11/Dec/15 ] |
|
master, build# 3264, 2.7.64 tag |