[LU-1764] Test failure on test suite replay-single, subtest test_0a Created: 17/Aug/12  Updated: 21/Aug/12  Resolved: 21/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Doug Oucharek (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4168

 Description   

This issue was created by maloo for Di Wang <di.wang@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/4745cdbe-e861-11e1-82fb-52540035b04c.

The sub-test test_0a failed with the following error:

test failed to respond and timed out

Info required for matching: replay-single 0a

It happens when testing another completely unrelated patch. And It seems MDS can not be shutdown for some reason.

21:02:16:Lustre: DEBUG MARKER: == replay-single test 0a: empty replay =============================================================== 21:02:15 (1345176135)
21:02:17:Lustre: DEBUG MARKER: sync; sync; sync
21:02:19:Lustre: DEBUG MARKER: /usr/sbin/lctl --device %lustre-MDT0000 notransno
21:02:19:Lustre: DEBUG MARKER: /usr/sbin/lctl --device %lustre-MDT0000 readonly
21:02:19:LustreError: 12781:0:(osd_handler.c:1114:osd_ro()) *** setting device osd-ldiskfs read-only ***
21:02:19:Turning device dm-0 (0xfd00000) read-only
21:02:19:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
21:02:19:Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
21:02:19:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
21:02:19:Lustre: DEBUG MARKER: umount -d /mnt/mds1
21:02:19:Lustre: Failing over lustre-MDT0000
21:02:20:Lustre: Failing over mdd_obd-lustre-MDT0000
21:02:20:Lustre: mdd_obd-lustre-MDT0000: shutting down for failover; client state will be preserved.
21:02:20:Lustre: MGS has stopped.
21:02:20:LustreError: 12922:0:(ldlm_request.c:1166:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
21:02:20:LustreError: 12922:0:(ldlm_request.c:1166:ldlm_cli_cancel_req()) Skipped 1 previous similar message
21:02:20:LustreError: 12922:0:(ldlm_request.c:1792:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
21:02:20:LustreError: 12922:0:(ldlm_request.c:1792:ldlm_cli_cancel_list()) Skipped 1 previous similar message
21:02:24:Lustre: 12922:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1345176138/real 1345176138] req@ffff88006fc1d400 x1410511265640323/t0(0) o251->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176144 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
21:02:25:Lustre: 2749:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1345176138/real 1345176138] req@ffff880057e64000 x1410511265640320/t0(0) o400->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176145 ref 1 fl Rpc:REXN/0/ffffffff rc -5/-1
21:02:25:LustreError: 12922:0:(import.c:324:ptlrpc_invalidate_import()) MGS: rc = -110 waiting for callback (1 != 0)
21:02:25:LustreError: 12922:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880057e64000 x1410511265640320/t0(0) o400->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176145 ref 1 fl Rpc:REXN/0/ffffffff rc -110/-1
21:02:26:LustreError: 12922:0:(import.c:366:ptlrpc_invalidate_import()) MGS: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.
21:02:46:LustreError: 12922:0:(import.c:324:ptlrpc_invalidate_import()) MGS: rc = -110 waiting for callback (1 != 0)
21:02:46:LustreError: 12922:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880057e64000 x1410511265640320/t0(0) o400->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176145 ref 1 fl Rpc:REXN/0/ffffffff rc -110/-1
21:02:46:LustreError: 12922:0:(import.c:366:ptlrpc_invalidate_import()) MGS: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.
21:03:06:LustreError: 12922:0:(import.c:324:ptlrpc_invalidate_import()) MGS: rc = -110 waiting for callback (1 != 0)
21:03:06:LustreError: 12922:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880057e64000 x1410511265640320/t0(0) o400->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176145 ref 1 fl Rpc:REXN/0/ffffffff rc -110/-1
21:03:06:LustreError: 12922:0:(import.c:366:ptlrpc_invalidate_import()) MGS: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.
21:03:26:LustreError: 12922:0:(import.c:324:ptlrpc_invalidate_import()) MGS: rc = -110 waiting for callback (1 != 0)
21:03:26:LustreError: 12922:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880057e64000 x1410511265640320/t0(0) o400->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176145 ref 1 fl Rpc:REXN/0/ffffffff rc -110/-1
21:03:26:LustreError: 12922:0:(import.c:366:ptlrpc_invalidate_import()) MGS: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.
21:03:45:LustreError: 12922:0:(import.c:324:ptlrpc_invalidate_import()) MGS: rc = -110 waiting for callback (1 != 0)
21:03:45:LustreError: 12922:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880057e64000 x1410511265640320/t0(0) o400->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176145 ref 1 fl Rpc:REXN/0/ffffffff rc -110/-1
21:03:45:LustreError: 12922:0:(import.c:366:ptlrpc_invalidate_import()) MGS: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.
21:04:06:LustreError: 12922:0:(import.c:324:ptlrpc_invalidate_import()) MGS: rc = -110 waiting for callback (1 != 0)
21:04:06:LustreError: 12922:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880057e64000 x1410511265640320/t0(0) o400->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176145 ref 1 fl Rpc:REXN/0/ffffffff rc -110/-1
21:04:06:LustreError: 12922:0:(import.c:366:ptlrpc_invalidate_import()) MGS: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.
21:04:26:LustreError: 12922:0:(import.c:324:ptlrpc_invalidate_import()) MGS: rc = -110 waiting for callback (1 != 0)
21:04:26:LustreError: 12922:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff880057e64000 x1410511265640320/t0(0) o400->MGC10.10.4.142@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1345176145 ref 1 fl Rpc:REXN/0/ffffffff rc -110/-1
21:04:26:LustreError: 12922:0:(import.c:366:ptlrpc_invalidate_import()) MGS: RPCs in "Unregistering" phase found (0). Network is sluggish? Waiting them to error out.



 Comments   
Comment by Jodi Levi (Inactive) [ 21/Aug/12 ]

Doug,
Can you take a look at this one?

Comment by Peter Jones [ 21/Aug/12 ]

Seems to be a duplicate of TT-834

Generated at Sat Feb 10 01:19:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.