[LU-5572] replay-single test_73b: import is not in FULL state Created: 02/Sep/14  Updated: 05/Sep/14  Resolved: 05/Sep/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-5287 (ldlm_lib.c:2253:target_queue_recover... Resolved
Severity: 3
Rank (Obsolete): 15542

 Description   

This issue was created by maloo for Amir Shehata <amir.shehata@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/b5ba7f1c-2fcf-11e4-9f89-5254006e85c2.

shadow-13vm6: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 0 sec
shadow-13vm5:  rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0000-mdc-*.mds_server_uuid into FULL state after 662 sec, have CONNECTING 
 replay-single test_73b: @@@@@@ FAIL: import is not in FULL state 

The following test runs had the exact same problem as well:
https://testing.hpdd.intel.com/test_sets/56c66138-2af7-11e4-ba37-5254006e85c2
https://testing.hpdd.intel.com/test_sets/7949beac-24ef-11e4-8458-5254006e85c2



 Comments   
Comment by Oleg Drokin [ 05/Sep/14 ]

MDS1 crashed with this assertion (visible in console log):

07:18:13:Lustre: lustre-MDT0000: Client lustre-MDT0001-mdtlov_UUID (at 10.1.4.149@tcp) reconnecting, waiting for 5 clients in recovery for 0:22
07:18:13:LustreError: 13004:0:(ldlm_lib.c:2253:target_queue_recovery_request()) ASSERTION( req->rq_export->exp_lock_replay_needed ) failed: 
07:18:13:LustreError: 13004:0:(ldlm_lib.c:2253:target_queue_recovery_request()) LBUG
07:18:13:Pid: 13004, comm: mdt00_001
07:18:13:
07:18:13:Call Trace:
07:18:13: [<ffffffffa0483895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
07:18:13: [<ffffffffa0483e97>] lbug_with_loc+0x47/0xb0 [libcfs]
07:18:13: [<ffffffffa07edcd5>] target_queue_recovery_request+0xb35/0xc40 [ptlrpc]
07:18:13: [<ffffffffa0886d7f>] tgt_handle_recovery+0x38f/0x520 [ptlrpc]
07:18:14: [<ffffffffa088cd05>] tgt_request_handle+0x1a5/0xb10 [ptlrpc]
07:18:14: [<ffffffffa083c294>] ptlrpc_main+0xe64/0x1990 [ptlrpc]
07:18:14: [<ffffffffa083b430>] ? ptlrpc_main+0x0/0x1990 [ptlrpc]
07:18:14: [<ffffffff8109abf6>] kthread+0x96/0xa0
07:18:14: [<ffffffff8100c20a>] child_rip+0xa/0x20
07:18:14: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
07:18:14: [<ffffffff8100c200>] ? child_rip+0x0/0x20
07:18:14:
07:18:14:Kernel panic - not syncing: LBUG
Comment by Oleg Drokin [ 05/Sep/14 ]

as such I think this is a dup of LU-5287

Generated at Sat Feb 10 01:52:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.