[LU-5449] Test failure on test suite sanity-scrub, subtest test_8 Created: 04/Aug/14  Updated: 23/Nov/17  Resolved: 23/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 15168

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/5705a464-1bfd-11e4-8763-5254006e85c2.

The sub-test test_8 failed with the following error:

test failed to respond and timed out

Info required for matching: sanity-scrub 8



 Comments   
Comment by Oleg Drokin [ 12/Aug/14 ]

Not there's a crash in MDT1 logs

05:26:14:Lustre: 28508:0:(service.c:1509:ptlrpc_at_check_timed()) Skipped 1 previous similar message
05:26:14:Lustre: lustre-MDT0000: Client lustre-MDT0000-lwp-MDT0001_UUID (at 10.2.5.88@tcp) reconnecting, waiting for 4 clients in recovery for 0:42
05:26:14:Lustre: Skipped 2 previous similar messages
05:26:14:LustreError: 28536:0:(ldlm_lib.c:1689:check_for_clients()) ASSERTION( clnts <= obd->obd_max_recoverable_clients ) failed: 
05:26:14:LustreError: 28536:0:(ldlm_lib.c:1689:check_for_clients()) LBUG
05:26:14:Pid: 28536, comm: tgt_recov
05:26:14:
05:26:14:Call Trace:
05:26:14: [<ffffffffa07f2920>] ? check_for_clients+0x0/0x70 [ptlrpc]
05:26:14: [<ffffffffa048e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
05:26:14: [<ffffffffa048ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
05:26:14: [<ffffffffa07f298c>] check_for_clients+0x6c/0x70 [ptlrpc]
05:26:14: [<ffffffffa07f3ee3>] target_recovery_overseer+0xb3/0x230 [ptlrpc]
05:26:14: [<ffffffffa07f2550>] ? exp_connect_healthy+0x0/0x20 [ptlrpc]
05:26:14: [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
05:26:14: [<ffffffffa07fa810>] ? target_recovery_thread+0x0/0x19c0 [ptlrpc]
05:26:14: [<ffffffffa07fadf4>] target_recovery_thread+0x5e4/0x19c0 [ptlrpc]
05:26:14: [<ffffffff81061d12>] ? default_wake_function+0x12/0x20
05:26:14: [<ffffffffa07fa810>] ? target_recovery_thread+0x0/0x19c0 [ptlrpc]
05:26:14: [<ffffffff8109abf6>] kthread+0x96/0xa0
05:26:14: [<ffffffff8100c20a>] child_rip+0xa/0x20
05:26:14: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
05:26:14: [<ffffffff8100c200>] ? child_rip+0x0/0x20
05:26:14:
05:26:14:Kernel panic - not syncing: LBUG
Comment by Nathaniel Clark [ 15/Sep/14 ]

This also happens on sanity-scrub/test_7 (review-dne-part-2 on master):
https://testing.hpdd.intel.com/test_sets/ee4b9602-3b93-11e4-a52e-5254006e85c2

Comment by Andreas Dilger [ 23/Nov/17 ]

Haven't seen this issue in years.

Generated at Sat Feb 10 01:51:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.