[LU-6296] insanity test_1: check_for_recovery_ready()) ASSERTION( clnts <= obd->obd_max_recoverable_clients ) Created: 26/Feb/15 Updated: 09/Sep/16 Resolved: 27/Feb/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne2 | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 17630 | ||||||||
| Description |
|
This issue was created by maloo for wangdi <di.wang@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/4636a65e-bd3a-11e4-8d85-5254006e85c2. The sub-test test_1 failed with the following error: test failed to respond and timed out 19:56:07:Lustre: Evicted from MGS (at 10.1.6.12@tcp) after server handle changed from 0xbe3c45f237bd6bdf to 0xbe3c45f237bd760b 19:56:07:LustreError: 19415:0:(ldlm_lib.c:1963:check_for_recovery_ready()) ASSERTION( clnts <= obd->obd_max_recoverable_clients ) failed: 19:56:07:LustreError: 19415:0:(ldlm_lib.c:1963:check_for_recovery_ready()) LBUG 19:56:07:Pid: 19415, comm: tgt_recov 19:56:07: 19:56:07:Call Trace: 19:56:07: [<ffffffffa0491895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 19:56:07: [<ffffffffa0491e97>] lbug_with_loc+0x47/0xb0 [libcfs] 19:56:07: [<ffffffffa07cf9de>] check_for_recovery_ready+0x17e/0x180 [ptlrpc] 19:56:07: [<ffffffffa07cf860>] ? check_for_recovery_ready+0x0/0x180 [ptlrpc] 19:56:07: [<ffffffffa07d0f16>] target_recovery_overseer+0xd6/0x320 [ptlrpc] 19:56:07: [<ffffffffa07cf4b0>] ? exp_connect_healthy+0x0/0x20 [ptlrpc] 19:56:07: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 19:56:07: [<ffffffffa07d8274>] target_recovery_thread+0x5b4/0x1ad0 [ptlrpc] 19:56:07: [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20 19:56:07: [<ffffffffa07d7cc0>] ? target_recovery_thread+0x0/0x1ad0 [ptlrpc] 19:56:07: [<ffffffff8109e66e>] kthread+0x9e/0xc0 19:56:07: [<ffffffff8100c20a>] child_rip+0xa/0x20 19:56:07: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 19:56:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20 19:56:07: 19:56:07:Kernel panic - not syncing: LBUG Please provide additional information about the failure here. Info required for matching: insanity 1 |
| Comments |
| Comment by Di Wang [ 27/Feb/15 ] |
|
It seems obd_max_recoverable_clients should be updated when new MDS-MDS exports are created. This patch should fix the problem diff --git a/lustre/ldlm/ldlm_lib.c b/lustre/ldlm/ldlm_lib.c
index f1b8faf..7388e6c 100644
--- a/lustre/ldlm/ldlm_lib.c
+++ b/lustre/ldlm/ldlm_lib.c
@@ -1172,8 +1172,7 @@ dont_check_exports:
&export->exp_nid_hash);
}
- if (target->obd_recovering && !export->exp_in_recovery && !lw_client &&
- !new_mds_mds_conn) {
+ if (target->obd_recovering && !export->exp_in_recovery && !lw_client) {
int has_transno;
__u64 transno = data->ocd_transno;
@@ -1206,6 +1205,14 @@ dont_check_exports:
atomic_inc(&target->obd_req_replay_clients);
atomic_inc(&target->obd_lock_replay_clients);
+ /* Note: MDS-MDS connection is allowed to be connected during
+ * recovery, no matter if the exports needs to be recoveried.
+ * Because we need retrieve updates logs from all other MDTs.
+ * So if the MDS-MDS export is new, obd_max_recoverable_clients
+ * also needs to be increased to match other recovery checking
+ * condition. */
+ if (new_mds_mds_conn)
+ target->obd_max_recoverable_clients++;
if (atomic_inc_return(&target->obd_connected_clients) ==
target->obd_max_recoverable_clients)
wake_up(&target->obd_next_transno_waitq);
|
| Comment by Di Wang [ 27/Feb/15 ] |
|
I will add this patch to http://review.whamcloud.com/#/c/11737/ |