[LU-6296] insanity test_1: check_for_recovery_ready()) ASSERTION( clnts <= obd->obd_max_recoverable_clients ) Created: 26/Feb/15  Updated: 09/Sep/16  Resolved: 27/Feb/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne2

Issue Links:
Related
is related to LU-3534 async update cross-MDTs Resolved
Severity: 3
Rank (Obsolete): 17630

 Description   

This issue was created by maloo for wangdi <di.wang@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/4636a65e-bd3a-11e4-8d85-5254006e85c2.

The sub-test test_1 failed with the following error:

test failed to respond and timed out
19:56:07:Lustre: Evicted from MGS (at 10.1.6.12@tcp) after server handle changed from 0xbe3c45f237bd6bdf to 0xbe3c45f237bd760b
19:56:07:LustreError: 19415:0:(ldlm_lib.c:1963:check_for_recovery_ready()) ASSERTION( clnts <= obd->obd_max_recoverable_clients ) failed: 
19:56:07:LustreError: 19415:0:(ldlm_lib.c:1963:check_for_recovery_ready()) LBUG
19:56:07:Pid: 19415, comm: tgt_recov
19:56:07:
19:56:07:Call Trace:
19:56:07: [<ffffffffa0491895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
19:56:07: [<ffffffffa0491e97>] lbug_with_loc+0x47/0xb0 [libcfs]
19:56:07: [<ffffffffa07cf9de>] check_for_recovery_ready+0x17e/0x180 [ptlrpc]
19:56:07: [<ffffffffa07cf860>] ? check_for_recovery_ready+0x0/0x180 [ptlrpc]
19:56:07: [<ffffffffa07d0f16>] target_recovery_overseer+0xd6/0x320 [ptlrpc]
19:56:07: [<ffffffffa07cf4b0>] ? exp_connect_healthy+0x0/0x20 [ptlrpc]
19:56:07: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
19:56:07: [<ffffffffa07d8274>] target_recovery_thread+0x5b4/0x1ad0 [ptlrpc]
19:56:07: [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
19:56:07: [<ffffffffa07d7cc0>] ? target_recovery_thread+0x0/0x1ad0 [ptlrpc]
19:56:07: [<ffffffff8109e66e>] kthread+0x9e/0xc0
19:56:07: [<ffffffff8100c20a>] child_rip+0xa/0x20
19:56:07: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
19:56:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20
19:56:07:
19:56:07:Kernel panic - not syncing: LBUG

Please provide additional information about the failure here.

Info required for matching: insanity 1



 Comments   
Comment by Di Wang [ 27/Feb/15 ]

It seems obd_max_recoverable_clients should be updated when new MDS-MDS exports are created. This patch should fix the problem

diff --git a/lustre/ldlm/ldlm_lib.c b/lustre/ldlm/ldlm_lib.c
index f1b8faf..7388e6c 100644
--- a/lustre/ldlm/ldlm_lib.c
+++ b/lustre/ldlm/ldlm_lib.c
@@ -1172,8 +1172,7 @@ dont_check_exports:
                              &export->exp_nid_hash);
         }

-       if (target->obd_recovering && !export->exp_in_recovery && !lw_client &&
-           !new_mds_mds_conn) {
+       if (target->obd_recovering && !export->exp_in_recovery && !lw_client) {
                 int has_transno;
                 __u64 transno = data->ocd_transno;

@@ -1206,6 +1205,14 @@ dont_check_exports:

                atomic_inc(&target->obd_req_replay_clients);
                atomic_inc(&target->obd_lock_replay_clients);
+               /* Note: MDS-MDS connection is allowed to be connected during
+                * recovery, no matter if the exports needs to be recoveried.
+                * Because we need retrieve updates logs from all other MDTs.
+                * So if the MDS-MDS export is new, obd_max_recoverable_clients
+                * also needs to be increased to match other recovery checking
+                * condition. */
+               if (new_mds_mds_conn)
+                       target->obd_max_recoverable_clients++;
                if (atomic_inc_return(&target->obd_connected_clients) ==
                    target->obd_max_recoverable_clients)
                        wake_up(&target->obd_next_transno_waitq);
Comment by Di Wang [ 27/Feb/15 ]

I will add this patch to http://review.whamcloud.com/#/c/11737/

Generated at Sat Feb 10 01:58:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.