Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6296

insanity test_1: check_for_recovery_ready()) ASSERTION( clnts <= obd->obd_max_recoverable_clients )

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • 3
    • 17630

    Description

      This issue was created by maloo for wangdi <di.wang@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/4636a65e-bd3a-11e4-8d85-5254006e85c2.

      The sub-test test_1 failed with the following error:

      test failed to respond and timed out
      19:56:07:Lustre: Evicted from MGS (at 10.1.6.12@tcp) after server handle changed from 0xbe3c45f237bd6bdf to 0xbe3c45f237bd760b
      19:56:07:LustreError: 19415:0:(ldlm_lib.c:1963:check_for_recovery_ready()) ASSERTION( clnts <= obd->obd_max_recoverable_clients ) failed: 
      19:56:07:LustreError: 19415:0:(ldlm_lib.c:1963:check_for_recovery_ready()) LBUG
      19:56:07:Pid: 19415, comm: tgt_recov
      19:56:07:
      19:56:07:Call Trace:
      19:56:07: [<ffffffffa0491895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      19:56:07: [<ffffffffa0491e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      19:56:07: [<ffffffffa07cf9de>] check_for_recovery_ready+0x17e/0x180 [ptlrpc]
      19:56:07: [<ffffffffa07cf860>] ? check_for_recovery_ready+0x0/0x180 [ptlrpc]
      19:56:07: [<ffffffffa07d0f16>] target_recovery_overseer+0xd6/0x320 [ptlrpc]
      19:56:07: [<ffffffffa07cf4b0>] ? exp_connect_healthy+0x0/0x20 [ptlrpc]
      19:56:07: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
      19:56:07: [<ffffffffa07d8274>] target_recovery_thread+0x5b4/0x1ad0 [ptlrpc]
      19:56:07: [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
      19:56:07: [<ffffffffa07d7cc0>] ? target_recovery_thread+0x0/0x1ad0 [ptlrpc]
      19:56:07: [<ffffffff8109e66e>] kthread+0x9e/0xc0
      19:56:07: [<ffffffff8100c20a>] child_rip+0xa/0x20
      19:56:07: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
      19:56:07: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      19:56:07:
      19:56:07:Kernel panic - not syncing: LBUG
      

      Please provide additional information about the failure here.

      Info required for matching: insanity 1

      Attachments

        Issue Links

          Activity

            [LU-6296] insanity test_1: check_for_recovery_ready()) ASSERTION( clnts <= obd->obd_max_recoverable_clients )
            di.wang Di Wang added a comment -

            I will add this patch to http://review.whamcloud.com/#/c/11737/

            di.wang Di Wang added a comment - I will add this patch to http://review.whamcloud.com/#/c/11737/
            di.wang Di Wang added a comment -

            It seems obd_max_recoverable_clients should be updated when new MDS-MDS exports are created. This patch should fix the problem

            diff --git a/lustre/ldlm/ldlm_lib.c b/lustre/ldlm/ldlm_lib.c
            index f1b8faf..7388e6c 100644
            --- a/lustre/ldlm/ldlm_lib.c
            +++ b/lustre/ldlm/ldlm_lib.c
            @@ -1172,8 +1172,7 @@ dont_check_exports:
                                          &export->exp_nid_hash);
                     }
            
            -       if (target->obd_recovering && !export->exp_in_recovery && !lw_client &&
            -           !new_mds_mds_conn) {
            +       if (target->obd_recovering && !export->exp_in_recovery && !lw_client) {
                             int has_transno;
                             __u64 transno = data->ocd_transno;
            
            @@ -1206,6 +1205,14 @@ dont_check_exports:
            
                            atomic_inc(&target->obd_req_replay_clients);
                            atomic_inc(&target->obd_lock_replay_clients);
            +               /* Note: MDS-MDS connection is allowed to be connected during
            +                * recovery, no matter if the exports needs to be recoveried.
            +                * Because we need retrieve updates logs from all other MDTs.
            +                * So if the MDS-MDS export is new, obd_max_recoverable_clients
            +                * also needs to be increased to match other recovery checking
            +                * condition. */
            +               if (new_mds_mds_conn)
            +                       target->obd_max_recoverable_clients++;
                            if (atomic_inc_return(&target->obd_connected_clients) ==
                                target->obd_max_recoverable_clients)
                                    wake_up(&target->obd_next_transno_waitq);
            
            di.wang Di Wang added a comment - It seems obd_max_recoverable_clients should be updated when new MDS-MDS exports are created. This patch should fix the problem diff --git a/lustre/ldlm/ldlm_lib.c b/lustre/ldlm/ldlm_lib.c index f1b8faf..7388e6c 100644 --- a/lustre/ldlm/ldlm_lib.c +++ b/lustre/ldlm/ldlm_lib.c @@ -1172,8 +1172,7 @@ dont_check_exports: &export->exp_nid_hash); } - if (target->obd_recovering && !export->exp_in_recovery && !lw_client && - !new_mds_mds_conn) { + if (target->obd_recovering && !export->exp_in_recovery && !lw_client) { int has_transno; __u64 transno = data->ocd_transno; @@ -1206,6 +1205,14 @@ dont_check_exports: atomic_inc(&target->obd_req_replay_clients); atomic_inc(&target->obd_lock_replay_clients); + /* Note: MDS-MDS connection is allowed to be connected during + * recovery, no matter if the exports needs to be recoveried. + * Because we need retrieve updates logs from all other MDTs. + * So if the MDS-MDS export is new, obd_max_recoverable_clients + * also needs to be increased to match other recovery checking + * condition. */ + if (new_mds_mds_conn) + target->obd_max_recoverable_clients++; if (atomic_inc_return(&target->obd_connected_clients) == target->obd_max_recoverable_clients) wake_up(&target->obd_next_transno_waitq);

            People

              di.wang Di Wang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: