Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3446

changelog index reset on MDT restart

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0
    • mds: lustre-2.4.0-RC2_2chaos_2.6.32_358.6.1.3chaos.ch5.1.ch5.1.x86_64
    • 3
    • 8596

    Description

      After a maintenance downtime in which we updated our vesta Lustre servers from 2.4.0-RC1_3chaos to 2.4.0-RC2_2chaos, the changelog current index reverted to zero. The registered user cl1 retained its highest index value:

      # vesta-mds1 /root > lctl get_param mdd.fsv MDT0000.changelog_users                                                                     
      mdd.fsv-MDT0000.changelog_users=
      current index: 9158128
      ID    index
      cl1   445461827
      

      Here is a debug log entry from when the MDS started:

      00000004:00000080:1.0:1370534969.929148:0:17901:0:(mdd_device.c:308:mdd_changelog_llog_init()) changelog starting index=0
      

      LLNL-bug-id: TOSS-2103

      Attachments

        Activity

          [LU-3446] changelog index reset on MDT restart
          pjones Peter Jones added a comment -

          Landed for 2.4.1 and 2.5

          pjones Peter Jones added a comment - Landed for 2.4.1 and 2.5
          nedbass Ned Bass (Inactive) added a comment - - edited Patch for b2_4: http://review.whamcloud.com/#/c/7012/

          When should we expect to see the patch for 2.4.1?

          jlevi Jodi Levi (Inactive) added a comment - When should we expect to see the patch for 2.4.1?

          This needs to be fixed for 2.4.1 as well, so you should really added that Fix Version as well.

          morrone Christopher Morrone (Inactive) added a comment - This needs to be fixed for 2.4.1 as well, so you should really added that Fix Version as well.
          nedbass Ned Bass (Inactive) added a comment - Path for master: http://review.whamcloud.com/#change,6642

          It seems that if the registered changelog user has cleared all changelog records when the MDT is stopped, then when the MDT restarts changelog_init_cb() is never called from llog_reverse_process(), so mdd->mdd_cl.mc_index is left with 0.

          This reproduces the problem in a VM for me:

          FSTYPE=zfs llmount.sh
          lctl --device lustre-MDT0000 changelog_register
          touch /mnt/lustre/{1,2,3,4}
          lfs changelog_clear lustre-MDT0000 cl1 4
          umount /mnt/mds1
          mount.lustre lustre-mdt1/mdt1  /mnt/mds1
          lctl get_param mdd.lustre-MDT0000.changelog_users
          
          nedbass Ned Bass (Inactive) added a comment - It seems that if the registered changelog user has cleared all changelog records when the MDT is stopped, then when the MDT restarts changelog_init_cb() is never called from llog_reverse_process() , so mdd->mdd_cl.mc_index is left with 0. This reproduces the problem in a VM for me: FSTYPE=zfs llmount.sh lctl --device lustre-MDT0000 changelog_register touch /mnt/lustre/{1,2,3,4} lfs changelog_clear lustre-MDT0000 cl1 4 umount /mnt/mds1 mount.lustre lustre-mdt1/mdt1 /mnt/mds1 lctl get_param mdd.lustre-MDT0000.changelog_users

          Bobijam, yes, the cl1 changelog_users users in the description happened after the MDT restart.

          Before this update, the current index was preserved between MDT restarts (as far as I know).

          We had a reboot yesterday, and, now that I check, it appears that the index was again reset:

          # vesta-mds1 /root > lctl get_param mdd.fsv-MDT0000.changelog_users
          mdd.fsv-MDT0000.changelog_users=
          current index: 444830
          ID    index
          cl1   445461827
          cl2   18248038
          
          nedbass Ned Bass (Inactive) added a comment - Bobijam, yes, the cl1 changelog_users users in the description happened after the MDT restart. Before this update, the current index was preserved between MDT restarts (as far as I know). We had a reboot yesterday, and, now that I check, it appears that the index was again reset: # vesta-mds1 /root > lctl get_param mdd.fsv-MDT0000.changelog_users mdd.fsv-MDT0000.changelog_users= current index: 444830 ID index cl1 445461827 cl2 18248038
          bobijam Zhenyu Xu added a comment - - edited

          Ned,

          When did the cl1 changelog_users read happen, after the MDT restart?

          Here's my local VM situation.

          [10:38 AM]root@test1:~/work/lustre/lustre/tests/
          # lctl get_param mdd.lustre-MDT0000.changelog_users
          mdd.lustre-MDT0000.changelog_users=current index: 58
          ID    index
          cl1   0
          cl2   37
          
          [10:42 AM]root@test1:~/work/lustre/lustre/tests/
          # umount /mnt/mds1
          
          [10:43 AM]root@test1:~/work/lustre/lustre/tests/
          # mount -t lustre /dev/sdb /mnt/mds1
          
          [10:43 AM]root@test1:~/work/lustre/lustre/tests/
          # lctl get_param mdd.lustre-MDT0000.changelog_users
          mdd.lustre-MDT0000.changelog_users=current index: 58
          ID    index
          cl1   0
          cl2   37
          
          bobijam Zhenyu Xu added a comment - - edited Ned, When did the cl1 changelog_users read happen, after the MDT restart? Here's my local VM situation. [10:38 AM]root@test1:~/work/lustre/lustre/tests/ # lctl get_param mdd.lustre-MDT0000.changelog_users mdd.lustre-MDT0000.changelog_users=current index: 58 ID index cl1 0 cl2 37 [10:42 AM]root@test1:~/work/lustre/lustre/tests/ # umount /mnt/mds1 [10:43 AM]root@test1:~/work/lustre/lustre/tests/ # mount -t lustre /dev/sdb /mnt/mds1 [10:43 AM]root@test1:~/work/lustre/lustre/tests/ # lctl get_param mdd.lustre-MDT0000.changelog_users mdd.lustre-MDT0000.changelog_users=current index: 58 ID index cl1 0 cl2 37

          Bobijam, does your comment indicate that this should not be an actual problem? Do we have a ChangeLog test that verifies the ChangeLog index after a restart?

          adilger Andreas Dilger added a comment - Bobijam, does your comment indicate that this should not be an actual problem? Do we have a ChangeLog test that verifies the ChangeLog index after a restart?
          bobijam Zhenyu Xu added a comment -

          the mdd_device.c:308 is from mdd initialize LLOG_CHANGELOG_ORIG_CTXT, after that, we come to LLOG_CHANGELOG_USER_ORIG_CTXT initialization, where in changelog_user_init_cb(), it sets mdd->mdd_cl.mc_lastuser, then we go back to mdd_change_llog_init()

                  /* If we have registered users, assume we want changelogs on */
                  if (mdd->mdd_cl.mc_lastuser > 0) {
                          rc = mdd_changelog_on(env, mdd, 1);
                          if (rc < 0)
                                  GOTO(out_uclose, rc);
                  }
          

          And mdd_changelog_on() > mdd_changelog_write_head(), it adds back mdd>mdd_cl.mc_index.

          bobijam Zhenyu Xu added a comment - the mdd_device.c:308 is from mdd initialize LLOG_CHANGELOG_ORIG_CTXT, after that, we come to LLOG_CHANGELOG_USER_ORIG_CTXT initialization, where in changelog_user_init_cb(), it sets mdd->mdd_cl.mc_lastuser, then we go back to mdd_change_llog_init() /* If we have registered users, assume we want changelogs on */ if (mdd->mdd_cl.mc_lastuser > 0) { rc = mdd_changelog_on(env, mdd, 1); if (rc < 0) GOTO(out_uclose, rc); } And mdd_changelog_on() > mdd_changelog_write_head(), it adds back mdd >mdd_cl.mc_index.

          People

            bobijam Zhenyu Xu
            nedbass Ned Bass (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: