Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3446

changelog index reset on MDT restart

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0
    • mds: lustre-2.4.0-RC2_2chaos_2.6.32_358.6.1.3chaos.ch5.1.ch5.1.x86_64
    • 3
    • 8596

    Description

      After a maintenance downtime in which we updated our vesta Lustre servers from 2.4.0-RC1_3chaos to 2.4.0-RC2_2chaos, the changelog current index reverted to zero. The registered user cl1 retained its highest index value:

      # vesta-mds1 /root > lctl get_param mdd.fsv MDT0000.changelog_users                                                                     
      mdd.fsv-MDT0000.changelog_users=
      current index: 9158128
      ID    index
      cl1   445461827
      

      Here is a debug log entry from when the MDS started:

      00000004:00000080:1.0:1370534969.929148:0:17901:0:(mdd_device.c:308:mdd_changelog_llog_init()) changelog starting index=0
      

      LLNL-bug-id: TOSS-2103

      Attachments

        Activity

          [LU-3446] changelog index reset on MDT restart
          pjones Peter Jones added a comment -

          Landed for 2.4.1 and 2.5

          pjones Peter Jones added a comment - Landed for 2.4.1 and 2.5
          nedbass Ned Bass (Inactive) added a comment - - edited Patch for b2_4: http://review.whamcloud.com/#/c/7012/

          When should we expect to see the patch for 2.4.1?

          jlevi Jodi Levi (Inactive) added a comment - When should we expect to see the patch for 2.4.1?

          This needs to be fixed for 2.4.1 as well, so you should really added that Fix Version as well.

          morrone Christopher Morrone (Inactive) added a comment - This needs to be fixed for 2.4.1 as well, so you should really added that Fix Version as well.
          nedbass Ned Bass (Inactive) added a comment - Path for master: http://review.whamcloud.com/#change,6642

          It seems that if the registered changelog user has cleared all changelog records when the MDT is stopped, then when the MDT restarts changelog_init_cb() is never called from llog_reverse_process(), so mdd->mdd_cl.mc_index is left with 0.

          This reproduces the problem in a VM for me:

          FSTYPE=zfs llmount.sh
          lctl --device lustre-MDT0000 changelog_register
          touch /mnt/lustre/{1,2,3,4}
          lfs changelog_clear lustre-MDT0000 cl1 4
          umount /mnt/mds1
          mount.lustre lustre-mdt1/mdt1  /mnt/mds1
          lctl get_param mdd.lustre-MDT0000.changelog_users
          
          nedbass Ned Bass (Inactive) added a comment - It seems that if the registered changelog user has cleared all changelog records when the MDT is stopped, then when the MDT restarts changelog_init_cb() is never called from llog_reverse_process() , so mdd->mdd_cl.mc_index is left with 0. This reproduces the problem in a VM for me: FSTYPE=zfs llmount.sh lctl --device lustre-MDT0000 changelog_register touch /mnt/lustre/{1,2,3,4} lfs changelog_clear lustre-MDT0000 cl1 4 umount /mnt/mds1 mount.lustre lustre-mdt1/mdt1 /mnt/mds1 lctl get_param mdd.lustre-MDT0000.changelog_users

          People

            bobijam Zhenyu Xu
            nedbass Ned Bass (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: