Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5895

lfs: data version changed during migration

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.7.0
    • head of tree + LU-4840
    • 3
    • 16474

    Description

      Archiving, releasing then migrating leads to a "data version changed during migration":

      # cd /mnt/lustre
      # cp /usr/bin/zip .
      # lfs getstripe zip
      zip
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  1
      	obdidx		 objid		 objid		 group
      	     1	             2	          0x2	             0
      # lfs hsm_archive zip
      # lfs hsm_release zip
      # lfs hsm_state zip
      zip: (0x0000000d) released exists archived, archive_id:1
      # lfs getstripe zip
      zip
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        80000001
      lmm_layout_gen:     1
      lmm_stripe_offset:  0
      # lfs migrate -o 0 zip
      /root/lustre-cleanup/lustre/utils/lfs: zip: data version changed during migration
      error: migrate: migrate stripe file 'zip' failed
      

      I think the file is restored first, then migrated, but its data version is not updated. Which lead to the following questions:

      • is it correct to force a restore of an archived file when asking for a migrate operation?
      • couldn't the file be restored directly to the proper OST/stripe size, ...?
      • although an error is reported, the file is present and complete, so the operation actually completed properly. What if that was another kind of error? Would we get a data corruption?

      Attachments

        Issue Links

          Activity

            [LU-5895] lfs: data version changed during migration

            frank zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/13356
            Subject: LU-5895 lfs: prevent migration of a released file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 28d3e178ee1a9f98f6c7926b5dafca778e98c4e5

            gerrit Gerrit Updater added a comment - frank zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/13356 Subject: LU-5895 lfs: prevent migration of a released file Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 28d3e178ee1a9f98f6c7926b5dafca778e98c4e5

            I missed that message. I'll send a patch.

            fzago Frank Zago (Inactive) added a comment - I missed that message. I'll send a patch.

            Frank, would you be able to make a patch to just return 0 (do nothing) if trying to migrate a file that is already released?

            adilger Andreas Dilger added a comment - Frank, would you be able to make a patch to just return 0 (do nothing) if trying to migrate a file that is already released?
            rread Robert Read added a comment -

            Yes, it does sense to skip migration for a released file since there is no data to migrate as far as Lustre is concerned.

            rread Robert Read added a comment - Yes, it does sense to skip migration for a released file since there is no data to migrate as far as Lustre is concerned.

            I have to question whether it even makes sense to migrate a released file? Maybe this should just become a no-op?

            adilger Andreas Dilger added a comment - I have to question whether it even makes sense to migrate a released file? Maybe this should just become a no-op?

            I think the 3rd question is a BUG. I didn't look into the code, but I guess the root cause of this problem is that zero data version was returned for released file, but later after file was restored it saw different data version.

            The 2nd question is a good one. The question can be refined to support setstripe style of restore operation, in another word, the command `lfs hsm_restore' should be able to override original stripe pattern.

            jay Jinshan Xiong (Inactive) added a comment - I think the 3rd question is a BUG. I didn't look into the code, but I guess the root cause of this problem is that zero data version was returned for released file, but later after file was restored it saw different data version. The 2nd question is a good one. The question can be refined to support setstripe style of restore operation, in another word, the command `lfs hsm_restore' should be able to override original stripe pattern.
            fzago Frank Zago (Inactive) added a comment - - edited

            When I tried migrating to 2 stripes, the file was restored to only one stripe. So that part looks ok actually. That third question is now moot. The first 2 remain.

            fzago Frank Zago (Inactive) added a comment - - edited When I tried migrating to 2 stripes, the file was restored to only one stripe. So that part looks ok actually. That third question is now moot. The first 2 remain.

            When I archive/restore a file, the objid stays the same. It's not the case here.

            I'll try with 2 stripes.

            fzago Frank Zago (Inactive) added a comment - When I archive/restore a file, the objid stays the same. It's not the case here. I'll try with 2 stripes.

            HSM may not allocate the original OST to restore the file, thus your example can't verify that the migration has completed. Please try to migrate the released file to have 2 stripes and see how it goes.

            jay Jinshan Xiong (Inactive) added a comment - HSM may not allocate the original OST to restore the file, thus your example can't verify that the migration has completed. Please try to migrate the released file to have 2 stripes and see how it goes.

            After the "failed" migration, getstripes return this:

            # lfs getstripe zip
            zip
            lmm_stripe_count:   1
            lmm_stripe_size:    1048576
            lmm_pattern:        1
            lmm_layout_gen:     2
            lmm_stripe_offset:  0
            	obdidx		 objid		 objid		 group
            	     0	             3	          0x3	             0
            

            So the file has indeed migrated, and is not the original one simply restored.

            fzago Frank Zago (Inactive) added a comment - After the "failed" migration, getstripes return this: # lfs getstripe zip zip lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 2 lmm_stripe_offset: 0 obdidx objid objid group 0 3 0x3 0 So the file has indeed migrated, and is not the original one simply restored.

            People

              bevans Ben Evans (Inactive)
              fzago Frank Zago (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: