Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17334

Client should handle dir/file/object created on newly added MDT/OST

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      When a new MDT or OST is added to a filesystem without no_create, then a new subdirectory or file could be created on the new MDT, or a new object created on an OST relatively quickly after it is added to the filesystem, in particular because the new MDT/OST would be preferred by QOS space balancing due to lots of free space. However, it might take a few seconds for the addition of the new MDT/OST to be propagated across all of the clients, so there is a risk that the MDS creates file object on OSTs that a client is not yet aware of. There is a much smaller risk that an MDT is used for a subdirectory or file that a client is (depending on workload, if multiple clients are working in the same directory tree in parallel).

      This ticket is tracking the case where a new MDT or OST is used for a subdirectory that is not in the config, then the client should either wait and retry for some short time, possibly actively pulling the config from the MGS to see if the target was newly added, instead of immediately returning an error to the application. LU-17300 is tracking the issue of not creating new subdirs/files/objects on newly-added targets in the first place.

      It is still possible that the file layout is itself corrupted for whatever reason, and referencing an OST or MDT index that will never exist in the filesystem, so the client should not retry this operation indefinitely. But an (up to) ~30 second application delay while the configuration is distributed across the cluster is far preferable to the application getting an error.

      Attachments

        Issue Links

          Activity

            [LU-17334] Client should handle dir/file/object created on newly added MDT/OST
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53860/
            Subject: LU-17334 lmv: exclude newly added MDT in mkdir
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a2b08583a1dc8ab18c4ea4a4b900870761a5c252

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53860/ Subject: LU-17334 lmv: exclude newly added MDT in mkdir Project: fs/lustre-release Branch: master Current Patch Set: Commit: a2b08583a1dc8ab18c4ea4a4b900870761a5c252

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53363/
            Subject: LU-17334 lmv: handle object created on newly added MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 94a4663db95656ade6b6e695b849cd7763f0bd49

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53363/ Subject: LU-17334 lmv: handle object created on newly added MDT Project: fs/lustre-release Branch: master Current Patch Set: Commit: 94a4663db95656ade6b6e695b849cd7763f0bd49

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53860
            Subject: LU-17334 lmv: exclude newly added MDT in mkdir
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4de7657c74729530485ffa31d7ca179e9dadfe5d

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53860 Subject: LU-17334 lmv: exclude newly added MDT in mkdir Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4de7657c74729530485ffa31d7ca179e9dadfe5d

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53335/
            Subject: LU-17334 lov: handle object created on newly added OST
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f35f897ec8ec0752ea4d4830e72f5193375a474b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53335/ Subject: LU-17334 lov: handle object created on newly added OST Project: fs/lustre-release Branch: master Current Patch Set: Commit: f35f897ec8ec0752ea4d4830e72f5193375a474b

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53363
            Subject: LU-17334 lmv: handle object created on newly added MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b4405f028cb03c3780f30f9efa88ccb33f3ee621

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53363 Subject: LU-17334 lmv: handle object created on newly added MDT Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b4405f028cb03c3780f30f9efa88ccb33f3ee621

            Lai, could you please make a separate patch for the LMV changes, so that it does not disrupt the LOV patch from testing/landing.

            Please do a "checkout" of Jian's patch "git fetch ssh://adilger@review.whamcloud.com:29418/fs/lustre-release refs/changes/35/53335/7 && git checkout FETCH_HEAD" and base your patch on top of it and then rebase the test patch https://review.whamcloud.com/53300 onto yours, so that it will test both patches together. It looks like it is sometimes failing in Gerrit Janitor with ENOSPC, but I'm not sure if that is a problem with the test script or some issue with Lustre file/object allocation, because the test should only be copying 1/2 of the initial OST size. I've reduced this to 1/4 of the OST size to see if that makes a difference.

            adilger Andreas Dilger added a comment - Lai, could you please make a separate patch for the LMV changes, so that it does not disrupt the LOV patch from testing/landing. Please do a "checkout" of Jian's patch " git fetch ssh://adilger@review.whamcloud.com:29418/fs/lustre-release refs/changes/35/53335/7 && git checkout FETCH_HEAD " and base your patch on top of it and then rebase the test patch https://review.whamcloud.com/53300 onto yours, so that it will test both patches together. It looks like it is sometimes failing in Gerrit Janitor with ENOSPC, but I'm not sure if that is a problem with the test script or some issue with Lustre file/object allocation, because the test should only be copying 1/2 of the initial OST size. I've reduced this to 1/4 of the OST size to see if that makes a difference.

            It looks like the 53335 patch is working reasonably well. I see evidence in the testing logs that this patch is working:

             Lustre: 79053:0:(lov_ea.c:299:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
             Lustre: lustre-clilov_UUID: wait 30s while client connects to new OST
             Lustre: 35469:0:(lov_ea.c:299:lsme_unpack()) lustre-clilov_UUID: OST index 2 more than OST count 2
             Lustre: lustre-clilov_UUID: wait 30s while client connects to new OST 
            

            There are still some failures of the subtest, but it looks like these are MDT issues and not OST issues, since it looks like they relate to filename lookup (though the "No such file or directory (2) = -ENOENT" error is ambiguous as to whether an OST or MDT object could not be found).

            Using the OSS console log timestamps, it looks like it is just after MDT0003 was mounted, and while the first OST is finished mounting:

            MDS2 [ 2488.656376] Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
            MDS2 [ 2488.990425] Lustre: DEBUG MARKER: sync; sleep 1; sync
            MDS2 [ 2490.646167] Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey 2>/dev/null
            
            OSS [ 2493.319919] LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro
            OSS [ 2494.373579] LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
            OSS [ 2494.931873] Lustre: DEBUG MARKER: e2label /dev/mapper/ost2_flakey 2>/dev/null
            OSS [ 2495.391620] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param 				seq.cli-lustre-OST0001-super.width=18724
            OSS [ 2495.728824] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
            
            LOG rsync: stat "/mnt/lustre/d46b.conf-sanity/python3.6/site-packages" failed: No such file or directory (2)
            
            OSS [ 2497.208663] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4
            OSS [ 2497.426045] Lustre: DEBUG MARKER: trevis-33vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4
            OSS [ 2497.448788] Lustre: DEBUG MARKER: trevis-33vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4
            OSS [ 2497.628831] Lustre: DEBUG MARKER: e2label /dev/mapper/ost2_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
            OSS [ 2498.028865] Lustre: DEBUG MARKER: sync; sleep 1; sync
            OSS [ 2500.674301] Lustre: DEBUG MARKER: e2label /dev/mapper/ost2_flakey 2>/dev/null
            LOG Started lustre-OST0001
            
            adilger Andreas Dilger added a comment - It looks like the 53335 patch is working reasonably well. I see evidence in the testing logs that this patch is working: Lustre: 79053:0:(lov_ea.c:299:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1 Lustre: lustre-clilov_UUID: wait 30s while client connects to new OST Lustre: 35469:0:(lov_ea.c:299:lsme_unpack()) lustre-clilov_UUID: OST index 2 more than OST count 2 Lustre: lustre-clilov_UUID: wait 30s while client connects to new OST There are still some failures of the subtest, but it looks like these are MDT issues and not OST issues, since it looks like they relate to filename lookup (though the " No such file or directory (2) = -ENOENT " error is ambiguous as to whether an OST or MDT object could not be found). Using the OSS console log timestamps, it looks like it is just after MDT0003 was mounted, and while the first OST is finished mounting: MDS2 [ 2488.656376] Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' MDS2 [ 2488.990425] Lustre: DEBUG MARKER: sync; sleep 1; sync MDS2 [ 2490.646167] Lustre: DEBUG MARKER: e2label /dev/mapper/mds4_flakey 2>/dev/null OSS [ 2493.319919] LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro OSS [ 2494.373579] LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc OSS [ 2494.931873] Lustre: DEBUG MARKER: e2label /dev/mapper/ost2_flakey 2>/dev/null OSS [ 2495.391620] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param seq.cli-lustre-OST0001-super.width=18724 OSS [ 2495.728824] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check LOG rsync: stat "/mnt/lustre/d46b.conf-sanity/python3.6/site-packages" failed: No such file or directory (2) OSS [ 2497.208663] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-33vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4 OSS [ 2497.426045] Lustre: DEBUG MARKER: trevis-33vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4 OSS [ 2497.448788] Lustre: DEBUG MARKER: trevis-33vm3.trevis.whamcloud.com: executing set_default_debug -1 all 4 OSS [ 2497.628831] Lustre: DEBUG MARKER: e2label /dev/mapper/ost2_flakey 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' OSS [ 2498.028865] Lustre: DEBUG MARKER: sync; sleep 1; sync OSS [ 2500.674301] Lustre: DEBUG MARKER: e2label /dev/mapper/ost2_flakey 2>/dev/null LOG Started lustre-OST0001
            laisiyao Lai Siyao added a comment -

            Mmm, Fujian, I'll update your patch with the LMV change.

            laisiyao Lai Siyao added a comment - Mmm, Fujian, I'll update your patch with the LMV change.

            laisiyao, it is getting more likely that the test failures after Jian's LOV patch are now related to MDT addition:

            1701844910: rsync: stat "/mnt/lustre/d46b.conf-sanity/etc/NetworkManager" failed: No such file or directory (2)
            1701844910: rsync: recv_generator: mkdir "/mnt/lustre/d46b.conf-sanity/etc/NetworkManager/conf.d" failed: No such file or directory (2)
            :
            :
            (Default) /mnt/lustre/d46b.conf-sanity/etc/NetworkManager
            lmm_fid:           [0x280000401:0x1:0x0]
            stripe_count:  1 stripe_size:   4194304 pattern:       0 stripe_offset: -1
            

            This is the first file to report an error, and it looks like the first file to be allocated on the newly-added MDT0001 (based on FID):

            lustre: cli-ctl-lustre-MDT0001: Allocated super-sequence [0x0000000240000400-0x0000000280000400):1:mdt]
            
            adilger Andreas Dilger added a comment - laisiyao , it is getting more likely that the test failures after Jian's LOV patch are now related to MDT addition: 1701844910: rsync: stat "/mnt/lustre/d46b.conf-sanity/etc/NetworkManager" failed: No such file or directory (2) 1701844910: rsync: recv_generator: mkdir "/mnt/lustre/d46b.conf-sanity/etc/NetworkManager/conf.d" failed: No such file or directory (2) : : (Default) /mnt/lustre/d46b.conf-sanity/etc/NetworkManager lmm_fid: [0x280000401:0x1:0x0] stripe_count: 1 stripe_size: 4194304 pattern: 0 stripe_offset: -1 This is the first file to report an error, and it looks like the first file to be allocated on the newly-added MDT0001 (based on FID): lustre: cli-ctl-lustre-MDT0001: Allocated super-sequence [0x0000000240000400-0x0000000280000400):1:mdt]

            People

              laisiyao Lai Siyao
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: