Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17300

Avoid creating new dir/file/object on newly added MDT/OST

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      When a new MDT is added to a filesystem without no_create then the DNE MDT auto-balance is enabled, then a new subdirectory could be created on the new MDT relatively quickly after it is added to the filesystem. However, it might take a few seconds for the addition of the new MDT to be propagated across all of the clients, so there is a small risk that an MDT is holding a subdirectory that another client is not yet aware of.

      I think there are two relatively simple solutions to this:

      • don't consider a new MDT for subdirectory creation for, say, 30s after it appears in the config, to give other clients a chance to discover it
      • if a new MDT is used for a subdirectory that is not in the config, then the client should pull the config from the MGS to see if it was newly added.

      The similar situation exists with new OSTs added to an existing files, where the MDS is notified about an OST addition and allocates an object to a file, but the client has not yet been notified about the new OST and will return an error accessing the file. The solution in this case is the same - avoiding to use a new OST for object allocation for ~30s after the OST is added, and similarly for clients to pull the config from the MGS to see if the OST was newly added.

      This ticket is tracking the first issue - don't create new subdirs/files/objects on newly-added targets. LU-17344 is separately tracking the client more gracefully handling the presence of subdirs/files/objects on an MDT or OST that it doesn't know about.

      Attachments

        Issue Links

          Activity

            [LU-17300] Avoid creating new dir/file/object on newly added MDT/OST
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DOE-70 [ DOE-70 ]
            adilger Andreas Dilger added a comment - - edited

            A very simple solution here would be for newly-formatted MDTs and OSTs to just set no_create=1 automatically on detecting this is the first mount? If "-o no_create" was passed as a mount option, then this would stay set indefinitely (on the assumption that the admin knows to deactivate it manually when they are ready). If no mount option was used, then a delayed work item would set no_create=0 automatically 60s after mount was completed.

            For most "real" filesystems this initial 60s delay before new files can be created on the OSTs would be a non-issue. This might slow down some testing for e.g. conf-sanity which is formatting mounting a lot of new filesystems, but it could also be cleared by test-framework.sh immediately after mount to avoid this delay.

            adilger Andreas Dilger added a comment - - edited A very simple solution here would be for newly-formatted MDTs and OSTs to just set no_create=1 automatically on detecting this is the first mount? If " -o no_create " was passed as a mount option, then this would stay set indefinitely (on the assumption that the admin knows to deactivate it manually when they are ready). If no mount option was used, then a delayed work item would set no_create=0 automatically 60s after mount was completed. For most "real" filesystems this initial 60s delay before new files can be created on the OSTs would be a non-issue. This might slow down some testing for e.g. conf-sanity which is formatting mounting a lot of new filesystems, but it could also be cleared by test-framework.sh immediately after mount to avoid this delay.
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LR-11 [ LR-11 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17327 [ LU-17327 ]

            The new conf-sanity test_46b added in patch: https://review.whamcloud.com/53300 "LU-17327 tests: add test case for online MDT and OST addition" shows this case nicely:

            [  183.117948] Lustre: DEBUG MARKER: == conf-sanity test 46b: online OST and MDT addition ===== 16:48:36 (1701380916)
            [  206.830361] Lustre: Mounted lustre-client
            [  214.067224] LustreError: 14230:0:(lov_ea.c:279:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
            [  214.070240] Lustre: 14230:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x2ab:1025, magic 0x0bd10bd0, pattern 0x1
            [  214.072822] Lustre: 14230:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0
            [  214.075459] Lustre: 14230:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x2c0000401:2
            [  214.078104] LustreError: 14230:0:(lcommon_cl.c:196:cl_file_inode_init()) lustre: failed to initialize cl_object [0x200000401:0x2ab:0x0]: rc = -22
            [  214.081653] LustreError: 14230:0:(llite_lib.c:3613:ll_prep_inode()) new_inode -fatal: rc -22
            [  460.594212] rsync (14223) used greatest stack depth: 10688 bytes left
            [  460.900709] Lustre: DEBUG MARKER: conf-sanity test_46b: @@@@@@ FAIL: rsync failed
            
            adilger Andreas Dilger added a comment - The new conf-sanity test_46b added in patch: https://review.whamcloud.com/53300 " LU-17327 tests: add test case for online MDT and OST addition " shows this case nicely: [ 183.117948] Lustre: DEBUG MARKER: == conf-sanity test 46b: online OST and MDT addition ===== 16:48:36 (1701380916) [ 206.830361] Lustre: Mounted lustre-client [ 214.067224] LustreError: 14230:0:(lov_ea.c:279:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1 [ 214.070240] Lustre: 14230:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x2ab:1025, magic 0x0bd10bd0, pattern 0x1 [ 214.072822] Lustre: 14230:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 214.075459] Lustre: 14230:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x2c0000401:2 [ 214.078104] LustreError: 14230:0:(lcommon_cl.c:196:cl_file_inode_init()) lustre: failed to initialize cl_object [0x200000401:0x2ab:0x0]: rc = -22 [ 214.081653] LustreError: 14230:0:(llite_lib.c:3613:ll_prep_inode()) new_inode -fatal: rc -22 [ 460.594212] rsync (14223) used greatest stack depth: 10688 bytes left [ 460.900709] Lustre: DEBUG MARKER: conf-sanity test_46b: @@@@@@ FAIL: rsync failed
            adilger Andreas Dilger made changes -
            Summary Original: Avoid new dir/file/object on newly added MDT/OST New: Avoid creating new dir/file/object on newly added MDT/OST
            adilger Andreas Dilger made changes -
            Description Original: When a new MDT is added to a filesystem without {{no_create}} then the DNE MDT auto-balance is enabled, then a new subdirectory could be created on the new MDT relatively quickly after it is added to the filesystem. However, it might take a few seconds for the addition of the new MDT to be propagated across all of the clients, so there is a small risk that an MDT is holding a subdirectory that another client is not yet aware of.

            I think there are two relatively simple solutions to this:
            - don't consider a new MDT for subdirectory creation for, say, 30s after it appears in the config, to give other clients a chance to discover it
            - if a new MDT is used for a subdirectory that is not in the config, then the client should pull the config from the MGS to see if it was newly added.

            The similar situation exists with new OSTs added to an existing files, where the MDS is notified about an OST addition and allocates an object to a file, but the client has not yet been notified about the new OST and will return an error accessing the file. The solution in this case is the same - avoiding to use a new OST for object allocation for ~30s after the OST is added, and similarly for clients to pull the config from the MGS to see if the OST was newly added.
            New: When a new MDT is added to a filesystem without {{no_create}} then the DNE MDT auto-balance is enabled, then a new subdirectory could be created on the new MDT relatively quickly after it is added to the filesystem. However, it might take a few seconds for the addition of the new MDT to be propagated across all of the clients, so there is a small risk that an MDT is holding a subdirectory that another client is not yet aware of.

            I think there are two relatively simple solutions to this:
            - don't consider a new MDT for subdirectory creation for, say, 30s after it appears in the config, to give other clients a chance to discover it
            - if a new MDT is used for a subdirectory that is not in the config, then the client should pull the config from the MGS to see if it was newly added.

            The similar situation exists with new OSTs added to an existing files, where the MDS is notified about an OST addition and allocates an object to a file, but the client has not yet been notified about the new OST and will return an error accessing the file. The solution in this case is the same - avoiding to use a new OST for object allocation for ~30s after the OST is added, and similarly for clients to pull the config from the MGS to see if the OST was newly added.

            This ticket is tracking the first issue - don't create new subdirs/files/objects on newly-added targets. LU-17344 is separately tracking the client more gracefully handling the presence of subdirs/files/objects on an MDT or OST that it doesn't know about.
            adilger Andreas Dilger made changes -
            Summary Original: Avoid and/or handle new dir/file/object on newly added MDT/OST New: Avoid new dir/file/object on newly added MDT/OST
            adilger Andreas Dilger made changes -
            Link New: This issue is related to EX-8750 [ EX-8750 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is cloned by LU-17334 [ LU-17334 ]

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: