Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.0
    • 9223372036854775807

    Description

      There needs to be a mechanism to repair the project IDs on the OST objects if they become inconsistent with the project ID on the MDT inodes. In some rare cases they can be different (in particular projid=0 on the OST objects, even though they have data). This can lead to inconsistencies between "du -sk <dir>" and "lfs quota -p <projid> <dir>" that are not corrected by running "e2fsck" or "tune2fs" or "lfs project -cr <dir>".

      The "lfs project -cr <dir>" command will only verify the projid on the MDT inode matches the parent directory, but it doesn't verify the projid on the OST objects also matches. In most cases, that is the correct behavior, since the MDT and OST projids should be in sync and checking every OST object can increase the RPC count by an order of magnitude.

      Running "lfs project -r -p 0 <dir>; lfs project -r -p <projid> <dir>" will reset the projid of every file, but this has the drawback that the project quota of the affected directory tree will be totally inaccurate for the whole time that this process is running (maybe hours), and affects objects that may have the correct projid already.

      One option is to add an option like "lfs project -r -p <projid> --force-osts <dir>", that also sends an RPC to all of the OSTs to update their project ID, even if the MDT projid matches. While this will still send an RPC for every object in the directory tree, it doesn't have to modify the OST objects if the projid matches already. That would avoid a potentially large number of OST disk IOPS (double if the "-p 0; -p <projid>" method is used).

      It would also seem relatively easy to have OST write RPCs with obdo.o_projid that have a non-zero value to update OST objects that have projid==0. This would be the same one-shot mechanism to set the projid for OST objects as is done with UID/GID on first write. This depends on the clients consistently setting o_projid with the bulk write and setattr RPCs, and the OST to check this, but it can be done without interoperability concerns, especially since the o_projid would always relate to the inode and not to the process that is accessing the file.

      It would also be useful if LFSCK would verify the MDT inode UID/GID/PROJID match those on on OST objects of a file, and update them if different.

      Attachments

        Issue Links

          Activity

            [LU-15520] OST object projid and quota reset
            adilger Andreas Dilger made changes -
            Link New: This issue is related to EX-11729 [ EX-11729 ]
            adilger Andreas Dilger made changes -
            Labels Original: medium quota New: LFSCK medium quota
            adilger Andreas Dilger made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Core Lustre Triage [ core-lustre-triage ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-18757 [ LU-18757 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-18756 [ LU-18756 ]
            adilger Andreas Dilger made changes -
            Description Original: There needs to be a mechanism to reset the project IDs on the OST objects if they become inconsistent with the project ID on the MDT inodes. In some rare cases they can be different (in particular projid=0 on the OST objects, even though they have data). This can lead to inconsistencies between "{{du -sk <dir>}}" and "{{lfs quota -p <projid> <dir>}}" that are not corrected by running "{{e2fsck}}" or "{{tune2fs}}" or "{{lfs project -cr <dir>}}".

            The "{{lfs project -cr <dir>}}" command will only verify the projid on the MDT inode matches the parent directory, but it doesn't verify the projid on the OST objects also matches. In most cases, that is the correct behavior, since the MDT and OST projids should be in sync and checking every OST object can increase the RPC count by an order of magnitude.

            Running "{{lfs project -r -p 0 <dir>; lfs project -r -p <projid> <dir>}}" will reset the projid of every file, but this has the drawback that the project quota of the affected directory tree will be totally inaccurate for the whole time that this process is running (maybe hours), and affects objects that may have the correct projid already.

            One option is to add an option like "{{lfs project -r -p <projid> --force-osts <dir>}}", that also sends an RPC to all of the OSTs to update their project ID, even if the MDT projid matches. While this will still send an RPC for *every* object in the directory tree, it doesn't have to _modify_ the OST objects if the projid matches already. That would avoid a potentially large number of OST disk IOPS (double if the "{{-p 0; -p <projid>}}" method is used).

            It would also seem relatively easy to have OST write RPCs with {{obdo.o_projid}} that have a non-zero value to update OST objects that have {{projid==0}}. This would be the same one-shot mechanism to set the projid for OST objects as is done with UID/GID on first write. This depends on the clients consistently setting {{o_projid}} with the bulk write and setattr RPCs, and the OST to check this, but it can be done without interoperability concerns, especially since the {{o_projid}} would always relate to the *inode* and not to the *process* that is accessing the file.

            It would also be useful if LFSCK would verify the MDT inode UID/GID/PROJID match those on on OST objects of a file, and update them if different.
            New: There needs to be a mechanism to repair the project IDs on the OST objects if they become inconsistent with the project ID on the MDT inodes. In some rare cases they can be different (in particular projid=0 on the OST objects, even though they have data). This can lead to inconsistencies between "{{du -sk <dir>}}" and "{{lfs quota -p <projid> <dir>}}" that are not corrected by running "{{e2fsck}}" or "{{tune2fs}}" or "{{lfs project -cr <dir>}}".

            The "{{lfs project -cr <dir>}}" command will only verify the projid on the MDT inode matches the parent directory, but it doesn't verify the projid on the OST objects also matches. In most cases, that is the correct behavior, since the MDT and OST projids should be in sync and checking every OST object can increase the RPC count by an order of magnitude.

            Running "{{lfs project -r -p 0 <dir>; lfs project -r -p <projid> <dir>}}" will reset the projid of every file, but this has the drawback that the project quota of the affected directory tree will be totally inaccurate for the whole time that this process is running (maybe hours), and affects objects that may have the correct projid already.

            One option is to add an option like "{{lfs project -r -p <projid> --force-osts <dir>}}", that also sends an RPC to all of the OSTs to update their project ID, even if the MDT projid matches. While this will still send an RPC for *every* object in the directory tree, it doesn't have to _modify_ the OST objects if the projid matches already. That would avoid a potentially large number of OST disk IOPS (double if the "{{-p 0; -p <projid>}}" method is used).

            It would also seem relatively easy to have OST write RPCs with {{obdo.o_projid}} that have a non-zero value to update OST objects that have {{projid==0}}. This would be the same one-shot mechanism to set the projid for OST objects as is done with UID/GID on first write. This depends on the clients consistently setting {{o_projid}} with the bulk write and setattr RPCs, and the OST to check this, but it can be done without interoperability concerns, especially since the {{o_projid}} would always relate to the *inode* and not to the *process* that is accessing the file.

            It would also be useful if LFSCK would verify the MDT inode UID/GID/PROJID match those on on OST objects of a file, and update them if different.
            pjones Peter Jones made changes -
            Link New: This issue is related to DDN-5649 [ DDN-5649 ]
            adilger Andreas Dilger made changes -
            Labels Original: medium New: medium quota
            adilger Andreas Dilger made changes -
            Link New: This issue is related to DDN-5390 [ DDN-5390 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16988 [ LU-16988 ]

            People

              core-lustre-triage Core Lustre Triage
              adilger Andreas Dilger
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: