Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.12.2
    • None
    • EL 7.6.1810 x86_64
    • 3
    • 9223372036854775807

    Description

      MDT filesystem filled up, MDS crashed and would crash again shortly after mounting filesystem . Disabled mount on boot and mounted the MDT as ldiskfs to make some space. After deleting some files, unmounted and ran a fsck.ext4. This has crashed twice after running for 6-7 days, the first time the MDS was unresponsive and had to be rebooted, the second time I could capture a backtrace. I remounted the MDT - still showing 100% (could well be because the fsck didn't finish). Not sure what to do next.

      Attachments

        Issue Links

          Activity

            [LU-14056] MDT filesystem full, fsck crashing

            Thanks for all the help. It might have been the addition of a filer that is being backed up on this cluster that has 284 million files...

            cmcl Campbell Mcleay (Inactive) added a comment - Thanks for all the help. It might have been the addition of a filer that is being backed up on this cluster that has 284 million files...
            pjones Peter Jones added a comment -

            Looks like we're ok to close out this ticket now

            pjones Peter Jones added a comment - Looks like we're ok to close out this ticket now

            Looking at the xattr blocks in bmds1-sample-files.tar.gz it appears that the external xattr block is being used by the "link" xattr, which tracks the hard links for each file. On those files I saw between 10 and 30 links on those files, with an xattr size between 900-3000 bytes because of the relatively long filenames (60-75 bytes each).

            If this is a common workload for you, then there are a couple of options (for future filesystems at least) to change the formatting options on the MDT from the defaults. The default is to format with 2560 bytes/inode (1024 bytes for the inode itself, plus an average of 1536 bytes/inode for xattrs, directory entry, logs, etc.). Formatting the MDT with "mkfs.lustre --mdt ... --mkfsoptions='-i 5120'" would allow a 4KB xattr block for each inode. While each inode wouldn't necessarily have an xattr block, there are also directory blocks and other needs for that space. Unfortunately, it isn't possible to change this ratio for an existing MDT filesystem without a full backup/restore.

            adilger Andreas Dilger added a comment - Looking at the xattr blocks in bmds1-sample-files.tar.gz it appears that the external xattr block is being used by the " link " xattr, which tracks the hard links for each file. On those files I saw between 10 and 30 links on those files, with an xattr size between 900-3000 bytes because of the relatively long filenames (60-75 bytes each). If this is a common workload for you, then there are a couple of options (for future filesystems at least) to change the formatting options on the MDT from the defaults. The default is to format with 2560 bytes/inode (1024 bytes for the inode itself, plus an average of 1536 bytes/inode for xattrs, directory entry, logs, etc.). Formatting the MDT with " mkfs.lustre --mdt ... --mkfsoptions='-i 5120' " would allow a 4KB xattr block for each inode. While each inode wouldn't necessarily have an xattr block, there are also directory blocks and other needs for that space. Unfortunately, it isn't possible to change this ratio for an existing MDT filesystem without a full backup/restore.

            Hi Andreas,

            Yes, it is up and operational. Thanks for all your help!

            Kind regards,

            Campbell

            cmcl Campbell Mcleay (Inactive) added a comment - Hi Andreas, Yes, it is up and operational. Thanks for all your help! Kind regards, Campbell

            cmcl Just to confirm, besides the "osp_sync_declare_add()) logging isn't available" message, my understanding is that this filesystem is now up and operational?

            adilger Andreas Dilger added a comment - cmcl Just to confirm, besides the " osp_sync_declare_add()) logging isn't available " message, my understanding is that this filesystem is now up and operational?

            Hi Andreas,

            Sadly I didn't capture the summary information at the end of the fsck, we had a power cut and so things have been hectic the last two days. I have captured some info on 4 files on the MDS (let me know if you would like more). I will attach them to the ticket. I will also create a new ticket for the log issue.

            Kind regards,

            Campbell bmds1-sample-files.tar.gz

            cmcl Campbell Mcleay (Inactive) added a comment - Hi Andreas, Sadly I didn't capture the summary information at the end of the fsck, we had a power cut and so things have been hectic the last two days. I have captured some info on 4 files on the MDS (let me know if you would like more). I will attach them to the ticket. I will also create a new ticket for the log issue. Kind regards, Campbell bmds1-sample-files.tar.gz

            As for the llog message:

            osp_sync_declare_add()) logging isn't available, run LFSCK
            

            it implies that the MDS isn't able to create a recovery log (typically) for OST object deletes. This may eventually become an issue, depending on how this error is handled. Could you please file that into a separate LU ticket so that it can be tracked and fixed properly. The "run LFSCK" part means (AFAIK) that there is a chance of OST objects not being deleted, so deleting files from the filesystem will not reduce space usage on the OSTs.

            I'd need someone else to look into whether this error means "the OST object is deleted immediately, and space will be orphaned only in case of an MDS crash" (i.e. very low severity), or "no OST object is deleted and space may run out quickly" (more serious), and/or whether this issue is specific to a single OST (less important, but seems to be the case from what I can see), or it affects many/all OSTs (more serious). This can be assigned and resolved (along with a better error message) in the context of the new ticket.

            adilger Andreas Dilger added a comment - As for the llog message: osp_sync_declare_add()) logging isn't available, run LFSCK it implies that the MDS isn't able to create a recovery log (typically) for OST object deletes. This may eventually become an issue, depending on how this error is handled. Could you please file that into a separate LU ticket so that it can be tracked and fixed properly. The " run LFSCK " part means (AFAIK) that there is a chance of OST objects not being deleted, so deleting files from the filesystem will not reduce space usage on the OSTs. I'd need someone else to look into whether this error means "the OST object is deleted immediately, and space will be orphaned only in case of an MDS crash" (i.e. very low severity), or "no OST object is deleted and space may run out quickly" (more serious), and/or whether this issue is specific to a single OST (less important, but seems to be the case from what I can see), or it affects many/all OSTs (more serious). This can be assigned and resolved (along with a better error message) in the context of the new ticket.

            People

              pjones Peter Jones
              cmcl Campbell Mcleay (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: