Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.12.2
    • None
    • EL 7.6.1810 x86_64
    • 3
    • 9223372036854775807

    Description

      MDT filesystem filled up, MDS crashed and would crash again shortly after mounting filesystem . Disabled mount on boot and mounted the MDT as ldiskfs to make some space. After deleting some files, unmounted and ran a fsck.ext4. This has crashed twice after running for 6-7 days, the first time the MDS was unresponsive and had to be rebooted, the second time I could capture a backtrace. I remounted the MDT - still showing 100% (could well be because the fsck didn't finish). Not sure what to do next.

      Attachments

        Issue Links

          Activity

            [LU-14056] MDT filesystem full, fsck crashing

            Thanks for all the help. It might have been the addition of a filer that is being backed up on this cluster that has 284 million files...

            cmcl Campbell Mcleay (Inactive) added a comment - Thanks for all the help. It might have been the addition of a filer that is being backed up on this cluster that has 284 million files...
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Looks like we're ok to close out this ticket now

            pjones Peter Jones added a comment - Looks like we're ok to close out this ticket now
            pjones Peter Jones made changes -
            Link Original: This issue is related to JFC-21 [ JFC-21 ]

            Looking at the xattr blocks in bmds1-sample-files.tar.gz it appears that the external xattr block is being used by the "link" xattr, which tracks the hard links for each file. On those files I saw between 10 and 30 links on those files, with an xattr size between 900-3000 bytes because of the relatively long filenames (60-75 bytes each).

            If this is a common workload for you, then there are a couple of options (for future filesystems at least) to change the formatting options on the MDT from the defaults. The default is to format with 2560 bytes/inode (1024 bytes for the inode itself, plus an average of 1536 bytes/inode for xattrs, directory entry, logs, etc.). Formatting the MDT with "mkfs.lustre --mdt ... --mkfsoptions='-i 5120'" would allow a 4KB xattr block for each inode. While each inode wouldn't necessarily have an xattr block, there are also directory blocks and other needs for that space. Unfortunately, it isn't possible to change this ratio for an existing MDT filesystem without a full backup/restore.

            adilger Andreas Dilger added a comment - Looking at the xattr blocks in bmds1-sample-files.tar.gz it appears that the external xattr block is being used by the " link " xattr, which tracks the hard links for each file. On those files I saw between 10 and 30 links on those files, with an xattr size between 900-3000 bytes because of the relatively long filenames (60-75 bytes each). If this is a common workload for you, then there are a couple of options (for future filesystems at least) to change the formatting options on the MDT from the defaults. The default is to format with 2560 bytes/inode (1024 bytes for the inode itself, plus an average of 1536 bytes/inode for xattrs, directory entry, logs, etc.). Formatting the MDT with " mkfs.lustre --mdt ... --mkfsoptions='-i 5120' " would allow a 4KB xattr block for each inode. While each inode wouldn't necessarily have an xattr block, there are also directory blocks and other needs for that space. Unfortunately, it isn't possible to change this ratio for an existing MDT filesystem without a full backup/restore.

            Hi Andreas,

            Yes, it is up and operational. Thanks for all your help!

            Kind regards,

            Campbell

            cmcl Campbell Mcleay (Inactive) added a comment - Hi Andreas, Yes, it is up and operational. Thanks for all your help! Kind regards, Campbell

            cmcl Just to confirm, besides the "osp_sync_declare_add()) logging isn't available" message, my understanding is that this filesystem is now up and operational?

            adilger Andreas Dilger added a comment - cmcl Just to confirm, besides the " osp_sync_declare_add()) logging isn't available " message, my understanding is that this filesystem is now up and operational?
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-14098 [ LU-14098 ]

            Hi Andreas,

            Sadly I didn't capture the summary information at the end of the fsck, we had a power cut and so things have been hectic the last two days. I have captured some info on 4 files on the MDS (let me know if you would like more). I will attach them to the ticket. I will also create a new ticket for the log issue.

            Kind regards,

            Campbell bmds1-sample-files.tar.gz

            cmcl Campbell Mcleay (Inactive) added a comment - Hi Andreas, Sadly I didn't capture the summary information at the end of the fsck, we had a power cut and so things have been hectic the last two days. I have captured some info on 4 files on the MDS (let me know if you would like more). I will attach them to the ticket. I will also create a new ticket for the log issue. Kind regards, Campbell bmds1-sample-files.tar.gz
            cmcl Campbell Mcleay (Inactive) made changes -
            Attachment New: bmds1-sample-files.tar.gz [ 36498 ]

            People

              pjones Peter Jones
              cmcl Campbell Mcleay (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: