Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5204

2.6 DNE stress testing: EINVAL when attempting to delete file

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.6.0
    • 3
    • 14529

    Description

      After our stress testing this weekend, we are unable to delete some (perhaps any?) of the files on a particular OST (OST 38). All of them give EINVAL.

      For example:
      [root@galaxy-esf-mds008 tmp]# rm -f posix_shm_open
      rm: cannot remove `posix_shm_open': Invalid argument
      [root@galaxy-esf-mds008 tmp]# lfs getstripe posix_shm_open
      posix_shm_open
      lmm_stripe_count: 1
      lmm_stripe_size: 1048576
      lmm_pattern: 1
      lmm_layout_gen: 0
      lmm_stripe_offset: 38
      obdidx objid objid group
      38 907263 0xdd7ff 0

      However, OST 38 (OST0027) is showing up in lctl dl, and as far as I know, there are no issues with it. (The dk logs on the OSS don't show any issues.)

      Here's the relevant part of the log from MDT000:
      00000004:00020000:2.0:1402947131.685511:0:25039:0:(lod_lov.c:695:validate_lod_and_idx()) esfprod-MDT0000-mdtlov: bad idx: 38 of 64
      00000004:00000001:2.0:1402947131.685513:0:25039:0:(lod_lov.c:757:lod_initialize_objects()) Process leaving via out (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
      00000004:00000010:2.0:1402947131.685515:0:25039:0:(lod_lov.c:782:lod_initialize_objects()) kfreed 'stripe': 8 at ffff8807fc208a00.
      00000004:00000001:2.0:1402947131.685516:0:25039:0:(lod_lov.c:788:lod_initialize_objects()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
      00000004:00000001:2.0:1402947131.685519:0:25039:0:(lod_lov.c:839:lod_parse_striping()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
      00000004:00000001:2.0:1402947131.685520:0:25039:0:(lod_lov.c:885:lod_load_striping_locked()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
      00000004:00000001:2.0:1402947131.685522:0:25039:0:(lod_object.c:2754:lod_declare_object_destroy()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
      00000004:00000001:2.0:1402947131.685524:0:25039:0:(mdd_dir.c:1586:mdd_unlink()) Process leaving via stop (rc=18446744073709551594 : -22 : 0xffffffffffffffea)

      I don't know for certain if this is related to DNE2 or not, but this is not an error I've seen before. The file system and objects are still around, so I can provide further data if needed.

      Any thoughts?

      Attachments

        1. invalid_object_client_mdt0007
          506 kB
        2. invalid_object_mds_mdt0000
          133 kB
        3. lctl_dl_from_client
          4 kB
        4. lctl_dl_from_mds001_mdt0000
          4 kB
        5. LU-5204_mds0_start_log.tar.gz
          0.2 kB
        6. LU-5204_start_log_with_oss.tar.gz
          0.3 kB
        7. mdt0.config.log
          57 kB

        Issue Links

          Activity

            [LU-5204] 2.6 DNE stress testing: EINVAL when attempting to delete file

            The one obvious problem that I see is that it should ALWAYS be possible to delete a file, even if the OST is unavailable, or configured out of the system. Regardless of what the root cause of the problem is, there needs to be a patch to allow the file to be deleted.

            adilger Andreas Dilger added a comment - The one obvious problem that I see is that it should ALWAYS be possible to delete a file, even if the OST is unavailable, or configured out of the system. Regardless of what the root cause of the problem is, there needs to be a patch to allow the file to be deleted.

            Opened LU-5233 for the MDS1 LBUG I mentioned above.

            paf Patrick Farrell (Inactive) added a comment - Opened LU-5233 for the MDS1 LBUG I mentioned above.

            Andreas,

            It's really unlikely. No one should have been mucking with the system. I can't say it's impossible, but...

            Now that we've tracked it down to such a strange error, I'm planning to go ahead and fix it, and not worry unless it occurs again in further stress testing. In fact, I'm going to do exactly that unless someone has further information they'd like from the system. (Speak up soon - I'm going to fix it for our stress testing slot tonight.)

            I've also (in further testing) hit an MDS0 crash bug that could possibly be related to this one I'm going to open shortly. I'll reference that LU here once I've got it open.

            paf Patrick Farrell (Inactive) added a comment - Andreas, It's really unlikely. No one should have been mucking with the system. I can't say it's impossible, but... Now that we've tracked it down to such a strange error, I'm planning to go ahead and fix it, and not worry unless it occurs again in further stress testing. In fact, I'm going to do exactly that unless someone has further information they'd like from the system. (Speak up soon - I'm going to fix it for our stress testing slot tonight.) I've also (in further testing) hit an MDS0 crash bug that could possibly be related to this one I'm going to open shortly. I'll reference that LU here once I've got it open.

            Is it possible that OST0026 was ever deactivated during testing (e.g. lctl conf_param esfprod-OST0026.osc.active=0 or similar)? That would permanently disable the OST in the config log and seems to me to be the most likely cause of this problem.

            adilger Andreas Dilger added a comment - Is it possible that OST0026 was ever deactivated during testing (e.g. lctl conf_param esfprod-OST0026.osc.active=0 or similar)? That would permanently disable the OST in the config log and seems to me to be the most likely cause of this problem.

            Emoly,

            Unfortunately, I don't really know how many is enough. We have 8 MDSes and 8 MDTs, and 4 OSSes and 40 OSTs. It's a test bed system for DNE, which is why it's such a weird configuration.

            We do have separate MGT and MDT.

            As far as other things: all I know about what we did is we ran a bunch of different IO tests, like IOR and a large number of tests from the Linux test project in various configurations, all with mkdir replaced by a script which would randomly create striped or remote directories. It would also sometimes create normal directories.

            We did that last weekend, and had this problem on Monday. No idea what was running when it started.

            Sorry for not having many specifics on testing, it's a large test suite.

            We're probably going to fix the system soon by doing a writeconf, so we can continue stress testing DNE2. Let me know if there's anything else I can give you first.

            paf Patrick Farrell (Inactive) added a comment - Emoly, Unfortunately, I don't really know how many is enough. We have 8 MDSes and 8 MDTs, and 4 OSSes and 40 OSTs. It's a test bed system for DNE, which is why it's such a weird configuration. We do have separate MGT and MDT. As far as other things: all I know about what we did is we ran a bunch of different IO tests, like IOR and a large number of tests from the Linux test project in various configurations, all with mkdir replaced by a script which would randomly create striped or remote directories. It would also sometimes create normal directories. We did that last weekend, and had this problem on Monday. No idea what was running when it started. Sorry for not having many specifics on testing, it's a large test suite. We're probably going to fix the system soon by doing a writeconf, so we can continue stress testing DNE2. Let me know if there's anything else I can give you first.
            emoly.liu Emoly Liu added a comment -

            Patrick,

            I will try to upgrade a lustre file system from 2.5.1 to 2.6 to reproduce this problem. Could you please suggest how many OSTs and MDTs are enough for this test? What's more, I know MGS and MDS should be separated in this test, and anything else I should pay attention to?

            Thanks.

            emoly.liu Emoly Liu added a comment - Patrick, I will try to upgrade a lustre file system from 2.5.1 to 2.6 to reproduce this problem. Could you please suggest how many OSTs and MDTs are enough for this test? What's more, I know MGS and MDS should be separated in this test, and anything else I should pay attention to? Thanks.
            pjones Peter Jones added a comment -

            Emoly

            Could you please try reproducing this issue?

            Thanks

            Peter

            pjones Peter Jones added a comment - Emoly Could you please try reproducing this issue? Thanks Peter

            Di,

            Yes, now that we know it's a config log issue, I figured we could fix it with a writeconf operation.. But like you said, we'd like to understand the issue.

            It was not reformatted before the test. It WAS upgraded from 2.5, which required a writeconf operation at that time to get it to start.
            So it was originally formatted with Lustre 2.5.1, then upgraded to master.

            For the mkfs.lustre command for the MDT (I don't have the device name, but these are the options that were used):
            mkfs.lustre --reformat --mdt --fsname=esfprod --mgsnode=galaxy-esf-mds001 --index=0 --quiet --backfstype=ldiskfs --param sys.timeout=300 --param lov.stripesize=1048576 --param lov.stripecount=1 --mkfsoptions="-J size=400" [MDT device name]

            For the MGT:
            Command: mkfs.lustre --reformat --mgs --quiet --backfstype=ldiskfs --param sys.timeout=300 [MGT device name]

            For one of the OSTs:
            mkfs.lustre --reformat --ost --fsname=esfprod --mgsnode=galaxy-esf-mds001 --index=1 --quiet --backfstype=ldiskfs --param sys.timeout=300 --mkfsoptions="-J size=400" --mountfsoptions="errors=remount-ro,extents,mballoc" [OST device name]

            paf Patrick Farrell (Inactive) added a comment - Di, Yes, now that we know it's a config log issue, I figured we could fix it with a writeconf operation.. But like you said, we'd like to understand the issue. It was not reformatted before the test. It WAS upgraded from 2.5, which required a writeconf operation at that time to get it to start. So it was originally formatted with Lustre 2.5.1, then upgraded to master. For the mkfs.lustre command for the MDT (I don't have the device name, but these are the options that were used): mkfs.lustre --reformat --mdt --fsname=esfprod --mgsnode=galaxy-esf-mds001 --index=0 --quiet --backfstype=ldiskfs --param sys.timeout=300 --param lov.stripesize=1048576 --param lov.stripecount=1 --mkfsoptions="-J size=400" [MDT device name] For the MGT: Command: mkfs.lustre --reformat --mgs --quiet --backfstype=ldiskfs --param sys.timeout=300 [MGT device name] For one of the OSTs: mkfs.lustre --reformat --ost --fsname=esfprod --mgsnode=galaxy-esf-mds001 --index=1 --quiet --backfstype=ldiskfs --param sys.timeout=300 --mkfsoptions="-J size=400" --mountfsoptions="errors=remount-ro,extents,mballoc" [OST device name]
            di.wang Di Wang added a comment -

            Patrick, please also provide mkfs.lustre command line you use to create the filesystem. I checked the master code and did not find any issue there.

            di.wang Di Wang added a comment - Patrick, please also provide mkfs.lustre command line you use to create the filesystem. I checked the master code and did not find any issue there.
            di.wang Di Wang added a comment -

            Patrick, was this FS reformatted before this test? Btw you can always erase the config log by tunefs --writeconf and remount the FS to fix this config log issue. But we still need to understand the issue here.

            di.wang Di Wang added a comment - Patrick, was this FS reformatted before this test? Btw you can always erase the config log by tunefs --writeconf and remount the FS to fix this config log issue. But we still need to understand the issue here.

            No, definitely not. We did a stress run of 2.6 with DNE2 (2.6 clients as well), and when it was over and the system had been rebooted, we were in this state where some of the files created during that stress run could not be deleted. We didn't deliberately touch the config at any point in there.

            paf Patrick Farrell (Inactive) added a comment - No, definitely not. We did a stress run of 2.6 with DNE2 (2.6 clients as well), and when it was over and the system had been rebooted, we were in this state where some of the files created during that stress run could not be deleted. We didn't deliberately touch the config at any point in there.

            People

              emoly.liu Emoly Liu
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: