Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5778

MDS not creating files on OSTs properly

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.5.2
    • CentOS 6.5, kernel 2.6.32-431.17.1.el6_lustre.x86_64
    • 3
    • 16216

    Description

      One of our Stampede filesystems running Lustre 2.5.2 has an OST offline due to a different problem described in another ticket. Since the OST has been offline, the MDS server crashed with an LBUG and was restarted last Friday. After the restart, the MDS server no longer automatically creates files on any OSTs after the offline OSTs. In our case, OST0010 is offline so now the MDS will only create files on the first 16 OSTs unless we manually specify the stripeoffset in lfs setstripe. This is overloading the the servers with these OSTs while the others are doing nothing. If we deactivate the first 16 OSTs on the MDS, then all files are created with the first stripe on the lowest numbered active OST.

      Can you suggest any way to force the MDS to use all the other OSTs through any lctl set_param options? Getting the offline OST back online is not currently an option due to corruption and ongoing e2fsck, it can't be mounted. Manually setting the stripe is also not an option, we need it to work automatically like it should. Could we set some qos options to try and have it balance the OST file creation?

      Attachments

        1. lctl_state.out
          44 kB
        2. lctl_target_obd.out
          11 kB
        3. LU-5778_file_create_getstripe.out.gz
          12 kB
        4. LU-5778.debug_filtered.bz2
          30 kB
        5. mds5_prealloc.out
          128 kB

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              minyard Tommy Minyard
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: