Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10238

adding new OSTs causes quota reporting error

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      We have a Lustre 2.10.0 filesystem which was built with two OSSes containing 5 OSTs each. Last week I added a third OSS (same exact hardware, slightly newer OS software except the kernel & lustre). When I created the OSTs with mkfs.lustre, the filesystem seemed to grow correctly. We currently only set and enforce group quotas.

      Later that day, we noticed the output of `lfs quota -g $GROUP /center1` was showing bad values and an error message. Here's an example.

      chinook02:PENGUIN$ sudo lfs quota -g penguin /center1
      Disk quotas for grp penguin (gid 12738):
      Filesystem kbytes quota limit grace files quota limit grace
      /center1 [214] 1073741824 1181116006 - 13 0 0 -
      Some errors happened when getting quota info. Some devices may be not working or deactivated. The data in "[]" is inaccurate.

      We found a workaround. As soon as the group has data written to the new OSTs, `lfs quota` seems to work fine.

      chinook02:PENGUIN$ lfs setstripe -i -1 -c -1 loforbes
      chinook02:PENGUIN$ dd of=loforbes/testfile if=/dev/urandom bs=1M count=15
      15+0 records in
      15+0 records out
      15728640 bytes (16 MB) copied, 1.80694 s, 8.7 MB/s
      chinook02:PENGUIN$ sudo lfs quota -g penguin /center1
      Disk quotas for grp penguin (gid 12738):
      Filesystem kbytes quota limit grace files quota limit grace
      /center1 671997883 1073741824 1181116006 - 13 0 0 -
      chinook02:PENGUIN$ lfs getstripe loforbes/testfile
      loforbes/testfile
      lmm_stripe_count: 15
      lmm_stripe_size: 1048576
      lmm_pattern: 1
      lmm_layout_gen: 0
      lmm_stripe_offset: 12
      obdidx objid objid group
      12 31981 0x7ced 0
      7 62233208 0x3b59a78 0
      14 32068 0x7d44 0
      8 72183233 0x44d6dc1 0
      10 31854 0x7c6e 0
      11 31849 0x7c69 0
      2 68917015 0x41b9717 0
      5 71171215 0x43dfc8f 0
      1 69395583 0x422e47f 0
      13 32088 0x7d58 0
      9 68211489 0x410d321 0
      6 70389457 0x4320ed1 0
      4 70225352 0x42f8dc8 0
      3 66783438 0x3fb08ce 0
      0 65674625 0x3ea1d81 0

      We figured out it's not really necessary to have data on the 10 original OSTs, just the 5 new ones for this to work. I've implemented this workaround for all projects using our lustre filesystem.

      Before implementing the workaround, we tried "deleting" a group's quota and recreating. That didn't seem to impact the issue. We also tried unmounting and remounting the filesystem on a client. Again, no change. Removing all files owned by a group that have data on the new OSTs results in `lfs quota` showing the error again.

      We are considering a Lustre 2.10.1 update sometime soon.

      Regards,
      -liam

      Attachments

        Activity

          People

            hongchao.zhang Hongchao Zhang
            loforbes Liam Forbes
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: