Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1947

OST ZFS grant shortage on precreate

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0
    • Lustre 2.4.0
    • Single node client+MDS+OSS, with 3x 1GB OSTs, 256MB MDT, x86_64, 4GB RAM, OSTFSTYPE=zfs, USE_OFD=yes
    • 3
    • 5675

    Description

      It seems there is some kind of grant leak or shortage for precreating objects, or the ZFS estimation of how many objects can be created. This is happening with small OSTs, but I suspect the same problem will happen when larger OSTs are full, seen during a run of sanity.sh. The first error is seen early on:

      ep 15 03:21:13 sookie-gig kernel: Lustre: DEBUG MARKER: == sanity test 27b: create two stripe file =========================================================== 03:21:13 (1347700873)
      Sep 15 03:21:23 sookie-gig kernel: LustreError: 11997:0:(lov_request.c:593:lov_update_create_set()) error creating fid 0x62 sub-object on OST idx 2/2: rc = -28
      Sep 15 03:21:23 sookie-gig kernel: LustreError: 11997:0:(lov_request.c:593:lov_update_create_set()) error creating fid 0x62 sub-object on OST idx 2/2: rc = -5
      Sep 15 03:25:25 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0001: failed to acquire grant space to precreate 0 objects
      Sep 15 03:25:25 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) Skipped 2293017 previous similar messages
      Sep 15 03:25:25 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0002: unable to precreate [0x0:0xe1:0x0]: rc = -28
      Sep 15 03:25:25 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) Skipped 2293021 previous similar messages
      Sep 15 03:33:57 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0001: failed to acquire grant space to precreate 0 objects
      Sep 15 03:33:57 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) Skipped 4886890 previous similar messages
      Sep 15 03:33:57 sookie-gig kernel: Lustre: 4428:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0001: unable to precreate [0x0:0x101:0x0]: rc = -28
      Sep 15 03:33:57 sookie-gig kernel: Lustre: 4428:0:(ofd_obd.c:1192:ofd_create()) Skipped 4886885 previous similar messages
      

      It appears that the precreate code is busy-looping in precreate, since it is calling this millions of times in a few minutes. The ofd_create() messages are changed from CDEBUG() to CWARN() for debugging.

      [root@sookie-gig lustre-head]# df
      Filesystem           1K-blocks      Used Available Use% Mounted on
      testfs-ost1/ost1        999424     71296    926080   8% /mnt/ost1
      testfs-ost2/ost2        999424     39296    958080   4% /mnt/ost2
      testfs-ost3/ost3        999424     35200    962176   4% /mnt/ost3
      [root@sookie-gig lustre-head]# df -i
      Filesystem            Inodes   IUsed   IFree IUse% Mounted on
      testfs-ost1/ost1        8011     760    7251   10% /mnt/ost1
      testfs-ost2/ost2        7962     461    7501    6% /mnt/ost2
      testfs-ost3/ost3        7931     398    7533    6% /mnt/ost3
      

      Only about 10% of the filesystem is full, and ZFS itself thinks that there are free inodes that could be created. The grant statistics also don't appear to show a shortage of grant:

      # lctl get_param obdfilter.*.grant_* 
      obdfilter.testfs-OST0000.grant_compat_disable=0
      obdfilter.testfs-OST0000.grant_precreate=0
      obdfilter.testfs-OST0000.grant_ratio=19%
      obdfilter.testfs-OST0001.grant_compat_disable=0
      obdfilter.testfs-OST0001.grant_precreate=0
      obdfilter.testfs-OST0001.grant_ratio=19%
      obdfilter.testfs-OST0002.grant_compat_disable=0
      obdfilter.testfs-OST0002.grant_precreate=0
      obdfilter.testfs-OST0002.grant_ratio=19%
      # lctl get_param obdfilter.*.tot*
      obdfilter.testfs-OST0000.tot_dirty=0
      obdfilter.testfs-OST0000.tot_granted=0
      obdfilter.testfs-OST0000.tot_pending=0
      obdfilter.testfs-OST0001.tot_dirty=0
      obdfilter.testfs-OST0001.tot_granted=0
      obdfilter.testfs-OST0001.tot_pending=0
      obdfilter.testfs-OST0002.tot_dirty=0
      obdfilter.testfs-OST0002.tot_granted=0
      obdfilter.testfs-OST0002.tot_pending=0
      
      Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0002: failed to acquire grant space to precreate 0 objects
      Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1168:ofd_create()) Skipped 5756702 previous similar messages
      Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0001: unable to precreate [0x0:0x101:0x0]: rc = -28
      Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) Skipped 5756716 previous similar messages
      

      It's still going hours later... I had started with 256MB OSTs, but hit this problem immediately. Increasing to 1GB OSTs allowed some testing to pass, but failed on a second test (after reboot and such)

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: