Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
Single node client+MDS+OSS, with 3x 1GB OSTs, 256MB MDT, x86_64, 4GB RAM, OSTFSTYPE=zfs, USE_OFD=yes
-
3
-
5675
Description
It seems there is some kind of grant leak or shortage for precreating objects, or the ZFS estimation of how many objects can be created. This is happening with small OSTs, but I suspect the same problem will happen when larger OSTs are full, seen during a run of sanity.sh. The first error is seen early on:
ep 15 03:21:13 sookie-gig kernel: Lustre: DEBUG MARKER: == sanity test 27b: create two stripe file =========================================================== 03:21:13 (1347700873) Sep 15 03:21:23 sookie-gig kernel: LustreError: 11997:0:(lov_request.c:593:lov_update_create_set()) error creating fid 0x62 sub-object on OST idx 2/2: rc = -28 Sep 15 03:21:23 sookie-gig kernel: LustreError: 11997:0:(lov_request.c:593:lov_update_create_set()) error creating fid 0x62 sub-object on OST idx 2/2: rc = -5 Sep 15 03:25:25 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0001: failed to acquire grant space to precreate 0 objects Sep 15 03:25:25 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) Skipped 2293017 previous similar messages Sep 15 03:25:25 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0002: unable to precreate [0x0:0xe1:0x0]: rc = -28 Sep 15 03:25:25 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) Skipped 2293021 previous similar messages Sep 15 03:33:57 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0001: failed to acquire grant space to precreate 0 objects Sep 15 03:33:57 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) Skipped 4886890 previous similar messages Sep 15 03:33:57 sookie-gig kernel: Lustre: 4428:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0001: unable to precreate [0x0:0x101:0x0]: rc = -28 Sep 15 03:33:57 sookie-gig kernel: Lustre: 4428:0:(ofd_obd.c:1192:ofd_create()) Skipped 4886885 previous similar messages
It appears that the precreate code is busy-looping in precreate, since it is calling this millions of times in a few minutes. The ofd_create() messages are changed from CDEBUG() to CWARN() for debugging.
[root@sookie-gig lustre-head]# df Filesystem 1K-blocks Used Available Use% Mounted on testfs-ost1/ost1 999424 71296 926080 8% /mnt/ost1 testfs-ost2/ost2 999424 39296 958080 4% /mnt/ost2 testfs-ost3/ost3 999424 35200 962176 4% /mnt/ost3 [root@sookie-gig lustre-head]# df -i Filesystem Inodes IUsed IFree IUse% Mounted on testfs-ost1/ost1 8011 760 7251 10% /mnt/ost1 testfs-ost2/ost2 7962 461 7501 6% /mnt/ost2 testfs-ost3/ost3 7931 398 7533 6% /mnt/ost3
Only about 10% of the filesystem is full, and ZFS itself thinks that there are free inodes that could be created. The grant statistics also don't appear to show a shortage of grant:
# lctl get_param obdfilter.*.grant_* obdfilter.testfs-OST0000.grant_compat_disable=0 obdfilter.testfs-OST0000.grant_precreate=0 obdfilter.testfs-OST0000.grant_ratio=19% obdfilter.testfs-OST0001.grant_compat_disable=0 obdfilter.testfs-OST0001.grant_precreate=0 obdfilter.testfs-OST0001.grant_ratio=19% obdfilter.testfs-OST0002.grant_compat_disable=0 obdfilter.testfs-OST0002.grant_precreate=0 obdfilter.testfs-OST0002.grant_ratio=19% # lctl get_param obdfilter.*.tot* obdfilter.testfs-OST0000.tot_dirty=0 obdfilter.testfs-OST0000.tot_granted=0 obdfilter.testfs-OST0000.tot_pending=0 obdfilter.testfs-OST0001.tot_dirty=0 obdfilter.testfs-OST0001.tot_granted=0 obdfilter.testfs-OST0001.tot_pending=0 obdfilter.testfs-OST0002.tot_dirty=0 obdfilter.testfs-OST0002.tot_granted=0 obdfilter.testfs-OST0002.tot_pending=0
Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0002: failed to acquire grant space to precreate 0 objects Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1168:ofd_create()) Skipped 5756702 previous similar messages Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0001: unable to precreate [0x0:0x101:0x0]: rc = -28 Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) Skipped 5756716 previous similar messages
It's still going hours later... I had started with 256MB OSTs, but hit this problem immediately. Increasing to 1GB OSTs allowed some testing to pass, but failed on a second test (after reboot and such)
Attachments
Issue Links
- is related to
-
LU-3421 (ost_handler.c:1762:ost_blocking_ast()) Error -2 syncing data on lock cancel causes first ENOSPC client issues then MDS server locks up
- Resolved