[LU-1947] OST ZFS grant shortage on precreate Created: 15/Sep/12 Updated: 22/Apr/14 Resolved: 22/Apr/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Environment: |
Single node client+MDS+OSS, with 3x 1GB OSTs, 256MB MDT, x86_64, 4GB RAM, OSTFSTYPE=zfs, USE_OFD=yes |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 5675 | ||||||||
| Description |
|
It seems there is some kind of grant leak or shortage for precreating objects, or the ZFS estimation of how many objects can be created. This is happening with small OSTs, but I suspect the same problem will happen when larger OSTs are full, seen during a run of sanity.sh. The first error is seen early on: ep 15 03:21:13 sookie-gig kernel: Lustre: DEBUG MARKER: == sanity test 27b: create two stripe file =========================================================== 03:21:13 (1347700873) Sep 15 03:21:23 sookie-gig kernel: LustreError: 11997:0:(lov_request.c:593:lov_update_create_set()) error creating fid 0x62 sub-object on OST idx 2/2: rc = -28 Sep 15 03:21:23 sookie-gig kernel: LustreError: 11997:0:(lov_request.c:593:lov_update_create_set()) error creating fid 0x62 sub-object on OST idx 2/2: rc = -5 Sep 15 03:25:25 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0001: failed to acquire grant space to precreate 0 objects Sep 15 03:25:25 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) Skipped 2293017 previous similar messages Sep 15 03:25:25 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0002: unable to precreate [0x0:0xe1:0x0]: rc = -28 Sep 15 03:25:25 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) Skipped 2293021 previous similar messages Sep 15 03:33:57 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0001: failed to acquire grant space to precreate 0 objects Sep 15 03:33:57 sookie-gig kernel: Lustre: 4427:0:(ofd_obd.c:1168:ofd_create()) Skipped 4886890 previous similar messages Sep 15 03:33:57 sookie-gig kernel: Lustre: 4428:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0001: unable to precreate [0x0:0x101:0x0]: rc = -28 Sep 15 03:33:57 sookie-gig kernel: Lustre: 4428:0:(ofd_obd.c:1192:ofd_create()) Skipped 4886885 previous similar messages It appears that the precreate code is busy-looping in precreate, since it is calling this millions of times in a few minutes. The ofd_create() messages are changed from CDEBUG() to CWARN() for debugging. [root@sookie-gig lustre-head]# df Filesystem 1K-blocks Used Available Use% Mounted on testfs-ost1/ost1 999424 71296 926080 8% /mnt/ost1 testfs-ost2/ost2 999424 39296 958080 4% /mnt/ost2 testfs-ost3/ost3 999424 35200 962176 4% /mnt/ost3 [root@sookie-gig lustre-head]# df -i Filesystem Inodes IUsed IFree IUse% Mounted on testfs-ost1/ost1 8011 760 7251 10% /mnt/ost1 testfs-ost2/ost2 7962 461 7501 6% /mnt/ost2 testfs-ost3/ost3 7931 398 7533 6% /mnt/ost3 Only about 10% of the filesystem is full, and ZFS itself thinks that there are free inodes that could be created. The grant statistics also don't appear to show a shortage of grant: # lctl get_param obdfilter.*.grant_* obdfilter.testfs-OST0000.grant_compat_disable=0 obdfilter.testfs-OST0000.grant_precreate=0 obdfilter.testfs-OST0000.grant_ratio=19% obdfilter.testfs-OST0001.grant_compat_disable=0 obdfilter.testfs-OST0001.grant_precreate=0 obdfilter.testfs-OST0001.grant_ratio=19% obdfilter.testfs-OST0002.grant_compat_disable=0 obdfilter.testfs-OST0002.grant_precreate=0 obdfilter.testfs-OST0002.grant_ratio=19% # lctl get_param obdfilter.*.tot* obdfilter.testfs-OST0000.tot_dirty=0 obdfilter.testfs-OST0000.tot_granted=0 obdfilter.testfs-OST0000.tot_pending=0 obdfilter.testfs-OST0001.tot_dirty=0 obdfilter.testfs-OST0001.tot_granted=0 obdfilter.testfs-OST0001.tot_pending=0 obdfilter.testfs-OST0002.tot_dirty=0 obdfilter.testfs-OST0002.tot_granted=0 obdfilter.testfs-OST0002.tot_pending=0 Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1168:ofd_create()) testfs-OST0002: failed to acquire grant space to precreate 0 objects Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1168:ofd_create()) Skipped 5756702 previous similar messages Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) testfs-OST0001: unable to precreate [0x0:0x101:0x0]: rc = -28 Sep 15 11:23:57 sookie-gig kernel: Lustre: 15270:0:(ofd_obd.c:1192:ofd_create()) Skipped 5756716 previous similar messages It's still going hours later... I had started with 256MB OSTs, but hit this problem immediately. Increasing to 1GB OSTs allowed some testing to pass, but failed on a second test (after reboot and such) |
| Comments |
| Comment by Andreas Dilger [ 15/Sep/12 ] |
|
It may be that part of the problem is the ZFS pool does not get recreated, even after reboot of the filesystem. |
| Comment by Johann Lombardi (Inactive) [ 15/Oct/12 ] |
Grant for precreate is allocated in ->ldo_recovery_complete which wasn't called for OFD in master. I have fixed this in http://review.whamcloud.com/4182, so i think it is worth trying again now, this bug might be fixed already (on master, at least). |
| Comment by Andreas Dilger [ 10/Jul/13 ] |
|
This might have also been fixed by the recent landing of http://review.whamcloud.com/6546 from |
| Comment by Jodi Levi (Inactive) [ 03/Mar/14 ] |
|
Can this ticket be closed? |