[LU-5309] The `zfs userspace` command is broken on Lustre-ZFS-OST datasets Created: 09/Jul/14 Updated: 12/Feb/15 Resolved: 12/Feb/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Nathaniel Clark |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | llnl | ||
| Severity: | 3 |
| Rank (Obsolete): | 14835 |
| Description |
|
It looks as though the zfs userspace command is completely broken on at least one of our ZFS datasets backing a Lustre OST running a Lustre 2.4.2 based release. Here's what zfs list shows: # pilsner1 /root > zfs list -t all NAME USED AVAIL REFER MOUNTPOINT pilsner1 17.3T 46.9T 30K /pilsner1 pilsner1/lsd-ost0 17.3T 46.9T 15.9T /pilsner1/lsd-ost0 pilsner1/lsd-ost0@17jun2014 1.39T - 10.6T - And here's what zfs userspace shows: # pilsner1 /root > zfs userspace pilsner1/lsd-ost0@17jun2014 TYPE NAME USED QUOTA POSIX User <omitted> 21.5K none POSIX User <omitted> 118M none POSIX User <omitted> 6.49G none POSIX User <omitted> 72.1M none POSIX User <omitted> 725M none This is clearly broken, and according to Matt Ahrens: <+mahrens> prakash: yeah, that pbpaste output looks wrong. All the "REFER" space should be accounted for by "zfs userspace" Additionally, running that command on the dataset itself (as opposed to a snapshot) fails with an error: # pilsner1 /root > zfs userspace pilsner1/lsd-ost0 cannot get used/quota for pilsner1/lsd-ost0: dataset is busy This is not the case for filesystems mounted through the ZPL. |
| Comments |
| Comment by John Fuchs-Chesney (Inactive) [ 09/Jul/14 ] |
|
Nathaniel – please review. |
| Comment by Nathaniel Clark [ 09/Jul/14 ] |
|
Referenced (REFER) space does not equal space used (USED) as reported by "zfs userspace". The latter reports the total space of all the blocks allocated where REFER is (according to the zfs man page) " The amount of data that is accessible by this dataset, which may or may not be shared with other datasets in the pool". Even in a small scale test on a fresh plain ZFS filesystem these numbers don't agree: $ zpool create test /dev/volgrp/logvol $ zfs create test/foo $ mkdir /test/foo/user $ chown user.user /mnt/foo/user $ su user $ echo blah > /mnt/foo/user/bar $ exit $ zfs list -pt all zfs list -pt all NAME USED AVAIL REFER MOUNTPOINT test 188416 1031610368 31744 /test test/foo 50688 1031610368 32256 /test/foo $ zfs userspace -p test/foo TYPE NAME USED QUOTA POSIX User user 2560 none POSIX User root 1536 none So if they are supposed to agree, that isn't a Lustre issue. |
| Comment by Prakash Surya (Inactive) [ 09/Jul/14 ] |
|
Well first off, the Lustre zfs-osd and ZFS on Linux ZPL are two completely separate interfaces to the underlying internals of ZFS. So even if the accounting was broken through the ZPL, that does not necessarily mean it's for the same reason as through the zfs-osd. So you can't simply write this off because you think the accounting is incorrect through the posix layer. And second, I don't think they're supposed to be exactly the same. I'd have to study the code to better understand, though, since I'm not very familiar with how the accounting fits together. Being off by a number of megabytes doesn't seem strange to me, being off by number of terabytes does seem strange. Through the ZPL, I see this on a new pool and filesystem: [root@surya1-guest-1 ~]# zpool create -f tank /dev/vd{b,c,d}
[root@surya1-guest-1 ~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 93K 23.4G 30K /tank
[root@surya1-guest-1 ~]# dd if=/dev/urandom of=/tank/file1 bs=1M count=100 1>/dev/null 2>&1
[root@surya1-guest-1 ~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 100M 23.3G 100M /tank
[root@surya1-guest-1 ~]# zfs userspace tank
TYPE NAME USED QUOTA
POSIX User root 100M none
[root@surya1-guest-1 ~]# zfs list -p
NAME USED AVAIL REFER MOUNTPOINT
tank 105055232 25063914496 104970752 /tank
[root@surya1-guest-1 ~]# zfs userspace tank -p
TYPE NAME USED QUOTA
POSIX User root 104942080 none
[root@surya1-guest-1 ~]# echo 104970752-104942080 | bc
28672
That's a difference of exactly 28KB. This is in contrast to the example in the description, where the difference is over 10 TB. I believe the zfs_space_delta_cb performs this space accounting tracking in the ZPL, and I would not be surprised if we never implemented equivalent functionality for the zfs-osd. |
| Comment by Nathaniel Clark [ 10/Jul/14 ] |
|
Running a similar test on a fresh osd-zfs backed lustre filesystem I also get equivalent accounting as you did (this is with lustre master build 2553 and zfs/spl 0.6.3). I will double check Lustre 2.4.3 and zfs/spl 0.6.1. |
| Comment by Prakash Surya (Inactive) [ 10/Jul/14 ] |
Hmm... that's interesting because I get "more expected" results when running the zfs userspace command against one of our osd-zfs filesystems backing our test filesystem.
Yea, perhaps there was issues with the earlier ZFS releases.. The production system I checked (and seemed "broken") was formatted originally with a 2.4.0 based lustre release and zfs 0.6.2, IIRC. While the test filesystem was formatted with a lustre 2.4.2 based release and ZFS 0.6.3. |
| Comment by Nathaniel Clark [ 23/Jul/14 ] |
|
I have not been able to reproduce this on Lustre 2.4.3 with zfs/spl 0.6.1, but it was a freshly formatted system, and I was just moving large files around to see if numbers matched. |
| Comment by John Fuchs-Chesney (Inactive) [ 12/Feb/15 ] |
|
This is an old and not very critical bug and we have not been able to reproduce it. ~ jfc. |