[LU-5309] The `zfs userspace` command is broken on Lustre-ZFS-OST datasets Created: 09/Jul/14  Updated: 12/Feb/15  Resolved: 12/Feb/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Prakash Surya (Inactive) Assignee: Nathaniel Clark
Resolution: Cannot Reproduce Votes: 0
Labels: llnl

Severity: 3
Rank (Obsolete): 14835

 Description   

It looks as though the zfs userspace command is completely broken on at least one of our ZFS datasets backing a Lustre OST running a Lustre 2.4.2 based release.

Here's what zfs list shows:

# pilsner1 /root > zfs list -t all
NAME                          USED  AVAIL  REFER  MOUNTPOINT
pilsner1                     17.3T  46.9T    30K  /pilsner1
pilsner1/lsd-ost0            17.3T  46.9T  15.9T  /pilsner1/lsd-ost0
pilsner1/lsd-ost0@17jun2014  1.39T      -  10.6T  -

And here's what zfs userspace shows:

# pilsner1 /root > zfs userspace pilsner1/lsd-ost0@17jun2014
TYPE        NAME         USED  QUOTA                                            
POSIX User  <omitted>   21.5K   none                                               
POSIX User  <omitted>    118M   none                                               
POSIX User  <omitted>   6.49G   none                                               
POSIX User  <omitted>   72.1M   none                                               
POSIX User  <omitted>    725M   none

This is clearly broken, and according to Matt Ahrens:

<+mahrens> prakash: yeah, that pbpaste output looks wrong.  All the "REFER" space should be accounted for by "zfs userspace"

Additionally, running that command on the dataset itself (as opposed to a snapshot) fails with an error:

# pilsner1 /root > zfs userspace pilsner1/lsd-ost0          
cannot get used/quota for pilsner1/lsd-ost0: dataset is busy

This is not the case for filesystems mounted through the ZPL.



 Comments   
Comment by John Fuchs-Chesney (Inactive) [ 09/Jul/14 ]

Nathaniel – please review.
Thank you,
~ jfc.

Comment by Nathaniel Clark [ 09/Jul/14 ]

Referenced (REFER) space does not equal space used (USED) as reported by "zfs userspace". The latter reports the total space of all the blocks allocated where REFER is (according to the zfs man page) " The amount of data that is accessible by this dataset, which may or may not be shared with other datasets in the pool". Even in a small scale test on a fresh plain ZFS filesystem these numbers don't agree:

$ zpool create test /dev/volgrp/logvol
$ zfs create test/foo
$ mkdir /test/foo/user
$ chown user.user /mnt/foo/user
$ su user
$ echo blah > /mnt/foo/user/bar
$ exit
$ zfs list -pt all
zfs list -pt all 
NAME                       USED       AVAIL    REFER  MOUNTPOINT
test                     188416  1031610368    31744  /test
test/foo                  50688  1031610368    32256  /test/foo
$ zfs userspace -p test/foo
TYPE        NAME  USED  QUOTA
POSIX User  user  2560   none
POSIX User  root  1536   none

So if they are supposed to agree, that isn't a Lustre issue.

Comment by Prakash Surya (Inactive) [ 09/Jul/14 ]

Well first off, the Lustre zfs-osd and ZFS on Linux ZPL are two completely separate interfaces to the underlying internals of ZFS. So even if the accounting was broken through the ZPL, that does not necessarily mean it's for the same reason as through the zfs-osd. So you can't simply write this off because you think the accounting is incorrect through the posix layer.

And second, I don't think they're supposed to be exactly the same. I'd have to study the code to better understand, though, since I'm not very familiar with how the accounting fits together.

Being off by a number of megabytes doesn't seem strange to me, being off by number of terabytes does seem strange. Through the ZPL, I see this on a new pool and filesystem:

[root@surya1-guest-1 ~]# zpool create -f tank /dev/vd{b,c,d}
[root@surya1-guest-1 ~]# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank    93K  23.4G    30K  /tank
[root@surya1-guest-1 ~]# dd if=/dev/urandom of=/tank/file1 bs=1M count=100 1>/dev/null 2>&1
[root@surya1-guest-1 ~]# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank   100M  23.3G   100M  /tank
[root@surya1-guest-1 ~]# zfs userspace tank
TYPE        NAME  USED  QUOTA
POSIX User  root  100M   none
[root@surya1-guest-1 ~]# zfs list -p
NAME       USED        AVAIL      REFER  MOUNTPOINT
tank  105055232  25063914496  104970752  /tank
[root@surya1-guest-1 ~]# zfs userspace tank -p
TYPE        NAME       USED  QUOTA
POSIX User  root  104942080   none
[root@surya1-guest-1 ~]# echo 104970752-104942080 | bc
28672

That's a difference of exactly 28KB. This is in contrast to the example in the description, where the difference is over 10 TB.

I believe the zfs_space_delta_cb performs this space accounting tracking in the ZPL, and I would not be surprised if we never implemented equivalent functionality for the zfs-osd.

Comment by Nathaniel Clark [ 10/Jul/14 ]

Running a similar test on a fresh osd-zfs backed lustre filesystem I also get equivalent accounting as you did (this is with lustre master build 2553 and zfs/spl 0.6.3).

I will double check Lustre 2.4.3 and zfs/spl 0.6.1.

Comment by Prakash Surya (Inactive) [ 10/Jul/14 ]

Running a similar test on a fresh osd-zfs backed lustre filesystem I also get equivalent accounting as you did (this is with lustre master build 2553 and zfs/spl 0.6.3).

Hmm... that's interesting because I get "more expected" results when running the zfs userspace command against one of our osd-zfs filesystems backing our test filesystem.

I will double check Lustre 2.4.3 and zfs/spl 0.6.1.

Yea, perhaps there was issues with the earlier ZFS releases.. The production system I checked (and seemed "broken") was formatted originally with a 2.4.0 based lustre release and zfs 0.6.2, IIRC. While the test filesystem was formatted with a lustre 2.4.2 based release and ZFS 0.6.3.

Comment by Nathaniel Clark [ 23/Jul/14 ]

I have not been able to reproduce this on Lustre 2.4.3 with zfs/spl 0.6.1, but it was a freshly formatted system, and I was just moving large files around to see if numbers matched.

Comment by John Fuchs-Chesney (Inactive) [ 12/Feb/15 ]

This is an old and not very critical bug and we have not been able to reproduce it.

~ jfc.

Generated at Sat Feb 10 01:50:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.