[LU-9176] ZFS MDT sizing. 7TB of MDT shows LFS support for 215M files? Created: 01/Mar/17  Updated: 18/Mar/17  Resolved: 16/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Jeff Johnson (Inactive) Assignee: Joseph Gmitter (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.3, EE 3.1.0.3, ZFS 0.6.5.7


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

LFS configuration with two DNE2 MDTs, one MDT per MDS pair. LFS shows support for 215M files.

MDT0000: 3.3TB zpool
MDT0001: 3.7TB zpool

  1. lfs df
    UUID 1K-blocks Used Available Use% Mounted on
    data-0-MDT0000_UUID 3374354432 9472 3374342912 0% /mnt/data-0[MDT:0]
    data-0-MDT0001_UUID 3749283200 9344 3749271808 0% /mnt/data-0[MDT:1]
  1. lfs df -i
    UUID Inodes IUsed IFree IUse% Mounted on
    data-0-MDT0000_UUID 102359233 259 102358974 0% /mnt/data-0[MDT:0]
    data-0-MDT0001_UUID 113503655 248 113503407 0% /mnt/data-0[MDT:1]

A single 3.3TB MDT should result in ~800M files. A 3.3TB and 3.7TB MDT I should be seeing a lot more than 215M files.

What am I missing?



 Comments   
Comment by Jeff Johnson (Inactive) [ 01/Mar/17 ]

I seem to recall this being a ZFS "feature" (peculiarity).

MDT0000: 3455338938368 bytes
MDT0001: 3839265996800 bytes
Total: 7294604935168 bytes / 215862888 = approx 33793 bytes per inode

Seems a bit high.

Comment by Andreas Dilger [ 02/Mar/17 ]

Jeff, the number of available inodes in the ZFS filesystem is an estimate based on the average space used per file so far, since there isn't a fixed inode table. If the MDTs are mostly empty (only ~250 inodes allocated), which are mostly directories (e.g. 128 OI.n ZAPs and 32 directories per O/{0,1,10,200000003} created at mount time), then the numbers will be skewed to a very conservative inode count estimate.

If you create a few thousand regular files this should get closer to the expected 4KB/inode for ZFS, as described in https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#settinguplustresystem.tab2 . I'm updating the "Determining MDT Space Requirements" section of the manual to contain more information about ZFS.

Comment by Andreas Dilger [ 02/Mar/17 ]

Presumably you haven't set a recordsize=N option on the MDT in this case?

Comment by Jeff Johnson (Inactive) [ 02/Mar/17 ]

No, I left it at default after learning the hard way in the past about setting the recordsize to too low of a number.

Comment by Andreas Dilger [ 02/Mar/17 ]

I updated the MDT sizing manual in https://review.whamcloud.com/25713 , and a formatted version of this section is available at https://build.hpdd.intel.com/job/lustre-manual-reviews/754/artifact/lustre_manual.xhtml#dbdoclet.space_requirements if you are interested.

Comment by Gerrit Updater [ 03/Mar/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/25743
Subject: LU-9176 osd-zfs: improve statfs estimate for ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4df3840ae497820904caee074d9e30e649da8e61

Comment by Andreas Dilger [ 03/Mar/17 ]

Jeff, with the attached patch to fix up the inode estimation (increasing the number of "synthetic average sized inodes" added to the estimate when the filesystem is nearly empty, and fixing the calculation a bit) MDT0000 would report about 568M inodes (synthetic average inode size = 6KB rather than 33KB previously) when newly formatted, which is much closer to the expected 843M 4KB inodes for the 3.3TB filesystem. With 40k real inodes in the filesystem, the synthetic average would only contribute 10% to the average, and at 1M inodes it would contribute only 0.4% to the average size.

Comment by Gerrit Updater [ 16/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25743/
Subject: LU-9176 osd-zfs: improve statfs estimate for ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5e2b17964fd880a9763ba66151aab6091fb4813c

Comment by Peter Jones [ 16/Mar/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:23:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.