[LU-9176] ZFS MDT sizing. 7TB of MDT shows LFS support for 215M files? Created: 01/Mar/17 Updated: 18/Mar/17 Resolved: 16/Mar/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jeff Johnson (Inactive) | Assignee: | Joseph Gmitter (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.3, EE 3.1.0.3, ZFS 0.6.5.7 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
LFS configuration with two DNE2 MDTs, one MDT per MDS pair. LFS shows support for 215M files. MDT0000: 3.3TB zpool
A single 3.3TB MDT should result in ~800M files. A 3.3TB and 3.7TB MDT I should be seeing a lot more than 215M files. What am I missing? |
| Comments |
| Comment by Jeff Johnson (Inactive) [ 01/Mar/17 ] |
|
I seem to recall this being a ZFS "feature" (peculiarity). MDT0000: 3455338938368 bytes Seems a bit high. |
| Comment by Andreas Dilger [ 02/Mar/17 ] |
|
Jeff, the number of available inodes in the ZFS filesystem is an estimate based on the average space used per file so far, since there isn't a fixed inode table. If the MDTs are mostly empty (only ~250 inodes allocated), which are mostly directories (e.g. 128 OI.n ZAPs and 32 directories per O/{0,1,10,200000003} created at mount time), then the numbers will be skewed to a very conservative inode count estimate. If you create a few thousand regular files this should get closer to the expected 4KB/inode for ZFS, as described in https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#settinguplustresystem.tab2 . I'm updating the "Determining MDT Space Requirements" section of the manual to contain more information about ZFS. |
| Comment by Andreas Dilger [ 02/Mar/17 ] |
|
Presumably you haven't set a recordsize=N option on the MDT in this case? |
| Comment by Jeff Johnson (Inactive) [ 02/Mar/17 ] |
|
No, I left it at default after learning the hard way in the past about setting the recordsize to too low of a number. |
| Comment by Andreas Dilger [ 02/Mar/17 ] |
|
I updated the MDT sizing manual in https://review.whamcloud.com/25713 , and a formatted version of this section is available at https://build.hpdd.intel.com/job/lustre-manual-reviews/754/artifact/lustre_manual.xhtml#dbdoclet.space_requirements if you are interested. |
| Comment by Gerrit Updater [ 03/Mar/17 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/25743 |
| Comment by Andreas Dilger [ 03/Mar/17 ] |
|
Jeff, with the attached patch to fix up the inode estimation (increasing the number of "synthetic average sized inodes" added to the estimate when the filesystem is nearly empty, and fixing the calculation a bit) MDT0000 would report about 568M inodes (synthetic average inode size = 6KB rather than 33KB previously) when newly formatted, which is much closer to the expected 843M 4KB inodes for the 3.3TB filesystem. With 40k real inodes in the filesystem, the synthetic average would only contribute 10% to the average, and at 1M inodes it would contribute only 0.4% to the average size. |
| Comment by Gerrit Updater [ 16/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25743/ |
| Comment by Peter Jones [ 16/Mar/17 ] |
|
Landed for 2.10 |