[LU-1365] Implement ldiskfs LARGEDIR support for e2fsprogs Created: 03/May/12  Updated: 04/Jun/21  Resolved: 11/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.12.4

Type: New Feature Priority: Minor
Reporter: Andreas Dilger Assignee: Artem Blagodarenko (Inactive)
Resolution: Fixed Votes: 0
Labels: LTS12, e2fsprogs, ldiskfs, patch

Attachments: File conf-san-125-4.txt.tar.gzip     File conf-sanity-124-125.tar.bz2     PNG File image-2.png     PNG File image.png    
Issue Links:
Duplicate
is duplicated by LU-896 change e2fsprogs to make it allow dir... Resolved
is duplicated by LU-7932 Patch e2fsprogs to allow set special ... Resolved
Related
is related to LU-50 pdirops patch for ldiskfs Resolved
is related to LU-11915 conf-sanity test 115 is skipped or hangs Open
is related to LU-14345 e2fsck of very large directories is b... Resolved
is related to LU-14734 enable large_dir on existing MDTs Resolved
is related to LU-11546 enable large_dir support for MDTs Resolved
is related to LU-8974 Сhange force_over_256tb lustre mount ... Resolved
Story Points: 3
Rank (Obsolete): 10210

 Description   

This INCOMPAT_LARGEDIR feature allows larger directories to be created in ldiskfs, both with directory sizes over 2GB and and a maximum htree depth of 3 instead of the current limit of 2. These features are needed in order to exceed the current limit of approximately 10M entries in a single directory. The INCOMPAT_LARGEDIR feature was added to ldiskfs as part of the pdirops LU-50 coding, but was not part of that project. As there is currently no mke2fs, e2fsck, or tune2fs support for INCOMPAT_LARGEDIR, this feature is disabled by default when creating a new ldiskfs filesystem as it would otherwise make the filesystem unrecoverable in the case that e2fsck needs to be run on it.

Tasks that need to be completed before INCOMPAT_LARGEDIR can be used:

  • add support for the INCOMPAT_LARGEDIR and "large_dir" features to mke2fs/tune2fs
  • add conf-sanity.sh test LARGEDIR and 3-level htree for local ldiskfs with 1kB blocksize up to 100k entries with 255-byte names (3-level exceeded at 48k entries). This might be done using a smaller number of hard-linked inodes (nlink_max = 65000), to avoid overhead of accessing and caching a large number of different inodes.
  • add parallel-scale.sh test LARGEDIR and >2GB directories with Lustre using 255-byte names and 10M entries (2GB exceeded at 4M entries, 4GB exceeded at 8M entries). This might be done using a smaller number of hard-linked inodes (nlink_max = 65000), to avoid overhead of accessing and caching a large number of different inodes.
  • e2fsck support for 3-level htree
  • e2fsck support for directories larger than 2GB (using i_size_hi consistently for S_IFDIR() inodes)
  • e2fsprogs regression test test for 3-level/2GB+ htree e2fsck, corruptions
  • e2fsprogs add LARGEDIR feature to "tests/f_random_corruption"
  • port the ext4-large-dir.patch with the INCOMPAT_LARGEDIR features (>2GB, 3-level htree) to the upstream kernel and submit to linux-ext4 list for review
  • submit e2fsprogs patches to linux-ext4 list for review
  • solicit testing of feature from community
  • some time after e2fsprogs is released and available for download, a patch is needed to enable large_dir on new filesystems with mkfs.lustre
  • updates to the user manual and release notes to describe how to enable the large_dir feature with tune2fs


 Comments   
Comment by Gerrit Updater [ 18/Aug/16 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: http://review.whamcloud.com/22008
Subject: LU-1365 e2fsprogs: enable large directroy support in tools.
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: bae7b74bf24ce86ec28af322a3d9c75a21c12a7a

Comment by Gerrit Updater [ 18/Aug/16 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: http://review.whamcloud.com/22009
Subject: LU-1365 tests: LARGEDIR and 3-level htree for local ldiskfs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7c4b67f0a09903efa9656d161475fe30c70a80c2

Comment by Artem Blagodarenko (Inactive) [ 18/Aug/16 ]

add support for the INCOMPAT_LARGEDIR and "large_dir" features to mke2fs/tune2fs

This patch is enough to enable large_dir. http://review.whamcloud.com/22008
I patched tools this way before starting tests

add conf-sanity.sh test LARGEDIR and 3-level htree for local ldiskfs with 1kB blocksize up to 100k entries with 255-byte names (3-level exceeded at 48k entries). This might be done using a smaller number of hard-linked inodes (nlink_max = 65000), to avoid overhead of accessing and caching a large number of different nodes.

Test 100 in http://review.whamcloud.com/22009

test failed if no large_dir enabled

n: creating hard link `/mnt/lustre/d100.conf-sanity/Kv74Cdj3FnDyJrJ9gbgqM0GIrlWtGKKTxTO4N5usmjXjRkDk3DKDhdPqjTq5Fw8JgKh5rADZMb5omdc2ySMqUURJUfIcjE5O2FSTqs2WNtOQKCNqK8vnLM5wawDd26Txd27GwLEagRRA6KipNNUj4NLb711dvwt46hBGuvJfeiN6iir9NMjqiJfcLfXQPOYwheMKBVAtjwauj5sdi3zSdSSzTeyCUIka7p3MHYAiPduo90fQWVA2GPtbvMVJzp0': No space left on device

and passed with option "large_dir".

>add parallel-scale.sh test LARGEDIR and >2GB directories with Lustre using 255-byte names and 10M entries (2GB exceeded at 4M entries, 4GB exceeded at 8M entries). This might be done using a smaller number of hard-linked inodes (nlink_max = 65000), to avoid overhead of accessing and caching a large number of different nodes.

Andreas, I can't find the reasons we need add such test in parallel-scale.sh. mdtest execution can help to estimate performance. I created functional test that shows the possibility of creating " >2GB directories with Lustre using 255-byte names and 10M entries" but believe parallel-scale.sh is not the best place for it. So I placed it to config_sanity.sh (test_101).

config_sanity.sh test_101 creates 12M hard links. Here is creation rates on my local testing system:

Testing system on virtual machine is not ideal, so we tested 120M hard links creations on cluster. But ldiskfs was used (this exclude all other code except ldiskfs and can show its possible troubles). Here is graph of creation rates:

Comment by Gerrit Updater [ 08/Sep/16 ]

Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: http://review.whamcloud.com/22384
Subject: LU-1365 e2fsprogs: enable large directroy support in e2fsck.
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: d1e6a1f72923d59c42bf9d74c850b9f32c01f462

Comment by Gerrit Updater [ 17/Nov/16 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/23831
Subject: LU-1365 resize2fs: clear uninit if allocating from new group
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: a12ae87de615f78bd663378d8978a93a8d35445a

Comment by Gerrit Updater [ 02/Apr/17 ]

Anonymous Coward (jjkky@yahoo.com) uploaded a new patch: https://review.whamcloud.com/26311
Subject: LU-1365 resize2fs: clear uninit if allocating from new group
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 2bb22e1bc0af6a2dbb213f3af79325d306cc498a

Comment by Gerrit Updater [ 02/Apr/17 ]

Anonymous Coward (jjkky@yahoo.com) uploaded a new patch: https://review.whamcloud.com/26312
Subject: LU-1365 e2fsprogs: enable large directroy support in tools
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: c4c581ccd003af1e40032e6352f45c6b546d1334

Comment by Gerrit Updater [ 02/Apr/17 ]

Anonymous Coward (jjkky@yahoo.com) uploaded a new patch: https://review.whamcloud.com/26313
Subject: LU-1365 e2fsprogs: enable large directroy support in e2fsck
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 53effd14b09e128f2f4450e6bee206d1c7341b0d

Comment by Artem Blagodarenko (Inactive) [ 03/Apr/17 ]

Andreas, Do I need to resend ext4(fsprogs) patches again to the email list? I see new patches are uploaded there, but don't understand the reason. Thanks.

Comment by Gerrit Updater [ 05/May/17 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch https://review.whamcloud.com/23831/
Subject: LU-1365 resize2fs: clear uninit if allocating from new group
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: 6ba8ad101d4a8331b8131abb56ea821b77b7d2b0

Comment by Andreas Dilger [ 16/Sep/17 ]

This landed to upstream e2fsprogs-1.44, and kernel 4.14.

Comment by Gerrit Updater [ 18/Jan/18 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/30912
Subject: LU-1365 e2fsprogs: add support for 3-level htree
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 1140574de77dad5aeca7b3688aa866d6921c97a3

Comment by Gerrit Updater [ 18/Jan/18 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/30913
Subject: LU-1365 tests: 3 level hash tree test
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: ad3eabbbbb5305cc1006db9ec11a66cba795a228

Comment by Artem Blagodarenko (Inactive) [ 19/Nov/18 ]

adilger, I have attached logs for conf_sanity_124 and conf_sanity_125 just. Test session in my local environment. They are passed. I also  installed packages that maloo built, and test started successfully (I haven't wait until it finished, but they are not failed in start like in maloo).

I have no idea how to fix maloo test session. Do you have any suggestions?

Comment by Andreas Dilger [ 27/Nov/18 ]

Reopen this issue while the patch is still unlanded.

There also appears to be an issue with e2fsck and directories over 2GB:

Pass 1: Checking inodes, blocks, and sizes
Inode 252 is too big.  Truncate? no

Block #524289 (552395) causes directory to be too big.  IGNORED.
Block #524290 (552396) causes directory to be too big.  IGNORED.
Block #524291 (552397) causes directory to be too big.  IGNORED.
Block #524292 (552398) causes directory to be too big.  IGNORED.
Block #524293 (552399) causes directory to be too big.  IGNORED.
Comment by Dongyang Li [ 28/Nov/18 ]

Artem, Looks like in process_block() from pass1, the limit of dir size is still 2GB.

with large_dir we could end up with a dir larger than 2GB, like the one created in conf_sanity test_125.

I also noticed that stat and ls from debugfs is showing the size of the dir as a negative value for the same dir,

the reason is we are just using inode->i_size rather than EXT2_I_SIZE(inode).

Can you please fix them in e2fsprogs upstream? Also please push a patch to gerrit for the master-lustre branch so we can land it from our side.

Thanks

DY

Comment by Gerrit Updater [ 29/Nov/18 ]

Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/33756
Subject: LU-1365 tests: createmany outputs stat after 2 seconds
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b66dd7a12ca6ef3290e09d2baabe2361c758f2e3

Comment by Gerrit Updater [ 29/Nov/18 ]

Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/33757
Subject: LU-1365 utils: allow set block size for ldiskfs backend
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1fc5bde1a96d249febd15e5c96e33d1b4c9b64ce

Comment by Gerrit Updater [ 10/Dec/18 ]

Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/33813
Subject: LU-1365 e2fsck: allow to check >2GB sized directory
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: 7e77372d97aad810b651e680220d8168cd5bf54c

Comment by Gerrit Updater [ 10/Dec/18 ]

Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/33814
Subject: LU-1365 debugfs: output large directory size
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: d598fae184f1abdfd47b8396f21db217238ce220

Comment by Gerrit Updater [ 12/Dec/18 ]

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/33814/
Subject: LU-1365 debugfs: output large directory size
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: b3267b9f4f9037b84fa83e1296e7dad508a592ba

Comment by Gerrit Updater [ 13/Dec/18 ]

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/33813/
Subject: LU-1365 e2fsck: allow to check >2GB sized directory
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: c5de396607efa2032bc32f7aa607b38d9e82dc6b

Comment by Artem Blagodarenko (Inactive) [ 21/Dec/18 ]

Test conf_sanity 125 successfully passed on my local environment. It's took 1,5 hours. Full logs are attached to this issue.
Name creation rate didn't reduced dramatically. From 

 total: 60000 link in 119.82 seconds: 500.76 ops/second

in first iteration. To

total: 60000 link in 123.89 seconds: 484.31 ops/second 

I belive previous performance drop could be because we sent full directory name to createmany utility and lookup require more time for large directory. This time createmany acts on the current directory.

Comment by Gerrit Updater [ 30/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33757/
Subject: LU-1365 utils: allow set block size for ldiskfs backend
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5f674667bfd1ab9a0e47d9f03f3e7eab37eb8e17

Comment by Gerrit Updater [ 30/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/22009/
Subject: LU-1365 tests: LARGEDIR and 3-level htree for local ldiskfs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8048e216c16fa403da6fa2a755df8f718ab3105d

Comment by Gerrit Updater [ 11/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33756/
Subject: LU-1365 tests: createmany outputs stat after 2 seconds
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 46125b6d9627d006087dfcc727e2cff0954e78c8

Comment by Peter Jones [ 11/Feb/19 ]

Landed for 2.13

Comment by Colin Faber [X] (Inactive) [ 29/May/19 ]

should this be closed?

Comment by Peter Jones [ 29/May/19 ]

We usually leave tickets as RESOLVED rather than CLOSED because then the ticket can be updated when needed (if landed to maintenance branches, say) without the extra email generated by having to go through the states REOPEN then RESOLVED then CLOSED again.

Comment by Colin Faber [X] (Inactive) [ 29/May/19 ]

Got it. How do you guys keep track of where it's and when this / tickets like this are ready to close?

Comment by Peter Jones [ 29/May/19 ]

We just consider RESOLVED to be the primary task is complete.

Comment by Gerrit Updater [ 18/Nov/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36778
Subject: LU-1365 utils: allow set block size for ldiskfs backend
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: d412d6c6e41446303a2244c46f9bfd3330926b45

Comment by Gerrit Updater [ 18/Nov/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36779
Subject: LU-1365 tests: LARGEDIR and 3-level htree for local ldiskfs
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 76bec4f7fdd0ec1f85863f13b5cff248d5676fc3

Comment by Gerrit Updater [ 05/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36778/
Subject: LU-1365 utils: allow set block size for ldiskfs backend
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: c89aa6edccab2d20d44e52af1e5a16e8d9d39fe9

Generated at Sat Feb 10 01:15:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.