[LU-7592] Сhange force_over_128tb lustre mount option to force_over_256b for ldiskfs Created: 22/Dec/15 Updated: 08/Dec/17 Resolved: 18/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Artem Blagodarenko (Inactive) | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Currently attempts of creating ldisk file system with size >128TB finished with message. LDISKFS-fs does not support file systems greater than 128TB and can cause data corruption.Use "force_over_128tb" mount option to override. Before using “force_over_128tb” parameter in production systems lustre file system software should be analyzed to point possible large disks support issues. This issue is about research of some aspects of Lustre software. Finally patch that change "force_over_128tb" to "force_over_256tb" should be landed. This gives ability use ldiskfs partitions <256tb without options. |
| Comments |
| Comment by Gerrit Updater [ 22/Dec/15 ] |
|
Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: http://review.whamcloud.com/17702 |
| Comment by Artem Blagodarenko (Inactive) [ 24/Dec/15 ] |
|
Verification steps: Issues verified and tested: -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,huge_file,64bit,flex_bg -G 256 -E lazy_journal_init,lazy_itable_init=0 -F 2 Inode count limitation MDS required inodes count can be calculated. It has to be more than ost inode count * number of ost. This calculations for worst case with 1 stripe. With current option -i 1048576 for 256TB OST inode count is 256M. Maximum inodes count is 2^32-1=4294967296, so this limit is exceeded with 4294967296/256M=16 OSTs. 3. Directories format. 32 directories with 64kb files static void ext4_inc_count(handle_t *handle, struct inode *inode) { inc_nlink(inode); if (is_dx(inode) && inode->i_nlink > 1) { /* limit is 16-bit i_links_count */ if (inode->i_nlink >= EXT4_LINK_MAX || inode->i_nlink == 2) { inode->i_nlink = 1; EXT4_SET_RO_COMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_RO_COMPAT_DIR_NLINK); } } } /* * If a directory had nlink == 1, then we should let it be 1. This indicates * directory has >LDISKFS_LINK_MAX subdirs. */ static void ldiskfs_dec_count(handle_t *handle, struct inode *inode) { if (!S_ISDIR(inode->i_mode) || inode->i_nlink > 2) drop_nlink(inode); } There are some doubts how this code works when i_nlink become less then EXT4_LINK_MAX. There is sanity run_test 51b "exceed 64k subdirectory nlink limit" but it has some issues: 4 Performance near first and last block of disk 5 ldiskfs data structures limitations But ext4_bmap returns sector_t value. static sector_t ext4_bmap(struct address_space *mapping, sector_t block) That depending on macros can be 32 or 64 bit long
So we need to use sector_t for this array of blocks. 6. Obdfilter. Block addressing etc. 7. Extended attribute inode probable overflow 8. Quta limits. Sizes and inodes. 9. llog. llog id limitaions 10 Tools. FSCK - 64 bits block number 1) It is used for bad blocks accessing in wrong way. There is patch that chagnes bad blocks numbers to 64bit http://patchwork.ozlabs.org/patch/279297/ we could port it or make from scratch. (LU-XXXX) 11. e2fsprogs update 12. fsck time 13. lfsck For points there marker will upload patches in near future. |
| Comment by Andreas Dilger [ 17/Apr/16 ] |
|
Thank you for this detailed analysis. For some reason I don't recall reading it, maybe because it was posted on Christmas and I was on holidays for a couple of weeks and missed it on my return. In any case it looks very thorough. Some issues I think are important in this area to discuss in advance if you plan to keep enhancing ext4 for even larger OSTs:
If you are planning to do further enhancements to ldiskfs, I'd strongly recommend to discuss them on the linux-ext4 mailing list first, so they have a chance to be improved and hopefully landed instead of being for Lustre only. |
| Comment by Andreas Dilger [ 17/Apr/16 ] |
|
More on the MDT side, a couple of interesting possibilities exist:
PS: if you do plan on working on any new features, we should move the discussion to new tickets, if they don't already have one. |
| Comment by Gerrit Updater [ 22/Apr/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17702/ |
| Comment by Gerrit Updater [ 26/Apr/16 ] |
|
Artem Blagodarenko (artem.blagodarenko@seagate.com) uploaded a new patch: http://review.whamcloud.com/19788 |
| Comment by Artem Blagodarenko (Inactive) [ 10/Feb/17 ] |
|
https://review.whamcloud.com/#/c/19788 is abandoned because its change is landed as part of https://review.whamcloud.com/#/c/24524 |
| Comment by Andreas Dilger [ 18/Apr/17 ] |
|
The two patches here were landed for 2.9.0. |