[LU-11882] OST recreated objects gets badness mark from e2fsck Created: 22/Jan/19  Updated: 30/May/19  Resolved: 30/May/19

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Artem Blagodarenko (Inactive) Assignee: Artem Blagodarenko (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Gantt End to Start
has to be done after LU-11915 conf-sanity test 115 is skipped or hangs Open
Related
is related to LU-8465 parallel e2fsck performance at scale Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

e2fsck spends 72 hours to checkĀ ~113T OST. Most time in phase 2.

Profiler shows that most expensive path is:

e2fsck_run -> e2fsck_pass2 ->  check_dir_block -> e2fsck_process_bad_inode -> e2fsck_read_inode

After adding -d option we get millions of such messages:

e2fsck_pass1:1543: increase inode 12485987 badness 0 to 2

Which correspond to this piece of code:

else if (EXT4_XTIME_ANCIENT(ctx, sb, inode->i_ctime,
                                            ctx->time_fudge))
                        e2fsck_mark_inode_bad(ctx, ino, BADNESS_HIGH);

Code check if ctime is too old. But Lustre FS adds precreated object with zeroed time. So every such object assumed as bad by e2fsck. By some reasons (this is other issue topic) we get millions of precreated files and e2fsck spends a lot of time to process this file during phase 2.

After removing check above e2fsck completes its work after 20 minutes.

Two possible solutions are suggested: 1) remove this check, because having zeroed chime is possible situation for Lustre FS 2) remove inode badness patches. Does it give enough advantages for such overhead.



 Comments   
Comment by Andreas Dilger [ 22/Jan/19 ]

The other question is why you have millions of precreated objects? I guess they are stripes for files that were never used?

In any case, rather than disable this case completely, it would be trivial to make a special case for ctime == 0 if SUID/SGID are also set. That would catch the specific case of Lustre precreated objects without disabling the inode badness completely.

Comment by Andreas Dilger [ 22/Jan/19 ]

Note that "badness = 2" is not enough for e2fsck to consider the inode corrupt, just one part of the possibility that there is something wrong with it. It needs "badness > 7" to be considered corrupt.

Comment by Gerrit Updater [ 25/Jan/19 ]

Artem Blagodarenko (c17828@cray.com) uploaded a new patch: https://review.whamcloud.com/34113
Subject: LU-11882 e2fsck: zero date is not inode badness
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set: 1
Commit: b15477ef0bd382c9c349a94e089c696fa59c5acf

Comment by Gerrit Updater [ 28/May/19 ]

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/34113/
Subject: LU-11882 e2fsck: zero date is not inode badness
Project: tools/e2fsprogs
Branch: master-lustre
Current Patch Set:
Commit: c478288dddf485459115e14c57470c4466740e8d

Comment by Artem Blagodarenko (Inactive) [ 30/May/19 ]

Change has been successfully cherry-picked as c478288dddf485459115e14c57470c4466740e8d

Generated at Sat Feb 10 02:47:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.