[LU-12912] fsck looping Created: 28/Oct/19  Updated: 01/Nov/19  Resolved: 01/Nov/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Joe Mervini Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: None
Environment:

DDN sfa12k IB attached to Dell R730 servers. TOSS 3.5.1 (rhel 7.5)/Lustre 2.10.5/e2fsprogs-1.42.13.wc6-7.el7.x86_64


Issue Links:
Related
is related to LU-12913 fsck found > 1M multilply-claimed blocks Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

An OST on one of our systems went read-only last week. Today I unmounted the particular OST and ran a read-only fsck against the target. During the fsck the check got stuck on one inode and kept repeating the message:

Block <foo> (<bar>) causes symlink to be too big. IGNORE.

(repeats 12 times)

Too many illegal blocks in inode 33023185.

Clear inode? no

Suppress message? no

 

This continually repeats. Since it was a read-only fsck we terminated it and remounted the file system without going any further with the file system repair without consulting with you first. 

This is on a classified system so providing explicit detail will be difficult. Note: This problem is similar to LU-12386 that was posted last year.



 Comments   
Comment by Andreas Dilger [ 29/Oct/19 ]

The e2fsck run is looping because it is stuck being told not to fix the inode and also not to stop printing the message. If you run "e2fsck -fy" it will fix the problem by zeroing out the broken inode.

Note that it is recommended to run e2fsck from the latest version, which is 1.45.3-wc1 at this time. It is also a good idea to log the e2fsck output so it is available in. the future if needed.

Comment by Peter Jones [ 29/Oct/19 ]

Joe

Is the system in question one of the supported on-prem deployments or our AWS Lustre service without support?

Peter

Comment by Joe Mervini [ 29/Oct/19 ]

Thanks Andreas. That's kind of what we though but wanted to make sure. Thanks for the e2fsprogs update recommendation.

 

Peter - This is through our maintenance support via DDN. Contract was placed on 3/27/19.

Comment by Ruth Klundt (Inactive) [ 29/Oct/19 ]

The latest e2fsprogs I see is 1.45.2 here:

https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/RPMS/x86_64/

 

is there a 1.45.3-wc1 somewhere else? 

Comment by Andreas Dilger [ 30/Oct/19 ]

Hi Ruth, sorry but I got the version number wrong. The latest version tagged in our git repo is 1.45.2-wc1.

Comment by Ruth Klundt (Inactive) [ 30/Oct/19 ]

Thanks for the confirmation  I figured so. 

Comment by Peter Jones [ 01/Nov/19 ]

Am I correct in thinking that we can close this ticket too?

Comment by Joe Mervini [ 01/Nov/19 ]

I ran fsck against the file system with the new e2fsprogs first with the preen option which abort and then again with the the '-y' option. After the fsck completed I repeated the 'fsck -fy' to ensure it came back clean and remounted the file system.

Everything is good. We can close this issue as resolved.

Comment by Joe Mervini [ 01/Nov/19 ]

Must have been posting at the same time. I planned to run the fsck this morning, which I did and all is good.

Comment by Peter Jones [ 01/Nov/19 ]

Great

Generated at Sat Feb 10 02:56:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.