[LU-12501] lfs df can never return Created: 01/Jul/19  Updated: 12/Aug/19  Resolved: 21/Jul/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Major
Reporter: James A Simmons Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: patch
Environment:

Any lustre client


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In recent testing I have encounter cases were running lfs df never returns. Today I sat down to track down why it never returns. The reason is this loop never exits in showdf() in lfs.c.

for (i = 0, state = stat->os_state; state != 0; i++)

{ ... }

 Comments   
Comment by Peter Jones [ 01/Jul/19 ]

James

Any idea of when this problem first appeared?

Peter

Comment by Patrick Farrell (Inactive) [ 01/Jul/19 ]

James,

When do you see this problem?  It's clearly not all the time, as we don't see it in our testing.

Comment by James A Simmons [ 01/Jul/19 ]

The last couple of weeks  I started to see it. For some reason I see this all the time on our test bed.

Comment by Peter Jones [ 02/Jul/19 ]

Does that mean that you are able to run git bisect to identify the commit that introduced the problem?

Comment by Gerrit Updater [ 02/Jul/19 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/35403
Subject: LU-12501 utils: stop showdf() endless loop
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: eba159fd6d2b45689072cd59dff3c77af83c00aa

Comment by James A Simmons [ 02/Jul/19 ]

Yes I tracked the issue Peter. Currently my test setup is all SSDs so I see this problem.

Comment by Peter Jones [ 02/Jul/19 ]

Aha! Yes, I can see why that would slip through. Thanks James!

Comment by James A Simmons [ 03/Jul/19 ]

So Andreas suggest  a few ideas.

I'd prefer it even more if the "NONROT" state was added to the obd_statfs_state_names[] array, and was printed with the "MNTDF_VERBOSE" flag is passed, and masked up in mntdf() otherwise. That avoids special-casing the code here, and pushes the "presentation decision" up toward where options are handled.

As for what letter to use for "NONROT", one option would be to use a lower-case letter to indicate that it is not a "problem" with the target, like 'f' for "flash" or "fast", but I'm open to other options. I'd prefer to avoid overloading 'n' so early.

We might even consider changing the OS_STATE_NONROT flag to count from 0x80000000 downward to make it possible to programatically separate error states from informational states, even though this would slow down showdf() by a few cycles for flash OSTs. The original NONROT patch was landed as v2_12_53-108-g68635c3, so it isn't in a 2.13 release yet, and the backport hasn't been landed to b2_12 yet, so I don't think there isn't a release that includes this flag yet.

I'm going to do the simplest fix but if someone wants something more we can do another patch to enhance this feature.

Comment by Gerrit Updater [ 10/Jul/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35456
Subject: LU-12501 utils: fix 'lfs df' printing loop
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f9bd4be4e79684099fa0a3d09f7991be991ec180

Comment by Gerrit Updater [ 20/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35456/
Subject: LU-12501 utils: fix 'lfs df' printing loop
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e4d92a8a08acbdca6634decd4deb9fe5678ad7ba

Comment by Gerrit Updater [ 01/Aug/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35662
Subject: LU-12501 utils: fix 'lfs df' printing loop
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: aa801af165b903b4ae15f0ef975e8b7bf0413878

Comment by Gerrit Updater [ 11/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35662/
Subject: LU-12501 utils: fix 'lfs df' printing loop
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 2734b902c296f3f6cbc1522a579f23bd86d1be45

Generated at Sat Feb 10 02:53:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.