[LU-12501] lfs df can never return Created: 01/Jul/19 Updated: 12/Aug/19 Resolved: 21/Jul/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | James A Simmons | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
Any lustre client |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
In recent testing I have encounter cases were running lfs df never returns. Today I sat down to track down why it never returns. The reason is this loop never exits in showdf() in lfs.c. for (i = 0, state = stat->os_state; state != 0; i++) { ... } |
| Comments |
| Comment by Peter Jones [ 01/Jul/19 ] |
|
James Any idea of when this problem first appeared? Peter |
| Comment by Patrick Farrell (Inactive) [ 01/Jul/19 ] |
|
James, When do you see this problem? It's clearly not all the time, as we don't see it in our testing. |
| Comment by James A Simmons [ 01/Jul/19 ] |
|
The last couple of weeks I started to see it. For some reason I see this all the time on our test bed. |
| Comment by Peter Jones [ 02/Jul/19 ] |
|
Does that mean that you are able to run git bisect to identify the commit that introduced the problem? |
| Comment by Gerrit Updater [ 02/Jul/19 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/35403 |
| Comment by James A Simmons [ 02/Jul/19 ] |
|
Yes I tracked the issue Peter. Currently my test setup is all SSDs so I see this problem. |
| Comment by Peter Jones [ 02/Jul/19 ] |
|
Aha! Yes, I can see why that would slip through. Thanks James! |
| Comment by James A Simmons [ 03/Jul/19 ] |
|
So Andreas suggest a few ideas. I'd prefer it even more if the "NONROT" state was added to the obd_statfs_state_names[] array, and was printed with the "MNTDF_VERBOSE" flag is passed, and masked up in mntdf() otherwise. That avoids special-casing the code here, and pushes the "presentation decision" up toward where options are handled. As for what letter to use for "NONROT", one option would be to use a lower-case letter to indicate that it is not a "problem" with the target, like 'f' for "flash" or "fast", but I'm open to other options. I'd prefer to avoid overloading 'n' so early. We might even consider changing the OS_STATE_NONROT flag to count from 0x80000000 downward to make it possible to programatically separate error states from informational states, even though this would slow down showdf() by a few cycles for flash OSTs. The original NONROT patch was landed as v2_12_53-108-g68635c3, so it isn't in a 2.13 release yet, and the backport hasn't been landed to b2_12 yet, so I don't think there isn't a release that includes this flag yet. I'm going to do the simplest fix but if someone wants something more we can do another patch to enhance this feature. |
| Comment by Gerrit Updater [ 10/Jul/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35456 |
| Comment by Gerrit Updater [ 20/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35456/ |
| Comment by Gerrit Updater [ 01/Aug/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35662 |
| Comment by Gerrit Updater [ 11/Aug/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35662/ |