Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.12.4
Labels:
None
Environment:
CentOS7.7.1908

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Hey all,
Ive been struggling with a problem with our newly updated lustre 2.12 cluster, and I don't really know if its a bug, or configuration problem, or what.
So here's the setup: I've recently set up a small 2-OST single MDT 2.10 cluster to emulate our production cluster, and test the process of upgrading to 2.12.4. The upgrade went fine, however there is a problem with how df reports space on the lustre filesystem that is causing problems with our processing software. The software includes a df check to make sure the filesystem isn't too full before beginning a job. The problem is, that when multiple df commands are run against the lustre filesystem from the same client, occasionally the command will return a 0 in the available field, which in turn makes the software think the filesystem is full, then drop jobs. I can test this by running 'while [ true ];do /bin/df -TP /performance;done' on two sessions on the same client. As soon as I start the second while loop, the outputs go from:
Filesystem Type 1024-blocks Used Available Capacity Mounted on
192.168.0.181@tcp:/perform lustre 71467728 100416 67664944 1% /performance

to:
Filesystem Type 1024-blocks Used Available Capacity Mounted on
192.168.0.181@tcp:/perform lustre 0 -0 -0 50% /performance
I am using lustre 2.12.4 on the client as well, so Ive ruled out version mismatch issues at least.

I've checked all the mount settings between the prod 2.10 cluster and the dev 2.12 cluster, and everything I can find looks the same. The 2.10 prod cluster does not have this problem, and the dev cluster did not have the problem before upgrading from 2.10.

I have posted this in the lustre-discuss mailing list and Nathan Dauchy suggested I open a Jira issue so I could upload an strace of the failure.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

df.out
5 kB
26/Feb/20 2:11 PM
df2.out
5 kB
26/Feb/20 2:12 PM
dftest.txt
5.12 MB
21/Feb/20 4:02 PM

Issue Links

is related to

LU-13296 statfs isn't work properly with MDT statfs proxy

Resolved

Activity

People

Assignee:: WC Triage

Reporter:: Kevin Konzem

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 21/Feb/20 4:02 PM

Updated:: 06/Apr/20 11:36 PM

Resolved:: 06/Apr/20 11:36 PM