[LU-8152] show OST/MDT read-only status in "lctl dl" and/or "lfs df" Created: 16/May/16  Updated: 20/May/22  Resolved: 06/Apr/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-8920 don't print permanently deactivated O... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

To aid in debugging server problems when the filesystem has gone read-only, it would be useful to print "RO" for the device status in "lctl dl" instead of "UP", if that is possible without significant layering violations.

It would also be useful to print out the OS_STATE_DEGRADED and OS_STATE_READONLY flags with "lfs df", possibly one letter per flag [DR] after the device names, like:

$ lfs df
UUID                   1K-blocks        Used   Available Use% Mounted on
myth-MDT0000_UUID        9174328      584036     8066004   7% /myth[MDT:0]
myth-OST0000_UUID     3880285544  3451122460   235124108  94% /myth[OST:0] DR
myth-OST0001_UUID     3886052008  3362805556   328933836  91% /myth[OST:1]
myth-OST0002_UUID     3880285544  3547714164   216149060  94% /myth[OST:2] D
myth-OST0003_UUID     5840741952  3831815716  1716851948  69% /myth[OST:3]
OST0004             : inactive device

We might consider to require a -v flag if there is a concern that this would break the output format, but I don't think that it will.



 Comments   
Comment by Gary Hagensen (Inactive) [ 15/Jun/16 ]

an observation on why this should get fixed soon. I am doing a training in Brazil and two of the students had 2 OSTs go read-only on them in their virtual machines. I am assuming because they started using the OSTs when they were still in recovery as there can't be bad hardware in VMs. IML and lctl dl says everything is happy but cat /proc/fs/lustre/lod/Lustre01-MDT0000-mdtlov/target_obd showed two OSTs as INACTIVE.

They discovered this when setting the stripe count on a directory to 8 and the files under the directory only using 6 OSTs. No complaints from set_stripe, etc that that many where not available. I only knew where to look because of this JIRA.

I was also a bit surprised that using a OST while in recovery (just doing file operations) would cause the OSTs to go INACTIVE. Nothing in the logs that I could see indicating the transition to inactive. This was the case when this JIRA was generated also.

Comment by Gary Hagensen (Inactive) [ 15/Jun/16 ]

oops.. did find a entry in dmesg and /var/log/messages on the MDS from LDISK_FS about the read-only transition. ALso noted the time which was in the evening when I was re-creating the cluster. No IO.

Comment by Gary Hagensen (Inactive) [ 15/Jun/16 ]

However the messages did not make it to the IML aggregated log. Seems like IML should look for LDISKFS entries as well as LUSTRE or there should be a LUSTRE error generated as well.

Comment by Gabriele Paciucci (Inactive) [ 28/Jun/16 ]

/proc/fs/lustre/health_status should report LDISKFS errors and LBUG...not sure ZFS errors.

Comment by Peter Jones [ 29/Jul/16 ]

Jian

Could you please look into this issue?

Thanks

Peter

Comment by Andreas Dilger [ 21/Oct/16 ]

Jian, could you please make a patch for "lfs df" to print the OS_STATE_* flags? This should be fairly fast as it only invokes a small change to showdf(), and testing is straight forward since the "obdfilter.*.degraded" tunable can be set directly from the test script.

Comment by Jian Yu [ 21/Oct/16 ]

Sure, Andreas. I'll create the patch.

Comment by Gerrit Updater [ 24/Oct/16 ]

Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/23330
Subject: LU-8152 utils: improve “lfs df” to show device status
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f892e9c4b0216b7d48fae0ed06ebab5e84abab1b

Comment by Gerrit Updater [ 06/Apr/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23330/
Subject: LU-8152 utils: improve “lfs df” to show device status
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 95d7592a9f33f62accade96d81e0cc3ca0fb94e2

Comment by Peter Jones [ 06/Apr/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:15:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.