[LU-8152] show OST/MDT read-only status in "lctl dl" and/or "lfs df" Created: 16/May/16 Updated: 20/May/22 Resolved: 06/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
To aid in debugging server problems when the filesystem has gone read-only, it would be useful to print "RO" for the device status in "lctl dl" instead of "UP", if that is possible without significant layering violations. It would also be useful to print out the OS_STATE_DEGRADED and OS_STATE_READONLY flags with "lfs df", possibly one letter per flag [DR] after the device names, like: $ lfs df UUID 1K-blocks Used Available Use% Mounted on myth-MDT0000_UUID 9174328 584036 8066004 7% /myth[MDT:0] myth-OST0000_UUID 3880285544 3451122460 235124108 94% /myth[OST:0] DR myth-OST0001_UUID 3886052008 3362805556 328933836 91% /myth[OST:1] myth-OST0002_UUID 3880285544 3547714164 216149060 94% /myth[OST:2] D myth-OST0003_UUID 5840741952 3831815716 1716851948 69% /myth[OST:3] OST0004 : inactive device We might consider to require a -v flag if there is a concern that this would break the output format, but I don't think that it will. |
| Comments |
| Comment by Gary Hagensen (Inactive) [ 15/Jun/16 ] |
|
an observation on why this should get fixed soon. I am doing a training in Brazil and two of the students had 2 OSTs go read-only on them in their virtual machines. I am assuming because they started using the OSTs when they were still in recovery as there can't be bad hardware in VMs. IML and lctl dl says everything is happy but cat /proc/fs/lustre/lod/Lustre01-MDT0000-mdtlov/target_obd showed two OSTs as INACTIVE. They discovered this when setting the stripe count on a directory to 8 and the files under the directory only using 6 OSTs. No complaints from set_stripe, etc that that many where not available. I only knew where to look because of this JIRA. I was also a bit surprised that using a OST while in recovery (just doing file operations) would cause the OSTs to go INACTIVE. Nothing in the logs that I could see indicating the transition to inactive. This was the case when this JIRA was generated also. |
| Comment by Gary Hagensen (Inactive) [ 15/Jun/16 ] |
|
oops.. did find a entry in dmesg and /var/log/messages on the MDS from LDISK_FS about the read-only transition. ALso noted the time which was in the evening when I was re-creating the cluster. No IO. |
| Comment by Gary Hagensen (Inactive) [ 15/Jun/16 ] |
|
However the messages did not make it to the IML aggregated log. Seems like IML should look for LDISKFS entries as well as LUSTRE or there should be a LUSTRE error generated as well. |
| Comment by Gabriele Paciucci (Inactive) [ 28/Jun/16 ] |
|
/proc/fs/lustre/health_status should report LDISKFS errors and LBUG...not sure ZFS errors. |
| Comment by Peter Jones [ 29/Jul/16 ] |
|
Jian Could you please look into this issue? Thanks Peter |
| Comment by Andreas Dilger [ 21/Oct/16 ] |
|
Jian, could you please make a patch for "lfs df" to print the OS_STATE_* flags? This should be fairly fast as it only invokes a small change to showdf(), and testing is straight forward since the "obdfilter.*.degraded" tunable can be set directly from the test script. |
| Comment by Jian Yu [ 21/Oct/16 ] |
|
Sure, Andreas. I'll create the patch. |
| Comment by Gerrit Updater [ 24/Oct/16 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/23330 |
| Comment by Gerrit Updater [ 06/Apr/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23330/ |
| Comment by Peter Jones [ 06/Apr/17 ] |
|
Landed for 2.10 |