Details
-
Question/Request
-
Resolution: Done
-
Minor
-
None
-
None
-
Lustre: Build Version: 2.8.0_5.chaos
-
9223372036854775807
Description
On a DNE file system, MDT0000 ran out of space while one or more other MDTs were in recovery.
2016-10-31 18:26:53 [20537.964631] Lustre: Skipped 1 previous similar message 2016-10-31 18:26:58 [20542.793836] LustreError: 31561:0:(osd_handler.c:223:osd_trans_start()) lsh-MDT0000: failed to start transaction due to ENOSPC. Metadata overhead is underestimated or grant_ratio is too low. 2016-10-31 18:26:58 [20542.815473] LustreError: 31561:0:(osd_handler.c:223:osd_trans_start()) Skipped 39 previous similar messages 2016-10-31 18:26:58 [20542.827434] LustreError: 31561:0:(llog_cat.c:744:llog_cat_cancel_records()) lsh-OST0009-osc-MDT0000: fail to cancel 1 of 1 llog-records: rc = -28 2016-10-31 18:26:58 [20542.843771] LustreError: 31561:0:(osp_sync.c:1031:osp_sync_process_committed()) lsh-OST0009-osc-MDT0000: can't cancel record: -28
Obviously the first step is to increase the capacity of the pool. However, after that is done, is further action required? Should I run lfsck, or do anything else?
Attachments
Issue Links
- is related to
-
LU-8753 Recovery already passed deadline with DNE
-
- Resolved
-
Generally, the llog became bigger after the reboot means there are something to be recovered. But as long as your recovery complete successfully after the reboot, even if you removed the llogs, your namespace should be in consistent status unless there were some inconsistency before your reboot (for ZFS backend, it should very rare case).