Details
-
Question/Request
-
Resolution: Done
-
Minor
-
None
-
None
-
Lustre: Build Version: 2.8.0_5.chaos
-
9223372036854775807
Description
On a DNE file system, MDT0000 ran out of space while one or more other MDTs were in recovery.
2016-10-31 18:26:53 [20537.964631] Lustre: Skipped 1 previous similar message 2016-10-31 18:26:58 [20542.793836] LustreError: 31561:0:(osd_handler.c:223:osd_trans_start()) lsh-MDT0000: failed to start transaction due to ENOSPC. Metadata overhead is underestimated or grant_ratio is too low. 2016-10-31 18:26:58 [20542.815473] LustreError: 31561:0:(osd_handler.c:223:osd_trans_start()) Skipped 39 previous similar messages 2016-10-31 18:26:58 [20542.827434] LustreError: 31561:0:(llog_cat.c:744:llog_cat_cancel_records()) lsh-OST0009-osc-MDT0000: fail to cancel 1 of 1 llog-records: rc = -28 2016-10-31 18:26:58 [20542.843771] LustreError: 31561:0:(osp_sync.c:1031:osp_sync_process_committed()) lsh-OST0009-osc-MDT0000: can't cancel record: -28
Obviously the first step is to increase the capacity of the pool. However, after that is done, is further action required? Should I run lfsck, or do anything else?
Attachments
Issue Links
- is related to
-
LU-8753 Recovery already passed deadline with DNE
-
- Resolved
-
Basic advice that we should delete the update logs and then run lfsck is a sufficient answer.
This occurred during DNE2 testing with Lustre 2.8, which we have decided not to work at any further. Instead we will test DNE2 when we start testing Lustre 2.10.x. So we will test the advice only if we encounter the problem again, and in that case will file a new ticket.