[LU-8953] ZFS-MDT 100% full. Request for verification of plan to fix Created: 19/Dec/16 Updated: 20/Dec/16 Resolved: 20/Dec/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Peter Bortas | Assignee: | Nathaniel Clark |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Centos 6, Lustre from llnl chaos branch |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The MDT for one of our filesystems is full, and it's not possible to delete any files, rendering the filesystem unusable from the users point of view. It's possible to manually track files that could be deleted via fid to ZFS objects on the disk. But we haven't found a way to delete objects via zdb. A recovery procedure using something like that would probably be good to have if more people run in to this. Given that it's almost Christmas vacation, so lets keep this simple and low risk. I've thrown some more disks into the MDS. Given that the filesystem with problems looks like this: lustre-mdt0 ONLINE 0 0 0 Would it be safe (and fix the problem) to expand it by adding another mirror?: zpool add lustre-mdt0 mirror /dev/exp_sdq/mdt_fouo6exp_sdq /dev/exp_sdt/mdt_fouo6exp_sdt (This is probably the same issue as |
| Comments |
| Comment by Peter Bortas [ 19/Dec/16 ] |
|
Addendum: Our intention is to expand the pool without shutting down Lustre. Either way should be fine, but expanding it live and giving Lustre at least the chance of completing any outstanding operations feels like the more sound way. Please let us know if you disagree. |
| Comment by Peter Jones [ 19/Dec/16 ] |
|
Nathaniel Could you please advise? Thanks Peter |
| Comment by Nathaniel Clark [ 19/Dec/16 ] |
|
zino, I know taking the FS down and growing the MDT will alleviate your issue. I think growing the MDT live will be okay, but I would want to double check (run a test locally) before I could bless that course of action. |
| Comment by Peter Bortas [ 20/Dec/16 ] |
|
Nathaniel, When do you think you could run that test? |
| Comment by Peter Bortas [ 20/Dec/16 ] |
|
After having talked through the failure scenarios of shutting down the FS in this state we decided to do it after unmounting since it's the procedure you know works. Seems to have worked out without any failures I can detect so far. For the record this is what we did: umount lustre-mdt0/fouo6 |
| Comment by Nathaniel Clark [ 20/Dec/16 ] |
|
zino, I'm glad that worked for you. I'll close this bug seeing as you've completed your expansion of the MDT. If I'm mistaken that you need something else from this bug, please feel free to re-open. |