[LU-5662] mkfs.lustre --replace can cause index numbers to be locked out Created: 17/Sep/14 Updated: 06/Jan/16 Resolved: 06/Jan/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alexander Mitin | Assignee: | John Fuchs-Chesney (Inactive) |
| Resolution: | Low Priority | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 6.5, Lustre 2.5.3 |
||
| Severity: | 3 |
| Rank (Obsolete): | 15764 |
| Description |
|
Running mkfs.lustre with the --replace flag causes a difficult-to-untangle lockout situation if the lustre MDS has never seen the index before. I stumbled across this while scripting the reformatting operations of a test system. In this process, I ended up reformatting a completely new OST for the first time, using the --replace flag. When I did this, the attempt to mount the OST on the OSS resulted in the following failure message: mount.lustre: mount /dev/sdae at /lustre/ost6 failed: No such file or directory Once this has happened, the MDS seems to be stuck in a halfway state, and there is no easy way recover use of the index number. If you reformat (again) with --replace specified, the error persists. If you reformat (again) with --replace absent, you get the following error on mount: mount.lustre: mount /dev/sdac at /lustre/ost4 failed: Address already in use Workaround is to dismount all Lustre disks, shut down lustre with lustre_rmmod, and then remount all disks. This is similar to the -U versus -i flag in the rpm command. Technically, -U should not be used if the RPM has never been installed before, but the rpm command is intelligent enough to interpret -U as meaning -i if there is no prior installed version of the RPM. What rpm should certainly NOT do is to treat a technical misuse of -U on an uninstalled module as a persistent error that prevents subsequent use of either -U or -i on that module without a full reboot of Linux. There are numerous ways to fix this. The simplest from a user perspective is this: if a formatted OST is flagged with --replace, then when mounting, the MDS should recognize it has never seen the index before, and simply ignore the --replace tag. This would be the analogue of rpm seeing the -U flag, recognizing that it has never seen the RPM before, and simply treating -U as -i. |
| Comments |
| Comment by John Fuchs-Chesney (Inactive) [ 17/Sep/14 ] |
|
Joseph, Can you let us know if you are using IML? (Generally we'd expect IML to replace the need for this kind of scripting). Thanks, |
| Comment by Joseph Nemeth (Inactive) [ 17/Sep/14 ] |
|
Not using IML. |
| Comment by Andreas Dilger [ 17/Sep/14 ] |
|
Joseph, |
| Comment by Joseph Nemeth (Inactive) [ 17/Sep/14 ] |
|
Understood. I'm in a test environment where I will be breaking down and building up Lustre frequently, and after testing, anticipate needing to restart from zero: the simplest way to do this is to reformat the disks, which would normally include the MDTs as well. However, in stumbling through my first attempts to make this easier for myself in the future, I tripped over this bug, where using --replace in the case of a disk which has never before been formatted effectively burns the index number: you can't recover from the mistake by reformatting, because, somehow, Lustre "remembers" that you screwed up. I thought it should be documented. |
| Comment by Andreas Dilger [ 18/Sep/14 ] |
|
Thanks for the update. I wasn't sure if there was some use that we hadn't anticipated, so it is good to know that this is an unlikely scenario. I think this should be fixed at some point since it may be possible to hit this under normal usage also, but it doesn't sound urgent. |
| Comment by Joseph Nemeth (Inactive) [ 19/Sep/14 ] |
|
No problem. And I agree, it isn't urgent. |
| Comment by John Fuchs-Chesney (Inactive) [ 19/Nov/14 ] |
|
Is this ticket ready to be marked as resolved? Thanks, |
| Comment by Joseph Nemeth (Inactive) [ 19/Nov/14 ] |
|
I've not seen any resolution. It's still a (low priority) bug. |
| Comment by John Fuchs-Chesney (Inactive) [ 19/Nov/14 ] |
|
Assigned back to me. |
| Comment by John Fuchs-Chesney (Inactive) [ 06/Jan/16 ] |
|
I'm marking this as resolved/low priority, since it is a rare/unanticipated case, and an unintended use of the mkfs.lustre --replace command. If anyone disagrees, please shout. ~ jfc. |