[LU-2718] Unable to re-mount OST (-17) Created: 30/Jan/13 Updated: 11/Oct/17 Resolved: 11/Oct/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Robert Read (Inactive) | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 6611 |
| Description |
|
Seen in a 2.3.58 build. The error messages here are similar to 2013-01-30 22:11:47,707 [1494] shell.run: ['mount', '-t', 'lustre', '/dev/sdf', '/mnt/targets/scratch-OST0009'] 2013-01-30 22:11:47,948 [1494] ERR> mount.lustre: mount /dev/xvdj at /mnt/targets/scratch-OST0009 failed: File exists Messages: LustreError: 1634:0:(genops.c:318:class_newdev()) Device scratch-MDT0000-osp-OST0009 already exists at 4, won't add LustreError: 1634:0:(obd_config.c:374:class_attach()) Cannot create device scratch-MDT0000-osp-OST0009 of type osp : -17 LustreError: 1634:0:(obd_mount.c:373:lustre_start_simple()) scratch-MDT0000-osp-OST0009 attach error -17 LustreError: 1634:0:(obd_mount.c:1137:lustre_osp_setup()) scratch-MDT0000-osp-OST0009: setup up failed: rc -17 LustreError: 15c-8: MGC10.197.11.79@tcp: The configuration from log 'scratch-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 1598:0:(obd_mount.c:1865:server_start_targets()) scratch-OST0009: failed to start OSP: -17 LustreError: 1598:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -17 LustreError: 1598:0:(obd_mount.c:1365:lustre_disconnect_osp()) Can't find osp-on-ost scratch-MDT0000-osp-OST0009 LustreError: 1598:0:(obd_mount.c:2114:server_put_super()) scratch-OST0009: failed to disconnect osp-on-ost (rc=-2)! Lustre: Failing over scratch-OST0009 LustreError: 1598:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost scratch-MDT0000-osp-OST0009 LustreError: 1598:0:(obd_mount.c:2159:server_put_super()) scratch-OST0009: Fail to stop osp-on-ost! Lustre: server umount scratch-OST0009 complete LustreError: 1598:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount (-17) From reading the debug log (attached), the osp device is not cleaned up until a few seconds after the initial umount completes. In the meantime, there were two unsuccessful mount attemps. After the osp device was cleaned up, I was able to mount. Would it be possible to block umount on this device cleanup? |
| Comments |
| Comment by Brian Murrell (Inactive) [ 30/Jan/13 ] |
|
I've seen this in 2.1.x also. I ended up just putting mount in a loop. I thought I had filed a bug about it but can't seem to find it any more. |
| Comment by Alex Zhuravlev [ 31/Jan/13 ] |
|
we did hit this in some conf-sanity tests, a workaround was to unload modules. but this isn't quite the solution, of course. |
| Comment by Robert Read (Inactive) [ 31/Jan/13 ] |
|
Obviously unloading modules isn't particularly convenient when there is more than one target on the node, but luckily it doesn't appear to be necessary as eventually the device does get cleaned up. It looks like it's kept alive waiting on the export, btw. For now I added a sleep and retry, like Brian it seems. |
| Comment by Andreas Dilger [ 05/Feb/13 ] |
|
There is also a "--retry" option to mount.lustre that can be used instead of having an external loop. It will sleep an increasing interval from 0s up to 5s between retries. |