[LU-2718] Unable to re-mount OST (-17) Created: 30/Jan/13  Updated: 11/Oct/17  Resolved: 11/Oct/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Robert Read (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6611

 Description   

Seen in a 2.3.58 build.

The error messages here are similar to LU-2110, but these occur when mounting a target right after a umount. This is easily reproducible in my test environment.

2013-01-30 22:11:47,707 [1494] shell.run: ['mount', '-t', 'lustre', '/dev/sdf', '/mnt/targets/scratch-OST0009']
2013-01-30 22:11:47,948 [1494] ERR> mount.lustre: mount /dev/xvdj at /mnt/targets/scratch-OST0009 failed: File exists

Messages:

LustreError: 1634:0:(genops.c:318:class_newdev()) Device scratch-MDT0000-osp-OST0009 already exists at 4, won't add
LustreError: 1634:0:(obd_config.c:374:class_attach()) Cannot create device scratch-MDT0000-osp-OST0009 of type osp : -17
LustreError: 1634:0:(obd_mount.c:373:lustre_start_simple()) scratch-MDT0000-osp-OST0009 attach error -17
LustreError: 1634:0:(obd_mount.c:1137:lustre_osp_setup()) scratch-MDT0000-osp-OST0009: setup up failed: rc -17
LustreError: 15c-8: MGC10.197.11.79@tcp: The configuration from log 'scratch-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 1598:0:(obd_mount.c:1865:server_start_targets()) scratch-OST0009: failed to start OSP: -17
LustreError: 1598:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -17
LustreError: 1598:0:(obd_mount.c:1365:lustre_disconnect_osp()) Can't find osp-on-ost scratch-MDT0000-osp-OST0009
LustreError: 1598:0:(obd_mount.c:2114:server_put_super()) scratch-OST0009: failed to disconnect osp-on-ost (rc=-2)!
Lustre: Failing over scratch-OST0009
LustreError: 1598:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost scratch-MDT0000-osp-OST0009
LustreError: 1598:0:(obd_mount.c:2159:server_put_super()) scratch-OST0009: Fail to stop osp-on-ost!
Lustre: server umount scratch-OST0009 complete
LustreError: 1598:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount  (-17)

From reading the debug log (attached), the osp device is not cleaned up until a few seconds after the initial umount completes. In the meantime, there were two unsuccessful mount attemps. After the osp device was cleaned up, I was able to mount.

Would it be possible to block umount on this device cleanup?



 Comments   
Comment by Brian Murrell (Inactive) [ 30/Jan/13 ]

I've seen this in 2.1.x also. I ended up just putting mount in a loop. I thought I had filed a bug about it but can't seem to find it any more.

Comment by Alex Zhuravlev [ 31/Jan/13 ]

we did hit this in some conf-sanity tests, a workaround was to unload modules. but this isn't quite the solution, of course.

Comment by Robert Read (Inactive) [ 31/Jan/13 ]

Obviously unloading modules isn't particularly convenient when there is more than one target on the node, but luckily it doesn't appear to be necessary as eventually the device does get cleaned up. It looks like it's kept alive waiting on the export, btw. For now I added a sleep and retry, like Brian it seems.

Comment by Andreas Dilger [ 05/Feb/13 ]

There is also a "--retry" option to mount.lustre that can be used instead of having an external loop. It will sleep an increasing interval from 0s up to 5s between retries.

Generated at Sat Feb 10 01:27:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.