[LU-2110] Unable to mount (-17) MDT Created: 08/Oct/12 Updated: 19/Apr/13 Resolved: 19/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | topsequoia | ||
| Environment: |
Lustre: 2.3.53-2chaos |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 4642 |
| Description |
|
After updating to 2.3.53-2chaos, the MDS is no longer able to mount its MDT. The relevant console messages: Lustre: Found index 0 for lstest-MDT0000, updating log LustreError: 33410:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context LustreError: 33836:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add LustreError: 33836:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17 Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11 LustreError: 33836:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17 LustreError: 33836:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17 LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 33405:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17 Lustre: lstest-MDT0000: Unable to start target: -17 Lustre: Failing over lstest-MDT0000 LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880faa7f0800 x1415277549978464/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 32690:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0182-osc-MDT0000: couldn't update statfs: rc = -5 LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) Skipped 253 previous similar messages Lustre: server umount lstest-MDT0000 complete LustreError: 33405:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount (-17) I'm just about to start looking into the root cause. |
| Comments |
| Comment by Alex Zhuravlev [ 08/Oct/12 ] |
|
is this on the clean node (after reboot) ? can you unload all lustre modules (may take some time) and try again ? |
| Comment by Prakash Surya (Inactive) [ 08/Oct/12 ] |
|
Originally, this this was on a clean reboot. But the messages I pasted in the description were from a manually retried mount, after the first failed. Here are all the messages from the console: Lustre: Lustre: Build Version: 2.3.53-2chaos-2chaos--PRISTINE-2.6.32-220.23.1.1chaos.ch5.x86_64 Lustre: Found index 0 for lstest-MDT0000, updating log LustreError: 32758:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2 LustreError: 32761:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context LustreError: 33225:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add LustreError: 33225:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17 Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11 LustreError: 33225:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17 LustreError: 33225:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17 LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 32758:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17 Lustre: lstest-MDT0000: Unable to start target: -17 Lustre: Failing over lstest-MDT0000 LustreError: 32680:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880f977bb800 x1415277549977961/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 32680:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0181-osc-MDT0000: couldn't update statfs: rc = -5 LustreError: 32681:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff882016d06c00 x1415277549977962/t0(0) o13->lstest-OST0182-osc-MDT0000@172.20.2.186@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 32682:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0183-osc-MDT0000: couldn't update statfs: rc = -5 LustreError: 32682:0:(osp_precreate.c:116:osp_statfs_interpret()) Skipped 1 previous similar message LustreError: 32680:0:(osp_precreate.c:116:osp_statfs_interpret()) Skipped 125 previous similar messages Lustre: server umount lstest-MDT0000 complete LustreError: 32758:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount (-17) Lustre: Found index 0 for lstest-MDT0000, updating log LustreError: 33410:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context LustreError: 33836:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add LustreError: 33836:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17 Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11 LustreError: 33836:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17 LustreError: 33836:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17 LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 33405:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17 Lustre: lstest-MDT0000: Unable to start target: -17 Lustre: Failing over lstest-MDT0000 LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880faa7f0800 x1415277549978464/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 32690:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0182-osc-MDT0000: couldn't update statfs: rc = -5 LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) Skipped 253 previous similar messages Lustre: server umount lstest-MDT0000 complete LustreError: 33405:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount (-17) Is is still worth rebooting, and trying again? |
| Comment by Alex Zhuravlev [ 08/Oct/12 ] |
|
well, sorry you're seeing this... could you please try again and attach lustre log to the ticket ? |
| Comment by Prakash Surya (Inactive) [ 08/Oct/12 ] |
|
Rebooted and collected the lustre log file. |
| Comment by Alex Zhuravlev [ 08/Oct/12 ] |
|
thanks. I see the root cause.. working on the fix. |
| Comment by Alex Zhuravlev [ 08/Oct/12 ] |
|
Prakash, please try with http://review.whamcloud.com/#change,4227 if I understand right, failover nid for MDS was specified at mkfs.lustre time, not added later ? |
| Comment by Prakash Surya (Inactive) [ 08/Oct/12 ] |
|
Actually, I'm not certain of that. At one point a failover NID was added using a writeconf, but the filesystem was reformatted since then. During the reformat, I'm unsure if both the failover NIDs were specified at mkfs time, or the writeconf method was used after mkfs. I can try to track down that information if it is useful..? |
| Comment by Alex Zhuravlev [ 08/Oct/12 ] |
|
one way is to fetch /CONFIGS/lstest-client file from MDS and parse it with llog_reader utility. |
| Comment by Prakash Surya (Inactive) [ 08/Oct/12 ] |
|
Here you go. The dump of # grove-mds2 /mnt/grove-mds2/mgs > llog_reader CONFIGS/lstest-client > lstest-client.llogreader |
| Comment by Alex Zhuravlev [ 08/Oct/12 ] |
|
#12 (128)attach 0:lstest-MDT0000-mdc 1:mdc 2:lstest-clilmv_UUID #21 resulted in a second instance of OSP device. I think the patch above should help with the issue. |
| Comment by Christopher Morrone [ 08/Oct/12 ] |
|
Ok, we've pulled in patch http://review.whamcloud.com/#change,4227 and will give it a try. |
| Comment by Ian Colle (Inactive) [ 09/Oct/12 ] |
|
Patch landed to master. |
| Comment by Alex Zhuravlev [ 12/Oct/12 ] |
|
can we close the ticket? |
| Comment by Prakash Surya (Inactive) [ 12/Oct/12 ] |
|
Sure. |