[LU-2110] Unable to mount (-17) MDT Created: 08/Oct/12  Updated: 19/Apr/13  Resolved: 19/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Critical
Reporter: Prakash Surya (Inactive) Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: topsequoia
Environment:

Lustre: 2.3.53-2chaos


Attachments: File LU-2110.llog.bz2     File lstest-client.llogreader.bz2    
Severity: 3
Rank (Obsolete): 4642

 Description   

After updating to 2.3.53-2chaos, the MDS is no longer able to mount its MDT. The relevant console messages:

Lustre: Found index 0 for lstest-MDT0000, updating log
LustreError: 33410:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context
LustreError: 33836:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add
LustreError: 33836:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17
Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo
LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11
LustreError: 33836:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17
LustreError: 33836:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17
LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 33405:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17
Lustre: lstest-MDT0000: Unable to start target: -17
Lustre: Failing over lstest-MDT0000
LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880faa7f0800 x1415277549978464/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
LustreError: 32690:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0182-osc-MDT0000: couldn't update statfs: rc = -5
LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) Skipped 253 previous similar messages
Lustre: server umount lstest-MDT0000 complete
LustreError: 33405:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount  (-17)

I'm just about to start looking into the root cause.



 Comments   
Comment by Alex Zhuravlev [ 08/Oct/12 ]

is this on the clean node (after reboot) ? can you unload all lustre modules (may take some time) and try again ?

Comment by Prakash Surya (Inactive) [ 08/Oct/12 ]

Originally, this this was on a clean reboot. But the messages I pasted in the description were from a manually retried mount, after the first failed.

Here are all the messages from the console:

Lustre: Lustre: Build Version: 2.3.53-2chaos-2chaos--PRISTINE-2.6.32-220.23.1.1chaos.ch5.x86_64
Lustre: Found index 0 for lstest-MDT0000, updating log
LustreError: 32758:0:(mgc_request.c:248:do_config_log_add()) failed processing sptlrpc log: -2
LustreError: 32761:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context
LustreError: 33225:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add
LustreError: 33225:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17
Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo
LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11
LustreError: 33225:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17
LustreError: 33225:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17
LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 32758:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17
Lustre: lstest-MDT0000: Unable to start target: -17
Lustre: Failing over lstest-MDT0000
LustreError: 32680:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880f977bb800 x1415277549977961/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
LustreError: 32680:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0181-osc-MDT0000: couldn't update statfs: rc = -5
LustreError: 32681:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff882016d06c00 x1415277549977962/t0(0) o13->lstest-OST0182-osc-MDT0000@172.20.2.186@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
LustreError: 32682:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0183-osc-MDT0000: couldn't update statfs: rc = -5
LustreError: 32682:0:(osp_precreate.c:116:osp_statfs_interpret()) Skipped 1 previous similar message
LustreError: 32680:0:(osp_precreate.c:116:osp_statfs_interpret()) Skipped 125 previous similar messages
Lustre: server umount lstest-MDT0000 complete
LustreError: 32758:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount  (-17)
Lustre: Found index 0 for lstest-MDT0000, updating log
LustreError: 33410:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context
LustreError: 33836:0:(genops.c:316:class_newdev()) Device lstest-MDT0000-osp-MDT0000 already exists at 136, won't add
LustreError: 33836:0:(obd_config.c:374:class_attach()) Cannot create device lstest-MDT0000-osp-MDT0000 of type osp : -17
Lustre: lstest-MDT0000: Temporarily refusing client connection from 0@lo
LustreError: 11-0: lstest-MDT0000-osp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11
LustreError: 33836:0:(obd_mount.c:373:lustre_start_simple()) lstest-MDT0000-osp-MDT0000 attach error -17
LustreError: 33836:0:(obd_mount.c:1135:lustre_osp_setup()) lstest-MDT0000-osp-MDT0000: setup up failed: rc -17
LustreError: 15c-8: MGC172.20.5.2@o2ib500: The configuration from log 'lstest-client' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 33405:0:(obd_mount.c:1865:server_start_targets()) lstest-MDT0000: failed to start OSP: -17
Lustre: lstest-MDT0000: Unable to start target: -17
Lustre: Failing over lstest-MDT0000
LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880faa7f0800 x1415277549978464/t0(0) o13->lstest-OST0181-osc-MDT0000@172.20.2.185@o2ib500:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
LustreError: 32690:0:(osp_precreate.c:116:osp_statfs_interpret()) lstest-OST0182-osc-MDT0000: couldn't update statfs: rc = -5
LustreError: 32689:0:(client.c:1116:ptlrpc_import_delay_req()) Skipped 253 previous similar messages
Lustre: server umount lstest-MDT0000 complete
LustreError: 33405:0:(obd_mount.c:2985:lustre_fill_super()) Unable to mount  (-17)

Is is still worth rebooting, and trying again?

Comment by Alex Zhuravlev [ 08/Oct/12 ]

well, sorry you're seeing this... could you please try again and attach lustre log to the ticket ?

Comment by Prakash Surya (Inactive) [ 08/Oct/12 ]

Rebooted and collected the lustre log file.

Comment by Alex Zhuravlev [ 08/Oct/12 ]

thanks. I see the root cause.. working on the fix.

Comment by Alex Zhuravlev [ 08/Oct/12 ]

Prakash, please try with http://review.whamcloud.com/#change,4227

if I understand right, failover nid for MDS was specified at mkfs.lustre time, not added later ?

Comment by Prakash Surya (Inactive) [ 08/Oct/12 ]

Actually, I'm not certain of that. At one point a failover NID was added using a writeconf, but the filesystem was reformatted since then. During the reformat, I'm unsure if both the failover NIDs were specified at mkfs time, or the writeconf method was used after mkfs. I can try to track down that information if it is useful..?

Comment by Alex Zhuravlev [ 08/Oct/12 ]

one way is to fetch /CONFIGS/lstest-client file from MDS and parse it with llog_reader utility.
it would help us if you attach it to the ticket as well. thanks.

Comment by Prakash Surya (Inactive) [ 08/Oct/12 ]

Here you go. The dump of

# grove-mds2 /mnt/grove-mds2/mgs > llog_reader CONFIGS/lstest-client > lstest-client.llogreader
Comment by Alex Zhuravlev [ 08/Oct/12 ]

#12 (128)attach 0:lstest-MDT0000-mdc 1:mdc 2:lstest-clilmv_UUID
...
#20 (088)add_uuid nid=172.20.2.185@o2ib500(0x501f4ac1402b9) 0: 1:172.20.2.185@o2ib500
#21 (088)add_uuid nid=172.20.2.185@tcp(0x20000ac1402b9) 0: 1:172.20.2.185@o2ib500

#21 resulted in a second instance of OSP device.

I think the patch above should help with the issue.

Comment by Christopher Morrone [ 08/Oct/12 ]

Ok, we've pulled in patch http://review.whamcloud.com/#change,4227 and will give it a try.

Comment by Ian Colle (Inactive) [ 09/Oct/12 ]

Patch landed to master.

Comment by Alex Zhuravlev [ 12/Oct/12 ]

can we close the ticket?

Comment by Prakash Surya (Inactive) [ 12/Oct/12 ]

Sure.

Generated at Sat Feb 10 01:22:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.