[LU-730] MDS Won't Mount Created: 03/Oct/11  Updated: 03/Oct/11  Resolved: 03/Oct/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Stephen Simms (Inactive) Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 6547

 Description   

[root@dc-mds01 log]# mount -t lustre /dev/vg_dc/mdt /lustre/dc/mdt
mount.lustre: mount /dev/vg_dc/mdt at /lustre/dc/mdt failed: Invalid argument
This may have multiple causes.
Are the mount options correct?
Check the syslog for more info.

[root@dc-mds01 log]# tail /var/log/messages
Oct 3 16:00:07 dc-mds01 multipathd: dm-7: add map (uevent)
Oct 3 16:00:21 dc-mds01 crm_shadow: [11747]: info: Invoked: crm_shadow
Oct 3 16:00:21 dc-mds01 crm_resource: [11748]: info: Invoked: crm_resource --meta -r mgs -p target-role -v Started
Oct 3 16:00:22 dc-mds01 multipathd: dm-6: umount map (uevent)
Oct 3 16:00:32 dc-mds01 crm_shadow: [11970]: info: Invoked: crm_shadow
Oct 3 16:00:32 dc-mds01 crm_resource: [11971]: info: Invoked: crm_resource --meta -r mdt_dc -p target-role -v Started
Oct 3 16:23:10 dc-mds01 crm_shadow: [12965]: info: Invoked: crm_shadow
Oct 3 16:23:10 dc-mds01 crm_resource: [12966]: info: Invoked: crm_resource --meta -r mdt_dc -p target-role -v Started
Oct 3 16:39:36 dc-mds01 multipathd: dm-7: umount map (uevent)
Oct 3 16:39:39 dc-mds01 multipathd: dm-7: umount map (uevent)

[root@dc-mds01 log]# tail /var/log/kern
May 13 04:41:13 dc-mds01 kernel: [132780.597275] Lustre: 12660:0:(client.c:1482:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
May 13 04:41:16 dc-mds01 kernel: [132783.375612] Lustre: 12660:0:(quota_master.c:1716:mds_quota_recovery()) Only 28/30 OSTs are active, abort quota recovery
May 13 04:41:16 dc-mds01 kernel: [132783.375619] Lustre: 12660:0:(quota_master.c:1716:mds_quota_recovery()) Skipped 2 previous similar messages
May 13 04:41:16 dc-mds01 kernel: [132783.375626] Lustre: dc-OST001b-osc: Connection restored to service dc-OST001b using nid 149.165.235.234@tcp9.
May 13 04:41:16 dc-mds01 kernel: [132783.375629] Lustre: Skipped 2 previous similar messages
May 13 04:41:16 dc-mds01 kernel: [132783.376540] Lustre: MDS dc-MDT0000: dc-OST001b_UUID now active, resetting orphans
May 13 04:41:16 dc-mds01 kernel: [132783.376542] Lustre: Skipped 2 previous similar messages
May 13 04:41:40 dc-mds01 kernel: [132808.132985] Lustre: 12660:0:(quota_master.c:1716:mds_quota_recovery()) Only 28/30 OSTs are active, abort quota recovery
May 13 04:41:40 dc-mds01 kernel: [132808.133028] Lustre: dc-OST001d-osc: Connection restored to service dc-OST001d using nid 149.165.235.234@tcp9.
May 13 04:41:40 dc-mds01 kernel: [132808.133921] Lustre: MDS dc-MDT0000: dc-OST001d_UUID now active, resetting orphans



 Comments   
Comment by Stephen Simms (Inactive) [ 03/Oct/11 ]

dmesg output:
[13466.908788] LDISKFS-fs (dm-7): warning: maximal mount count reached, running e2fsck is recommended
[13466.913958] LDISKFS-fs (dm-7): mounted filesystem with ordered data mode
[13466.965528] LDISKFS-fs (dm-7): warning: maximal mount count reached, running e2fsck is recommended
[13466.970602] LDISKFS-fs (dm-7): mounted filesystem with ordered data mode
[13466.985515] Lustre: Enabling user_xattr
[13466.991612] Lustre: 13691:0:(mds_fs.c:677:mds_init_server_data()) RECOVERY: service dc-MDT0000, 4 recoverable clients, 0 delayed clients, last_transno 85899346187
[13466.995427] Lustre: dc-MDT0000: Now serving dc-MDT0000 on /dev/vg_dc/mdt with recovery enabled
[13466.995430] Lustre: dc-MDT0000: Will be in recovery for at least 5:00, or until 4 clients reconnect
[13466.995928] Lustre: 13691:0:(lproc_quota.c:448:lprocfs_quota_wr_type()) dc-MDT0000: quotaon failed because quota files don't exist, please run quotacheck firstly
[13466.995936] Lustre: dc-MDT0000.mdt: set parameter quota_type=ug
[13466.996197] Lustre: 13691:0:(mds_lov.c:1155:mds_notify()) MDS dc-MDT0000: add target dc-OST0000_UUID
[13466.996201] Lustre: 13691:0:(mds_lov.c:1155:mds_notify()) Skipped 28 previous similar messages
[13467.031748] LustreError: 13691:0:(obd_config.c:979:class_process_proc_param()) Can't parse param nosquash_nids
[13467.031753] LustreError: 13691:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command:
[13467.031758] Lustre: cmd=cf00f 0:dc-MDT0000 1:mdt.nosquash_nids
[13467.031903] LustreError: 15b-f: MGC149.165.235.235@tcp: The configuration from log 'dc-MDT0000' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre.
[13467.031908] LustreError: 15c-8: MGC149.165.235.235@tcp: The configuration from log 'dc-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[13467.031914] LustreError: 13571:0:(obd_mount.c:1126:server_start_targets()) failed to start server dc-MDT0000: -22
[13467.031921] LustreError: 13571:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -22
[13467.031939] Lustre: Failing over dc-MDT0000
[13467.031941] Lustre: Skipped 30 previous similar messages
[13467.048812] Lustre: dc-MDT0000: shutting down for failover; client state will be preserved.
[13467.049987] Lustre: MDT dc-MDT0000 has stopped.
[13468.686492] LustreError: 137-5: UUID 'dc-MDT0000_UUID' is not available for connect (no target)
[13468.686501] LustreError: 13622:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (19) req@ffff811809545800 x1379230428949522/t0 o38><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1317674478 ref 1 fl Interpret:/0/0 rc -19/0
[13470.046025] Lustre: server umount dc-MDT0000 complete
[13470.046033] LustreError: 13571:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-22)

Comment by Peter Jones [ 03/Oct/11 ]

Oleg will look into this one

Comment by Oleg Drokin [ 03/Oct/11 ]

I think this is the case of bugzilla bug 14693. - invalid config param preventing a service from starting
IT's a pretty old bug and I don't know what sort fo version you are running.
There was another reopening in 2010 with some more patches in bug 17471.

The new functionality should now allow you to delete the invalid parameters with option -d too.

Comment by Oleg Drokin [ 03/Oct/11 ]

btw the offending parameter is mdt.nosquash_nids

Comment by Oleg Drokin [ 03/Oct/11 ]

Ok, to further comment on this, the parameter is actually valid parameter, but it's a lustre 2.1 thing.
Somebody was playing with 2.1 stuff and typed something in a wrong terminal window?

Comment by Oleg Drokin [ 03/Oct/11 ]

Ok, to further track how and when the parameter was set, I just checked the code, there is this printing in mgs_wlp_lcfg:
LCONSOLE_INFO("%sing parameter %s.%s in log %s\n", del ? "Disabl" : rc ?
"Sett" : "Modify", tgtname, comment, logname);

This is output into the kernel logs, so if you have functional kernel logs on the MGS node, search for this line that contains squash in it.

(I noticed that the kernlog file you pasted was quite stale, hopefully you have another source of kernel messages that is valid).

Comment by Stephen Simms (Inactive) [ 03/Oct/11 ]

Ashley Pittman at DDN just got back to us with the following:

lctl conf_param -d dc-MDT0000.mdt.nosquash_nids

and we can now mount metadata. You guys nailed the culprit and now we're in the know.

Thanks as always for your help!
Simms

Comment by Peter Jones [ 03/Oct/11 ]

Glad to hear it Steve - marking the ticket as resolved

Generated at Sat Feb 10 01:09:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.