|
dmesg output:
[13466.908788] LDISKFS-fs (dm-7): warning: maximal mount count reached, running e2fsck is recommended
[13466.913958] LDISKFS-fs (dm-7): mounted filesystem with ordered data mode
[13466.965528] LDISKFS-fs (dm-7): warning: maximal mount count reached, running e2fsck is recommended
[13466.970602] LDISKFS-fs (dm-7): mounted filesystem with ordered data mode
[13466.985515] Lustre: Enabling user_xattr
[13466.991612] Lustre: 13691:0:(mds_fs.c:677:mds_init_server_data()) RECOVERY: service dc-MDT0000, 4 recoverable clients, 0 delayed clients, last_transno 85899346187
[13466.995427] Lustre: dc-MDT0000: Now serving dc-MDT0000 on /dev/vg_dc/mdt with recovery enabled
[13466.995430] Lustre: dc-MDT0000: Will be in recovery for at least 5:00, or until 4 clients reconnect
[13466.995928] Lustre: 13691:0:(lproc_quota.c:448:lprocfs_quota_wr_type()) dc-MDT0000: quotaon failed because quota files don't exist, please run quotacheck firstly
[13466.995936] Lustre: dc-MDT0000.mdt: set parameter quota_type=ug
[13466.996197] Lustre: 13691:0:(mds_lov.c:1155:mds_notify()) MDS dc-MDT0000: add target dc-OST0000_UUID
[13466.996201] Lustre: 13691:0:(mds_lov.c:1155:mds_notify()) Skipped 28 previous similar messages
[13467.031748] LustreError: 13691:0:(obd_config.c:979:class_process_proc_param()) Can't parse param nosquash_nids
[13467.031753] LustreError: 13691:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command:
[13467.031758] Lustre: cmd=cf00f 0:dc-MDT0000 1:mdt.nosquash_nids
[13467.031903] LustreError: 15b-f: MGC149.165.235.235@tcp: The configuration from log 'dc-MDT0000' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre.
[13467.031908] LustreError: 15c-8: MGC149.165.235.235@tcp: The configuration from log 'dc-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[13467.031914] LustreError: 13571:0:(obd_mount.c:1126:server_start_targets()) failed to start server dc-MDT0000: -22
[13467.031921] LustreError: 13571:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -22
[13467.031939] Lustre: Failing over dc-MDT0000
[13467.031941] Lustre: Skipped 30 previous similar messages
[13467.048812] Lustre: dc-MDT0000: shutting down for failover; client state will be preserved.
[13467.049987] Lustre: MDT dc-MDT0000 has stopped.
[13468.686492] LustreError: 137-5: UUID 'dc-MDT0000_UUID' is not available for connect (no target)
[13468.686501] LustreError: 13622:0:(ldlm_lib.c:1914:target_send_reply_msg()) @@@ processing error (19) req@ffff811809545800 x1379230428949522/t0 o38><?>@<?>:0/0 lens 368/0 e 0 to 0 dl 1317674478 ref 1 fl Interpret:/0/0 rc -19/0
[13470.046025] Lustre: server umount dc-MDT0000 complete
[13470.046033] LustreError: 13571:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-22)
|
|
Oleg will look into this one
|
|
I think this is the case of bugzilla bug 14693. - invalid config param preventing a service from starting
IT's a pretty old bug and I don't know what sort fo version you are running.
There was another reopening in 2010 with some more patches in bug 17471.
The new functionality should now allow you to delete the invalid parameters with option -d too.
|
|
btw the offending parameter is mdt.nosquash_nids
|
|
Ok, to further comment on this, the parameter is actually valid parameter, but it's a lustre 2.1 thing.
Somebody was playing with 2.1 stuff and typed something in a wrong terminal window?
|
|
Ok, to further track how and when the parameter was set, I just checked the code, there is this printing in mgs_wlp_lcfg:
LCONSOLE_INFO("%sing parameter %s.%s in log %s\n", del ? "Disabl" : rc ?
"Sett" : "Modify", tgtname, comment, logname);
This is output into the kernel logs, so if you have functional kernel logs on the MGS node, search for this line that contains squash in it.
(I noticed that the kernlog file you pasted was quite stale, hopefully you have another source of kernel messages that is valid).
|
|
Ashley Pittman at DDN just got back to us with the following:
lctl conf_param -d dc-MDT0000.mdt.nosquash_nids
and we can now mount metadata. You guys nailed the culprit and now we're in the know.
Thanks as always for your help!
Simms
|
|
Glad to hear it Steve - marking the ticket as resolved
|
Generated at Sat Feb 10 01:09:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.