[LU-5796] MGS: non-config logname received: params Created: 23/Oct/14  Updated: 11/Nov/14  Resolved: 11/Nov/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: Lustre 2.5.4

Type: Bug Priority: Minor
Reporter: Christopher Morrone Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: llnl

Issue Links:
Related
is related to LU-2059 mgc to backup configuration on osd-ba... Resolved
Severity: 3
Rank (Obsolete): 16260

 Description   

We are in the process of testing 2.5.3 plus our local patch stack. The current development tag is 2.5.3-0.13morrone (see github.com/chaos/lustre).

Our MGS is printing the following new message to the console:

Oct 21 15:23:43 zwicky-lcy-mds1 kernel: Lustre: lcy-OST0009-osc-MDT0000: Connection to lcy-OST0009 (at 10.1.1.180@o2ib9) was lost; in progress operations using this service will wait for recovery to complete
Oct 21 15:24:09 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params
Oct 21 15:24:09 zwicky-lcy-mds1 kernel: Lustre: Skipped 11 previous similar messages

The "non-config logname received" is the error message that needs to be addressed.

It would appear to be corrolated with OST start up.



 Comments   
Comment by Peter Jones [ 23/Oct/14 ]

Yu, Jian

Could you please help with this one?

Thanks

Peter

Comment by Jian Yu [ 23/Oct/14 ]

Hi Chris,

Could you please mount the OST with "-v" option like "mount -v -t lustre /dev/xxx /mnt/xxx" and show the output here?
I'll debug the issue by looking into the "options=" line.

Thank you!

Comment by Christopher Morrone [ 23/Oct/14 ]
mount -t lustre -v zwicky-lcy-oss16/lcy-ost0 /mnt/lustre/local/lcy-OST000f 
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = zwicky-lcy-oss16/lcy-ost0
arg[5] = /mnt/lustre/local/lcy-OST000f
source = zwicky-lcy-oss16/lcy-ost0 (zwicky-lcy-oss16/lcy-ost0), target = /mnt/lustre/local/lcy-OST000f
options = rw
checking for existing Lustre data: found
mounting device zwicky-lcy-oss16/lcy-ost0 at /mnt/lustre/local/lcy-OST000f, flags=0x1000000 options=osd=osd-zfs,,mgsnode=10.1.1.169@o2ib9,param=failover.node=10.1.1.185@o2ib9,param=mgsnode=10.1.1.169@o2ib9,svname=lcy-OST000f,device=zwicky-lcy-oss16/lcy-ost0
Comment by Christopher Morrone [ 24/Oct/14 ]

The MGS message is not unique to OST connections. Any connection at all seems to make the message. For instance, I just reboot a bunch of client nodes to make the 2.5.3-1chaos based, and saw the messages. Here is a snippet from the console:

Oct 23 18:56:20 zwicky-lcy-oss7 kernel: Lustre: lcy-OST0006: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, 
and I am evicting it. exp ffff8808206e6c00, cur 1414115780 expire 1414115630 last 1414115548
Oct 23 18:56:20 zwicky-lcy-oss5 kernel: Lustre: lcy-OST0004: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, 
and I am evicting it. exp ffff8810304e6c00, cur 1414115780 expire 1414115630 last 1414115548
Oct 23 18:56:20 zwicky-lcy-oss15 kernel: Lustre: lcy-OST000e: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead,
 and I am evicting it. exp ffff8807fb866000, cur 1414115780 expire 1414115630 last 1414115548
Oct 23 18:56:20 zwicky-lcy-oss1 kernel: Lustre: lcy-OST0000: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, 
and I am evicting it. exp ffff881013c25c00, cur 1414115780 expire 1414115630 last 1414115548
Oct 23 18:56:20 zwicky-lcy-oss13 kernel: Lustre: lcy-OST000c: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead,
 and I am evicting it. exp ffff880819aa7400, cur 1414115780 expire 1414115630 last 1414115548
Oct 23 18:56:20 zwicky-lcy-oss10 kernel: Lustre: lcy-OST0009: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead,
 and I am evicting it. exp ffff8810322dc000, cur 1414115780 expire 1414115630 last 1414115548
Oct 23 18:56:20 zwicky-lcy-oss12 kernel: Lustre: lcy-OST000b: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead,
 and I am evicting it. exp ffff88102b5b1800, cur 1414115780 expire 1414115630 last 1414115548
Oct 23 18:56:21 zwicky-lcy-oss3 kernel: Lustre: lcy-OST0002: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 233 seconds. I think it's dead, 
and I am evicting it. exp ffff88080aea4000, cur 1414115781 expire 1414115631 last 1414115548
Oct 23 18:56:25 zwicky-lcy-mds1 kernel: Lustre: lcy-MDT0000: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 237 seconds. I think it's dead, 
and I am evicting it. exp ffff88100d2fe400, cur 1414115785 expire 1414115635 last 1414115548
Oct 23 19:00:23 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params
Oct 23 19:00:23 zwicky-lcy-mds1 kernel: Lustre: Skipped 3 previous similar messages
Oct 23 19:00:25 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params
Oct 23 19:00:25 zwicky-lcy-mds1 kernel: Lustre: Skipped 32 previous similar messages
Oct 23 19:00:27 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params
Oct 23 19:00:27 zwicky-lcy-mds1 kernel: Lustre: Skipped 25 previous similar messages
Oct 23 19:00:32 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params
Oct 23 19:00:32 zwicky-lcy-mds1 kernel: Lustre: Skipped 42 previous similar messages
Comment by Jian Yu [ 24/Oct/14 ]

The "non-config logname received" warning message was printed from mgs_llog_open() in lustre/mgs/mgs_handler.c:

        logname = req_capsule_client_get(tsi->tsi_pill, &RMF_NAME);
        if (logname) {
                char *ptr = strchr(logname, '-');
                int   len = (int)(ptr - logname);

                if (ptr == NULL || len >= sizeof(mgi->mgi_fsname)) {
                        LCONSOLE_WARN("%s: non-config logname received: %s\n",
                                      tgt_name(tsi->tsi_tgt), logname);
                        /* not error, this can be llog test name */
                } else {
                        //......
                }
        }

The codes were introduced by the following commit on Lustre b2_5 branch:

commit 93a6346f8b73f68cb5bc02a3c826ac0e5b4c236e
Author: Mikhail Pershin <tappro@whamcloud.com>
Date:   Thu Dec 13 22:07:52 2012 +0400

    LU-2145 server: use unified request handler for MGS

I'll look into the codes.

Comment by Andreas Dilger [ 24/Oct/14 ]

Yu Jian, I think this bug was fixed in master also, please check the git commit logs and/or "git blame" and/or jira for a duplicate and backport to b2_5.

Comment by Jian Yu [ 24/Oct/14 ]

Thank you very much, Andreas!
This was fixed in LU-2059. I'll back-port http://review.whamcloud.com/10311 and http://review.whamcloud.com/10589 to Lustre b2_5 branch.

Comment by Christopher Morrone [ 25/Oct/14 ]

Yes, the combination of http://review.whamcloud.com/10311 and http://review.whamcloud.com/10589 on b2_5 seems to have eliminated the "non-config logname" messages.

Comment by Jian Yu [ 25/Oct/14 ]

Here are the back-ported patches for Lustre b2_5 branch:
http://review.whamcloud.com/12427
http://review.whamcloud.com/12428

Comment by Jian Yu [ 11/Nov/14 ]

Patches were merged into Lustre b2_5 branch for 2.5.4 release.

Generated at Sat Feb 10 01:54:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.