[LU-5796] MGS: non-config logname received: params Created: 23/Oct/14 Updated: 11/Nov/14 Resolved: 11/Nov/14 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.5.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Christopher Morrone | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 16260 | ||||||||
| Description |
|
We are in the process of testing 2.5.3 plus our local patch stack. The current development tag is 2.5.3-0.13morrone (see github.com/chaos/lustre). Our MGS is printing the following new message to the console: Oct 21 15:23:43 zwicky-lcy-mds1 kernel: Lustre: lcy-OST0009-osc-MDT0000: Connection to lcy-OST0009 (at 10.1.1.180@o2ib9) was lost; in progress operations using this service will wait for recovery to complete Oct 21 15:24:09 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params Oct 21 15:24:09 zwicky-lcy-mds1 kernel: Lustre: Skipped 11 previous similar messages The "non-config logname received" is the error message that needs to be addressed. It would appear to be corrolated with OST start up. |
| Comments |
| Comment by Peter Jones [ 23/Oct/14 ] |
|
Yu, Jian Could you please help with this one? Thanks Peter |
| Comment by Jian Yu [ 23/Oct/14 ] |
|
Hi Chris, Could you please mount the OST with "-v" option like "mount -v -t lustre /dev/xxx /mnt/xxx" and show the output here? Thank you! |
| Comment by Christopher Morrone [ 23/Oct/14 ] |
mount -t lustre -v zwicky-lcy-oss16/lcy-ost0 /mnt/lustre/local/lcy-OST000f arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = zwicky-lcy-oss16/lcy-ost0 arg[5] = /mnt/lustre/local/lcy-OST000f source = zwicky-lcy-oss16/lcy-ost0 (zwicky-lcy-oss16/lcy-ost0), target = /mnt/lustre/local/lcy-OST000f options = rw checking for existing Lustre data: found mounting device zwicky-lcy-oss16/lcy-ost0 at /mnt/lustre/local/lcy-OST000f, flags=0x1000000 options=osd=osd-zfs,,mgsnode=10.1.1.169@o2ib9,param=failover.node=10.1.1.185@o2ib9,param=mgsnode=10.1.1.169@o2ib9,svname=lcy-OST000f,device=zwicky-lcy-oss16/lcy-ost0 |
| Comment by Christopher Morrone [ 24/Oct/14 ] |
|
The MGS message is not unique to OST connections. Any connection at all seems to make the message. For instance, I just reboot a bunch of client nodes to make the 2.5.3-1chaos based, and saw the messages. Here is a snippet from the console: Oct 23 18:56:20 zwicky-lcy-oss7 kernel: Lustre: lcy-OST0006: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, and I am evicting it. exp ffff8808206e6c00, cur 1414115780 expire 1414115630 last 1414115548 Oct 23 18:56:20 zwicky-lcy-oss5 kernel: Lustre: lcy-OST0004: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, and I am evicting it. exp ffff8810304e6c00, cur 1414115780 expire 1414115630 last 1414115548 Oct 23 18:56:20 zwicky-lcy-oss15 kernel: Lustre: lcy-OST000e: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, and I am evicting it. exp ffff8807fb866000, cur 1414115780 expire 1414115630 last 1414115548 Oct 23 18:56:20 zwicky-lcy-oss1 kernel: Lustre: lcy-OST0000: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, and I am evicting it. exp ffff881013c25c00, cur 1414115780 expire 1414115630 last 1414115548 Oct 23 18:56:20 zwicky-lcy-oss13 kernel: Lustre: lcy-OST000c: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, and I am evicting it. exp ffff880819aa7400, cur 1414115780 expire 1414115630 last 1414115548 Oct 23 18:56:20 zwicky-lcy-oss10 kernel: Lustre: lcy-OST0009: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, and I am evicting it. exp ffff8810322dc000, cur 1414115780 expire 1414115630 last 1414115548 Oct 23 18:56:20 zwicky-lcy-oss12 kernel: Lustre: lcy-OST000b: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 232 seconds. I think it's dead, and I am evicting it. exp ffff88102b5b1800, cur 1414115780 expire 1414115630 last 1414115548 Oct 23 18:56:21 zwicky-lcy-oss3 kernel: Lustre: lcy-OST0002: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 233 seconds. I think it's dead, and I am evicting it. exp ffff88080aea4000, cur 1414115781 expire 1414115631 last 1414115548 Oct 23 18:56:25 zwicky-lcy-mds1 kernel: Lustre: lcy-MDT0000: haven't heard from client b25f461c-463d-a2e2-24f2-54c135569e7c (at 192.168.121.132@o2ib2) in 237 seconds. I think it's dead, and I am evicting it. exp ffff88100d2fe400, cur 1414115785 expire 1414115635 last 1414115548 Oct 23 19:00:23 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params Oct 23 19:00:23 zwicky-lcy-mds1 kernel: Lustre: Skipped 3 previous similar messages Oct 23 19:00:25 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params Oct 23 19:00:25 zwicky-lcy-mds1 kernel: Lustre: Skipped 32 previous similar messages Oct 23 19:00:27 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params Oct 23 19:00:27 zwicky-lcy-mds1 kernel: Lustre: Skipped 25 previous similar messages Oct 23 19:00:32 zwicky-lcy-mds1 kernel: Lustre: MGS: non-config logname received: params Oct 23 19:00:32 zwicky-lcy-mds1 kernel: Lustre: Skipped 42 previous similar messages |
| Comment by Jian Yu [ 24/Oct/14 ] |
|
The "non-config logname received" warning message was printed from mgs_llog_open() in lustre/mgs/mgs_handler.c: logname = req_capsule_client_get(tsi->tsi_pill, &RMF_NAME);
if (logname) {
char *ptr = strchr(logname, '-');
int len = (int)(ptr - logname);
if (ptr == NULL || len >= sizeof(mgi->mgi_fsname)) {
LCONSOLE_WARN("%s: non-config logname received: %s\n",
tgt_name(tsi->tsi_tgt), logname);
/* not error, this can be llog test name */
} else {
//......
}
}
The codes were introduced by the following commit on Lustre b2_5 branch: commit 93a6346f8b73f68cb5bc02a3c826ac0e5b4c236e
Author: Mikhail Pershin <tappro@whamcloud.com>
Date: Thu Dec 13 22:07:52 2012 +0400
LU-2145 server: use unified request handler for MGS
I'll look into the codes. |
| Comment by Andreas Dilger [ 24/Oct/14 ] |
|
Yu Jian, I think this bug was fixed in master also, please check the git commit logs and/or "git blame" and/or jira for a duplicate and backport to b2_5. |
| Comment by Jian Yu [ 24/Oct/14 ] |
|
Thank you very much, Andreas! |
| Comment by Christopher Morrone [ 25/Oct/14 ] |
|
Yes, the combination of http://review.whamcloud.com/10311 and http://review.whamcloud.com/10589 on b2_5 seems to have eliminated the "non-config logname" messages. |
| Comment by Jian Yu [ 25/Oct/14 ] |
|
Here are the back-ported patches for Lustre b2_5 branch: |
| Comment by Jian Yu [ 11/Nov/14 ] |
|
Patches were merged into Lustre b2_5 branch for 2.5.4 release. |