[LU-4243] multiple servicenodes or failnids: wrong client llog registration Created: 12/Nov/13 Updated: 13/Mar/14 Resolved: 20/Dec/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Erich Focht | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
failover MDS/MGS, failover OSTs |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 11555 | ||||||||
| Description |
|
Since Lustre 2.4.0 we had problems with clients that could not connect after eg. the MGS was failing over. Most experiments we did with a client on the standby MGS server, the symptom was that the client only worked from the active MGS/MDS node, not from the passive one! The reason for the problem seems to be commit d9d27cad and the following hunk of the patch:
+ name_destroy(&failnodeuuid); This leads to a wrong lustre client llog when the lustre block devices are formated with multiple --servicenode options! Here is an example: #09 (224)marker 6 (flags=0x01, v2.5.0.0) lnec-MDT0000 'add mdc' Mon Nov 11 17:03:21 2013- The last add_uuid should have 1:10.3.0.35@o2ib instead of 1:10.3.0.34@o2ib. And the reason is that only the first nid of the first --servicenode AKA --failnode entry is considered. Please revert that little patch of mgs_llog.c. Regards, |
| Comments |
| Comment by Erich Focht [ 12/Nov/13 ] |
|
The patch again, with hopefully proper formatting: @@ -1447,13 +1481,11 @@ static int mgs_write_log_failnids(const struct lu_env *env,
failnodeuuid, cliname);
rc = record_add_uuid(env, llh, nid, failnodeuuid);
}
- if (failnodeuuid) {
+ if (failnodeuuid)
rc = record_add_conn(env, llh, cliname, failnodeuuid);
- name_destroy(&failnodeuuid);
- failnodeuuid = NULL;
- }
}
+ name_destroy(&failnodeuuid);
return rc;
}
|
| Comment by Peter Jones [ 12/Nov/13 ] |
|
Hongchao Could you please help with this one? Thanks Peter |
| Comment by Andreas Dilger [ 15/Nov/13 ] |
|
Erich, to clarify, this problem only happens when you are trying to mount a client from the backup MGS node? |
| Comment by Oleg Drokin [ 15/Nov/13 ] |
|
I wonder if it's also related somehow to |
| Comment by Erich Focht [ 16/Nov/13 ] |
|
Andreas, the problem occurs also for normal clients, IIRC. The peculiarity of the setup is that we formatted the Lustre devices with two -servicenode arguments (which transform into failover.node options). The first mount of the devices was on the first of the service nodes. This used to work fine under 2.1.X. The other (maybe most widely used) way of formatting is with just one -failnode option, say -failnode B, and first mount of device on node A. The patch mentioned in my first comment leads to the problem that the failnodeuuid is set only once in the entire registration process, i.e. it is set to the first failover.node argument (or --sevicenode) which was used for formatting. Instead failnodeuuid should be set again for each of the appearing failover.node options. We verified that reverting that patch fixes the problem in 2.5.0. Oleg, we've seen the issue with two --mgsnode options, too, but that's different. Seems fixed in 2.5.0. |
| Comment by Hongchao Zhang [ 22/Nov/13 ] |
|
the problem here is that the LNet doesn't use the other NIDs with the same "distance" and "order" contained in the same UUID, which is "10.3.0.34@o2ib" this issue will still exist even the patch mentioned is reverted if MDT is formatted with "--servicenode 10.3.0.34,10.3.0.35" for these two NIDs will use |
| Comment by Hongchao Zhang [ 22/Nov/13 ] |
|
the patch is tracked at http://review.whamcloud.com/#/c/8372/ |
| Comment by Zhenyu Xu [ 27/Nov/13 ] |
|
Hi Hongchao, Why "if MDT is formatted with "--servicenode 10.3.0.34,10.3.0.35" for these two NIDs will use the same UUID "10.3.0.34@o2ib"."? I think these two service nodes will be parsed to different failnodeuuid string. |
| Comment by Hongchao Zhang [ 29/Nov/13 ] |
|
in mkfs_lustre.c int parse_opts(int argc, char *const argv[], struct mkfs_opts *mop, char **mountopts) { ... case 's': { ... nids = convert_hostnames(optarg); if (!nids) return 1; rc = add_param(mop->mo_ldd.ldd_params, PARAM_FAILNODE, nids); free(nids); ... } ... } "convert_hostnames" does little to "10.3.0.34,10.3.0.35", and "add_param" will add a single "failover.node" param. |
| Comment by Zhenyu Xu [ 29/Nov/13 ] |
|
add_param() would separate the string into two parameters, like it shows on my VM machine # mkfs.lustre --mgs --mdt --fsname=lustre --index=0 --servicenode 10.3.0.34,10.3.0.35 --reformat /dev/sdb
Permanent disk data:
Target: lustre:MDT0000
Index: 0
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x1065
(MDT MGS first_time update no_primnode )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=10.3.0.34@tcp failover.node=10.3.0.35@tcp
|
| Comment by Hongchao Zhang [ 20/Dec/13 ] |
|
oh, Yes, it has been fixed in |
| Comment by Peter Jones [ 20/Dec/13 ] |
|
Landed for 2.4.2 and 2.6. Will be landed for 2.5.1 shortly. |
| Comment by John Fuchs-Chesney (Inactive) [ 20/Feb/14 ] |
|
Hello Erich – Do you have what you need on this issue? If so, can I go ahead and mark it as resolved? Thanks, ~ jfc. |