[LU-10040] nodemap and quota issues (ineffective GID mapping) Created: 28/Sep/17 Updated: 15/Mar/19 Resolved: 22/Nov/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0, Lustre 2.10.1 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | Stephane Thiell | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
client: lustre-client-2.10.0-1.el7.x86_64, lustre-2.10.1_RC1_srcc01-1.el7.centos.x86_64 (2.10.1-RC1 + patch from |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 2 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We're using the nodemap feature with map_mode=gid_only in production and we are seeing more and more issues with GID mapping, which seems to default to squash_gid instead of being properly mapped. The nodemap hasn't changed for these groups, we just add new groups from time to time. Example, configuration for mapping 'sherlock' on MGS: [root@oak-md1-s1 sherlock]# pwd
/proc/fs/lustre/nodemap/sherlock
[root@oak-md1-s1 sherlock]# cat ranges
[
{ id: 6, start_nid: 0.0.0.0@o2ib4, end_nid: 255.255.255.255@o2ib4 },
{ id: 5, start_nid: 0.0.0.0@o2ib3, end_nid: 255.255.255.255@o2ib3 }
]
[root@oak-md1-s1 sherlock]# cat idmap
[
{ idtype: gid, client_id: 3525, fs_id: 3741 } { idtype: gid, client_id: 6401, fs_id: 3752 } { idtype: gid, client_id: 99001, fs_id: 3159 } { idtype: gid, client_id: 10525, fs_id: 3351 } { idtype: gid, client_id: 11886, fs_id: 3593 } { idtype: gid, client_id: 12193, fs_id: 3636 } { idtype: gid, client_id: 13103, fs_id: 3208 } { idtype: gid, client_id: 17079, fs_id: 3700 } { idtype: gid, client_id: 19437, fs_id: 3618 } { idtype: gid, client_id: 22959, fs_id: 3745 } { idtype: gid, client_id: 24369, fs_id: 3526 } { idtype: gid, client_id: 26426, fs_id: 3352 } { idtype: gid, client_id: 29361, fs_id: 3746 } { idtype: gid, client_id: 29433, fs_id: 3479 } { idtype: gid, client_id: 30289, fs_id: 3262 } { idtype: gid, client_id: 32264, fs_id: 3199 } { idtype: gid, client_id: 32774, fs_id: 3623 } { idtype: gid, client_id: 38517, fs_id: 3702 } { idtype: gid, client_id: 40387, fs_id: 3708 } { idtype: gid, client_id: 47235, fs_id: 3674 } { idtype: gid, client_id: 48931, fs_id: 3325 } { idtype: gid, client_id: 50590, fs_id: 3360 } { idtype: gid, client_id: 52892, fs_id: 3377 } { idtype: gid, client_id: 56316, fs_id: 3353 } { idtype: gid, client_id: 56628, fs_id: 3411 } { idtype: gid, client_id: 59943, fs_id: 3372 } { idtype: gid, client_id: 63938, fs_id: 3756 } { idtype: gid, client_id: 100533, fs_id: 3281 } { idtype: gid, client_id: 244300, fs_id: 3617 } { idtype: gid, client_id: 254778, fs_id: 3362 } { idtype: gid, client_id: 267829, fs_id: 3748 } { idtype: gid, client_id: 270331, fs_id: 3690 } { idtype: gid, client_id: 305454, fs_id: 3371 } { idtype: gid, client_id: 308753, fs_id: 3367 }
[root@oak-md1-s1 sherlock]# cat squash_gid
99
[root@oak-md1-s1 sherlock]# cat map_mode
gid_only
[root@oak-md1-s1 sherlock]# cat admin_nodemap
0
[root@oak-md1-s1 sherlock]# cat deny_unknown
1
[root@oak-md1-s1 sherlock]# cat trusted_nodemap
0
Issue with group: GID 3593 (mapped to GID 11886 on sherlock) lfs quota, not mapped (using canonical GID 3593): [root@oak-rbh01 ~]# lfs quota -g oak_euan /oak
Disk quotas for group oak_euan (gid 3593):
Filesystem kbytes quota limit grace files quota limit grace
/oak 33255114444 50000000000 50000000000 - 526016 7500000 7500000 -
Broken lfs quota mapped on sherlock (o2ib4): [root@sh-113-01 ~]# lfs quota -g euan /oak
Disk quotas for grp euan (gid 11886):
Filesystem kbytes quota limit grace files quota limit grace
/oak 2875412844* 1 1 - 26* 1 1 -
[root@sh-113-01 ~]# lctl list_nids
10.9.113.1@o2ib4
It matches the quota usage for squash_gid: [root@oak-rbh01 ~]# lfs quota -g 99 /oak
Disk quotas for group 99 (gid 99):
Filesystem kbytes quota limit grace files quota limit grace
/oak 2875412844* 1 1 - 26* 1 1 -
Please note that GID mapping works OK for most of the groups though: 3199 -> 32264(sherlock) canonical: [root@oak-rbh01 ~]# lfs quota -g oak_ruthm /oak Disk quotas for group oak_ruthm (gid 3199): Filesystem kbytes quota limit grace files quota limit grace /oak 10460005688 20000000000 20000000000 - 1683058 3000000 3000000 - mapped (sherlock): [root@sh-113-01 ~]# lfs quota -g ruthm /oak Disk quotas for grp ruthm (gid 32264): Filesystem kbytes quota limit grace files quota limit grace /oak 10460005688 20000000000 20000000000 - 1683058 3000000 3000000 - Failing over the MDT resolved a few groups, but not all. Failing the MDT back showed an issue on the exact same original groups having issues (currently 4-5). While I haven't seen it by myself yet, the issue seems to affect users as a few of them reported erroneous EDQUOT errors. This is why it is quite urgent to figure out what's wrong. Please note that the issue was already there before using the patch from I'm willing to attach some debug logs, but what debug flags should I enable to troubleshoot such a quota+nodemap issue on client and server? Thanks! |
| Comments |
| Comment by Emoly Liu [ 28/Sep/17 ] |
|
Stephane, |
| Comment by Stephane Thiell [ 28/Sep/17 ] |
|
Hi Emoly, Yes it might be related to
[root@sh-113-01 ~]# lfs quota -hg euan /oak Disk quotas for grp euan (gid 11886): Filesystem used quota limit grace files quota limit grace /oak 2.678T* 1k 1k - 26* 1 1 - [root@sh-113-01 ~]# lfs quota -hg ruthm /oak Disk quotas for grp ruthm (gid 32264): Filesystem used quota limit grace files quota limit grace /oak 9.886T 18.63T 18.63T - 1683246 3000000 3000000 - I think the first lfs quota ends up like this on the MDT: 1691 00040000:04000000:0.0:1506621079.799639:0:200838:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:oak-QMT0000 pool:0-md id:99 enforced:1 hard:1 soft:1 granted:27 time:1506754418 qunit:1024 edquot:1 may_rel:0 revoke:135062 1692 00040000:04000000:0.0:1506621079.799642:0:200838:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:oak-QMT0000 pool:0-dt id:99 enforced:1 hard:1 soft:1 granted:2875370220 time:1506717740 qunit:1024 edquot:1 may_rel:0 revoke:135032 Notice the id:99, which is the squash_gid The good one seems to be: 3417 00040000:04000000:7.0:1506621082.680316:0:200864:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:oak-QMT0000 pool:0-md id:3199 enforced:1 hard:3000000 soft:3000000 granted:1769473 time:0 qunit:262144 edquot:0 may_rel:0 revoke:0 3418 00040000:04000000:7.0:1506621082.680324:0:200864:0:(qmt_handler.c:65:qmt_get()) $$$ fetch settings qmt:oak-QMT0000 pool:0-dt id:3199 enforced:1 hard:20000000000 soft:20000000000 granted:12762252828 time:0 qunit:67108864 edquot:0 may_rel:0 revoke:0 Any idea of what could be wrong there? |
| Comment by Emoly Liu [ 29/Sep/17 ] |
|
Stephane,
Can you show me the gid of "oak_euan" on oak_rbh01 and "euan" on sh-110-01? I will analyze your logs and see if I can reproduce this issue. Thanks,
|
| Comment by Emoly Liu [ 29/Sep/17 ] |
|
And, can you run "getfacl /oak" and post the output here? Thanks. |
| Comment by Stephane Thiell [ 29/Sep/17 ] |
|
oak_euan GID 3593 maps to euan GID 11886: [root@oak-rbh01 ~]# getent group oak_euan oak_euan:*:3593:(...) [root@sh-113-01 ~]# getent group euan euan:*:11886:(...) Similarly, oak_ruthm GID 3199 maps to ruthm GID 32264 getfacl /oak results below: [root@oak-rbh01 ~]# getfacl /oak getfacl: Removing leading '/' from absolute path names # file: oak # owner: root # group: root user::rwx group::r-x other::r-x [root@sh-113-01 ~]# getfacl /oak getfacl: Removing leading '/' from absolute path names # file: oak # owner: root # group: nobody user::rwx group::r-x other::r-x Thanks! |
| Comment by Emoly Liu [ 29/Sep/17 ] |
|
Your nodemap policy { idtype: gid, client_id: 11886, fs_id: 3593 } means a user GID=11886 from a client with any IP address will be mapped to a canonical GID=3593 on server. But oak-euan(3593) is not defined as a client_id in any idmap, so there is no mapping for it. That's why you see the following output. This is right.
|
| Comment by Stephane Thiell [ 29/Sep/17 ] |
|
No, my nodemap policy is for o2ib3 and o2ib4 (sherlock). oak-rbh01 is o2ib5 and is in the canonical GID range, sorry for the confusion, I used it to get the canonical GID, that's all. sh-113-01 is the client with mapped GID in my example. [root@oak-md1-s2 ~]# cat /proc/fs/lustre/nodemap/sherlock/ranges
[
{ id: 6, start_nid: 0.0.0.0@o2ib4, end_nid: 255.255.255.255@o2ib4 },
{ id: 5, start_nid: 0.0.0.0@o2ib3, end_nid: 255.255.255.255@o2ib3 }
]
The 'sherlock' policy has { idtype: gid, client_id: 11886, fs_id: 3593 } with ranges=o2ib3 and o2ib4, so GID=11886 on sherlock (eg. sh-113-01, o2ib4) should be mapped to the canonical GID 3593, but it is not...
|
| Comment by Stephane Thiell [ 29/Sep/17 ] |
|
Actually, I wasn't able to reproduce the 'disk quota exceeded errors' that a few users from the 'euan' group are getting and reported. That means the gid mapping seems to be done correctly when using the filesystem from 'sherlock'. But what is clearly broken right now is "lfs quota -g euan /oak" from sherlock (with the sherlock policy nodemap). But it works for almost all other groups, weird. Thanks for your help. |
| Comment by Emoly Liu [ 29/Sep/17 ] |
|
When you run "[root@sh-113-01 ~]# lfs quota -g euan /oak", as you said, since sh-113-01 is o2ib4 and euan GID=11886, it will be mapped to canonical GID=3593 on server. But, this is only half, because there is an unmapping process when server returns the GID to the client, 3593 will be unmapped to 11886. |
| Comment by Stephane Thiell [ 29/Sep/17 ] |
|
To illustrate what I am trying to say: Using policy { idtype: gid, client_id: 11886, fs_id: 3593 } On a client with mapped GID 11886: [euan@sherlock-ln01 login_node ~]$ id
uid=11886(euan) gid=11886(euan) groups=11886(euan),…
[euan@sherlock-ln01 login_node ~]$ cd $OAK
[euan@sherlock-ln01 login_node /oak/stanford/groups/euan]$ touch test
[euan@sherlock-ln01 login_node /oak/stanford/groups/euan]$ ls -l test
-rw-rw----+ 1 euan euan 0 Sep 28 22:14 test
[euan@sherlock-ln01 login_node /oak/stanford/groups/euan]$ ls -ln test
-rw-rw----+ 1 11886 11886 0 Sep 28 22:14 test
[euan@sherlock-ln01 login_node /oak/stanford/groups/euan]$ lfs quota -g euan /oak
Disk quotas for grp euan (gid 11886):
Filesystem kbytes quota limit grace files quota limit grace
/oak 2875169252* 1 1 - 25* 1 1 -
On a client with canonical GID 3593: [root@oak-rbh01 ~]# ls -l /oak/stanford/groups/euan/test
-rw-rw----+ 1 euan oak_euan 0 Sep 28 22:17 /oak/stanford/groups/euan/test
[root@oak-rbh01 ~]# ls -ln /oak/stanford/groups/euan/test
-rw-rw----+ 1 11886 3593 0 Sep 28 22:17 /oak/stanford/groups/euan/test
[root@oak-rbh01 ~]# lfs quota -g oak_euan /oak
Disk quotas for group oak_euan (gid 3593):
Filesystem kbytes quota limit grace files quota limit grace
/oak 33246312992 50000000000 50000000000 - 525598 7500000 7500000 -
|
| Comment by Emoly Liu [ 29/Sep/17 ] |
|
Stephane, never mind. If you can hit this issue next time, can you collect the logs for the whole lfs quota command? Because this time I didn't see any nodemap related information in your mds log file. I need to check if there is something wrong with nodemap mapping or unmapping. Thanks. |
| Comment by Stephane Thiell [ 29/Sep/17 ] |
|
Ok, will do that first thing tomorrow morning, thanks! If you have the debug flags and subs I should use, please let me know. Otherwise, I will enable full debug, but the MDS is in production so that might be a bit verbose. Thanks! |
| Comment by Emoly Liu [ 29/Sep/17 ] |
|
I see your question now, you mean both of these two users should return same quota information. I have checked our code that we have nodemap handling in quota. I will keep investigation. |
| Comment by Emoly Liu [ 29/Sep/17 ] |
|
Stephane, Could you please run the following command on all the MDS server(s)? If the output is too big to paste here, you can save it as a file and upload here. lctl get_param qmt.*.*.glb-grp BTW, as you said, you failed over the MDT, could you please check if this MDT has the same lustre version to others? Thanks, |
| Comment by Stephane Thiell [ 29/Sep/17 ] |
|
Hi Emoly, Attached oak-md1-s2.glb-grp.txt (output of glb-grp of the primary MDS), then I failed over the MDT to oak-md1-s1 (MDS failover partner) and did the same command, output in attached file oak-md1-s1.glb-grp.txt. Thanks, |
| Comment by Emoly Liu [ 30/Sep/17 ] |
|
Since GID mapping works OK for most of the groups and this issue can't be reproduced now, I suspect it is related to the process of assigning or getting the nodemap to/from the given export. I will have a check. |
| Comment by Stephane Thiell [ 02/Oct/17 ] |
|
Hi Emoly, The issue with 'lfs quota' is easily reproducible (for me). The one with the real quota error isn't, but a few users provided job logs with disk quota exceeded errors. So far, users that have reported this error are from the groups corresponding to the broken groups in "lfs quota", so I think there is a link. I took new, cleaner logs from MDT and one mapped client (sh-101-59) oak-md1-s2.mdt.dk.full.log with two lfs quota commands:
[root@sh-101-59 ~]# lfs quota -gh euan /oak Disk quotas for grp euan (gid 11886): Filesystem used quota limit grace files quota limit grace /oak 2.687T* 1k 1k - 40* 1 1 - [root@sh-101-59 ~]# lfs quota -gh ruthm /oak Disk quotas for grp ruthm (gid 32264): Filesystem used quota limit grace files quota limit grace /oak 14.86T 18.63T 18.63T - 2958342 3000000 3000000 - [root@sh-101-59 ~]# lctl dk /tmp/dk.full.log As a reminder, gid 11886 is mapped to 3593 on MDT, and gid 32264 to 3199. For the working GID 32264->3199, I can find two log entries in the MDT log: 115253 00000001:00000001:1.0:1506921877.591507:0:124387:0:(nodemap_handler.c:672:nodemap_map_id()) Process leaving (rc=3199 : 3199 : c7f) 116197 00000001:00000001:1.0:1506921877.595021:0:124387:0:(nodemap_handler.c:672:nodemap_map_id()) Process leaving (rc=3199 : 3199 : c7f) Unfortunately, for the first one, if I look for "rc=3593", I cannot find anything :/ Please note that the client was doing nothing when the logs were taken, but the MDT is in production. Please let me know if I can further help debug this issue. Thanks! |
| Comment by Stephane Thiell [ 05/Oct/17 ] |
|
I tried to reproduce the issue on two VMs today, one client and one server (llmount.sh) after importing the exact same 'sherlock' idmap config, but I wasn't able to reproduce the issue, all groups are properly mapped including 'euan' and the two other broken groups. This is super weird. |
| Comment by Stephane Thiell [ 07/Oct/17 ] |
|
I just found this useful server-side 'lctl nodemap_test_id' command. At least, I know I'm not crazy now From the MGS (2.10.1 RC1): [root@oak-md1-s1 ~]# cat /proc/fs/lustre/nodemap/sherlock/ranges
[
{ id: 6, start_nid: 0.0.0.0@o2ib4, end_nid: 255.255.255.255@o2ib4 },
{ id: 5, start_nid: 0.0.0.0@o2ib3, end_nid: 255.255.255.255@o2ib3 }
]
[root@oak-md1-s1 ~]# cat /proc/fs/lustre/nodemap/sherlock/idmap
[
{ idtype: gid, client_id: 3525, fs_id: 3741 } { idtype: gid, client_id: 6401, fs_id: 3752 } { idtype: gid, client_id: 99001, fs_id: 3159 } { idtype: gid, client_id: 10525, fs_id: 3351 } { idtype: gid, client_id: 11886, fs_id: 3593 } { idtype: gid, client_id: 12193, fs_id: 3636 } { idtype: gid, client_id: 13103, fs_id: 3208 } { idtype: gid, client_id: 17079, fs_id: 3700 } { idtype: gid, client_id: 19437, fs_id: 3618 } { idtype: gid, client_id: 22959, fs_id: 3745 } { idtype: gid, client_id: 24369, fs_id: 3526 } { idtype: gid, client_id: 26426, fs_id: 3352 } { idtype: gid, client_id: 29361, fs_id: 3746 } { idtype: gid, client_id: 29433, fs_id: 3479 } { idtype: gid, client_id: 30289, fs_id: 3262 } { idtype: gid, client_id: 32264, fs_id: 3199 } { idtype: gid, client_id: 32774, fs_id: 3623 } { idtype: gid, client_id: 38517, fs_id: 3702 } { idtype: gid, client_id: 40387, fs_id: 3708 } { idtype: gid, client_id: 40837, fs_id: 3768 } { idtype: gid, client_id: 47235, fs_id: 3674 } { idtype: gid, client_id: 48931, fs_id: 3325 } { idtype: gid, client_id: 50590, fs_id: 3360 } { idtype: gid, client_id: 52892, fs_id: 3377 } { idtype: gid, client_id: 56316, fs_id: 3353 } { idtype: gid, client_id: 56628, fs_id: 3411 } { idtype: gid, client_id: 59943, fs_id: 3372 } { idtype: gid, client_id: 63938, fs_id: 3756 } { idtype: gid, client_id: 100533, fs_id: 3281 } { idtype: gid, client_id: 244300, fs_id: 3617 } { idtype: gid, client_id: 254778, fs_id: 3362 } { idtype: gid, client_id: 267829, fs_id: 3748 } { idtype: gid, client_id: 270331, fs_id: 3690 } { idtype: gid, client_id: 305454, fs_id: 3371 } { idtype: gid, client_id: 308753, fs_id: 3367 }
]
This is pertinent below:
{ idtype: gid, client_id: 32264, fs_id: 3199 }
{ idtype: gid, client_id: 11886, fs_id: 3593 }
[root@oak-md1-s1 ~]# lctl nodemap_test_id --nid 10.210.47.253@o2ib3 --idtype gid --id 32264
3199
[root@oak-md1-s1 ~]# lctl nodemap_test_id --nid 10.210.47.253@o2ib3 --idtype gid --id 11886
99
[root@oak-md1-s1 ~]# cat /proc/fs/lustre/nodemap/sherlock/squash_gid
99
|
| Comment by Stephane Thiell [ 07/Oct/17 ] |
|
Ok so I found out the origin of the problem. It looks like the problem came when we changed a client GID, then this broke the nodemap idmap rbtree somehow (only the client_to_fs_gidmap I think, the reverse seems to work). I would bet the 'replace' code part of idmap_insert() is broken. I was able to reproduce the problem on a test VM. It is easy. Please try llmount following by ./break_nodemap_rbtree.sh Reproducer logs and script attached. You'll see that client GID 99001 is not mapped and cannot be removed anymore, but listed in idmap. On the impacted production system, I'm not sure this can be fixed without stopping production and recreating the whole nodemap. Please advise. Thanks, Stephane |
| Comment by Gerrit Updater [ 07/Oct/17 ] |
|
Stephan Thiell (sthiell@stanford.edu) uploaded a new patch: https://review.whamcloud.com/29364 |
| Comment by Stephane Thiell [ 07/Oct/17 ] |
|
With a better understanding of the situation, we were able to fix the issue of the production system without downtime. We duplicated the corrupted 'sherlock' idmap config into a new 'sherlock2' nodemap (without the nid ranges at first). After double-checking the config, we then moved the ranges from sherlock to sherlock2, that might have generated some bad GIDs during this short period of time (a few seconds), but now all the groups are properly mapped. I submitted a patch to review to avoid corrupting the idmap red black trees in the future. I'll delete the corrupted nodemap config on Monday (it worked on the test system). The priority of this ticket may be lowered. Stephane |
| Comment by Emoly Liu [ 09/Oct/17 ] |
|
Thanks for your investigation on this issue and your patch! I'm just back to work from my vacation. I will review you patch now. Thanks, |
| Comment by Emoly Liu [ 09/Oct/17 ] |
|
The root cause does come from idmap_insert(): if (!replace) {
...
} else {
rb_replace_node(&cur->id_client_to_fs,
&idmap->id_client_to_fs,
fwd_root);
rb_replace_node(&cur->id_fs_to_client,
&idmap->id_fs_to_client,
bck_root);
idmap_destroy(cur);
}
When idmap_insert() calls rb_replace_node() to replace the "cur" id, only id_client_to_fs or id_fs_to_client is changed accordingly correctly, but the other one is not. 00000001:00000040:1.0:1507545424.048143:0:18591:0:(nodemap_idmap.c:254:idmap_search()) tree_type=1, id=99001, idmap->id_client=10525 00000001:00000040:1.0:1507545424.048144:0:18591:0:(nodemap_idmap.c:254:idmap_search()) tree_type=1, id=99001, idmap->id_client=270331 00000001:00000001:1.0:1507545424.048144:0:18591:0:(nodemap_idmap.c:264:idmap_search()) Process leaving (rc=0 : 0 : 0) But when you call "lctl get_param nodemap.sherlock.idmap", it outputs everything correctly, as follows, because it calls rb_first() to print idmap from the leftmost one: nodemap.sherlock.idmap=
[
{ idtype: gid, client_id: 3525, fs_id: 3741 } { idtype: gid, client_id: 99001, fs_id: 3159 } { idtype: gid, client_id: 10525, fs_id: 3351 } { idtype: gid, client_id: 270331, fs_id: 3690 }
]
I will fix this issue. |
| Comment by Stephane Thiell [ 10/Oct/17 ] |
|
Hi Emoly, Thanks! That was my understanding too. Glad to have you back! When you say "when you call "lctl get_param nodemap.sherlock.idmap", it outputs everything correctly", I noticed that it is usually sorted by client_ids, but when the tree is "corrupted", the output is not sorted (in the case of the reproducer, we have: 3525, 99001, 10525, 270331 instead of 3525, 10525, 99001, 270331). The full idmap output of my current (good) gid mapping looks like that: [root@oak-md1-s1 ~]# cat /proc/fs/lustre/nodemap/sherlock2/idmap
[
{ idtype: gid, client_id: 3525, fs_id: 3741 } { idtype: gid, client_id: 6401, fs_id: 3752 } { idtype: gid, client_id: 10525, fs_id: 3351 } { idtype: gid, client_id: 11886, fs_id: 3593 } { idtype: gid, client_id: 12193, fs_id: 3636 } { idtype: gid, client_id: 13103, fs_id: 3208 } { idtype: gid, client_id: 17079, fs_id: 3700 } { idtype: gid, client_id: 19437, fs_id: 3618 } { idtype: gid, client_id: 22959, fs_id: 3745 } { idtype: gid, client_id: 24369, fs_id: 3526 } { idtype: gid, client_id: 26426, fs_id: 3352 } { idtype: gid, client_id: 29361, fs_id: 3746 } { idtype: gid, client_id: 29433, fs_id: 3479 } { idtype: gid, client_id: 30289, fs_id: 3262 } { idtype: gid, client_id: 32264, fs_id: 3199 } { idtype: gid, client_id: 32774, fs_id: 3623 } { idtype: gid, client_id: 38517, fs_id: 3702 } { idtype: gid, client_id: 40387, fs_id: 3708 } { idtype: gid, client_id: 40837, fs_id: 3768 } { idtype: gid, client_id: 47235, fs_id: 3674 } { idtype: gid, client_id: 48931, fs_id: 3325 } { idtype: gid, client_id: 50590, fs_id: 3360 } { idtype: gid, client_id: 52892, fs_id: 3377 } { idtype: gid, client_id: 56316, fs_id: 3353 } { idtype: gid, client_id: 56628, fs_id: 3411 } { idtype: gid, client_id: 59943, fs_id: 3372 } { idtype: gid, client_id: 63938, fs_id: 3756 } { idtype: gid, client_id: 99001, fs_id: 3159 } { idtype: gid, client_id: 100533, fs_id: 3281 } { idtype: gid, client_id: 244300, fs_id: 3617 } { idtype: gid, client_id: 254778, fs_id: 3362 } { idtype: gid, client_id: 267829, fs_id: 3748 } { idtype: gid, client_id: 270331, fs_id: 3690 } { idtype: gid, client_id: 305454, fs_id: 3371 } { idtype: gid, client_id: 308753, fs_id: 3367 }
]
This is very clean now, and users don't seem to have any issue anymore. But, unfortunately, when I tried to remove the corrupted nodemap 'sherlock' using "lctl nodemap_del sherlock", it was properly removed on the MGS, but I got the following message on all MDS and OSS: [1197072.957459] Lustre: 123959:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2 Thanks, |
| Comment by Stephane Thiell [ 16/Oct/17 ] |
|
Hi Emoly, Since we tried to remove the bad nodemap 'sherlock' using: lctl nodemap_del sherlock we cannot add or remove any idmap anymore on the servers. It only works on the MGS, and then the following errors occur on the servers: # clush -w@mds,@oss 'dmesg | tail -1'
oak-md1-s2: [1784564.587104] Lustre: 123959:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2
oak-io2-s1: [1025664.072965] Lustre: 317684:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2
oak-io2-s2: [1025667.518836] Lustre: 309962:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2
oak-io1-s2: [1465169.609494] Lustre: 149882:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2
oak-io1-s1: [1462808.944808] Lustre: 226436:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2
I think because of the idmap_search() bug, the servers cannot properly remove the nodemap. I should have kept as is ... do you think your patch could handle this case to fix the trees and allow processing the next nodemap logs? I'll start to work on a new build with your patch just in case. Thanks, |
| Comment by Stephane Thiell [ 16/Oct/17 ] |
|
Also, do you know a way to force purge the nodemap log on the MGS? I would like to get rid of that 'nodemap_del' entry, I can easily replay all next ones. Many thanks, |
| Comment by Stephane Thiell [ 16/Oct/17 ] |
|
Hi, oak-MDT0000: [ 127.492117] Lustre: Lustre: Build Version: 2.10.1_srcc02 [ 127.527461] LNet: Using FMR for registration [ 127.553475] LNet: Added LNI 10.0.2.52@o2ib5 [8/256/0/180] [ 190.367048] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [ 191.433340] LustreError: 137-5: oak-MDT0000_UUID: not available for connect from 10.210.45.60@o2ib3 (no target). If you are running an HA pair check that the target is mounted on the other server. [ 191.452861] LustreError: Skipped 3 previous similar messages [ 191.471790] Lustre: 13119:0:(mgc_request.c:1797:mgc_process_recover_nodemap_log()) MGC10.0.2.51@o2ib5: error processing nodemap log nodemap: rc = -2 [ 191.523256] Lustre: oak-MDT0000: Not available for connect from 10.210.47.38@o2ib3 (not set up) [ 191.532970] Lustre: Skipped 3 previous similar messages [ 192.015060] Lustre: oak-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 192.501895] Lustre: oak-MDD0000: changelog on [ 192.549977] Lustre: oak-MDT0000: Will be in recovery for at least 2:30, or until 1212 clients reconnect [ 192.560492] Lustre: oak-MDT0000: Connection restored to cd0e08e0-aa22-f4da-21ed-94f218f886a1 (at 10.210.45.100@o2ib3) [ 192.595309] Lustre: oak-MDT0000: root_squash is set to 99:99 [ 192.603004] Lustre: oak-MDT0000: nosquash_nids set to 10.0.2.[1-3]@o2ib5 10.0.2.[51-58]@o2ib5 10.0.2.[101-120]@o2ib5 10.0.2.[221-223]@o2ib5 10.0.2.[226-229]@o2ib5 10.0.2.[232-235]@o2ib5 10.0.2.[240-241]@o2ib5 10.210.47.253@o2ib3 10.9.0.[1-2]@o2ib4 ... Thanks, |
| Comment by Stephane Thiell [ 16/Oct/17 ] |
|
Also, I cannot remount the MGS anymore (2.10.1 + patch gerrit 29364): [ 1174.919438] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc [ 1174.932548] Lustre: 17247:0:(osd_handler.c:7008:osd_mount()) MGS-osd: device /dev/mapper/md1-rbod1-mgt was upgraded from Lustre-1.x without enabling the dirdata feature. If you do not want to downgrade to Lustre-1.x again, you can enable it via 'tune2fs -O dirdata device' [ 1175.062057] Lustre: 17247:0:(nodemap_storage.c:914:nodemap_load_entries()) MGS-osd: failed to load nodemap configuration: rc = -2 [ 1175.075067] LustreError: 17247:0:(mgs_fs.c:187:mgs_fs_setup()) MGS: error loading nodemap config file, file must be removed via ldiskfs: rc = -2 [ 1175.089557] LustreError: 17247:0:(mgs_handler.c:1297:mgs_init0()) MGS: MGS filesystem method init failed: rc = -2 [ 1175.145812] LustreError: 17247:0:(obd_config.c:608:class_setup()) setup MGS failed (-2) [ 1175.154748] LustreError: 17247:0:(obd_mount.c:203:lustre_start_simple()) MGS setup error -2 [ 1175.164081] LustreError: 17247:0:(obd_mount_server.c:135:server_deregister_mount()) MGS not registered [ 1175.174463] LustreError: 15e-a: Failed to start MGS 'MGS' (-2). Is the 'mgs' module loaded? [ 1175.282230] Lustre: server umount MGS complete |
| Comment by Emoly Liu [ 17/Oct/17 ] |
|
Stephane, I saw the same "-2" logs from my server on Oct. 9 when I tried to reproduce this issue. Let me see how to purge the nodemap log. |
| Comment by Emoly Liu [ 17/Oct/17 ] |
|
Here are some steps to remove nodemap config log from MGS. This will remove all nodemap information from MGS, so before do that, you'd better save all of nodemap information by "cp -r /proc/fs/lustre/nodemap $nodemap_dir" or "lctl get_param nodemap.*.* > $nodemap_file".
Please let me know if this works for you. |
| Comment by Stephane Thiell [ 17/Oct/17 ] |
|
Hi Emoly, Good news. I renamed ./CONFIGS/nodemap into ./CONFIGS/nodemap.corrupted instead of removing it, but it worked! I was then able to mount the MGS and recreate all nodemaps by hand from there. And now, I can add new idmaps again and they are properly propagated to the targets. The corrupted 'sherlock' nodemap can't be seen anymore from the MGS. After some time, like a few minutes maybe (not immediately), the corrupted 'sherlock' nodemap was also automatically removed from all targets (MDT, OST). This is great. Thanks again! By the way, I am now running 2.10.1 with the patch on the MGS/MDS. Stephane |
| Comment by Emoly Liu [ 18/Oct/17 ] |
|
Stephane, That's great. After the MGS restarts/remounts, the other targets will detect the lock of config log changed and then fetch the config log from MGS to update their local copy. Thanks, Emoly |
| Comment by Gerrit Updater [ 22/Nov/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29364/ |
| Comment by Peter Jones [ 22/Nov/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 22/Nov/17 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/30206 |
| Comment by Gerrit Updater [ 27/Nov/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30206/ |