IU UID/GID Mapping Feature
(LU-3291)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Kit Westneat | Assignee: | Kit Westneat |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 14037 | ||||||||||||
| Description |
|
This ticket relates to transfer of idmap information between the server nodes. The goal is that the MGS stores the UID/GID maps in an index (similar to how quota is storing the master quota files) and these are transferred to the other servers to keep the mappings consistent. |
| Comments |
| Comment by Andreas Dilger [ 20/May/14 ] |
|
Kit had a question about API usage for creating the mapping indices as named objects on the MGS and transferring these to the servers. There may be an arbitrary number of maps for different remote systems, so a single reserved FID is not going to be sufficient. It probably makes sense to reserve a FID sequence for nodemap to hold all of the idmaps, and the OIDs in this sequence are used as needed for different idmaps. Should these idmaps be created under CONFIGS, or in their own top-level directory? It probably makes sense to put it under CONFIGS (maybe as a subdirectory), since AFAIK the MGS OSDs namespace is only under CONFIGS, and we don't want to mess with files under / and confuse a shared MDS OSD? Is there any documentation on how to create these objects via OSD API, or examples for Kit to follow? |
| Comment by Kit Westneat [ 20/May/14 ] |
|
Thanks for creating this ticket for me Andreas. Here's a quick recap of what I've tried and the results I've gotten. I was planning on just using one index file and holding all the maps within it, at least to start with. So a sequence might not be necessary, I'm not sure. I added an OID to enum local_oid, and called: lu_local_obj_fid(&fid, NODEMAP_OID); and added this to the osd_lf_maps: static const struct osd_lf_map osd_lf_maps[] /* nodemap */ { "nodemap", { FID_SEQ_LOCAL_FILE, NODEMAP_OID, 0 }, OLF_SHOW_NAME, NULL, NULL }, I then tried to create the file with local_index_find_or_create_with_fid, but the dt_devices did not like that: nodemap_idx = local_index_find_or_create_with_fid(env, dev, &fid, parent, nodemap_idx_filename, mode, &dt_nodemap_features); But that got me a couple different LBUGs. Using mgs_bottom: <0>LustreError: 1332:0:(mdd_object.c:107:mdd_env_info()) ASSERTION( info != ((void *)0) ) failed: <0>LustreError: 1332:0:(mdd_object.c:107:mdd_env_info()) LBUG <4>Pid: 1332, comm: lctl <4> <4>Call Trace: <4> [<ffffffffa0312895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0312e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0d2f5e1>] mdd_env_info+0x61/0x70 [mdd] <4> [<ffffffffa0d2ffbd>] mdd_object_start+0x4d/0x100 [mdd] <4> [<ffffffffa047e42a>] lu_object_alloc+0x12a/0x320 [obdclass] <4> [<ffffffffa047ef84>] lu_object_find_at+0x204/0x350 [obdclass] <4> [<ffffffffa047e94d>] ? lu_object_put+0xad/0x330 [obdclass] <4> [<ffffffffa048158c>] dt_locate_at+0x1c/0xa0 [obdclass] <4> [<ffffffffa0461a36>] local_index_find_or_create_with_fid+0x196/0x220 [obdclass] Using mgs_dt_dev: <0>LustreError: 4598:0:(mgs_handler.c:1311:mgs_object_alloc()) ASSERTION( hdr == ((void *)0) ) failed: <0>LustreError: 4598:0:(mgs_handler.c:1311:mgs_object_alloc()) LBUG <4>Pid: 4598, comm: lctl <4> <4>Call Trace: <4> [<ffffffffa0312895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0312e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0b91d98>] mgs_object_alloc+0x148/0x220 [mgs] <4> [<ffffffffa045f5e8>] ls_object_init+0x38/0x140 [obdclass] <4> [<ffffffffa047e3d8>] lu_object_alloc+0xd8/0x320 [obdclass] <4> [<ffffffffa047eea9>] lu_object_find_at+0x129/0x350 [obdclass] <4> [<ffffffffa048158c>] dt_locate_at+0x1c/0xa0 [obdclass] <4> [<ffffffffa0460d12>] __local_file_create+0x72/0x8c0 [obdclass] <4> [<ffffffffa04619ed>] local_index_find_or_create_with_fid+0x14d/0x220 [obdclass] |
| Comment by Kit Westneat [ 20/May/14 ] |
|
Adding dt_nodemap_features in case that's useful to see: /* nodemap files */ const struct dt_index_features dt_nodemap_features = { .dif_flags = DT_IND_UPDATE, .dif_keysize_min = sizeof(__u64), /* 64-bit nodemap id/record id */ .dif_keysize_max = sizeof(__u64), /* 64-bit nodemap id/record id */ .dif_recsize_min = sizeof(struct nodemap_rec), /* 32 bytes */ .dif_recsize_max = sizeof(struct nodemap_rec), /* 32 bytes */ .dif_ptrsize = 4 }; EXPORT_SYMBOL(dt_nodemap_features); |
| Comment by Peter Jones [ 20/May/14 ] |
|
Good to see you still around Kit! I have updated the reporter field to show you as the originator of this ticket |
| Comment by Kit Westneat [ 27/May/14 ] |
|
Thanks Peter! I was wondering when someone might be able to take a look at this. I'm a bit reticent to invest a ton more time on this work until I know that I am headed in the right direction. |
| Comment by Peter Jones [ 27/May/14 ] |
|
Kit Most likely not until 2.6 goes GA and work on 2.7 starts. Peter |
| Comment by Andreas Dilger [ 27/May/14 ] |
|
Peter, Kit is mostly just looking for some advice from Alex, Niu, and Johann about the right way to implement the nodemap transfer between the MDT and OST. He is planning on using the quota index transfer that Johann implemented in 2.4 as the model for the UID/GID mapping transfer. It makes sense to provide feedback on whether this is the right approach to begin with, and guidance on the right implementation before it is done rather than afterward. |
| Comment by Peter Jones [ 27/May/14 ] |
|
Thanks Andreas. Yes, I understood that Kit wanted guidance up front before starting in earnest on this approach. I was just trying to give him a rough estimate as to when I expected people to be free to do so. |
| Comment by Kit Westneat [ 10/Jun/14 ] |
|
I've been chipping away at this some. Specifically I've been looking at using the mgs_config_read RPC and returning all the index entries in the response body. The issue I'm running into is that mgs_config_res response only contains an offset, while the indexes use offset and version. Because llogs entries are immutable with monotonically increasing IDs, the offset and version are essentially the same, so the client can just ask for records older than its last record. The index offsets however are based on the hashes of the indices, and so the index file has a separate version variable. By examining the version variable, the client can see if the index file has changed while it was reading (between RPCs). The lack of version in the mgs_config_read RPC response makes it hard to send the index file. Here are some of my thoughts:
What do you all think is the best way to go about this? |
| Comment by Niu Yawei (Inactive) [ 22/Jul/14 ] |
|
Hi, Kit. OBD_IDX_READ RPC was designed for transferring index files, I think you can use it to transfer UID/GID mapping files. (needs adding such PRC handler for MGS). |
| Comment by Kit Westneat [ 08/Sep/14 ] |
|
patch to create and save nodemaps: |
| Comment by Kit Westneat [ 09/Sep/14 ] |
|
patch to transfer nodemaps: |
| Comment by Gerrit Updater [ 29/Mar/15 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/14254 |
| Comment by James A Simmons [ 05/May/15 ] |
|
Hi Kit. I update all your patches related to this ticket. Also added in node map data being stored for ZFS besides ldiskfs support. The only reservation I have for 14254 is that nm_id is to small. I think using unsigned int is too small. I could easily see our center wide file system exhausted id namespace quickly. |
| Comment by Andreas Dilger [ 12/May/15 ] |
|
James, I'm not sure why you think a 32-bit nm_id would run out? The nm_id is for the number of different remote clusters with different UID spaces. It doesn't relate to the number of remote nodes in a given mapping or the range of UID or GID values that can be mapped. |
| Comment by James A Simmons [ 12/May/15 ] |
|
Ah I see. I was under the impression it was related to the number of nodes plus the UID space in the the cluster. I was thinking of the different use cases if you had 25K clients and thousands of users total like we do and in that case it would be possible to exhaust the space. Since this is not the case then that size should be large enough (famous last words). |
| Comment by Gerrit Updater [ 20/May/15 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/14885 |
| Comment by James A Simmons [ 11/Jun/15 ] |
|
Sorry Kit but my cfs_hash cleanups broke your patches. I updated the first two in the series. Having trouble updating the 3rd one so far. Hope that helps you out. |
| Comment by Kit Westneat [ 11/Jun/15 ] |
|
Thanks James, I put a comment on the first one. I'll update the third. Just to confirm, you mean change 14254? |
| Comment by James A Simmons [ 11/Jun/15 ] |
|
Yes change 14254. I seen your comment. I missed that change. Thanks for looking over the update. |
| Comment by Gerrit Updater [ 10/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14885/ |
| Comment by Gerrit Updater [ 31/Jul/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14254/ |
| Comment by Gerrit Updater [ 26/Oct/15 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/16941 |
| Comment by Gerrit Updater [ 07/Dec/15 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/17503 |
| Comment by Gerrit Updater [ 20/Feb/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11813/ |
| Comment by Gerrit Updater [ 22/Feb/16 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/18554 |
| Comment by Gerrit Updater [ 20/Apr/16 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/19674 |
| Comment by Gerrit Updater [ 31/May/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19674/ |
| Comment by Gerrit Updater [ 02/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11830/ |
| Comment by James A Simmons [ 06/Jun/16 ] |
|
Kit was is the dependency chain now? Currently it is confusing to know what order is important. |
| Comment by Kit Westneat [ 06/Jun/16 ] |
|
Change 16941 allows for config files > the size of 1 RPC Change 18554 sets up 17503, which caches the config on targets so they don't need the MGS to startup with nodemap. So change 16941 is stand alone, but 17503 requires that 18554 be landed first. |
| Comment by Gerrit Updater [ 14/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16941/ |
| Comment by Gerrit Updater [ 22/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18554/ |
| Comment by Peter Jones [ 25/Jul/16 ] |
|
Kit It looks like there is still one patch - http://review.whamcloud.com/#/c/17503/ - being tracked under this ticket. Does that need to land for 2.9? Peter |
| Comment by Kit Westneat [ 26/Jul/16 ] |
|
Hi Peter, It would be best if it did, but it's not mandatory. Without that patch, if the MGS goes away, the OSSes and MDSes will not be able to load the nodemap configuration if they get restarted. Thanks, |
| Comment by Gerrit Updater [ 11/Aug/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17503/ |
| Comment by Peter Jones [ 11/Aug/16 ] |
|
Landed for 2.9 |