[LU-13602] Skip unknown FLR component types Created: 27/May/20 Updated: 02/Aug/21 Resolved: 29/Jul/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Li Xi | Assignee: | Qian Yingjin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
Currently, in lov_init_composite() will quit with error when reading an unknown LOV pattern: CERROR("%s: unknown composite layout entry type %i\n",
lov2obd(dev->ld_lov)->obd_name,
lsm->lsm_entries[i]->lsme_pattern);
Since FLR will be used for upcoming new features, like PCC-RO, an old client should still be able to read the old format mirrors and ignore the new types of mirrors. On the MDS, it should skip mirrors with unknown/foreign component types and mark them stale when selecting which component to use for writes. |
| Comments |
| Comment by Gerrit Updater [ 27/Jul/20 ] |
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/39513 |
| Comment by Andreas Dilger [ 26/Nov/20 ] |
|
Yingjin, could you please add a short description of what you think is needed to implement this. To my thinking, the main places that need to be changed are mirror selection for writes (on the MDS) and reads (on the client) to skip unknown component types (e.g. unknown layout types, foreign components, PCC/HSM components on remote nodes, etc) in addition to the normal skip of STALE components. For PCC-RO the client can select the local cache copy if it is not STALE. Is there some other change that is needed here which I'm missing? |
| Comment by Qian Yingjin [ 27/Nov/20 ] |
|
Hi Andreas, The problem is not PCC-RW as it is actual a HSM solution and the file data can be restored when a conflict access happens. The main problem is about PCC-RO. When a file is PCC-RO cached on a client, the components on Lustre OSTs and on the PCC copies are both in update-to-date state (not STALE state). All client can do read operations on the components with data on Lustre OSTs. To solve this problem, the solutions may be:
Regards, |
| Comment by Andreas Dilger [ 27/Nov/20 ] |
|
When you write "for a old client without PCC-RO support", shouldn't the normal FLR handling cover this condition? We already do not allow clients that do not understand FLR from accessing files with mirror components, and PCC-RO should be handled in the same way as any other FLR mirror. The main difference should be that a PCC-RO client has the added option of reading the file from the local cache copy (if the file exists there). Any non-PCC client can only read from the non-PCC mirror(s) of the file, but there should always be a non-PCC mirror for PCC-RO. I can't see any reason why a non-PCC client reading from a file should cause the PCC-RO cache copies to be invalidated? It should only be writing to a FLR file that should cause the layout to be invalidated, and all but one of the mirrors to be marked stale. This should already be handled by normal FLR mechanisms to revoke the layout lock from the clients (including PCC-RO clients), and the PCC client should only be able to access the local cache copy again when it has a layout lock, and the data version of the cache copy is still valid (i.e. the data was not modified). It may be that the PCC code needs to be changed so that it computes and stores the "data version" of only the non-PCC mirror copy instead of the whole layout, so that the PCC copy does not get invalidated just because another mirror was added, or similar. |
| Comment by Qian Yingjin [ 28/Nov/20 ] |
|
Combining with FLR, we need to extend the data structure `enum lov_comp_md_flags`. /** * on-disk data for lcm_flags. Valid if lcm_magic is LOV_MAGIC_COMP_V1. */ enum lov_comp_md_flags { /* the least 2 bits are used by FLR to record file state */ LCM_FL_NONE = 0x0, LCM_FL_RDONLY = 0x1, LCM_FL_WRITE_PENDING = 0x2, LCM_FL_SYNC_PENDING = 0x4, LCM_FL_PCC_RDONLY = ox8, /* The file is RO-PCC cached */ LCM_FL_FLR_MASK = 0xF, }; Please note the flags LCM_FL_PCC_RDONLY is different from LCM_FL_RDONLY here.
From my own opinion, I do not think PCC-RO attaching a file in LCM_FL_WRITE_PENDING state needs to sync all mirrors and transmit the file into LCM_FL_RDONLY state which may be time-consuming. In our implementation, we only need to add a new PCC-RO foreign layout component and add LCM_FL_PCC_RDONLY into the FLR flags. As we add a new LCM_FL_PCC_RDONLY state, thus the old client may not understand this new FLR component flag, and may result in inconsistent access. Regards, |
| Comment by Qian Yingjin [ 28/Nov/20 ] |
|
The old on-disk data for lcm_flags are: /** * on-disk data for lcm_flags. Valid if lcm_magic is LOV_MAGIC_COMP_V1. */ enum lov_comp_md_flags { /* the least 2 bits are used by FLR to record file state */ LCM_FL_NONE = 0, LCM_FL_RDONLY = 1, LCM_FL_WRITE_PENDING = 2, LCM_FL_SYNC_PENDING = 3, LCM_FL_FLR_MASK = 0x3, }; We have also changed the value of LCM_FL_SYNC_PENDING... Maybe the new lov_comp_md_flags should be: /** * on-disk data for lcm_flags. Valid if lcm_magic is LOV_MAGIC_COMP_V1. */ enum lov_comp_md_flags { /* the least 2 bits are used by FLR to record file state */ LCM_FL_NONE = 0x00, LCM_FL_RDONLY = 0x01, LCM_FL_WRITE_PENDING = 0x02, LCM_FL_SYNC_PENDING = 0x03, LCM_FL_PCC_RDONLY = 0x10, LCM_FL_FLR_MASK = 0x13, }; we don't change the old FLR component flags, add a new LCM_FL_PCC_RDONLY from 0x10. |
| Comment by Qian Yingjin [ 01/Dec/20 ] |
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/40791 |
| Comment by Gerrit Updater [ 01/Dec/20 ] |
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/40813 |
| Comment by Andreas Dilger [ 14/Dec/20 ] |
|
Yingjin, neither of these patches are addressing the main issue of this ticket, which is to allow the MDS and clients to skip layout types that they do not understand. In particular, in the new PCC code, it is using a foreign layout as part of a file, and if that file is being modified by a client then the PCC component needs to be marked STALE. The same should be done for any other layout type that the MDS does not understand. On the client, it should not try to read from a component type that it doesn't understand. |
| Comment by Qian Yingjin [ 07/Jan/21 ] |
|
Updated the patch https://review.whamcloud.com/#/c/39513/. It will skip the unknown FOREIGN layout component that the client does not support or understand. |
| Comment by Gerrit Updater [ 22/Jan/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39513/ |
| Comment by Gerrit Updater [ 28/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40813/ |
| Comment by Peter Jones [ 29/Jul/21 ] |
|
Landed for 2.15 |
| Comment by James A Simmons [ 02/Aug/21 ] |
|
For some reason the change in the patch LCM_FL_PCC_RDONLY exposes a bug that I have seen with the port to the native Linux client. With the patch I get: Lustre: DEBUG MARKER: == sanity test 63b: async write errors should be returned to fsync ============================ ======= 10:58:33 (1627916313) Lustre: *** cfs_fail_loc=406, val=0*** LustreError: 20036:0:(osc_request.c:2586:osc_build_rpc()) prep_req failed: -12 LustreError: 20036:0:(osc_cache.c:2259:osc_check_rpcs()) Write request failed with -12 LustreError: 20042:0:(lu_object.c:215:lu_object_put()) ASSERTION( list_empty(&top->loh_lru) ) failed: LustreError: 20042:0:(lu_object.c:215:lu_object_put()) LBUG kernel: Pid: 20042, comm: lctl 5.7.0-rc7+ #1 SMP PREEMPT Sat Jul 31 13:04:46 EDT 2021 kernel: Call Trace: kernel: libcfs_call_trace+0x62/0x80 [libcfs] kernel: lbug_with_loc+0x41/0xa0 [libcfs] kernel: lu_object_put+0x31b/0x340 [obdclass] kernel: vvp_pgcache_current+0x7e/0x150 [lustre] kernel: seq_read+0x200/0x3c0 kernel: full_proxy_read+0x4d/0x70 kernel: vfs_read+0xb3/0x17 kernel: ksys_read+0x5f/0xd0 kernel: do_syscall_64+0x6d/0x4f0 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xb3 kernel: Kernel panic - not syncing: LBUG |