[LU-11459] llsom_sync updates LSOM data for some files when called with non-existant user Created: 02/Oct/18 Updated: 21/Jan/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Qian Yingjin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | LSOM, patch | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Looks like llsom_sync is not working correctly when a non-existent changelog user is entered. For example, I've registered two changelog users on the MDS, set llite.*.xattr_cache = 0 on the client (vm4) and create a new file and overwrite an existing file: [vm3 ~]# dd if=/dev/urandom of=/lustre/scratch/ddfile3 bs=37k count=200 200+0 records in 200+0 records out 7577600 bytes (7.6 MB) copied, 0.0464932 s, 163 MB/s [vm4 ~]# echo "aa" > /lustre/scratch/ddfile2 Looking at the LSOM data, we can see the size is correct, but the blocks are not updated. [vm4 ~]# lfs getsom /lustre/scratch/ddfile2 file: /lustre/scratch/ddfile2 size: 3 blocks: 0 flags: 4 [vm4 ~]# lfs getsom /lustre/scratch/ddfile3 file: /lustre/scratch/ddfile3 size: 7577600 blocks: 0 flags: 4 Now, sync the LSOM data, but give it a bad changelog user ID. [vm4 ~]# llsom_sync --mdt scratch-MDT0000 --user cli4 -v /lustre/scratch/ Start receiving records Processed changelog record index:5 type:XATTR(0xf) FID:[0x200000402:0x1:0x0] Processed changelog record index:6 type:CREAT(0x1) FID:[0x200000402:0x2:0x0] Processed changelog record index:7 type:XATTR(0xf) FID:[0x200000402:0x2:0x0] Processed changelog record index:8 type:CLOSE(0xb) FID:[0x200000402:0x2:0x0] Processed changelog record index:9 type:XATTR(0xf) FID:[0x200000402:0x1:0x0] Processed changelog record index:10 type:TRUNC(0xd) FID:[0x200000402:0x1:0x0] Processed changelog record index:11 type:XATTR(0xf) FID:[0x200000402:0x1:0x0] Processed changelog record index:12 type:CLOSE(0xb) FID:[0x200000402:0x1:0x0] finished reading [scratch-MDT0000] Start to sync 2 records. record 1651949901960989620:8, updated LSOM for fid [0x200000402:0x2:0x0] size:7577600 blocks:14800 llsom_sync: cannot purge records for 'cli4': Invalid argument (22) llsom_sync: failed to clear changelog record: cli4:8: Invalid argument (22) [vm4 ~]# lfs getsom /lustre/scratch/ddfile3 file: /lustre/scratch/ddfile3 size: 7577600 blocks: 14800 flags: 4 [vm4 ~]# lfs getsom /lustre/scratch/ddfile2 file: /lustre/scratch/ddfile2 size: 3 blocks: 0 flags: 4 The changelog record purge error is to be expected since there is no user cli4. The problem is, one file's LSOM data (blocks) is updated, the other file's data is not updated. From the output of llsom_sync, it looks like updating of the LSOM file data is interrupted when it figured out that the user is not valid. It seems like there are two issues here: Looking at the llsom_sync code, we don't check the changelog user in llsom_sync until we call llapi_changelog_clear() to purge changelog records and this routine produces an error. Using a valid changelog user, then all file's LSOM data are updated. [vm4 ~]# llsom_sync --mdt scratch-MDT0000 --user cl2 -v /lustre/scratch/ Start receiving records Processed changelog record index:5 type:XATTR(0xf) FID:[0x200000402:0x1:0x0] Processed changelog record index:6 type:CREAT(0x1) FID:[0x200000402:0x2:0x0] Processed changelog record index:7 type:XATTR(0xf) FID:[0x200000402:0x2:0x0] Processed changelog record index:8 type:CLOSE(0xb) FID:[0x200000402:0x2:0x0] Processed changelog record index:9 type:XATTR(0xf) FID:[0x200000402:0x1:0x0] Processed changelog record index:10 type:TRUNC(0xd) FID:[0x200000402:0x1:0x0] Processed changelog record index:11 type:XATTR(0xf) FID:[0x200000402:0x1:0x0] Processed changelog record index:12 type:CLOSE(0xb) FID:[0x200000402:0x1:0x0] Processed changelog record index:13 type:XATTR(0xf) FID:[0x200000402:0x2:0x0] finished reading [scratch-MDT0000] Start to sync 2 records. record 1651949901960989620:8, updated LSOM for fid [0x200000402:0x2:0x0] size:7577600 blocks:14800 record 1651949963251084516:12, updated LSOM for fid [0x200000402:0x1:0x0] size:3 blocks:8 [vm4 ~]# lfs getsom /lustre/scratch/ddfile3 file: /lustre/scratch/ddfile3 size: 7577600 blocks: 14800 flags: 4 [vm4 ~]# lfs getsom /lustre/scratch/ddfile2 file: /lustre/scratch/ddfile2 size: 3 blocks: 8 flags: 4 |
| Comments |
| Comment by Andreas Dilger [ 02/Oct/18 ] |
|
I don't think we need an "all-or-none" semantic for LSOM data. Since LSOM is, by definition, lazy then there may be any number of reasons why the LSOM attrs are updated or not (e.g. some other client accessed the file, it was stored only on the MDT, whatever. I agree it would be useful to have the LSOM tool check for a valid Changelog user when it is first run, so that it can complain to the user appropriately. |
| Comment by Peter Jones [ 03/Oct/18 ] |
|
Qian Can you please investigate? Thanks Peter |
| Comment by Qian Yingjin [ 08/Oct/18 ] |
|
I digged into the code, to have the LSOM sync tool check for a valid Changelog user, we need add an extra RPC to MDS to check it. I will make a patch sooner. |
| Comment by Gerrit Updater [ 08/Oct/18 ] |
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/33315 |