Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11459

llsom_sync updates LSOM data for some files when called with non-existant user

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      Looks like llsom_sync is not working correctly when a non-existent changelog user is entered.

      For example, I've registered two changelog users on the MDS, set llite.*.xattr_cache = 0 on the client (vm4) and create a new file and overwrite an existing file:

      [vm3 ~]# dd if=/dev/urandom of=/lustre/scratch/ddfile3 bs=37k count=200
      200+0 records in
      200+0 records out
      7577600 bytes (7.6 MB) copied, 0.0464932 s, 163 MB/s
      
      [vm4 ~]# echo "aa" > /lustre/scratch/ddfile2
      

      Looking at the LSOM data, we can see the size is correct, but the blocks are not updated.

      [vm4 ~]# lfs getsom /lustre/scratch/ddfile2
      file: /lustre/scratch/ddfile2 size: 3 blocks: 0 flags: 4
      [vm4 ~]# lfs getsom /lustre/scratch/ddfile3
      file: /lustre/scratch/ddfile3 size: 7577600 blocks: 0 flags: 4
      

      Now, sync the LSOM data, but give it a bad changelog user ID.

      [vm4 ~]# llsom_sync --mdt scratch-MDT0000 --user cli4 -v /lustre/scratch/
      Start receiving records
      Processed changelog record index:5 type:XATTR(0xf) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:6 type:CREAT(0x1) FID:[0x200000402:0x2:0x0]
      Processed changelog record index:7 type:XATTR(0xf) FID:[0x200000402:0x2:0x0]
      Processed changelog record index:8 type:CLOSE(0xb) FID:[0x200000402:0x2:0x0]
      Processed changelog record index:9 type:XATTR(0xf) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:10 type:TRUNC(0xd) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:11 type:XATTR(0xf) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:12 type:CLOSE(0xb) FID:[0x200000402:0x1:0x0]
      finished reading [scratch-MDT0000]
      Start to sync 2 records.
      record 1651949901960989620:8, updated LSOM for fid [0x200000402:0x2:0x0] size:7577600 blocks:14800
      llsom_sync: cannot purge records for 'cli4': Invalid argument (22)
      llsom_sync: failed to clear changelog record: cli4:8: Invalid argument (22)
      [vm4 ~]# lfs getsom /lustre/scratch/ddfile3
      file: /lustre/scratch/ddfile3 size: 7577600 blocks: 14800 flags: 4
      [vm4 ~]# lfs getsom /lustre/scratch/ddfile2
      file: /lustre/scratch/ddfile2 size: 3 blocks: 0 flags: 4
      

      The changelog record purge error is to be expected since there is no user cli4. The problem is, one file's LSOM data (blocks) is updated, the other file's data is not updated.

      From the output of llsom_sync, it looks like updating of the LSOM file data is interrupted when it figured out that the user is not valid. It seems like there are two issues here:
      1. We should update the LSOM data of all files or none of the files when a bad user ID is input
      2. We are not checking the validity of the user at an appropriate time.

      Looking at the llsom_sync code, we don't check the changelog user in llsom_sync until we call llapi_changelog_clear() to purge changelog records and this routine produces an error.

      Using a valid changelog user, then all file's LSOM data are updated.

      [vm4 ~]# llsom_sync --mdt scratch-MDT0000 --user cl2 -v /lustre/scratch/
      Start receiving records
      Processed changelog record index:5 type:XATTR(0xf) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:6 type:CREAT(0x1) FID:[0x200000402:0x2:0x0]
      Processed changelog record index:7 type:XATTR(0xf) FID:[0x200000402:0x2:0x0]
      Processed changelog record index:8 type:CLOSE(0xb) FID:[0x200000402:0x2:0x0]
      Processed changelog record index:9 type:XATTR(0xf) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:10 type:TRUNC(0xd) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:11 type:XATTR(0xf) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:12 type:CLOSE(0xb) FID:[0x200000402:0x1:0x0]
      Processed changelog record index:13 type:XATTR(0xf) FID:[0x200000402:0x2:0x0]
      finished reading [scratch-MDT0000]
      Start to sync 2 records.
      record 1651949901960989620:8, updated LSOM for fid [0x200000402:0x2:0x0] size:7577600 blocks:14800
      record 1651949963251084516:12, updated LSOM for fid [0x200000402:0x1:0x0] size:3 blocks:8
      [vm4 ~]# lfs getsom /lustre/scratch/ddfile3
      file: /lustre/scratch/ddfile3 size: 7577600 blocks: 14800 flags: 4
      [vm4 ~]# lfs getsom /lustre/scratch/ddfile2
      file: /lustre/scratch/ddfile2 size: 3 blocks: 8 flags: 4
      

      Attachments

        Issue Links

          Activity

            [LU-11459] llsom_sync updates LSOM data for some files when called with non-existant user

            Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/33315
            Subject: LU-11459 changelog: valid check for a given changelog user
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2c5c1b7751b0873e2e0b1f905d2926a9d64501e8

            gerrit Gerrit Updater added a comment - Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/33315 Subject: LU-11459 changelog: valid check for a given changelog user Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2c5c1b7751b0873e2e0b1f905d2926a9d64501e8
            qian_wc Qian Yingjin added a comment -

            I digged into the code, to have the LSOM sync tool check for a valid Changelog user, we need add an extra RPC to MDS to check it. I will make a patch sooner.

            qian_wc Qian Yingjin added a comment - I digged into the code, to have the LSOM sync tool check for a valid Changelog user, we need add an extra RPC to MDS to check it. I will make a patch sooner.
            pjones Peter Jones added a comment -

            Qian

            Can you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Qian Can you please investigate? Thanks Peter

            I don't think we need an "all-or-none" semantic for LSOM data. Since LSOM is, by definition, lazy then there may be any number of reasons why the LSOM attrs are updated or not (e.g. some other client accessed the file, it was stored only on the MDT, whatever.

            I agree it would be useful to have the LSOM tool check for a valid Changelog user when it is first run, so that it can complain to the user appropriately.

            adilger Andreas Dilger added a comment - I don't think we need an "all-or-none" semantic for LSOM data. Since LSOM is, by definition, lazy then there may be any number of reasons why the LSOM attrs are updated or not (e.g. some other client accessed the file, it was stored only on the MDT, whatever. I agree it would be useful to have the LSOM tool check for a valid Changelog user when it is first run, so that it can complain to the user appropriately.

            People

              qian_wc Qian Yingjin
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: