Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12134

llog_reader (incorrectly?) reports a corrupted changelog

Details

    • Bug
    • Resolution: Done
    • Minor
    • None
    • Upstream
    • 3
    • 9223372036854775807

    Description

      I have been following LU-11205 and am trying to reproduce it. I experimented with the llog_reader:

      [root]# lustre/tests/llmount.sh
      [root]# mkdir /mnt/ldiskfs-mdt0
      [root]# mount -t ldiskfs /dev/mapper/mds1_flakey /mnt/ldiskfs-mdt0
      [root]# lctl --device lustre-MDT0000 changelog_register
      lustre-MDT0000: Registered changelog userid 'cl1'
      [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_users
      llog_reader: Error allocating -16 bytes for recs_buf: Cannot allocate memory (12)
      llog_reader: Could not pack buffer.: Cannot allocate memory (12)
      [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
      llog_reader: File too small for llog header: need 8192, size 0
      : Success (0)
      llog_reader: Could not pack buffer.: Input/output error (5)
      

      ... Not ideal, but this is probably because there are not any changelog records yet. So I generate one:

      [root]# touch /mnt/lustre/file
      [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_users
      llog_reader: Error allocating -16 bytes for recs_buf: Cannot allocate memory (12)
      llog_reader: Could not pack buffer.: Cannot allocate memory (12)
      [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
      rec #1 type=1064553b len=64 offset 8192
      Header size : 8192
      Time : Fri Mar 29 14:12:30 2019
      Number of records: 1
      Target uuid : 
      -----------------------
      #01 (064)id=[0xa:0x1:0x0]:0 path=O/1/d10/10
      

      Everything is fine now (well, the users changelog is still broken, but I do not really need it). But what happens if I try and generate a few more?

      [root]# for _ in {0..15}; do touch /mnt/lustre/file-{0..1023}; rm /mnt/lustre/file-{0..1023}; done
      [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
      rec #1 type=1064553b len=64 offset 8192
      off 8256 skip 8128 to next chunk.
      The log is corrupt (too big at 0)
      llog_reader: Could not pack buffer.: Invalid argument (22)
      

      Back to a broken changelog...
      lfs changelog seems to be doing fine though:

      [root]# lfs changelog lustre-MDT0000
      1 01CREAT 14:14:51.291849812 2019.03.29 0x0 t=[0x200000401:0x12:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file
      2 11CLOSE 14:14:51.293009818 2019.03.29 0x42 t=[0x200000401:0x12:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo
      3 01CREAT 14:23:23.124863776 2019.03.29 0x0 t=[0x200000401:0x13:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file-0
      4 11CLOSE 14:23:23.139933635 2019.03.29 0x42 t=[0x200000401:0x13:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo
      5 01CREAT 14:23:23.140437103 2019.03.29 0x0 t=[0x200000401:0x14:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file-1
      (...)
      49153 06UNLNK 14:23:57.498329392 2019.03.29 0x1 t=[0x200000401:0x4011:0x0] j=rm.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file-1022
      49154 06UNLNK 14:23:57.499430628 2019.03.29 0x1 t=[0x200000401:0x4012:0x0] j=rm.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file-1023
      

      Which makes me hope the issue is only with the llog_reader.

      Attachments

        Issue Links

          Activity

            [LU-12134] llog_reader (incorrectly?) reports a corrupted changelog

            OK, thank you Aurélien!

            bougetq Quentin Bouget (Inactive) added a comment - OK, thank you Aurélien!

            Quentin, even if you are mounting the same device, you are using 2 different filesystems and 2 different mount points. Both of them have their own caching mechanisms. You cannot expect data written by Lustre being visible right away in the ldiskfs mount. If ldiskfs has already cached several blocks, data structure, etc. it does not expect somebody else changing them.

            Mounting the lustre device with ldiskfs works if you want to read data already present or if Lustre is unmount.

            As this is mostly a caching issue, dropping cache should definitely help.

            degremoa Aurelien Degremont (Inactive) added a comment - Quentin, even if you are mounting the same device, you are using 2 different filesystems and 2 different mount points. Both of them have their own caching mechanisms. You cannot expect data written by Lustre being visible right away in the ldiskfs mount. If ldiskfs has already cached several blocks, data structure, etc. it does not expect somebody else changing them. Mounting the lustre device with ldiskfs works if you want to read data already present or if Lustre is unmount. As this is mostly a caching issue, dropping cache should definitely help.
            bougetq Quentin Bouget (Inactive) added a comment - - edited

            > try to remount local fs and check again?

            Yes, thank you, this helped:

            [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
            llog_reader: Error allocating -16 bytes for recs_buf: Cannot allocate memory (12)
            llog_reader: Could not pack buffer.: Cannot allocate memory (12)
            [root]# umount /mnt/ldiskfs-mdt0
            [root]# umount /mnt/lustre-mds1
            [root]# mount -t lustre /dev/mapper/mds1_flakey /mnt/lustre-mds1/
            [root]# mount -t ldiskfs /dev/mapper/mds1_flakey /mnt/ldiskfs-mdt0
            [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
            rec #1 type=1064553b len=64 offset 8192
            Header size : 8192
            Time : Tue Apr  2 00:10:37 2019
            Number of records: 1
            Target uuid : 
            -----------------------
            #01 (064)id=[0x12:0x1:0x0]:0 path=O/1/d18/18
            

            It would seem unmounting both the ldiskfs and the lustre fs before remounting them is mandatory, otherwise it doesn't work.

            Dropping caches seems to work too.

            bougetq Quentin Bouget (Inactive) added a comment - - edited > try to remount local fs and check again? Yes, thank you, this helped: [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog llog_reader: Error allocating -16 bytes for recs_buf: Cannot allocate memory (12) llog_reader: Could not pack buffer.: Cannot allocate memory (12) [root]# umount /mnt/ldiskfs-mdt0 [root]# umount /mnt/lustre-mds1 [root]# mount -t lustre /dev/mapper/mds1_flakey /mnt/lustre-mds1/ [root]# mount -t ldiskfs /dev/mapper/mds1_flakey /mnt/ldiskfs-mdt0 [root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog rec #1 type=1064553b len=64 offset 8192 Header size : 8192 Time : Tue Apr 2 00:10:37 2019 Number of records: 1 Target uuid : ----------------------- #01 (064)id=[0x12:0x1:0x0]:0 path=O/1/d18/18 It would seem unmounting both the ldiskfs and the lustre fs before remounting them is mandatory, otherwise it doesn't work. Dropping caches seems to work too.

            try to remount local fs and check again?

            tappro Mikhail Pershin added a comment - try to remount local fs and check again?
            pjones Peter Jones added a comment -

            Mike

            Can you advise

            Thanks

            Peter

            pjones Peter Jones added a comment - Mike Can you advise Thanks Peter

            People

              tappro Mikhail Pershin
              cealustre CEA
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: