[LU-12134] llog_reader (incorrectly?) reports a corrupted changelog Created: 29/Mar/19  Updated: 16/Jan/22  Resolved: 16/Jan/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Upstream
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: CEA Assignee: Mikhail Pershin
Resolution: Done Votes: 0
Labels: ORNL

Issue Links:
Related
is related to LU-11205 Failure to clear the changelog for us... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I have been following LU-11205 and am trying to reproduce it. I experimented with the llog_reader:

[root]# lustre/tests/llmount.sh
[root]# mkdir /mnt/ldiskfs-mdt0
[root]# mount -t ldiskfs /dev/mapper/mds1_flakey /mnt/ldiskfs-mdt0
[root]# lctl --device lustre-MDT0000 changelog_register
lustre-MDT0000: Registered changelog userid 'cl1'
[root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_users
llog_reader: Error allocating -16 bytes for recs_buf: Cannot allocate memory (12)
llog_reader: Could not pack buffer.: Cannot allocate memory (12)
[root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
llog_reader: File too small for llog header: need 8192, size 0
: Success (0)
llog_reader: Could not pack buffer.: Input/output error (5)

... Not ideal, but this is probably because there are not any changelog records yet. So I generate one:

[root]# touch /mnt/lustre/file
[root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_users
llog_reader: Error allocating -16 bytes for recs_buf: Cannot allocate memory (12)
llog_reader: Could not pack buffer.: Cannot allocate memory (12)
[root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
rec #1 type=1064553b len=64 offset 8192
Header size : 8192
Time : Fri Mar 29 14:12:30 2019
Number of records: 1
Target uuid : 
-----------------------
#01 (064)id=[0xa:0x1:0x0]:0 path=O/1/d10/10

Everything is fine now (well, the users changelog is still broken, but I do not really need it). But what happens if I try and generate a few more?

[root]# for _ in {0..15}; do touch /mnt/lustre/file-{0..1023}; rm /mnt/lustre/file-{0..1023}; done
[root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
rec #1 type=1064553b len=64 offset 8192
off 8256 skip 8128 to next chunk.
The log is corrupt (too big at 0)
llog_reader: Could not pack buffer.: Invalid argument (22)

Back to a broken changelog...
lfs changelog seems to be doing fine though:

[root]# lfs changelog lustre-MDT0000
1 01CREAT 14:14:51.291849812 2019.03.29 0x0 t=[0x200000401:0x12:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file
2 11CLOSE 14:14:51.293009818 2019.03.29 0x42 t=[0x200000401:0x12:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo
3 01CREAT 14:23:23.124863776 2019.03.29 0x0 t=[0x200000401:0x13:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file-0
4 11CLOSE 14:23:23.139933635 2019.03.29 0x42 t=[0x200000401:0x13:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo
5 01CREAT 14:23:23.140437103 2019.03.29 0x0 t=[0x200000401:0x14:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file-1
(...)
49153 06UNLNK 14:23:57.498329392 2019.03.29 0x1 t=[0x200000401:0x4011:0x0] j=rm.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file-1022
49154 06UNLNK 14:23:57.499430628 2019.03.29 0x1 t=[0x200000401:0x4012:0x0] j=rm.0 ef=0xf u=0:0 nid=0@lo p=[0x200000007:0x1:0x0] file-1023

Which makes me hope the issue is only with the llog_reader.



 Comments   
Comment by Peter Jones [ 29/Mar/19 ]

Mike

Can you advise

Thanks

Peter

Comment by Mikhail Pershin [ 30/Mar/19 ]

try to remount local fs and check again?

Comment by Quentin Bouget [ 01/Apr/19 ]

> try to remount local fs and check again?

Yes, thank you, this helped:

[root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
llog_reader: Error allocating -16 bytes for recs_buf: Cannot allocate memory (12)
llog_reader: Could not pack buffer.: Cannot allocate memory (12)
[root]# umount /mnt/ldiskfs-mdt0
[root]# umount /mnt/lustre-mds1
[root]# mount -t lustre /dev/mapper/mds1_flakey /mnt/lustre-mds1/
[root]# mount -t ldiskfs /dev/mapper/mds1_flakey /mnt/ldiskfs-mdt0
[root]# lustre/utils/llog_reader /mnt/ldiskfs-mdt0/changelog_catalog
rec #1 type=1064553b len=64 offset 8192
Header size : 8192
Time : Tue Apr  2 00:10:37 2019
Number of records: 1
Target uuid : 
-----------------------
#01 (064)id=[0x12:0x1:0x0]:0 path=O/1/d18/18

It would seem unmounting both the ldiskfs and the lustre fs before remounting them is mandatory, otherwise it doesn't work.

Dropping caches seems to work too.

Comment by Aurelien Degremont (Inactive) [ 02/Apr/19 ]

Quentin, even if you are mounting the same device, you are using 2 different filesystems and 2 different mount points. Both of them have their own caching mechanisms. You cannot expect data written by Lustre being visible right away in the ldiskfs mount. If ldiskfs has already cached several blocks, data structure, etc. it does not expect somebody else changing them.

Mounting the lustre device with ldiskfs works if you want to read data already present or if Lustre is unmount.

As this is mostly a caching issue, dropping cache should definitely help.

Comment by Quentin Bouget [ 02/Apr/19 ]

OK, thank you Aurélien!

Generated at Sat Feb 10 02:49:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.