[LU-1556] MDS does not register changelog readers Created: 22/Jun/12  Updated: 16/Mar/15  Resolved: 06/Jan/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Diego Moreno (Inactive) Assignee: Yang Sheng
Resolution: Cannot Reproduce Votes: 0
Labels: None

Attachments: File changelog_files.tgz    
Issue Links:
Related
Severity: 3
Rank (Obsolete): 4047

 Description   

Customer uses changelog to watch the filesystem. For some weeks it's been properly working but now we observe several problems on the changelog functionality. These problems do not allow us to properly use changelog:

[root@xxx14 ~]# lctl --device project-MDT0000 changelog_register
project-MDT0000: Registered changelog userid 'cl6'
[root@xxx14 ~]# cat /proc/fs/lustre/mdd/project-MDT0000/changelog_users
current index: 5077892
ID index
[root@helios14 ~]# lctl --device project-MDT0000 changelog_deregister cl6
error: changelog_deregister: Invalid argument
[root@helios14 ~]# cat /proc/fs/lustre/mdd/project-MDT0000/changelog_mask
MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RNMFM RNMTO OPEN CLOSE IOCTL TRUNC SATTR XATTR HSM MTIME CTIME

Trying to read changelog sequence starts at 3200000, so it's like if there was still one client reader not having cleared further entries.

Tryning to clear every client reader does not change anything:
for i in 1 2 3 4 5 6 ; do lfs changelog_clear project-MDT0000 cl$i 5000000 ; done

And in MDS' syslog:
1340367641 2012 Jun 22 21:20:41 helios14 kern warning kernel Lustre: 9270:0:(mdd_device.c:1478:mdd_changelog_user_purge()) Could not determine changelog records to purge; rc=-22
1340367641 2012 Jun 22 21:20:41 helios14 kern warning kernel Lustre: 9270:0:(mdd_device.c:1478:mdd_changelog_user_purge()) Skipped 3 previous similar messages

Changelogs can be read but not cleared, thus affecting the tool reading changelogs.

This issue arises with lustre-2.1.1.

Does this sound as a known problem for you? How could we have more debugging information while system is in production?



 Comments   
Comment by Peter Jones [ 22/Jun/12 ]

Yangsheng

Could you please look into this one?

Thanks

Peter

Comment by Yang Sheng [ 24/Jun/12 ]

Hi, Diego, Could you upload more logs in there? Especially the context of

1340367641 2012 Jun 22 21:20:41 helios14 kern warning kernel Lustre: 9270:0:(mdd_device.c:1478:mdd_changelog_user_purge()) Could not determine changelog records to purge; rc=-22
Comment by Diego Moreno (Inactive) [ 25/Jun/12 ]

As I couldn't upload the dmesg file via jira I uploaded it in whamcloud's ftp. Please, find there the dmesg file with some llog errors I couldn't understand.

We often see this message:
LustreError: 9178:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 670040609/1607149776 offset 16384

Comment by Yang Sheng [ 28/Jun/12 ]

Thanks, Diego, This is very useful.

Comment by Diego Moreno (Inactive) [ 02/Jul/12 ]

Some more logs (lustre debug file). Log has been enabled just before running "lctl --device MDT0000 changelog_register".

It seems there's an invalid entry on changelog but how could this entry be there?? and how can we remove this bad entry??

If you need the entire file, please ask.

 
00000040:00020000:1.0:1340966614.416509:0:24262:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 670040609/1607149776 offset 16384
00000040:00020000:8.0:1340966614.427773:0:24264:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 670040609/1607149776 offset 16384
00000004:02000000:1.0:1340966671.265493:0:24267:0:(mdd_device.c:270:mdd_changelog_on()) mdd_obd-project-MDT0000: changelog on
00000004:00020000:1.0:1340966671.269665:0:24267:0:(mdd_device.c:274:mdd_changelog_on()) Changelogs cannot be enabled due to error condition (see mdd_obd-project-MDT0000 log).
00000040:00020000:1.0:1340966683.106474:0:24270:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 670040609/1607149776 offset 16384
00000040:00020000:1.0:1340966683.126397:0:24272:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 670040609/1607149776 offset 16384
00000004:02000000:1.0:1340966711.276400:0:24275:0:(mdd_device.c:270:mdd_changelog_on()) mdd_obd-project-MDT0000: changelog on
00000004:00020000:1.0:1340966711.280559:0:24275:0:(mdd_device.c:274:mdd_changelog_on()) Changelogs cannot be enabled due to error condition (see mdd_obd-project-MDT0000 log).
00000040:00020000:1.0:1340966721.286568:0:24279:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 670040609/1607149776 offset 16384
00000004:00000400:6.0:1340966721.306381:0:24277:0:(mdd_device.c:1478:mdd_changelog_user_purge()) Could not determine changelog records to purge; rc=-22
00000004:02000000:7.0:1340966758.786405:0:24281:0:(mdd_device.c:270:mdd_changelog_on()) mdd_obd-project-MDT0000: changelog on
00000004:00020000:7.0:1340966758.790585:0:24281:0:(mdd_device.c:274:mdd_changelog_on()) Changelogs cannot be enabled due to error condition (see mdd_obd-project-MDT0000 log).
00000040:00020000:1.0:1340967087.617075:0:24326:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 670040609/1607149776 offset 16384
00000040:00020000:8.0:1340967087.628338:0:24328:0:(llog_lvfs.c:473:llog_lvfs_next_block()) Invalid llog tail at log id 670040609/1607149776 offset 16384
00000004:02000000:1.0:1340967087.629959:0:24329:0:(mdd_device.c:270:mdd_changelog_on()) mdd_obd-project-MDT0000: changelog on
00000004:00020000:1.0:1340967087.635292:0:24329:0:(mdd_device.c:274:mdd_changelog_on()) Changelogs cannot be enabled due to error condition (see mdd_obd-project-MDT0000 log).
00000004:02000000:7.0:1340967117.856469:0:24346:0:(mdd_device.c:270:mdd_changelog_on()) mdd_obd-project-MDT0000: changelog on
00000004:00020000:7.0:1340967117.860651:0:24346:0:(mdd_device.c:274:mdd_changelog_on()) Changelogs cannot be enabled due to error condition (see mdd_obd-project-MDT0000 log).
Comment by Yang Sheng [ 20/Jul/12 ]

Hi, Diego, Could you please upload the file MDT: CONFIGS/changelog_users for me? Juse use debugfs open your mdt device, and then use 'dump CONFIGS/changelog_users <choose a filename>'. TIA.

Comment by Diego Moreno (Inactive) [ 03/Aug/12 ]

I attach the changelog_users and changelog_catalog files from the MDT not registering new readers.

Comment by Yang Sheng [ 03/Aug/12 ]

Thanks, Diego, it is very helpful.

Comment by Yang Sheng [ 24/Dec/14 ]

Hi, Diego, Is this issue still make sense? I am not reproduce it in my local box.

Comment by Diego Moreno (Inactive) [ 06/Jan/15 ]

Hi,

We've not seen this bug for a while.

Generated at Sat Feb 10 01:17:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.