Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.6
-
CentOS 7.4 (3.10.0-693.2.2.el7_lustre.pl1.x86_64)
-
3
-
9223372036854775807
Description
Hello,
We're seeing the following messages on Oak's MDT in 2.10.4:
Aug 03 09:21:39 oak-md1-s2 kernel: Lustre: 11137:0:(mdd_device.c:1577:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22 Aug 03 09:31:38 oak-md1-s2 kernel: Lustre: 11271:0:(mdd_device.c:1577:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
Robinhood (also running 2.10.4) shows this:
2018/08/03 10:00:47 [13766/22] ChangeLog | ERROR: llapi_changelog_clear("oak-MDT0000", "cl1", 13975842301) returned -22 2018/08/03 10:00:47 [13766/22] EntryProc | Error -22 performing callback at stage STAGE_CHGLOG_CLR. 2018/08/03 10:00:47 [13766/16] llapi | cannot purge records for 'cl1' 2018/08/03 10:00:47 [13766/16] ChangeLog | ERROR: llapi_changelog_clear("oak-MDT0000", "cl1", 13975842303) returned -22 2018/08/03 10:00:47 [13766/16] EntryProc | Error -22 performing callback at stage STAGE_CHGLOG_CLR. 2018/08/03 10:00:47 [13766/4] llapi | cannot purge records for 'cl1' 2018/08/03 10:00:47 [13766/4] ChangeLog | ERROR: llapi_changelog_clear("oak-MDT0000", "cl1", 13975842304) returned -22 2018/08/03 10:00:47 [13766/4] EntryProc | Error -22 performing callback at stage STAGE_CHGLOG_CLR.
Oak's MDT usage is as follow:
[root@oak-md1-s2 ~]# df -h -t lustre Filesystem Size Used Avail Use% Mounted on /dev/mapper/md1-rbod1-mdt0 1.3T 131G 1022G 12% /mnt/oak/mdt/0 [root@oak-md1-s2 ~]# df -i -t lustre Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/md1-rbod1-mdt0 873332736 266515673 606817063 31% /mnt/oak/mdt/0
I'm concerned that the MDT might fill up with changelogs. Could you please assist in troubleshooting this issue?
Thanks!
Stephane
Attachments
Issue Links
- is related to
-
LU-12577 chlg_load failed to process llog -2 or -5 on client
-
- Resolved
-
-
LU-11426 2/2 Olafs agree: changelog entries are emitted out of order
-
- Resolved
-
-
LU-12134 llog_reader (incorrectly?) reports a corrupted changelog
-
- Closed
-
- is related to
-
LU-12098 changelog_deregister appears not to reliably clear all changelog entries
-
- Resolved
-
We're also seeing this with the following combination - sles11 sp4 client
> lustre-client-2.12.4-1.x86_64
> robinhood-lustre-3.1.5-1.lustre2.12.x86_64
against a 2.10.8 RHEL 7.6 server
> kernel-3.10.0-957.1.3.el7_lustre.x86_64
> kmod-lustre-2.10.8-1.el7.x86_64
> kmod-lustre-osd-ldiskfs-2.10.8-1.el7.x86_64
> lustre-2.10.8-1.el7.x86_64
> lustre-osd-ldiskfs-mount-2.10.8-1.el7.x86_64
> lustre-resource-agents-2.10.8-1.el7.x86_64
Robinhood will run fine for so long, then stop clearing changelogs:
but nothing untoward in the MDS logs at the same time although there was a load peak and CPUs in 'wait' (grafana plots from collectd can be attached). LNet traffic from the MDS wasn't anything particularly impressive.
The robinhood server is one of the few machines we've tested with a 2.12 client - we're still mostly 2.10.8 or 2.7 (Cray) as we've still got a 2.5.x filesystem (sonnexion) and don't want to move too far ahead with clients.