[LU-11205] Failure to clear the changelog for user 1 on MDT - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.6
Labels:
- ORNL
Environment:
CentOS 7.4 (3.10.0-693.2.2.el7_lustre.pl1.x86_64)

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Hello,

We're seeing the following messages on Oak's MDT in 2.10.4:

Aug 03 09:21:39 oak-md1-s2 kernel: Lustre: 11137:0:(mdd_device.c:1577:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
Aug 03 09:31:38 oak-md1-s2 kernel: Lustre: 11271:0:(mdd_device.c:1577:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22

Robinhood (also running 2.10.4) shows this:

2018/08/03 10:00:47 [13766/22] ChangeLog | ERROR: llapi_changelog_clear("oak-MDT0000", "cl1", 13975842301) returned -22
2018/08/03 10:00:47 [13766/22] EntryProc | Error -22 performing callback at stage STAGE_CHGLOG_CLR.
2018/08/03 10:00:47 [13766/16] llapi | cannot purge records for 'cl1'
2018/08/03 10:00:47 [13766/16] ChangeLog | ERROR: llapi_changelog_clear("oak-MDT0000", "cl1", 13975842303) returned -22
2018/08/03 10:00:47 [13766/16] EntryProc | Error -22 performing callback at stage STAGE_CHGLOG_CLR.
2018/08/03 10:00:47 [13766/4] llapi | cannot purge records for 'cl1'
2018/08/03 10:00:47 [13766/4] ChangeLog | ERROR: llapi_changelog_clear("oak-MDT0000", "cl1", 13975842304) returned -22
2018/08/03 10:00:47 [13766/4] EntryProc | Error -22 performing callback at stage STAGE_CHGLOG_CLR.

Oak's MDT usage is as follow:

[root@oak-md1-s2 ~]# df -h -t lustre
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/md1-rbod1-mdt0  1.3T  131G 1022G  12% /mnt/oak/mdt/0
[root@oak-md1-s2 ~]# df -i -t lustre
Filesystem                    Inodes     IUsed     IFree IUse% Mounted on
/dev/mapper/md1-rbod1-mdt0 873332736 266515673 606817063   31% /mnt/oak/mdt/0

I'm concerned that the MDT might fill up with changelogs. Could you please assist in troubleshooting this issue?
Thanks!
Stephane

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

changelog-reader.tgz
0.8 kB
26/Jun/19 12:26 PM
dk_ornl_20190328_1216.gz
4.61 MB
28/Mar/19 4:22 PM
dk_ornl_20190328.gz
4.59 MB
28/Mar/19 3:36 PM
dk.1547747365.gz
8.56 MB
17/Jan/19 6:26 PM
dk.1547747668.gz
8.58 MB
17/Jan/19 6:26 PM
dk.1547828521.gz
7.96 MB
18/Jan/19 4:32 PM
f2_llog_reader_20190328.gz
1.03 MB
28/Mar/19 3:36 PM
lu-11205-ssec.log.gz
647 kB
29/Mar/19 1:37 PM
ornl_0x1_0xd8_0x0.gz
1.19 MB
29/Mar/19 10:04 AM

Issue Links

is related to

LU-12577 chlg_load failed to process llog -2 or -5 on client

Resolved

LU-11426 2/2 Olafs agree: changelog entries are emitted out of order

Resolved

LU-12134 llog_reader (incorrectly?) reports a corrupted changelog

Closed

is related to

LU-12098 changelog_deregister appears not to reliably clear all changelog entries

Resolved

Activity

[LU-11205] Failure to clear the changelog for user 1 on MDT

James A Simmons added a comment - 07/Aug/19 1:23 PM

Was it better? In our testing it did make the problem go away. I wonder if the problem is less common.

James A Simmons added a comment - 07/Aug/19 1:23 PM Was it better? In our testing it did make the problem go away. I wonder if the problem is less common.

Olaf Weber (Inactive) added a comment - 05/Aug/19 3:40 PM

When I tested the code posted in the review earlier today, I still saw out-of-order and lost changelog records. From this and other tests it does appear that lost changelog records are out-of-order records that are not reliably (and often reliably not) returned to userspace.

Olaf Weber (Inactive) added a comment - 05/Aug/19 3:40 PM When I tested the code posted in the review earlier today, I still saw out-of-order and lost changelog records. From this and other tests it does appear that lost changelog records are out-of-order records that are not reliably (and often reliably not) returned to userspace.

James A Simmons added a comment - 02/Aug/19 2:16 PM

running lfs changelog clear 0 restores the changelog_size back to 4711216. There is some strange issue there but that is independent of this patch. The testing has gone very well!! Thanks Boyko.

James A Simmons added a comment - 02/Aug/19 2:16 PM running lfs changelog clear 0 restores the changelog_size back to 4711216. There is some strange issue there but that is independent of this patch. The testing has gone very well!! Thanks Boyko.

James A Simmons added a comment - 01/Aug/19 4:17 PM

We did some testing and saw these results. Before we started robinhood to purge things we had:

lctl get_param ..changelog_size mdd.f2-tds-MDT0000.changelog_size=4711216

Afterwards we saw

lctl get_param ..changelog_size mdd.f2-tds-MDT0000.changelog_size=928368

Does this patch help resolve some of the orphan records as well that we couldn't cleanup before?

James A Simmons added a comment - 01/Aug/19 4:17 PM We did some testing and saw these results. Before we started robinhood to purge things we had: lctl get_param . .changelog_size mdd.f2-tds-MDT0000.changelog_size=4711216 Afterwards we saw lctl get_param . .changelog_size mdd.f2-tds-MDT0000.changelog_size=928368 Does this patch help resolve some of the orphan records as well that we couldn't cleanup before?

James A Simmons added a comment - 30/Jul/19 2:23 PM - edited

Yep, setting it up right now. Thank you.

James A Simmons added a comment - 30/Jul/19 2:23 PM - edited Yep, setting it up right now. Thank you.

Alexander Boyko added a comment - 30/Jul/19 2:22 PM - edited

@James A Simmons, could you try the patch above? It also includes the reproducer for this issue.

== sanity test 160j: Verify that reordered records could be cancelled in changelog =================== 08:27:16 (1563352036)
mdd.lustre-MDT0000.changelog_mask=+hsm
mdd.lustre-MDT0001.changelog_mask=+hsm
Registered 2 changelog users: 'cl1 cl1'
fail_loc=0x8000015d
lustre-MDT0000.2 01CREAT 08:27:18.551297912 2019.07.17 0x0 t=[0x200000402:0x13:0x0] j=touch.0 ef=0xf u=0:0 nid=192.168.101.35@tcp p=[0x200000402:0x3:0x0] f2
lustre-MDT0000.3 06UNLNK 08:27:18.553672555 2019.07.17 0x1 t=[0x200000402:0xa:0x0] j=rm.0 ef=0xf u=0:0 nid=192.168.101.35@tcp p=[0x200000402:0x5:0x0] f3
lustre-MDT0000.4 14SATTR 08:27:18.554017844 2019.07.17 0x14 t=[0x200000402:0x8:0x0] j=chmod.0 ef=0xf u=0:0 nid=192.168.101.35@tcp
lustre-MDT0000.5 11CLOSE 08:27:18.555499959 2019.07.17 0x42 t=[0x200000402:0x13:0x0] j=touch.0 ef=0xf u=0:0 nid=192.168.101.35@tcp
lustre-MDT0000.1 01CREAT 08:27:18.550045424 2019.07.17 0x0 t=[0x200000402:0x12:0x0] j=touch.0 ef=0xf u=0:0 nid=192.168.101.35@tcp p=[0x200000402:0x7:0x0] 1
lustre-MDT0000.6 11CLOSE 08:27:19.562156592 2019.07.17 0x42 t=[0x200000402:0x12:0x0] j=touch.0 ef=0xf u=0:0 nid=192.168.101.35@tcp
2
lustre-MDT0000: clear the changelog for cl1 to record #2
3
lustre-MDT0000: clear the changelog for cl1 to record #3
4
lustre-MDT0000: clear the changelog for cl1 to record #4
5
lustre-MDT0000: clear the changelog for cl1 to record #5
1
lustre-MDT0000: clear the changelog for cl1 to record #1
lfs changelog_clear: cannot purge records for 'cl1': Invalid argument (22)
changelog_clear: record out of range: 1
 sanity test_160j: @@@@@@ FAIL: Fail to clear changelog record 1 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5919:error()
  = /usr/lib64/lustre/tests/sanity.sh:13140:test_160j()
  = /usr/lib64/lustre/tests/test-framework.sh:6203:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6242:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:6038:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:13145:main()
Dumping lctl log to /tmp/test_logs/1563352024/sanity.test_160j.*.1563352042.log

Alexander Boyko added a comment - 30/Jul/19 2:22 PM - edited @James A Simmons, could you try the patch above? It also includes the reproducer for this issue. == sanity test 160j: Verify that reordered records could be cancelled in changelog =================== 08:27:16 (1563352036) mdd.lustre-MDT0000.changelog_mask=+hsm mdd.lustre-MDT0001.changelog_mask=+hsm Registered 2 changelog users: 'cl1 cl1' fail_loc=0x8000015d lustre-MDT0000.2 01CREAT 08:27:18.551297912 2019.07.17 0x0 t=[0x200000402:0x13:0x0] j=touch.0 ef=0xf u=0:0 nid=192.168.101.35@tcp p=[0x200000402:0x3:0x0] f2 lustre-MDT0000.3 06UNLNK 08:27:18.553672555 2019.07.17 0x1 t=[0x200000402:0xa:0x0] j=rm.0 ef=0xf u=0:0 nid=192.168.101.35@tcp p=[0x200000402:0x5:0x0] f3 lustre-MDT0000.4 14SATTR 08:27:18.554017844 2019.07.17 0x14 t=[0x200000402:0x8:0x0] j=chmod.0 ef=0xf u=0:0 nid=192.168.101.35@tcp lustre-MDT0000.5 11CLOSE 08:27:18.555499959 2019.07.17 0x42 t=[0x200000402:0x13:0x0] j=touch.0 ef=0xf u=0:0 nid=192.168.101.35@tcp lustre-MDT0000.1 01CREAT 08:27:18.550045424 2019.07.17 0x0 t=[0x200000402:0x12:0x0] j=touch.0 ef=0xf u=0:0 nid=192.168.101.35@tcp p=[0x200000402:0x7:0x0] 1 lustre-MDT0000.6 11CLOSE 08:27:19.562156592 2019.07.17 0x42 t=[0x200000402:0x12:0x0] j=touch.0 ef=0xf u=0:0 nid=192.168.101.35@tcp 2 lustre-MDT0000: clear the changelog for cl1 to record #2 3 lustre-MDT0000: clear the changelog for cl1 to record #3 4 lustre-MDT0000: clear the changelog for cl1 to record #4 5 lustre-MDT0000: clear the changelog for cl1 to record #5 1 lustre-MDT0000: clear the changelog for cl1 to record #1 lfs changelog_clear: cannot purge records for 'cl1': Invalid argument (22) changelog_clear: record out of range: 1 sanity test_160j: @@@@@@ FAIL: Fail to clear changelog record 1 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5919:error() = /usr/lib64/lustre/tests/sanity.sh:13140:test_160j() = /usr/lib64/lustre/tests/test-framework.sh:6203:run_one() = /usr/lib64/lustre/tests/test-framework.sh:6242:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:6038:run_test() = /usr/lib64/lustre/tests/sanity.sh:13145:main() Dumping lctl log to /tmp/test_logs/1563352024/sanity.test_160j.*.1563352042.log

Gerrit Updater added a comment - 30/Jul/19 11:56 AM

Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/35650
Subject: ~~LU-11205~~ mdd: process mixing changelog records
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f9f82470637f7769884a863c8ef90d79fcdfc12f

Gerrit Updater added a comment - 30/Jul/19 11:56 AM Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/35650 Subject: LU-11205 mdd: process mixing changelog records Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f9f82470637f7769884a863c8ef90d79fcdfc12f

Alexander Boyko added a comment - 30/Jul/19 11:37 AM

I've created a fix, submitting soon.

Alexander Boyko added a comment - 30/Jul/19 11:37 AM I've created a fix, submitting soon.

James A Simmons added a comment - 22/Jul/19 3:52 PM

I tried the fix from Boyko and it didn't resolve the problem

James A Simmons added a comment - 22/Jul/19 3:52 PM I tried the fix from Boyko and it didn't resolve the problem

Stephane Thiell added a comment - 19/Jul/19 2:21 AM

Hi @aboyko

This seems very promising!! I guess all Robinhood users are hitting this issue with recent versions of Lustre. It's unclear to me if it's wasting space on the MDTs, but it's definitively very confusing for Lustre sysadmins.

Are you going to submit your patch to gerrit (http://review.whamcloud.com/) so that it can be properly reviewed by others?

Thanks again!

Stephane Thiell added a comment - 19/Jul/19 2:21 AM Hi @aboyko This seems very promising!! I guess all Robinhood users are hitting this issue with recent versions of Lustre. It's unclear to me if it's wasting space on the MDTs, but it's definitively very confusing for Lustre sysadmins. Are you going to submit your patch to gerrit ( http://review.whamcloud.com/ ) so that it can be properly reviewed by others? Thanks again!

Alexander Boyko added a comment - 15/Jul/19 12:17 PM - edited

I want to add my thoughts

mdd_changelog_store() fill the record index in sync way for mdd,

rec->cr.cr_index = ++mdd->mdd_cl.mc_index;

after that

llog_add()~~>llog_cat_add_rec~~>takes write semaphore and do llog_write_rec( ..LOG_NEXT_IDX..). Next index simply adds to the end of llog.

So assume we have two threads in same time, cr_index = 1 and cr_index = 2 before llog_add.

There is no any guarantee that first cr_index would be the first for llog_write_rec.

To guaranty the same transaction for reordered cr_index, we need to set cr_index after

llog_th = thandle_get_sub(env, th, ctxt->loc_handle->lgh_obj);

diff --git a/lustre/mdd/mdd_dir.c b/lustre/mdd/mdd_dir.c
index 3a24e21..21ad94b 100644
--- a/lustre/mdd/mdd_dir.c
+++ b/lustre/mdd/mdd_dir.c
@@ -806,13 +806,6 @@ int mdd_changelog_store(const struct lu_env *env, struct mdd_device *mdd,
        rec->cr_hdr.lrh_type = CHANGELOG_REC;
        rec->cr.cr_time = cl_time();

-       spin_lock(&mdd->mdd_cl.mc_lock);
-       /* NB: I suppose it's possible llog_add adds out of order wrt cr_index,
-        * but as long as the MDD transactions are ordered correctly for e.g.
-        * rename conflicts, I don't think this should matter. */
-       rec->cr.cr_index = ++mdd->mdd_cl.mc_index;
-       spin_unlock(&mdd->mdd_cl.mc_lock);
-
        ctxt = llog_get_context(obd, LLOG_CHANGELOG_ORIG_CTXT);
        if (ctxt == NULL)
                return -ENXIO;
@@ -821,6 +814,14 @@ int mdd_changelog_store(const struct lu_env *env, struct mdd_device *mdd,
        if (IS_ERR(llog_th))
                GOTO(out_put, rc = PTR_ERR(llog_th));

+       spin_lock(&mdd->mdd_cl.mc_lock);
+       /* NB: I suppose it's possible llog_add adds out of order wrt cr_index,
+        * but as long as the MDD transactions are ordered correctly for e.g.
+        * rename conflicts, I don't think this should matter. */
+       rec->cr.cr_index = ++mdd->mdd_cl.mc_index;
+       spin_unlock(&mdd->mdd_cl.mc_lock);
+
+
        /* nested journal transaction */
        rc = llog_add(env, ctxt->loc_handle, &rec->cr_hdr, NULL, llog_th);

Alexander Boyko added a comment - 15/Jul/19 12:17 PM - edited I want to add my thoughts mdd_changelog_store() fill the record index in sync way for mdd, rec->cr.cr_index = ++mdd->mdd_cl.mc_index; after that llog_add() >llog_cat_add_rec >takes write semaphore and do llog_write_rec( ..LOG_NEXT_IDX..). Next index simply adds to the end of llog. So assume we have two threads in same time, cr_index = 1 and cr_index = 2 before llog_add. There is no any guarantee that first cr_index would be the first for llog_write_rec. To guaranty the same transaction for reordered cr_index, we need to set cr_index after llog_th = thandle_get_sub(env, th, ctxt->loc_handle->lgh_obj); diff --git a/lustre/mdd/mdd_dir.c b/lustre/mdd/mdd_dir.c index 3a24e21..21ad94b 100644 --- a/lustre/mdd/mdd_dir.c +++ b/lustre/mdd/mdd_dir.c @@ -806,13 +806,6 @@ int mdd_changelog_store(const struct lu_env *env, struct mdd_device *mdd, rec->cr_hdr.lrh_type = CHANGELOG_REC; rec->cr.cr_time = cl_time(); - spin_lock(&mdd->mdd_cl.mc_lock); - /* NB: I suppose it's possible llog_add adds out of order wrt cr_index, - * but as long as the MDD transactions are ordered correctly for e.g. - * rename conflicts, I don't think this should matter. */ - rec->cr.cr_index = ++mdd->mdd_cl.mc_index; - spin_unlock(&mdd->mdd_cl.mc_lock); - ctxt = llog_get_context(obd, LLOG_CHANGELOG_ORIG_CTXT); if (ctxt == NULL) return -ENXIO; @@ -821,6 +814,14 @@ int mdd_changelog_store(const struct lu_env *env, struct mdd_device *mdd, if (IS_ERR(llog_th)) GOTO(out_put, rc = PTR_ERR(llog_th)); + spin_lock(&mdd->mdd_cl.mc_lock); + /* NB: I suppose it's possible llog_add adds out of order wrt cr_index, + * but as long as the MDD transactions are ordered correctly for e.g. + * rename conflicts, I don't think this should matter. */ + rec->cr.cr_index = ++mdd->mdd_cl.mc_index; + spin_unlock(&mdd->mdd_cl.mc_lock); + + /* nested journal transaction */ rc = llog_add(env, ctxt->loc_handle, &rec->cr_hdr, NULL, llog_th);

People

Assignee:: Mikhail Pershin

Reporter:: Stephane Thiell

Votes:: 3 Vote for this issue

Watchers:: 26 Start watching this issue

Dates

Created:: 03/Aug/18 5:06 PM

Updated:: 16/Apr/20 1:46 AM

Resolved:: 21/Sep/19 4:31 AM