[LU-3446] changelog index reset on MDT restart Created: 07/Jun/13  Updated: 20/Aug/13  Resolved: 20/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Critical
Reporter: Ned Bass Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: HB, changelog
Environment:

mds: lustre-2.4.0-RC2_2chaos_2.6.32_358.6.1.3chaos.ch5.1.ch5.1.x86_64


Severity: 3
Rank (Obsolete): 8596

 Description   

After a maintenance downtime in which we updated our vesta Lustre servers from 2.4.0-RC1_3chaos to 2.4.0-RC2_2chaos, the changelog current index reverted to zero. The registered user cl1 retained its highest index value:

# vesta-mds1 /root > lctl get_param mdd.fsv MDT0000.changelog_users                                                                     
mdd.fsv-MDT0000.changelog_users=
current index: 9158128
ID    index
cl1   445461827

Here is a debug log entry from when the MDS started:

00000004:00000080:1.0:1370534969.929148:0:17901:0:(mdd_device.c:308:mdd_changelog_llog_init()) changelog starting index=0

LLNL-bug-id: TOSS-2103



 Comments   
Comment by Peter Jones [ 08/Jun/13 ]

Bobijam

Could you please help with this one?

Thanks

Peter

Comment by Zhenyu Xu [ 09/Jun/13 ]

the mdd_device.c:308 is from mdd initialize LLOG_CHANGELOG_ORIG_CTXT, after that, we come to LLOG_CHANGELOG_USER_ORIG_CTXT initialization, where in changelog_user_init_cb(), it sets mdd->mdd_cl.mc_lastuser, then we go back to mdd_change_llog_init()

        /* If we have registered users, assume we want changelogs on */
        if (mdd->mdd_cl.mc_lastuser > 0) {
                rc = mdd_changelog_on(env, mdd, 1);
                if (rc < 0)
                        GOTO(out_uclose, rc);
        }

And mdd_changelog_on() > mdd_changelog_write_head(), it adds back mdd>mdd_cl.mc_index.

Comment by Andreas Dilger [ 10/Jun/13 ]

Bobijam, does your comment indicate that this should not be an actual problem? Do we have a ChangeLog test that verifies the ChangeLog index after a restart?

Comment by Zhenyu Xu [ 13/Jun/13 ]

Ned,

When did the cl1 changelog_users read happen, after the MDT restart?

Here's my local VM situation.

[10:38 AM]root@test1:~/work/lustre/lustre/tests/
# lctl get_param mdd.lustre-MDT0000.changelog_users
mdd.lustre-MDT0000.changelog_users=current index: 58
ID    index
cl1   0
cl2   37

[10:42 AM]root@test1:~/work/lustre/lustre/tests/
# umount /mnt/mds1

[10:43 AM]root@test1:~/work/lustre/lustre/tests/
# mount -t lustre /dev/sdb /mnt/mds1

[10:43 AM]root@test1:~/work/lustre/lustre/tests/
# lctl get_param mdd.lustre-MDT0000.changelog_users
mdd.lustre-MDT0000.changelog_users=current index: 58
ID    index
cl1   0
cl2   37
Comment by Ned Bass [ 13/Jun/13 ]

Bobijam, yes, the cl1 changelog_users users in the description happened after the MDT restart.

Before this update, the current index was preserved between MDT restarts (as far as I know).

We had a reboot yesterday, and, now that I check, it appears that the index was again reset:

# vesta-mds1 /root > lctl get_param mdd.fsv-MDT0000.changelog_users
mdd.fsv-MDT0000.changelog_users=
current index: 444830
ID    index
cl1   445461827
cl2   18248038
Comment by Ned Bass [ 13/Jun/13 ]

It seems that if the registered changelog user has cleared all changelog records when the MDT is stopped, then when the MDT restarts changelog_init_cb() is never called from llog_reverse_process(), so mdd->mdd_cl.mc_index is left with 0.

This reproduces the problem in a VM for me:

FSTYPE=zfs llmount.sh
lctl --device lustre-MDT0000 changelog_register
touch /mnt/lustre/{1,2,3,4}
lfs changelog_clear lustre-MDT0000 cl1 4
umount /mnt/mds1
mount.lustre lustre-mdt1/mdt1  /mnt/mds1
lctl get_param mdd.lustre-MDT0000.changelog_users
Comment by Ned Bass [ 13/Jun/13 ]

Path for master:

http://review.whamcloud.com/#change,6642

Comment by Christopher Morrone [ 17/Jun/13 ]

This needs to be fixed for 2.4.1 as well, so you should really added that Fix Version as well.

Comment by Jodi Levi (Inactive) [ 16/Jul/13 ]

When should we expect to see the patch for 2.4.1?

Comment by Ned Bass [ 16/Jul/13 ]

Patch for b2_4:

http://review.whamcloud.com/#/c/7012/

Comment by Peter Jones [ 20/Aug/13 ]

Landed for 2.4.1 and 2.5

Generated at Sat Feb 10 01:33:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.