[LU-3446] changelog index reset on MDT restart Created: 07/Jun/13 Updated: 20/Aug/13 Resolved: 20/Aug/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ned Bass | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, changelog | ||
| Environment: |
mds: lustre-2.4.0-RC2_2chaos_2.6.32_358.6.1.3chaos.ch5.1.ch5.1.x86_64 |
||
| Severity: | 3 |
| Rank (Obsolete): | 8596 |
| Description |
|
After a maintenance downtime in which we updated our vesta Lustre servers from 2.4.0-RC1_3chaos to 2.4.0-RC2_2chaos, the changelog current index reverted to zero. The registered user cl1 retained its highest index value: # vesta-mds1 /root > lctl get_param mdd.fsv MDT0000.changelog_users mdd.fsv-MDT0000.changelog_users= current index: 9158128 ID index cl1 445461827 Here is a debug log entry from when the MDS started: 00000004:00000080:1.0:1370534969.929148:0:17901:0:(mdd_device.c:308:mdd_changelog_llog_init()) changelog starting index=0 LLNL-bug-id: TOSS-2103 |
| Comments |
| Comment by Peter Jones [ 08/Jun/13 ] |
|
Bobijam Could you please help with this one? Thanks Peter |
| Comment by Zhenyu Xu [ 09/Jun/13 ] |
|
the mdd_device.c:308 is from mdd initialize LLOG_CHANGELOG_ORIG_CTXT, after that, we come to LLOG_CHANGELOG_USER_ORIG_CTXT initialization, where in changelog_user_init_cb(), it sets mdd->mdd_cl.mc_lastuser, then we go back to mdd_change_llog_init() /* If we have registered users, assume we want changelogs on */ if (mdd->mdd_cl.mc_lastuser > 0) { rc = mdd_changelog_on(env, mdd, 1); if (rc < 0) GOTO(out_uclose, rc); } And mdd_changelog_on() |
| Comment by Andreas Dilger [ 10/Jun/13 ] |
|
Bobijam, does your comment indicate that this should not be an actual problem? Do we have a ChangeLog test that verifies the ChangeLog index after a restart? |
| Comment by Zhenyu Xu [ 13/Jun/13 ] |
|
Ned, When did the cl1 changelog_users read happen, after the MDT restart? Here's my local VM situation. [10:38 AM]root@test1:~/work/lustre/lustre/tests/ # lctl get_param mdd.lustre-MDT0000.changelog_users mdd.lustre-MDT0000.changelog_users=current index: 58 ID index cl1 0 cl2 37 [10:42 AM]root@test1:~/work/lustre/lustre/tests/ # umount /mnt/mds1 [10:43 AM]root@test1:~/work/lustre/lustre/tests/ # mount -t lustre /dev/sdb /mnt/mds1 [10:43 AM]root@test1:~/work/lustre/lustre/tests/ # lctl get_param mdd.lustre-MDT0000.changelog_users mdd.lustre-MDT0000.changelog_users=current index: 58 ID index cl1 0 cl2 37 |
| Comment by Ned Bass [ 13/Jun/13 ] |
|
Bobijam, yes, the cl1 changelog_users users in the description happened after the MDT restart. Before this update, the current index was preserved between MDT restarts (as far as I know). We had a reboot yesterday, and, now that I check, it appears that the index was again reset: # vesta-mds1 /root > lctl get_param mdd.fsv-MDT0000.changelog_users mdd.fsv-MDT0000.changelog_users= current index: 444830 ID index cl1 445461827 cl2 18248038 |
| Comment by Ned Bass [ 13/Jun/13 ] |
|
It seems that if the registered changelog user has cleared all changelog records when the MDT is stopped, then when the MDT restarts changelog_init_cb() is never called from llog_reverse_process(), so mdd->mdd_cl.mc_index is left with 0. This reproduces the problem in a VM for me: FSTYPE=zfs llmount.sh
lctl --device lustre-MDT0000 changelog_register
touch /mnt/lustre/{1,2,3,4}
lfs changelog_clear lustre-MDT0000 cl1 4
umount /mnt/mds1
mount.lustre lustre-mdt1/mdt1 /mnt/mds1
lctl get_param mdd.lustre-MDT0000.changelog_users
|
| Comment by Ned Bass [ 13/Jun/13 ] |
|
Path for master: |
| Comment by Christopher Morrone [ 17/Jun/13 ] |
|
This needs to be fixed for 2.4.1 as well, so you should really added that Fix Version as well. |
| Comment by Jodi Levi (Inactive) [ 16/Jul/13 ] |
|
When should we expect to see the patch for 2.4.1? |
| Comment by Ned Bass [ 16/Jul/13 ] |
|
Patch for b2_4: |
| Comment by Peter Jones [ 20/Aug/13 ] |
|
Landed for 2.4.1 and 2.5 |