[LU-12098] changelog_deregister appears not to reliably clear all changelog entries Created: 22/Mar/19 Updated: 15/Jul/19 Resolved: 21/May/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.6 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Olaf Faaland | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ORNL, llnl | ||
| Environment: |
Lustre 2.10.6_2.chaos |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
After lctl changelog_deregister, changelogs were not fully cleared: [root@porteri:~]# pdsh -g mds 'lctl get_param "*.*.changelog_users"' | dshbak -c ---------------- eporter81 ---------------- mdd.lustre3-MDT0000.changelog_users=current index: 3856956771 ID index ---------------- eporter82 ---------------- mdd.lustre3-MDT0001.changelog_users=current index: 5663390742 ID index [root@porteri:~]# pdsh -g mds 'lctl get_param "*.*.changelog_size"' | dshbak -c ---------------- eporter81 ---------------- mdd.lustre3-MDT0000.changelog_size=15384080 ---------------- eporter82 ---------------- mdd.lustre3-MDT0001.changelog_size=25036880872 |
| Comments |
| Comment by Olaf Faaland [ 22/Mar/19 ] |
|
For our tracking, our internal ticket is TOSS-4471 |
| Comment by Olaf Faaland [ 22/Mar/19 ] |
|
I have not yet been able to test to see if it can be produced in master or 2.12. I will attempt that next week. |
| Comment by Olaf Faaland [ 22/Mar/19 ] |
|
In the case of one of MDT0001, the 'lctl deregister_changelog' command was issued and the MDS crashed. However, MDT0000 never went down. |
| Comment by Olaf Faaland [ 22/Mar/19 ] |
|
We can work around this in the future, on Lustre 2.10, by issuing an "lfs changelog_clear" first. For these MDTs, should I re-register a changelog user and then issue changelog_clear and changelog_deregister? |
| Comment by Peter Jones [ 22/Mar/19 ] |
|
Sébastien Could you please advise here? Thanks Peter |
| Comment by Sebastien Buisson [ 25/Mar/19 ] |
|
Hi, My understanding is that changelog_deregister is supposed to issue a clear first, so there should be no difference if clear is not called before. And I am not sure registering a new changelog user to issue a clear and then deregister would help, because the newly created changelog user will have its index pointing to the latest existing index on the MDT, hence preventing to clear older indexes. The only way I can see to clear those remaining changelog records is to stop then restart the MDTs. Indeed, upon stop, the changelog record files are completely cleaned up. In your case, there might be a bug which prevents former llog files from being deleted while MDT is online. Changelogs records are split into multiple llog files on MDTs, each one containing up to around 16k records. Once an llog file becomes full (ie contains ~16k records), an additional llog file is created to store the next Changelog records. When a clear is issued for index N, all llog files for which maximum record index is lower than N are removed. Only remain the llog files that have records with index greater than or equal to N. If the llog file containing record N is the current, latest one, then it cannot be removed, because it is being used. No matter how many records are in there, the space consumed on the MDT by this llog file will be freed later, when a new clear is issued. In order to investigate, could you please access the impacted MDTs as ZFS, and run the following command to help finding the actual location of the changelog entries on the MDTs: # llog_reader /zfs_mount/changelog_catalog It should output something like this: Header size : 8192 Time : Mon Mar 25 15:36:53 2019 Number of records: 4 Target uuid : ----------------------- #01 (064)id=[0x6:0x1:0x0]:0 path=oi.1/0x1:0x6:0x0 #02 (064)id=[0x7:0x1:0x0]:0 path=oi.1/0x1:0x7:0x0 #03 (064)id=[0x8:0x1:0x0]:0 path=oi.1/0x1:0x8:0x0 #04 (064)id=[0x9:0x1:0x0]:0 path=oi.1/0x1:0x9:0x0 What is important is the number of records (4 in this example), and the path to every llog file holding changelog records. In this example the files are oi.1/0x1:0x{6,7,8,9}:0x0. Then, for each of these files, it would be interesting to see how many records they contain: # for i in 6 7 8 9; do llog_reader /zfs_mount/oi.1/0x1:0x${i}:0x0 | grep records ; done
Number of records: 16577
Number of records: 16373
Number of records: 16320
Number of records: 10731
Thanks. |
| Comment by Olaf Faaland [ 27/Mar/19 ] |
That is my understanding too, but I believe the changelog_deregister first disables the changelog user. So if the MDT goes down while the changelog_deregister is occurring, how could one resume the clearing of the changelog records? I believe that they cannot be cleared at that point, at least by the user. But I am not certain of that. On the other hand, I believe the changelog_clear can just be invoked again after the MDT comes back up. I haven't had a machine to test either theory on, nor time to just walk the code to see, yet.
Do you recall where in the code this occurs? Is this for empty changelog record files (Number of records: 0) or for orphaned records like the ones we have?
I verified all the changelog_catalog entries point to existing files, and all those files contain changelog records. I'll post more details later today. |
| Comment by Olaf Faaland [ 28/Mar/19 ] |
|
Here is a summary for each of the two MDTs. I counted the number of records of type "changelog" found within the files the changelog_catalog points to. MIN is the minimum number of records seen after looking at all the files; MAX is same except maximum number, and AVG and TOT are average and total (sum). Note that porter81 is hosting MDT 0 and porter82 is hosting MDT 1 ---------------- eporter81 ---------------- Changelog_catalog header: Header size : 8192 Number of records: 2 Missing changelog files: 0 Existing changelog files: 2 checking oi.1/0x1:0x12b4a:0x0 checking oi.1/0x1:0x12b4b:0x0 all files checked MAX 59712 MIN 17606 TOT 77318 AVG 38659 ---------------- eporter82 ---------------- Changelog_catalog header: Header size : 8192 Number of records: 2911 Missing changelog files: 0 Existing changelog files: 2911 checking oi.1/0x1:0x2e14f:0x0 checking oi.1/0x1:0x2e150:0x0 <... 2909 rows removed ...> all files checked MAX 64101 MIN 11590 TOT 185533101 AVG 63735 |
| Comment by Olaf Faaland [ 28/Mar/19 ] |
|
Hi Sebastien, |
| Comment by Sebastien Buisson [ 28/Mar/19 ] |
|
Hi, changelog_deregister calls mdd_changelog_user_purge(). This one will invoke mdd_changelog_user_purge_cb(), which is responsible for (a) cancelling changelog records with indexes lower than that of the changelog user if not used by other user and (b) unregistering this changelog user. After that, mdd_changelog_user_purge() calls mdd_changelog_llog_cancel() to (c) purge changelog files if necessary. So you are right, if the MDS crashes in-between (b) and (c), you can end up with cancelled changelogs that are not freed. By introducing a new fail_loc for test purpose, I was able to make my MDS crash between step (b) and step (c). After reboot, the changelog user was not registered any more, but all its changelog entries were still on the MDT. So I registered a new changelog user, then immediately deregistered it, and it successfully cleared the pending changelog entries. It confirm that this technique is valid to get rid of remaining changelog entries. # lfs changelog_clear lustre-MDT0001 cl1 0 lfs changelog_clear: cannot purge records for 'cl1': No such file or directory (2) changelog_clear: no changelog user: cl1 |
| Comment by Sebastien Buisson [ 29/Mar/19 ] |
|
>> Indeed, upon stop, the changelog record files are completely cleaned up. The code path is the following: mdd_device_shutdown
mdd_changelog_fini
llog_cat_close
llog_destroy
So when an MDT is unmounted, llog_destroy is called for Changelog files that are referenced in a catalog file, and for which the number of records they contain is set to 0. |
| Comment by Sebastien Buisson [ 29/Mar/19 ] |
|
The information you gave with 'lctl get_param "..changelog_users"' and 'lctl get_param "..changelog_size"' on the one hand, and the information contained in the catalog and changelogs files (number of records, etc) on the other hand, is consistent with what I have when using my reproducer that crashes between steps (b) and (c). To sum up the situation, no more changelog users are registered, but the catalog file points to valid changelog files that are full of entries ('Number of records:' is not 0). On my test system I was able to get rid of these remaining changelogs not used anymore by any user by registering a new changelogs user and then deregistering it. # lctl get_param mdd.seb-MDT000*.changelog_size mdd.seb-MDT0000.changelog_size=0 mdd.seb-MDT0001.changelog_size=13186960 After de-registering: # lctl get_param mdd.seb-MDT000*.changelog_size mdd.seb-MDT0000.changelog_size=0 mdd.seb-MDT0001.changelog_size=25384 |
| Comment by Olaf Faaland [ 31/Mar/19 ] |
|
Sebastien, |
| Comment by Olaf Faaland [ 10/Apr/19 ] |
|
Hi Sebastien, Are you planning to re-order the de-register code to prevent this problem? |
| Comment by Sebastien Buisson [ 11/Apr/19 ] |
|
Hi Olaf, This is very good news! Regarding the way to solve this issue, the current implementation might be imposed by concurrency and/or deadlock concerns. So I was thinking about modifying 'changelog_deregister' so that it calls internally the 'changelog_clear' routines before actually proceeding to the consumer deregister. This way, if a crash occurs during the process, we could not end up with changelog entries not cleared but changelog consumer deregistered. I will try to push a patch for this. Thanks, |
| Comment by Gerrit Updater [ 16/Apr/19 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/34688 |
| Comment by Gerrit Updater [ 21/May/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34688/ |
| Comment by Peter Jones [ 21/May/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 21/May/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34921 |
| Comment by Gerrit Updater [ 08/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34921/ |