<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:14:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14958] configurable hash table size for jbd2</title>
                <link>https://jira.whamcloud.com/browse/LU-14958</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;revoke hash table can be enourmous with multi-GB journal. this may result in millions of revoke records which are loaded and inserted into specific hashtable during journal replay.&lt;br/&gt;
currently the revoke hashtable&apos;s size is hard-coded as 256, thus every slot may get too many records.&lt;/p&gt;

&lt;p&gt;a simple benchmark of that code:&lt;br/&gt;
1048576 - 95 seconds&lt;br/&gt;
2097152 - 580 seconds&lt;br/&gt;
in the fields it can be upto 30M records to find/insert.&lt;/p&gt;

&lt;p&gt;with 8192 buckets in the hash table:&lt;br/&gt;
4194304 - 59 seconds&lt;br/&gt;
8388608 - 247 seconds&lt;/p&gt;</description>
                <environment></environment>
        <key id="65770">LU-14958</key>
            <summary>configurable hash table size for jbd2</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="bzzz">Alex Zhuravlev</reporter>
                        <labels>
                    </labels>
                <created>Mon, 23 Aug 2021 19:40:19 +0000</created>
                <updated>Fri, 15 Sep 2023 04:13:23 +0000</updated>
                            <resolved>Sat, 29 Apr 2023 03:14:34 +0000</resolved>
                                                    <fixVersion>Lustre 2.16.0</fixVersion>
                    <fixVersion>Lustre 2.15.3</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="311239" author="bzzz" created="Thu, 26 Aug 2021 05:09:37 +0000"  >&lt;p&gt;same benchmark with rhashtable:&lt;/p&gt;

&lt;p&gt;1048576 in 1854304 (2 s)&lt;br/&gt;
2097152 in 3750792 (3 s)&lt;br/&gt;
4194304 in 7594669 (7 s)&lt;br/&gt;
8388608 in 15385228 (15 s)&lt;br/&gt;
16777216 in 31040219 (31 s)&lt;br/&gt;
33554432 in 62284580 (62 s)&lt;/p&gt;
</comment>
                            <comment id="312229" author="bzzz" created="Tue, 7 Sep 2021 17:03:13 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=pjones&quot; class=&quot;user-hover&quot; rel=&quot;pjones&quot;&gt;pjones&lt;/a&gt;, I&apos;m waiting for a feedback from linux-ext4@ and then I plan to make patches for master&lt;/p&gt;</comment>
                            <comment id="314593" author="gerrit" created="Mon, 4 Oct 2021 15:22:47 +0000"  >&lt;p&gt;&quot;Alex Zhuravlev &amp;lt;bzzz@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/45122&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45122&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14958&quot; title=&quot;configurable hash table size for jbd2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14958&quot;&gt;&lt;del&gt;LU-14958&lt;/del&gt;&lt;/a&gt; kernel: use rhashtable for revoke records in jbd2&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: cfdf255409758332bb0fc406c8b86578bbf6070b&lt;/p&gt;</comment>
                            <comment id="314718" author="bzzz" created="Tue, 5 Oct 2021 14:55:18 +0000"  >&lt;p&gt;I did some measurements and tend to think the current approach should be changed a bit. I ran sanity-benchmark, it took 1724 seconds, all lookups in revoke table were counted for runtime (in contrast with journal replay):&lt;br/&gt;
dm-0-8: 4666 lookups 7637 insertions in revoke table&lt;br/&gt;
only 7637 lookups and 4666 insertions over 1700 seconds on MDS. currently insertion is simple as:&lt;br/&gt;
       spin_lock(&amp;amp;journal-&amp;gt;j_revoke_lock);&lt;br/&gt;
       list_add(&amp;amp;record-&amp;gt;hash, hash_list);&lt;br/&gt;
       spin_unlock(&amp;amp;journal-&amp;gt;j_revoke_lock);&lt;br/&gt;
and this must be cheaper than an insertion into rhashtable which does lookup internally. I think there will be runtime performance penalty if we switch to rhashtable. probably we need a more flexible schema when rhashtable is used during journal replay and a simple list during regular operations.&lt;/p&gt;</comment>
                            <comment id="317922" author="eaujames" created="Wed, 10 Nov 2021 21:14:23 +0000"  >&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;The CEA hit that issue on a ClusterStore version &quot;2.12.4.2_cray&quot; with a 4G journal:&lt;/p&gt;

&lt;p&gt;Robinhood stops to dequeue changelogs because of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14158&quot; title=&quot;lfs changelog do not display old changelog after changelog_catalog  wrapped arround&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14158&quot;&gt;&lt;del&gt;LU-14158&lt;/del&gt;&lt;/a&gt;. After one night changelog_catalog was 50% full.&lt;br/&gt;
We try to deregister the changelog but the cancel process was too long (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt;). So we decide to crash the MDT.&lt;br/&gt;
We were unable to mount the target. We identify that mount process got stuck in journal recovery with &quot;perf&quot; (find_revoke_record).&lt;br/&gt;
We try e2fsck and it got stuck in journal recovery too. We identify (with strace) that the recovery process is not stuck but too slow (several days)&lt;/p&gt;

&lt;p&gt;To get back the filesystem in production quickly (with current IO losses):&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;we save the raw journal into a file&lt;/li&gt;
	&lt;li&gt;remove the need_recovery and has_journal flag&lt;/li&gt;
	&lt;li&gt;e2fsck -f to apply the journal removing and correct inconsistencies&lt;/li&gt;
	&lt;li&gt;reapply the has_journal ext4 flag to recreate the journal&lt;/li&gt;
	&lt;li&gt;kill/crash all the current jobs (current IOs will be lost)&lt;/li&gt;
	&lt;li&gt;mount lustre&lt;/li&gt;
	&lt;li&gt;execute an LFSCK on the MDT to correct lustre inconsistencies&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;There was &amp;gt;64M revoke entries inside the jbd2 journal.&lt;br/&gt;
For now we think this is the changelog_deregister that causes the creation of so many revoke records.&lt;/p&gt;</comment>
                            <comment id="318088" author="eaujames" created="Fri, 12 Nov 2021 15:01:31 +0000"  >&lt;p&gt;I successfully reproduced the issue on a 2.12.7 LTS with changelog_deregister and with a 1228M journal:&lt;/p&gt;

&lt;p&gt;I executed changelog_deregister with 17794/64768 entries in changelog_catalog (27%, 17794 llog plain files).&lt;br/&gt;
The revoke records appeared to increase linearly while removing llog plain: 2150 revoke_entries/s with a removing speed of -56 llog/min&lt;br/&gt;
When I crashed the MDS there was ~20M of revoke entries in the journal, the journal recovery took about 40min. &lt;/p&gt;</comment>
                            <comment id="318089" author="bzzz" created="Fri, 12 Nov 2021 15:25:28 +0000"  >&lt;p&gt;many thanks for the report, I&apos;m going to try this locally.&lt;/p&gt;</comment>
                            <comment id="318256" author="eaujames" created="Mon, 15 Nov 2021 14:57:43 +0000"  >&lt;p&gt;I retest the case with a 2.12.7 + &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt;&apos;s patch:&lt;/p&gt;

&lt;p&gt;I executed changelog_deregister with 17794/64768 entries in changelog_catalog (27%, 17794 llog plain files).&lt;br/&gt;
The revoke records grow linearly while removing llog plain: 109 revoke_entries/s with a removing speed of -2160 llog/min&lt;br/&gt;
When I crashed the MDS there was ~50k of revoke entries in the journal, the journal recovery took about 1s. &lt;/p&gt;

&lt;p&gt;So the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt; seems to resolve the issue for &quot;changelog_deregister&quot;.&lt;/p&gt;</comment>
                            <comment id="318261" author="bzzz" created="Mon, 15 Nov 2021 15:54:06 +0000"  >&lt;p&gt;in case you have time/wish, could you please try to collect more data? with NO &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt; patch make changelog_deregister with a very high number of changelog records (like in your example) and dump cat /proc/fs/jbd2/&amp;lt;mds device&amp;gt;/info just before crashing MDS?&lt;br/&gt;
the revoke records are generated when metadata blocks are being freed, so there must be same number of revoke records but time distribution is very different, I guess.&lt;/p&gt;</comment>
                            <comment id="318319" author="eaujames" created="Tue, 16 Nov 2021 09:43:32 +0000"  >&lt;p&gt;Hello,&lt;br/&gt;
I have retested without the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt; (it is same set of changelogs each time: I untar an archive to restore changelog state).&lt;br/&gt;
This time, I waited the end of the deregister (like in the test with  &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt;). It took 154min to complete.&lt;br/&gt;
Here the &quot;/proc/fs/jbd2/info&quot; before the crash:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;1858 transactions (1850 requested), each up to 262144 blocks
average: 
  0ms waiting for transaction
  0ms request delay
  4988ms running transaction
  8ms transaction was being locked
  0ms flushing data (in ordered mode)
  32ms logging transaction
  43059us average transaction commit time
  392638 handles per transaction
  49 blocks per transaction
  50 logged blocks per transaction
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Before the crash there was 30489150 revoke records in the on disk journal (count with &quot;logdump -a&quot;, see  &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/41380/41380_jbd2_debugfs.gz&quot; title=&quot;jbd2_debugfs.gz attached to LU-14958&quot;&gt;jbd2_debugfs.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;).&lt;br/&gt;
The recovery took 98min.&lt;/p&gt;

&lt;p&gt;I added a &quot;perf probe&quot; on insert_revoke_hash() and I record the some events during the deregister:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;   100.00%   100.00%  (ffffffffc0d3ea10)
            |
            ---__GI___ioctl
		system_call_fastpath+37
		sys_ioctl+161
		do_vfs_ioctl+928
		obd_class_ioctl+210
		class_handle_ioctl+6349
		mdt_iocontrol+1516
		mdd_iocontrol+966
		mdd_changelog_clear+1619
		mdd_changelog_llog_cancel+208
		llog_changelog_cancel+84
		llog_cat_process+46
		llog_cat_process_or_fork+481
		llog_process_or_fork+188
		llog_process_thread+2143
		llog_cat_process_cb+151
		llog_handle_put+72
		llog_osd_close+78
		lu_object_put+197
		lu_object_free+104
		osd_object_delete+468
		iput+252
		evict+180
		ldiskfs_evict_inode+1418
		ldiskfs_truncate+784
               |          
               |--60.87%--ldiskfs_ind_truncate+689
               |          |          
               |           --60.83%--ldiskfs_free_branches
               |                     |          
               |                      --60.75%--ldiskfs_free_branches+247
               |                                ldiskfs_free_branches+653
               |                                |          
               |                                |--36.95--ldiskfs_free_data+323
							  ldiskfs_clear_blocks+401
							  ldiskfs_free_blocks+411
							  __ldiskfs_forget+290
							  insert_revoke_hash+1
               |                                |          
               |                                |          
               |                                |          
               |                                |          
               |                                |          
               |                                 --23.81%--ldiskfs_free_data+275
							   ldiskfs_clear_blocks+401
							   ldiskfs_free_blocks+411
							   __ldiskfs_forget+290
							   insert_revoke_hash+1
               |          
                --38.68%--ldiskfs_ind_truncate+785
                          |          
				ldiskfs_free_branches+247
				ldiskfs_free_branches+653
				ldiskfs_free_data+323
				ldiskfs_clear_blocks+401
				ldiskfs_free_blocks+411
				__ldiskfs_forget+290
				insert_revoke_hash+1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Something I noticed is that the on disks revokes records number is stable after the deregister (even if we do some op on the MDT). It look like some kind of leak for me, but I don&apos;t understand when this records are supposed to be cancelled.&lt;/p&gt;</comment>
                            <comment id="318320" author="eaujames" created="Tue, 16 Nov 2021 09:56:47 +0000"  >&lt;p&gt;Concerning the difference between the tests with and without the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt;. At the end of the the changelog_deregister:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;with the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt;: I have 50k on disk revoke records ( ~3 revoke records per llog plain)&lt;/li&gt;
	&lt;li&gt;without the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt;: I have 30M on disk revoke records ( ~1700 revoke records per llog plain)&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="318321" author="eaujames" created="Tue, 16 Nov 2021 10:06:48 +0000"  >&lt;p&gt;Here the &quot;dumpe2fs -h&quot; after recovery:  &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/41381/41381_dumpe2fs.txt&quot; title=&quot;dumpe2fs.txt attached to LU-14958&quot;&gt;dumpe2fs.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;</comment>
                            <comment id="318327" author="bzzz" created="Tue, 16 Nov 2021 13:15:27 +0000"  >&lt;blockquote&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
1858 transactions (1850 requested), each up to 262144 blocks
average: 
  0ms waiting &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; transaction
  0ms request delay
  4988ms running transaction
  8ms transaction was being locked
  0ms flushing data (in ordered mode)
  32ms logging transaction
  43059us average transaction commit time
  392638 handles per transaction
  49 blocks per transaction
  50 logged blocks per transaction
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;/blockquote&gt;

&lt;p&gt;thanks a lot! and, if possible, the same with  &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt; applied if you have few spare cycles. thanks in advance!&lt;/p&gt;

&lt;p&gt;I think this confirms the theory that basically &quot;deregistiring&quot; is CPU bound and produces a lot of tiny transaction which aren&apos;t checkpointed (I guess there is no need to do so - no memory pressure as number of modified blocks is tiny). given they aren&apos;t checkpointed by the crash, JBD has to replay them all and need to skip revoked blocks so fill/lookup big revoke table.&lt;br/&gt;
the situation changes with &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt; as the process is less CPU-bound and generates bigger transactions and (somehow) transactions get checkpointed more frequently leaving less to replay.&lt;/p&gt;

&lt;p&gt;the important question here is  whether we still need to fix JBD.. I tend to think so as there are another use cases when external processes (like llsom_sync) may consume and clear changelog at high rate and this would result in tiny transactions as before &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt; patch.&lt;/p&gt;</comment>
                            <comment id="318346" author="eaujames" created="Tue, 16 Nov 2021 16:15:33 +0000"  >&lt;p&gt;Here the &quot;/proc/fs/jbd2/info&quot; with the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14688&quot; title=&quot;Changelog cancel improvement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14688&quot;&gt;&lt;del&gt;LU-14688&lt;/del&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;131 transactions (127 requested), each up to 262144 blocks
average: 
  0ms waiting for transaction
  0ms request delay
  4770ms running transaction
  13ms transaction was being locked
  0ms flushing data (in ordered mode)
  60ms logging transaction
  91593us average transaction commit time
  826 handles per transaction
  372 blocks per transaction
  373 logged blocks per transaction
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="365189" author="gerrit" created="Wed, 8 Mar 2023 03:27:47 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/45122/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/45122/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14958&quot; title=&quot;configurable hash table size for jbd2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14958&quot;&gt;&lt;del&gt;LU-14958&lt;/del&gt;&lt;/a&gt; kernel: use rhashtable for revoke records in jbd2&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: c3bb2b778d6b40a5cecb01993b55fcc107305b4a&lt;/p&gt;</comment>
                            <comment id="370389" author="gerrit" created="Mon, 24 Apr 2023 19:59:47 +0000"  >&lt;p&gt;&quot;Jian Yu &amp;lt;yujian@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50730&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50730&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14958&quot; title=&quot;configurable hash table size for jbd2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14958&quot;&gt;&lt;del&gt;LU-14958&lt;/del&gt;&lt;/a&gt; kernel: use rhashtable for revoke records in jbd2&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 53e110a884423b137b44ea44b0f1327b1535bfa1&lt;/p&gt;</comment>
                            <comment id="370967" author="gerrit" created="Sat, 29 Apr 2023 01:47:27 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50730/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50730/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14958&quot; title=&quot;configurable hash table size for jbd2&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14958&quot;&gt;&lt;del&gt;LU-14958&lt;/del&gt;&lt;/a&gt; kernel: use rhashtable for revoke records in jbd2&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 52bc546e91e415315e6cf9a46608264122e64ef3&lt;/p&gt;</comment>
                            <comment id="370970" author="pjones" created="Sat, 29 Apr 2023 03:14:34 +0000"  >&lt;p&gt;Seems to be merged for 2.15.3 and 2.16&lt;/p&gt;</comment>
                            <comment id="386096" author="adilger" created="Fri, 15 Sep 2023 04:13:23 +0000"  >&lt;p&gt;It looks like this patch solved the problem with the kernel journal revoke record handling, but there is still a similar problem in e2fsprogs handling of journal replay.  We hit an issue with slow journal recovery while deleting a large number of changelogs, and &quot;&lt;tt&gt;tune2fs&lt;/tt&gt;&quot; and &quot;&lt;tt&gt;e2fsck&lt;/tt&gt;&quot; were hung in journal recovery for hours before being interrupted.  The problem was eventually fixed by updating server to include this fix in ldiskfs and then mounting the filesystem to do journal recovery in the kernel.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="77911">LU-17117</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="41381" name="dumpe2fs.txt" size="1982" author="eaujames" created="Tue, 16 Nov 2021 10:06:44 +0000"/>
                            <attachment id="41380" name="jbd2_debugfs.gz" size="111403873" author="eaujames" created="Tue, 16 Nov 2021 09:29:38 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i022j3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>