<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:49:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12029] do not try to muck with max_sectors_kb on multipath configurations</title>
                <link>https://jira.whamcloud.com/browse/LU-12029</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Lots of reports lately how on multipath config increasing the sectors count breaks the multipath.&lt;/p&gt;

&lt;p&gt;It looks like we realy need to stop adjusting the values there and just print a warning so that the users can investigate if any larger count works and if so incorporates this into their config by some other means.&lt;/p&gt;

&lt;p&gt;There are just too many patches to list here.&lt;/p&gt;</description>
                <environment></environment>
        <key id="55014">LU-12029</key>
            <summary>do not try to muck with max_sectors_kb on multipath configurations</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Wed, 27 Feb 2019 15:18:06 +0000</created>
                <updated>Fri, 26 Jan 2024 23:47:49 +0000</updated>
                                            <version>Lustre 2.13.0</version>
                    <version>Lustre 2.10.7</version>
                    <version>Lustre 2.12.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="252278" author="chrisw" created="Tue, 30 Jul 2019 17:25:26 +0000"  >&lt;p&gt;For installations with a lot (&amp;gt;50) attached disks, the additional overhead of running l_tunedisk on every change/add event&#160;is significant: One test with an 84-disk enclosure showed a &apos;udev settle&apos; time of 55s with l_tunedisk in place and&#160;34s without.&lt;/p&gt;

&lt;p&gt;While I realize that removing 99-lustre-server.rules entirely might not be palatable for some customers, but could it at least be moved to /usr/lib/udev/rules.d so that it can be stubbed out easily? This seems like a more appropriate place&#160;for it, since that&apos;s the directory for &apos;system&apos; files, and the Lustre RPM is (IMO) the &apos;system&apos; in this case.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Chris&lt;/p&gt;</comment>
                            <comment id="277151" author="adilger" created="Mon, 10 Aug 2020 22:58:45 +0000"  >&lt;p&gt;The primary reason that the &lt;tt&gt;l_tunedisk&lt;/tt&gt; udev script exists is because &lt;tt&gt;dm_multipath&lt;/tt&gt; devices &quot;forget&quot; their &lt;tt&gt;max_sectors_kb&lt;/tt&gt; (and other) settings when they disconnect and reconnect as new SCSI devices (e.g. because of a brief cable disconnect or SCSI bus reset).  For the kernel, this results in the underlying SCSI device (e.g. &lt;tt&gt;/dev/sdb&lt;/tt&gt;) to disappear and reappear, possibly with a new device name, and reset back to the default settings.&lt;/p&gt;

&lt;p&gt;The upper-layer &lt;tt&gt;dm_multipath&lt;/tt&gt; device still reports the larger size for &lt;tt&gt;max_sectors_kb&lt;/tt&gt;, but the reconnected device has been reset to the smaller default value (no problem is hit if the user-specified &lt;tt&gt;max_sectors_kb&lt;/tt&gt; is smaller than the default). This causes SCSI errors as reported in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9551&quot; title=&quot;I/O errors when lustre uses multipath devices&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9551&quot;&gt;&lt;del&gt;LU-9551&lt;/del&gt;&lt;/a&gt; (and many other tickets) because in-flight IO is larger than what the &quot;new&quot; device queue will accept:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Mar 31 00:02:44 oss01 kernel: blk_cloned_rq_check_limits: over max size limit.
Mar 31 00:02:44 oss01 kernel: device-mapper: multipath: Failing path 8:160.
:
Mar 31 00:17:30 oss01 kernel: blk_update_request: I/O error, dev dm-17, sector 1182279680
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It would be best to fix the root of this problem in the kernel in the &lt;tt&gt;dm_multipath&lt;/tt&gt; device driver code, rather than continue to make &lt;tt&gt;l_tunedisk&lt;/tt&gt; or other udev scripts increasingly complex to handle this case.  Since this is (IMHO) a bug in the core kernel, it should also be submitted upstream.&lt;/p&gt;

&lt;p&gt;Firstly, the &quot;&lt;tt&gt;blk_cloned_rq_check_limits: over max size limit.&lt;/tt&gt;&quot; error message should be improved to print the device name, actual request size, and the current queue size limit to make it clear where the error lies (too large a request, or too small a limit).  This is just a one-line change to this function.&lt;/p&gt;

&lt;p&gt;Secondly, the &lt;tt&gt;dm_multipath&lt;/tt&gt; code needs to remember the &lt;tt&gt;max_sectors_kb&lt;/tt&gt; (and other) block device settings set on the multipath device.  It &lt;em&gt;should&lt;/em&gt; already have these parameters stored in its own queue settings, it just needs to automatically set those parameters on the underlying device when they are re-added to the multipath, &lt;b&gt;before any IO is submitted there&lt;/b&gt;.  This might benefit from having flags that indicate which parameters were tuned away from the defaults, so that it doesn&apos;t mess with parameters that have never actually been changed.&lt;/p&gt;

&lt;p&gt;This should properly handle already in-flight IOs that were generated &quot;while&quot; the device was being connected, and avoids the gap between the &quot;new&quot; device being added to the multipath and the (potentially several second long) delay when the &quot;udev&quot; script is run to (re-)tune the low-level block device queue values.  I don&apos;t &lt;em&gt;think&lt;/em&gt; that would be too hard to patch the &lt;tt&gt;dm_multipath&lt;/tt&gt; code to do this, but I haven&apos;t looked at this code in detail.&lt;/p&gt;

&lt;p&gt;AFAIK, there is already code in &lt;tt&gt;dm_multipath&lt;/tt&gt; to limit the size of &lt;tt&gt;max_sectors_kb&lt;/tt&gt; (and other parameters) to the minimum value reported by any of the underlying storage paths &lt;b&gt;at setup time&lt;/b&gt;, and there is code to pass the tuning written to &lt;tt&gt;/sys/block/dm-X/queue/max_sectors_kb&lt;/tt&gt; down to &lt;tt&gt;/sys/block/sdX,sdY,sdZ/queue/max_sectors_kb&lt;/tt&gt; &lt;b&gt;at the time it is set&lt;/b&gt;, but this essentially needs to be made &quot;persistent&quot; when a device is reconnected to the multipath.  In &lt;em&gt;theory&lt;/em&gt; it would be possible for a new path to be reintroduced with a smaller limit (e.g. connected via a &quot;worse&quot; HBA or iSCSI transport), and that new limit should also &quot;bubble up&quot; to the higher levels (if it doesn&apos;t already), but it is far more likely that the previously-tuned parameters can be set on the new device again because it was just a temporary blip in connectivity (flakey/disconnected cable) and is still the same device.&lt;/p&gt;</comment>
                            <comment id="334221" author="adilger" created="Tue, 10 May 2022 01:58:03 +0000"  >&lt;p&gt;According to RedHat, the &lt;tt&gt;/etc/multipath.conf&lt;/tt&gt; has a parameter for setting &lt;tt&gt;max_sectors_kb&lt;/tt&gt; at setup time:&lt;br/&gt;
&lt;a href=&quot;https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/dm_multipath/config_file_multipath&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/dm_multipath/config_file_multipath&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;tt&gt;max_sectors_kb&lt;/tt&gt;&lt;br/&gt;
Red Hat Enterprise Linux Release 6.9 and later) Sets the &lt;tt&gt;max_sectors_kb&lt;/tt&gt; device queue parameter to the specified value on all underlying paths of a multipath device before the multipath device is first activated. When a multipath device is created, the device inherits the &lt;tt&gt;max_sectors_kb&lt;/tt&gt; value from the path devices. Manually raising this value for the multipath device or lowering this value for the path devices can cause multipath to create I/O operations larger than the path devices allow. Using the &lt;tt&gt;max_sectors_kb&lt;/tt&gt; parameter is an easy way to set these values before a multipath device is created on top of the path devices and prevent invalid-sized I/O operations from being passed If this parameter is not set by the user, the path devices have it set by their device driver, and the multipath device inherits it from the path devices.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;It still isn&apos;t clear from this description if it is any better than calling &lt;tt&gt;tune_devices.sh&lt;/tt&gt; from UDEV, since it only mentions &quot;&lt;em&gt;before a multipath device is created&lt;/em&gt;&quot; and not anything about if the path reconnects (which is the core issue here).&lt;/p&gt;</comment>
                            <comment id="378902" author="eaujames" created="Mon, 17 Jul 2023 13:39:56 +0000"  >&lt;p&gt;We recently hit this issue at the CEA during disk firmware update on SFA18K:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Firmware update trigger alua event on the device.&lt;/li&gt;
	&lt;li&gt;udev rule 99-lustre-server.rules is triggered for each server with the block device (mounted or unmounted).&lt;/li&gt;
	&lt;li&gt;The rule run l_tunedisk for each VM&lt;/li&gt;
	&lt;li&gt;osd_is_lustre/ldiskfs_is_lustre tries to access to the raw device (via debugfs/e2fsprogs api) concurrently on every server (lot of 4k read)&lt;/li&gt;
	&lt;li&gt;Hang the targets&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;1 target is seen on 8 VMs. 1 SFA pool contains 2 VD. So the udev event is triggered 16 times for each firmware update. The OSTs are large: +620T&lt;/p&gt;

&lt;p&gt;To mitigate this issue, I think we should avoid to use debugfs/e2fsprogs on the raw devices to identify if the device is used by Lustre and run l_tunedisk only for the mounted devices (on device add/change rules).&lt;/p&gt;</comment>
                            <comment id="378975" author="gerrit" created="Mon, 17 Jul 2023 18:46:00 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51695&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51695&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12029&quot; title=&quot;do not try to muck with max_sectors_kb on multipath configurations&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12029&quot;&gt;LU-12029&lt;/a&gt; utils: l_tunedisk only tune mounted target&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 217a0475d4a1bf3df394e1008ff4937f60a12e9d&lt;/p&gt;</comment>
                            <comment id="379397" author="eaujames" created="Wed, 19 Jul 2023 18:41:39 +0000"  >&lt;p&gt;Hi Andreas,&lt;/p&gt;

&lt;p&gt;I have submitted a pull request on multipathd: &lt;a href=&quot;https://github.com/opensvc/multipath-tools/pull/69&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/opensvc/multipath-tools/pull/69&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the following kernel patch, this should work fine:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;commit 3ae706561637331aa578e52bb89ecbba5edcb7a9
Author: Mike Snitzer &amp;lt;snitzer@redhat.com&amp;gt;
Date:   Wed Sep 26 23:45:45 2012 +0100

    dm: retain table limits when swapping to new table with no devices
    
    Add a safety net that will re-use the DM device&apos;s existing limits in the
    event that DM device has a temporary table that doesn&apos;t have any
    component devices.  This is to reduce the chance that requests not
    respecting the hardware limits will reach the device.
    
    DM recalculates queue limits based only on devices which currently exist
    in the table.  This creates a problem in the event all devices are
    temporarily removed such as all paths being lost in multipath.  DM will
    reset the limits to the maximum permissible, which can then assemble
    requests which exceed the limits of the paths when the paths are
    restored.  The request will fail the blk_rq_check_limits() test when
    sent to a path with lower limits, and will be retried without end by
    multipath.  This became a much bigger issue after v3.6 commit fe86cdcef
    (&quot;block: do not artificially constrain max_sectors for stacking
    drivers&quot;).
    
    Reported-by: David Jeffery &amp;lt;djeffery@redhat.com&amp;gt;
    Signed-off-by: Mike Snitzer &amp;lt;snitzer@redhat.com&amp;gt;
    Signed-off-by: Alasdair G Kergon &amp;lt;agk@redhat.com&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And the following multipathd path:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;commit 8fd48686d72ee10e8665f03399da128e8c1362bd
Author: Benjamin Marzinski &amp;lt;bmarzins@redhat.com&amp;gt;
Date:   Fri Apr 7 01:16:37 2017 -0500

    libmultipath: don&apos;t set max_sectors_kb on reloads                                             

    Multipath was setting max_sectors_kb on the multipath device and all its                      
    path devices both when the device was created, and when it was reloaded.                      
    The problem with this is that while this would set max_sectors_kb on all                      
    the devices under multipath, it couldn&apos;t set this on devices on top of                        
    multipath.  This meant that if a user lowered max_sectors_kb on an                            
    already existing multipath device with a LV on top of it, the LV could                        
    send down IO that was too large for the new max_sectors_kb value,                             
    because the LV was still using the old value.  The solution to this is                        
    to only set max_sectors_kb to the configured value when the device is                         
    originally created, not when it is later reloaded.  Since not all paths                       
    may be present when the device is original created, on reloads multipath                      
    still needs to make sure that the max_sectors_kb value on all the path                        
    devices is the same as the value of the multipath device. But if this                         
    value doesn&apos;t match the configuration value, that&apos;s o.k.                                      

    This means that the max_sectors_kb value for a multipath device won&apos;t                         
    change after it have been initially created. All of the devices created                       
    on top of the multipath device will inherit that value, and all of the                        
    devices will use it all the way down, so IOs will never be mis-sized.                         

    I also moved sysfs_set_max_sectors_kb to configure.c, since it is only                        
    called from there, and it it makes use of static functions from there.                        

    Signed-off-by: Benjamin Marzinski &amp;lt;bmarzins@redhat.com&amp;gt;                                       
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt; </comment>
                            <comment id="379403" author="adilger" created="Wed, 19 Jul 2023 19:12:15 +0000"  >&lt;p&gt;It looks like the DM patch was landed as v3.6-rc7-5-g3ae706561637 so it should be present in el8.7 and later server kernels (3.10).  Can you confirm that the libmultipath patch is also included in el8 installs...&lt;/p&gt;</comment>
                            <comment id="379413" author="eaujames" created="Wed, 19 Jul 2023 19:25:52 +0000"  >&lt;p&gt;I am on Rocky Linux 8.8, the libmultipath patch is present.&lt;br/&gt;
This patch only set max_sectors_kb at multipath device init (if specified in configuration). And then it will keep the value set by a user on the device.&lt;/p&gt;

&lt;p&gt;So, as workaround, a value can be set for max_sectors_kb inside the mulitpath.conf.&lt;/p&gt;</comment>
                            <comment id="385268" author="eaujames" created="Fri, 8 Sep 2023 12:20:34 +0000"  >&lt;p&gt;The multipath patch landed in multipath-tools 0.9.6: &lt;a href=&quot;https://github.com/opensvc/multipath-tools/pull/68/commits/bbb77f318ee483292f50a7782aecaecc7e60f727&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/opensvc/multipath-tools/pull/68/commits/bbb77f318ee483292f50a7782aecaecc7e60f727&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Should we remove the 99-lustre-server.rules? &lt;/p&gt;</comment>
                            <comment id="385426" author="adilger" created="Sun, 10 Sep 2023 18:59:15 +0000"  >&lt;p&gt;Etienne, thanks for submitting the patch upstream.  I don&apos;t think we can remove this until at least the main distros have a version of multipath-tools that include your fix. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="55632">LU-12297</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="46284">LU-9551</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="55853">LU-12387</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00cfr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>