<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:36:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3682] &quot;tunefs.lustre --erase_params&quot; corrupts running MGS when run against device node symlink</title>
                <link>https://jira.whamcloud.com/browse/LU-3682</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A script was erroneously running tunefs.lustre --erase_params against a running MGS.  I would have expected this to refuse to run (similar to how mkfs.lustre will refuse to run on a device that is a running target).  Instead, it does run.  The first and second times it is run, it appears to succeed.  After the third run the MGS appears to be corrupted.&lt;/p&gt;

&lt;p&gt;After some experimentation I think this only happens when passing tunefs.lustre a device node symlink.  On this system this was happening like this:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;# ls -l /dev/disk/by-id/scsi-1dev.target0
lrwxrwxrwx 1 root root 9 Jul 30 10:10 /dev/disk/by-id/scsi-1dev.target0 -&amp;gt; ../../sdb
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Running against the /dev/sdb path seems to be safe (gives a 17 status code):&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;# tunefs.lustre --erase-params /dev/sdb ; echo $?
checking &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

17
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;...while running against the symlink seems to be unsafe (note that it returns 0 twice, before returning an error code and junk output):&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;# tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Writing CONFIGS/mountdata
0
[root@storage-0 log]# tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Writing CONFIGS/mountdata
0
[root@storage-0 log]# tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:
Index:      10
Lustre FS:
Mount type: h
Flags:      0
              ()
Persistent mount opts:
Parameters:&#65533;


tunefs.lustre FATAL: must set target type: MDT,OST,MGS
tunefs.lustre: exiting with 22 (Invalid argument)
22
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;At the least, this tool should resolve symlinks to prevent running against a running target.  Ideally, it would also use multi mount protection to be safe even when run from a different server when the target is mounted somewhere else.&lt;/p&gt;</description>
                <environment></environment>
        <key id="20143">LU-3682</key>
            <summary>&quot;tunefs.lustre --erase_params&quot; corrupts running MGS when run against device node symlink</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="4" iconUrl="https://jira.whamcloud.com/images/icons/statuses/reopened.png" description="This issue was once resolved, but the resolution was deemed incorrect. From here issues are either marked assigned or resolved.">Reopened</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="emoly.liu">Emoly Liu</assignee>
                                    <reporter username="john">John Spray</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Thu, 1 Aug 2013 12:04:46 +0000</created>
                <updated>Mon, 31 Jan 2022 04:45:01 +0000</updated>
                                            <version>Lustre 2.4.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="64162" author="emoly.liu" created="Tue, 13 Aug 2013 09:58:16 +0000"  >&lt;p&gt;John, could you please provide your mkfs options for mgs device?&lt;/p&gt;

&lt;p&gt;I tried many times on my local machine, but still can&apos;t reproduce this failure. It always returned 0 no matter I ran &quot;tunefs.lustre --erase_params&quot; against symlink or not.&lt;/p&gt;

&lt;p&gt;BTW, according to tunefs.lustre manual, &quot;changes made here will affect a filesystem only when the target is next mounted.&quot;, so it should be OK to run it against a running target.&lt;/p&gt;
</comment>
                            <comment id="64163" author="john" created="Tue, 13 Aug 2013 10:47:12 +0000"  >&lt;p&gt;The MGS was created with no fancy options (by IML), pretty much just&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;mkfs.lustre --mgs /dev/disk/by-id/&amp;lt;foo&amp;gt; --failnode XYZ
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I did find that simply creating and then tunefs-ing the MGS didn&apos;t reproduce the corruption: it was happening when I had some MDTs and OSTs as well, and I had recently been doing writeconfs on those.  I don&apos;t have a more specific set of steps than that I&apos;m afraid.&lt;/p&gt;</comment>
                            <comment id="64164" author="john" created="Tue, 13 Aug 2013 10:48:03 +0000"  >&lt;p&gt;I was chatting to Johann before opening this ticket about whether the target should be shut down first:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[8/1/13 12:00:03 PM] John Spray: is tunefs.lustre meant to be safe to run against a mounted target?
[8/1/13 12:00:30 PM] John Spray: (when doing things like --writeconf and --erase_params?)
[8/1/13 12:03:40 PM] John Spray: the writeconf procedure does ask one to stop the MDT + OST before doing it, but elsewhere in the manual (e.g. in &lt;span class=&quot;code-quote&quot;&gt;&quot;Changing the Address of a Failover Node&quot;&lt;/span&gt;) it doesn&apos;t say one way or the other.
[8/1/13 12:41:04 PM] Johann Lombardi: john: writeconf definitely requires to shut down the target. FYI, there is also a mount option to trigger a writeconf.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="64257" author="emoly.liu" created="Wed, 14 Aug 2013 16:32:19 +0000"  >&lt;p&gt;Can you tell me your tunefs.lustre version? Your test showed flags=0x74((MGS needs_index first_time update )), but index option is invalid for MGS.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;   Read previous values:&lt;br/&gt;
Target:     MGS&lt;br/&gt;
Index:      unassigned&lt;br/&gt;
Lustre FS:&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x74&lt;br/&gt;
              (MGS needs_index first_time update )&lt;br/&gt;
Persistent mount opts: user_xattr,errors=remount-ro&lt;br/&gt;
Parameters:&lt;/p&gt;


&lt;p&gt;   Permanent disk data:&lt;br/&gt;
Target:     MGS&lt;br/&gt;
Index:      unassigned&lt;br/&gt;
Lustre FS:&lt;br/&gt;
Mount type: ldiskfs&lt;br/&gt;
Flags:      0x74&lt;br/&gt;
              (MGS needs_index first_time update )&lt;br/&gt;
Persistent mount opts: user_xattr,errors=remount-ro&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;In my test, mkfs/tunefs.lustre is v2.4.90 and all MGS/MDT/OST are running. I tried different options and it seems that &quot;--erase-params&quot; fails only when --failnode option is applied. It looks like&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@centos6-3 tests]# tunefs.lustre --erase-params /tmp/lustre-mgs; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:  
Mount type: ldiskfs
Flags:      0x4
              (MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=10.211.55.5@tcp


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:  
Mount type: ldiskfs
Flags:      0x44
              (MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

tunefs.lustre: Unable to mount /dev/loop3: Invalid argument

tunefs.lustre FATAL: failed to write local files
tunefs.lustre: exiting with 22 (Invalid argument)
22
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But in your test (what I quoted), failnode was not set when failure happened. So it will be very helpful if you can reproduce it again and tell me how to do that.&lt;br/&gt;
BTW, can you update lustre code and use lustre/utils/(mkfs,tunefs).lustre to have a test, and let me know the result? Thanks.&lt;/p&gt;

&lt;p&gt;I also find a small bug in mkfs_lustre.c, but it should not cause this failure.&lt;/p&gt;</comment>
                            <comment id="64465" author="john" created="Mon, 19 Aug 2013 13:59:11 +0000"  >&lt;p&gt;Reproduced with:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64 (i.e. the 2.4.0 release) with e2fsprogs e2fsprogs-1.42.7.wc1-7.&lt;/li&gt;
	&lt;li&gt;lustre-2.4.91-2.6.32_358.14.1.el6_lustre.x86_64.x86_64 (i.e. latest master) with e2fsprogs master.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;With latest master, running tunefs.lustre on an MGS is broken (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3768&quot; title=&quot;&amp;quot;tunefs.lustre: &amp;#39;----index&amp;#39; only valid for MDT,OST&amp;quot; on a stand-alone MGS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3768&quot;&gt;&lt;del&gt;LU-3768&lt;/del&gt;&lt;/a&gt;, I think you already found this), so I reproduce on another target instead, same result.&lt;/p&gt;

&lt;p&gt;My environment is two CentOS 6.4 virtual machines (called storage-0 and storage-1) using iSCSI storage devices.&lt;/p&gt;

&lt;p&gt;storage-0 is 172.16.252.175@tcp0&lt;br/&gt;
storage-1 is 172.16.252.176@tcp0&lt;/p&gt;

&lt;p&gt;Here&apos;s how the filesystem is created:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[storage_1] sudo: mkfs.lustre --mdt --index=0 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.175@tcp /dev/disk/by-id/scsi-1dev.target1
[storage_0] sudo: mkfs.lustre --mgs --failnode=172.16.252.176@tcp /dev/disk/by-id/scsi-1dev.target0
[storage_1] sudo: mkfs.lustre --ost --index=1 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.175@tcp /dev/disk/by-id/scsi-1dev.target3
[storage_0] sudo: mkfs.lustre --ost --index=0 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.176@tcp /dev/disk/by-id/scsi-1dev.target2
[storage_0] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target0 /mnt/lustre/test0-MGS
[storage_1] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target1 /mnt/lustre/test0-MDT0000
[storage_0] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target2 /mnt/lustre/test0-OST0000
[storage_1] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target3 /mnt/lustre/test0-OST0001
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, while the filesystem is running, here&apos;s me reproducing the bug:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[vagrant@storage-0 ~]$ ls -l /dev/disk/by-id/scsi-1dev.target0
lrwxrwxrwx 1 root root 9 Aug 19 13:06 /dev/disk/by-id/scsi-1dev.target0 -&amp;gt; ../../sdb
[vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x4
              (MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=172.16.252.176@tcp


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x44
              (MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Writing CONFIGS/mountdata
0
[vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x44
              (MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x44
              (MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Writing CONFIGS/mountdata
0
[vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      0
Lustre FS:
Mount type: ext3
Flags:      0
              ()
Persistent mount opts:
Parameters:


tunefs.lustre FATAL: must set target type: MDT,OST,MGS
tunefs.lustre: exiting with 22 (Invalid argument)
22
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="64807" author="emoly.liu" created="Thu, 22 Aug 2013 02:02:40 +0000"  >&lt;p&gt;John, I can reproduce this problem following your steps. I will investigate it.&lt;/p&gt;</comment>
                            <comment id="64816" author="emoly.liu" created="Thu, 22 Aug 2013 08:12:42 +0000"  >&lt;p&gt;The root cause of this problem is in check_mtab_entry(). In this check, the running SCSI device returns EEXIST, but the symlink passes wrongly.&lt;/p&gt;

&lt;p&gt;I will fix it. &lt;/p&gt;</comment>
                            <comment id="64923" author="emoly.liu" created="Fri, 23 Aug 2013 02:41:03 +0000"  >&lt;p&gt;patch for master is at &lt;a href=&quot;http://review.whamcloud.com/7433&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7433&lt;/a&gt;&lt;br/&gt;
b2_4 also needs this one.&lt;/p&gt;</comment>
                            <comment id="65115" author="emoly.liu" created="Tue, 27 Aug 2013 01:40:33 +0000"  >&lt;p&gt;Similar issue happens to a running loop device. I am working on the patch to fix loop device path resolution and prevent tunefs.lustre from a running device.&lt;/p&gt;</comment>
                            <comment id="67175" author="pjones" created="Fri, 20 Sep 2013 21:35:36 +0000"  >&lt;p&gt;Landed for 2.5.0&lt;/p&gt;</comment>
                            <comment id="67388" author="pjones" created="Tue, 24 Sep 2013 16:57:38 +0000"  >&lt;p&gt;Patch reverted due to regression &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3991&quot; title=&quot;mkfs.lustre failed to copy pool name&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3991&quot;&gt;&lt;del&gt;LU-3991&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="67508" author="emoly.liu" created="Wed, 25 Sep 2013 07:21:39 +0000"  >&lt;p&gt;I resubmit a patch at &lt;a href=&quot;http://review.whamcloud.com/#/c/7754&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/7754&lt;/a&gt; . Hope this time it will fix the problem thoroughly and won&apos;t cause ZFS failure. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="21073">LU-3991</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="20368">LU-3768</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvwzb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9506</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>