<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:41:25 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4293] lfs_migrate is failing with a volatile file Operation not permitted error</title>
                <link>https://jira.whamcloud.com/browse/LU-4293</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;&quot;lfs_migrate -y&quot; aborts with this error&lt;/p&gt;

&lt;p&gt;cannot swap layouts between &amp;lt;filename&amp;gt; and a volatile file (Operation not permitted)&lt;/p&gt;

&lt;p&gt;This seems to happen for all files. The lfs_migrate operation aborts on the first file.&lt;/p&gt;</description>
                <environment>Lustre 2.4.1 RHEL6 2.6.32-358.18.1.el6_lustre.x86_64</environment>
        <key id="22208">LU-4293</key>
            <summary>lfs_migrate is failing with a volatile file Operation not permitted error</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="wbaudler">Wolfgang Baudler</reporter>
                        <labels>
                    </labels>
                <created>Fri, 22 Nov 2013 13:42:13 +0000</created>
                <updated>Fri, 21 Mar 2014 03:40:43 +0000</updated>
                            <resolved>Fri, 21 Feb 2014 18:52:14 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.4.1</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="72543" author="adilger" created="Fri, 29 Nov 2013 22:40:28 +0000"  >&lt;p&gt;I investigated this on my home system, since I was seeing this problem intermittently as well.&lt;/p&gt;

&lt;p&gt;It looks like the problem is caused in my case because the problematic files were created with Lustre 1.8 or earlier and have IGIF FIDs (these show as inode number &amp;lt; 4B, ).  Migrating files created under 2.x is OK (these have very large inode numbers):&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ ls -i /mnt/tmp/tmp/1M
144115440203858997 1024 /mnt/tmp/tmp/1M
$ lfs path2fid /mnt/tmp/tmp/1M
[0x200003ab4:0x435:0x0]
$ lfs migrate -i 1 /mnt/tmp/tmp/1M

$ ls -i /mnt/tmp/tmp/4M
2422700 4096 /mnt/tmp/tmp/4M
$ lfs path2fid /mnt/tmp/tmp/4M
[0x24f7ac:0x2d77b0e5:0x0]
$ lfs migrate -i 1 /mnt/tmp/tmp/4M
cannot swap layouts between /mnt/tmp/tmp/4M and a volatile file (Operation not permitted)
error: migrate: migrate stripe file &apos;/mnt/tmp/tmp/4M&apos; failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is because there is a check in &lt;tt&gt;mdd_swap_layouts_allowed()&lt;/tt&gt; that prevents layout swap for IGIF FIDs:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; mdd_layout_swap_allowed(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env,
                                   struct mdd_object *o1,
                                   struct mdd_object *o2)
{
        &lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_fid     *fid1, *fid2;

        fid1 = mdo2fid(o1);
        fid2 = mdo2fid(o2);

        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!fid_is_norm(fid1) || !fid_is_norm(fid2) ||
            (mdd_object_type(o1) != mdd_object_type(o2)))
                RETURN(-EPERM);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This was done to prevent clients from being able to swap the contents of regular files with internal system files by using their (internal) IGIF FIDs.&lt;/p&gt;

&lt;p&gt;I&apos;m not yet sure how to detect the difference between filenames that are visible in the namespace and ones that are accessed by IGIF FID.  It does seem that &quot;lfs fid2path&quot; and $MOUNT/.lustre/fid/ &lt;em&gt;can&lt;/em&gt; detect the difference between IGIF and FID access with my 2.4.1 server.  I took a normal FID and then figured out its IGIF FID by looking at the MDT inode&apos;s inum/generation directly, and got a &quot;no such file or directory&quot;, so that is a good start.&lt;/p&gt;

&lt;p&gt;However, it also appears that some files in the MDT root directory (e.g. backups of fld, seq_srv, seq_cli, etc) are readable via $MOUNT/.lustre/fid/ and have an IGIF FID assigned to them.  The shell also thinks that these FIDs have write permission (i.e. test -w &quot;$MOUNT/.lustre/fid/&lt;span class=&quot;error&quot;&gt;&amp;#91;0x2686:0xc40fa169:0x0&amp;#93;&lt;/span&gt;&quot; returns 0), even though I get a permission denied error trying to modify them, so normal write permission checks will fail.  That might be a problem with LFSCK adding these files into the OI when they shouldn&apos;t be.  The originals of these files correctly have SEQ 0x200000001 and get an error from obf_lookup(), but I think it makes sense to mark all files in the top-level MDT/OST root directory inaccessible, and only add files under ROOT to the OI.&lt;/p&gt;

&lt;p&gt;In the short term, &quot;lfs_migrate&quot; &lt;em&gt;should&lt;/em&gt; fall back to using rsync internally if &quot;lfs migrate&quot; returns an error, but I haven&apos;t tested this.  It would also be useful to fix the error message printed by &quot;lfs migrate&quot;, since I find the current one confusing.  I don&apos;t think it needs to mention anything about volatile files.&lt;/p&gt;</comment>
                            <comment id="72885" author="adilger" created="Thu, 5 Dec 2013 09:56:30 +0000"  >&lt;p&gt;I also see that the failure in &quot;&lt;tt&gt;lfs migrate&lt;/tt&gt;&quot; is causing volatile inodes to be leaked on the MDT.  I see:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;e2fsck 1.42.7.wc1 (12-Apr-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 116
Connect to /lost+found? no

Unattached inode 117
Connect to /lost+found? no

Unattached inode 118
Connect to /lost+found? no

Unattached inode 409
Connect to /lost+found? no
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The unattached inodes look like:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;debugfs 1.42.7.wc2 (12-Apr-2013)
debugfs:  stat &amp;lt;116&amp;gt;
Inode: 116   Type: regular    Mode:  0644   Flags: 0x0
Generation: 2124528570    Version: 0x00000001:00001b6d
User:     0   Group:     0   Size: 0
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5296625c:00000000 -- Wed Nov 27 14:21:32 2013
 atime: 0x52966270:00000000 -- Wed Nov 27 14:21:52 2013
 mtime: 0x52966270:00000000 -- Wed Nov 27 14:21:52 2013
crtime: 0x5296625c:087ac7fc -- Wed Nov 27 14:21:32 2013
Size of extra inode fields: 28
Extended attributes stored in inode body:
  lma = &quot;00 00 00 00 00 00 00 00 00 04 00 00 02 00 00 00 45 07 00 00 00 00 00 00 &quot; (24)
  lma: fid=[0x200000400:0x745:0x0] compat=0 incompat=0
  link = &quot;df f1 ea 11 01 00 00 00 3a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 22 00 00 00 02 00 00 00 07 00 00 00 01 00 00 00 00 2e 0c 13 14 12 3a 56 4f 4c 41 54 49 4c 45 3a 3a &quot; (58)
  lov = &quot;d0 0b d1 0b 01 00 00 00 45 07 00 00 00 00 00 00 00 04 00 00 02 00 00 00 00 00 10 00 01 00 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00 00 &quot; (56)
BLOCKS:
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;where the &quot;&lt;tt&gt;link&lt;/tt&gt;&quot; xattr is the &quot;&lt;tt&gt;^.L^S^T^R:VOLATILE&lt;/tt&gt;&quot; string.&lt;/p&gt;</comment>
                            <comment id="73018" author="adilger" created="Fri, 6 Dec 2013 21:47:58 +0000"  >&lt;p&gt;Bruno, does the layout swap failure here have any relation to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3834&quot; title=&quot;hsm_cdt_request_completed() may clear HS_RELEASED on failed restore&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3834&quot;&gt;&lt;del&gt;LU-3834&lt;/del&gt;&lt;/a&gt;?  I saw many leaked objects after failures calling &quot;lfs migrate&quot; on an IGIF file.&lt;/p&gt;</comment>
                            <comment id="73059" author="bfaccini" created="Mon, 9 Dec 2013 09:56:08 +0000"  >&lt;p&gt;Andreas, &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3834&quot; title=&quot;hsm_cdt_request_completed() may clear HS_RELEASED on failed restore&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3834&quot;&gt;&lt;del&gt;LU-3834&lt;/del&gt;&lt;/a&gt; is about how to handle/recover-from any layout swap failures during HSM Restore ops and avoid leaving file in inconsistent state, this ticket is for a specific condition and consequence. I will try to push a patch soon based on your already+very detailled analysis, thanks !!&lt;/p&gt;</comment>
                            <comment id="73609" author="bfaccini" created="Mon, 16 Dec 2013 18:56:06 +0000"  >&lt;p&gt;But finally may be it is related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3834&quot; title=&quot;hsm_cdt_request_completed() may clear HS_RELEASED on failed restore&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3834&quot;&gt;&lt;del&gt;LU-3834&lt;/del&gt;&lt;/a&gt;, after my recent discovery there that the layouts swap-back upon failure may not work as expected!&lt;/p&gt;</comment>
                            <comment id="73736" author="bfaccini" created="Wed, 18 Dec 2013 07:55:53 +0000"  >&lt;p&gt;Andreas,&lt;br/&gt;
After the failure, is the original file still accessible ?&lt;/p&gt;

&lt;p&gt;Also, since it appears the main problem for this ticket is that trying &quot;lfs migrate&quot; on 1.8 files with IGIF fids will fail and leak volatile objects on MDT, why not 1st change test in mdd_layout_swap_allowed() by adding/allowing also fid_is_igif() files ? Then we will address the special cases of system/ROOT files and the failure/recovery.&lt;/p&gt;

&lt;p&gt;Or do you still think all cases must be handled in &quot;lfs migrate&quot; by switching to rsync if not a system/ROOT file, and the leak to be addressed separately ?&lt;/p&gt;</comment>
                            <comment id="73737" author="adilger" created="Wed, 18 Dec 2013 09:22:01 +0000"  >&lt;p&gt;Uploaded a patch to correctly handle &quot;lfs migrate&quot; failure in lfs_migrate and do fallback to rsync copy+rename:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/8616&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8616&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is not a complete solution to the problem here, but is useful in any case.&lt;/p&gt;

&lt;p&gt;Bruno, yes after the migration failure the original file is still accessible.  Ideally, if the MDD could verify if the object is in the namespace then that would be a good way to know if it can be migrated or not.  I think allowing to migrate IGIF FIDs is a potential problem, but not worse than other issues a client might have, so that would be a solution for 2.6 if no others are available.&lt;/p&gt;</comment>
                            <comment id="74308" author="bfaccini" created="Fri, 3 Jan 2014 16:46:23 +0000"  >&lt;p&gt;Sorry to be late, but I am back on this one.&lt;/p&gt;

&lt;p&gt;Andreas, sorry to ask but can you explain me how the files created in MDT root-directory have an IGIF assigned ??&lt;/p&gt;

&lt;p&gt;I also confirm that as part of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3834&quot; title=&quot;hsm_cdt_request_completed() may clear HS_RELEASED on failed restore&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3834&quot;&gt;&lt;del&gt;LU-3834&lt;/del&gt;&lt;/a&gt;, and fault-injection during layouts-swap to verify patch behavior, I reproduce the volatile object leak (inode links number is 1 and e2fsck detects &quot;Unattached inode&quot;) on MDT. In my case, and for one layouts-swap forced error, I see one orphan inode with &quot;.^L^S^T^R:VOLATILE&quot;/LUSTRE_VOLATILE_HDR linkEA but also one with &quot;i_am_nobody&quot;, did you also find this ? &lt;/p&gt;

&lt;p&gt;But anyway, this clearly indicate that there is something to address and fix upon layouts-swap error.&lt;/p&gt;
</comment>
                            <comment id="74322" author="adilger" created="Fri, 3 Jan 2014 22:57:52 +0000"  >&lt;p&gt;I created the IGIF files under 1.8 and upgraded to 2.x. You could also get the same effect by mounting a 2.4 MDT as ldiskfs, deleting the &quot;lma&quot; xattr, then rounding and running LFSCK to fix the OI. &lt;/p&gt;</comment>
                            <comment id="74344" author="bfaccini" created="Sat, 4 Jan 2014 16:18:50 +0000"  >&lt;p&gt;Humm thanks, I understand I better had to read &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4392&quot; title=&quot;LFSCK shouldn&amp;#39;t assign FIDs, linkEA to files in &amp;quot;/&amp;quot; directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4392&quot;&gt;&lt;del&gt;LU-4392&lt;/del&gt;&lt;/a&gt; sub-task and learn more about LFSCK behavior than to ask, sorry !!&lt;/p&gt;

&lt;p&gt;So now, do we really need to detect such files with a wrongly assigned IGIF by LFSCK (to be fixed in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4392&quot; title=&quot;LFSCK shouldn&amp;#39;t assign FIDs, linkEA to files in &amp;quot;/&amp;quot; directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4392&quot;&gt;&lt;del&gt;LU-4392&lt;/del&gt;&lt;/a&gt; sub-task) ?? Because if not, fix for this ticket&apos;s original issue could simply be to add fid_is_igif() test for both files having their layouts swapped, in mdd_layout_swap_allowed().&lt;br/&gt;
I may miss some special cases about files with IGIF here since you wrote about &quot;internal system files by using their (internal) IGIF FIDs&quot; ??&#8230;&lt;/p&gt;

&lt;p&gt;I will also work on the MDT volatile object leak upon layouts swap failure, may be as part as a new ticket.&lt;/p&gt;</comment>
                            <comment id="74381" author="bfaccini" created="Mon, 6 Jan 2014 10:00:07 +0000"  >&lt;p&gt;Patch to allow layout swap for IGIF file is at &lt;a href=&quot;http://review.whamcloud.com/8737&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8737&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="74881" author="adilger" created="Tue, 14 Jan 2014 01:16:28 +0000"  >&lt;p&gt;I wonder if the -EPERM error seen on non-IGIF files is due to some file ownership problem like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3826&quot; title=&quot;copytool restore should set owner of volatile file before copy&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3826&quot;&gt;&lt;del&gt;LU-3826&lt;/del&gt;&lt;/a&gt; or similar?&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; ((attr1-&amp;gt;la_uid != attr2-&amp;gt;la_uid) ||
            (attr1-&amp;gt;la_gid != attr2-&amp;gt;la_gid))
                RETURN(-EPERM);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This will fail if lfs_migrate is not creating the file with the same ownership as the original file?  If lfs_migrate is running as root, then it should be able to fchown() the file after it is created.  The actual permissions don&apos;t matter, because the temporary file will be deleted, but this is proof that the caller of the migrate has permission to do this swap.&lt;/p&gt;</comment>
                            <comment id="76448" author="yujian" created="Fri, 7 Feb 2014 08:48:36 +0000"  >&lt;p&gt;Patch &lt;a href=&quot;http://review.whamcloud.com/8616&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8616&lt;/a&gt; was cherry-picked to Lustre b2_5 branch.&lt;/p&gt;</comment>
                            <comment id="76617" author="bfaccini" created="Mon, 10 Feb 2014 17:24:07 +0000"  >&lt;p&gt;My patch at &lt;a href=&quot;http://review.whamcloud.com/8737&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8737&lt;/a&gt; seems to address the original need (allow legitimate IGIFs, handle root special-case, ...) for this ticket now, only have to answer reviewers comments.&lt;/p&gt;</comment>
                            <comment id="76700" author="aalba6675" created="Tue, 11 Feb 2014 07:52:05 +0000"  >&lt;p&gt;I have also observed this on a filesystem created with 2.1.x and migrated to 2.4.2.&lt;br/&gt;
Exactly the same error message but this happens only with some directories.&lt;br/&gt;
Some directories lfs_migrate&apos;d perfectly.&lt;/p&gt;</comment>
                            <comment id="77085" author="bogl" created="Fri, 14 Feb 2014 15:05:06 +0000"  >&lt;p&gt;backport to b2_5:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/9278&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/9278&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="77635" author="pjones" created="Fri, 21 Feb 2014 18:52:14 +0000"  >&lt;p&gt;Landed for 2.5.1 and 2.6&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="16875">LU-2445</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="20566">LU-3826</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="20184">LU-3696</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>lfs_migrate</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw9x3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11777</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10023"><![CDATA[4]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>