<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:37:13 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10677] can&apos;t delete directory</title>
                <link>https://jira.whamcloud.com/browse/LU-10677</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hiya,&lt;/p&gt;

&lt;p&gt;we have 2 MDS&apos;s with 1 MDT on one of them and 2 MDTs on the other. so 3 MDT&apos;s in total. each MDT consists of 2 hardware raid1 mirrors with zmirror putting those together into one zfs MDT in one zpool. so 4-way replication.&lt;/p&gt;

&lt;p&gt;latest centos7.4 kernels  3.10.0-693.17.1.el7.x86_64 everywhere. nopti set on lustre servers. 8 OSS&apos;s if that matters. multipath on all lustre servers. purely software raidz3 on OSS&apos;s.&lt;/p&gt;

&lt;p&gt;we are testing DNE2 with 3-way dir striping, and also with default inheritance to all sub-dirs.&lt;br/&gt;
the below test fails and seems repeatable.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# lfs setdirstripe -c 3 mdt0-2
# lfs setdirstripe -D -c 3 mdt0-2
# chown rhumble mdt0-2
[rhumble@farnarkle2 ~]$ for f in /dagg/old_stuff/rjh/mdtest/mdt*; do echo === $f === ; time ( cd $f ; for g in {0000..9999}; do mkdir $g; for h in {00..99}; do mkdir $g/$h; done; done ) ; time rm -rf $f/*; done
...
=== /dagg/old_stuff/rjh/mdtest/mdt0-2 ===

real    57m21.053s
user    8m36.378s
sys     18m25.963s
rm: cannot remove &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/2556&#8217;: Directory not empty

real    72m52.257s
user    0m4.197s
sys     7m59.024s
...

[rhumble@farnarkle2 ~]$ ls -al /dagg/old_stuff/rjh/mdtest/mdt0-2/2556
total 894
drwxrwxr-x 3 rhumble hpcadmin  76800 Feb 16 03:33 .
drwxr-xr-x 3 rhumble hpcadmin 838656 Feb 16 15:46 ..
[rhumble@farnarkle2 ~]$ rmdir /dagg/old_stuff/rjh/mdtest/mdt0-2/2556
rmdir: failed to remove &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/2556&#8217;: Directory not empty
[rhumble@farnarkle2 ~]$ rm -rf /dagg/old_stuff/rjh/mdtest/mdt0-2/2556
rm: cannot remove &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/2556&#8217;: Directory not empty
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;there aren&apos;t any problems seen with the other 4 dirs tested.&lt;br/&gt;
the other 4 dirs are mdt0, mdt1, mdt2 whcih have dir striping set to only that mdt and no default (-D) set, and to a directory with 3-way dir striping and no default (-D) set. ie.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none

[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt1
lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none

[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt2
lmv_stripe_count: 0 lmv_stripe_offset: 2 lmv_hash_type: none

[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0-2
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000b7bd:0x4a1b:0x0]
     1           [0x28001639c:0x4a58:0x0]
     2           [0x680016b6b:0x4a58:0x0]

[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0-2-no-inherit
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000bfa7:0xa63a:0x0]
     1           [0x2800182f7:0xa69f:0x0]
     2           [0x680018abd:0xa697:0x0]

[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt0
lmv_stripe_count: 0 lmv_stripe_offset: -1 lmv_hash_type: none

[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt1
lmv_stripe_count: 0 lmv_stripe_offset: -1 lmv_hash_type: none

[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt2
lmv_stripe_count: 0 lmv_stripe_offset: -1 lmv_hash_type: none

[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt0-2
lmv_stripe_count: 3 lmv_stripe_offset: -1 lmv_hash_type: fnv_1a_64

[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt0-2-no-inherit/
lmv_stripe_count: 0 lmv_stripe_offset: -1 lmv_hash_type: none
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the un-removable directories have only appeared on the 3-way -D directory, so I suspect the bug is to do with DNE2 and the -D inheritance stuff in particular.&lt;/p&gt;

&lt;p&gt;I also re-ran the test with all 3 MDT&apos;s on one MDS, and the same thing happened - one directory was un-removable by any means.&lt;/p&gt;

&lt;p&gt;there&apos;s nothing in dmesg or syslog.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</description>
                <environment>zfs 0.7.5, OPA, skylake, centos7</environment>
        <key id="50849">LU-10677</key>
            <summary>can&apos;t delete directory</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="3" iconUrl="https://jira.whamcloud.com/images/icons/statuses/inprogress.png" description="This issue is being actively worked on at the moment by the assignee.">In Progress</status>
                    <statusCategory id="4" key="indeterminate" colorName="inprogress"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="scadmin">SC Admin</reporter>
                        <labels>
                    </labels>
                <created>Fri, 16 Feb 2018 09:56:37 +0000</created>
                <updated>Thu, 6 Dec 2018 08:37:15 +0000</updated>
                                            <version>Lustre 2.10.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="221161" author="rjh" created="Fri, 16 Feb 2018 14:11:39 +0000"  >&lt;p&gt;oh, and our lustre (client and server) were patched with &lt;a href=&quot;https://jira.hpdd.intel.com/browse/LU-10212&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/browse/LU-10212&lt;/a&gt; &lt;a href=&quot;https://review.whamcloud.com/#/c/29992/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/29992/&lt;/a&gt;  &apos;cos we were hitting ESTALE on clients.&lt;/p&gt;</comment>
                            <comment id="221531" author="adilger" created="Fri, 23 Feb 2018 02:07:04 +0000"  >&lt;p&gt;Robin, after the &quot;rm -r&quot; fails, if you do &quot;ls -lR&quot; of the directory, does it show any entries left behind, or is the directory empty?  It is possible there is some problem iterating the directory entries during the rm that leaves something behind. &lt;/p&gt;

&lt;p&gt;Lai, are you able to reproduce this?&lt;/p&gt;</comment>
                            <comment id="221538" author="scadmin" created="Fri, 23 Feb 2018 03:08:41 +0000"  >&lt;p&gt;Hi Andreas,&lt;/p&gt;

&lt;p&gt;the directory 2556/ is empty, but can&apos;t be rmdir&apos;d.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="221554" author="laisiyao" created="Fri, 23 Feb 2018 11:22:05 +0000"  >&lt;p&gt;I&apos;m trying to reproduce this, but haven&apos;t seen in smaller scale on master and 2.10 yet. In the mean time could you do the following to dump debug logs?&lt;br/&gt;
1. &apos;lctl set_param debug=-1&apos; on 2 MDS&apos;s.&lt;br/&gt;
2. &apos;lctl clear&apos; on 2 MDS&apos;s to clear previous debug logs.&lt;br/&gt;
3. &apos;rm /dagg/old_stuff/rjh/mdtest/mdt0-2/2556&apos;&lt;br/&gt;
4. &apos;lctl dk /tmp/10677-`hostname`.log&apos; to collect debug logs on 2 MDS, and attach them here.&lt;/p&gt;

&lt;p&gt;Also could you run &apos;lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0-2/2556&apos; and &apos;stat /dagg/old_stuff/rjh/mdtest/mdt0-2/2556&apos;, and post the output here?&lt;/p&gt;</comment>
                            <comment id="221613" author="scadmin" created="Sat, 24 Feb 2018 01:42:15 +0000"  >&lt;p&gt;Hi Lai,&lt;/p&gt;

&lt;p&gt;yeah, sorry, I should have mentioned - scale is important.&lt;br/&gt;
the above reproducer makes 10k dirs, each with 100 subdirs.&lt;br/&gt;
I also ran a bunch with 1k dirs, each with 100 subdirs, and they all worked fine.&lt;br/&gt;
so something about 10k (and presumably above) triggers it for us.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="221614" author="scadmin" created="Sat, 24 Feb 2018 01:52:10 +0000"  >&lt;p&gt;Hi Lai,&lt;/p&gt;

&lt;p&gt;I couldn&apos;t delete these dirs (there are 2 now), but I did mv them so I could do more mdtests, so in the below and the attached the paths differ slightly from the above, but it&apos;s the same dir.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/corrupted/2556
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000c067:0x12e91:0x0]
     1           [0x2800183d4:0x12f1b:0x0]
     2           [0x680018bb6:0x12f24:0x0]

[root@farnarkle1 ~]# stat /dagg/old_stuff/rjh/mdtest/corrupted/2556
  File: &#8216;/dagg/old_stuff/rjh/mdtest/corrupted/2556&#8217;
  Size: 76800           Blocks: 150        IO Block: 131072 directory
Device: ef57e2ach/4015514284d   Inode: 144116014454481792  Links: 3
Access: (0775/drwxrwxr-x)  Uid: ( 1040/ rhumble)   Gid: (10190/hpcadmin)
Access: 2018-02-24 12:46:01.000000000 +1100
Modify: 2018-02-16 03:33:20.000000000 +1100
Change: 2018-02-16 20:30:03.000000000 +1100
 Birth: -
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="221615" author="scadmin" created="Sat, 24 Feb 2018 02:04:25 +0000"  >&lt;p&gt;and just in case it isn&apos;t obvious from the logs, the current arrangement of MDTs is&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# cexec mds: df
************************* mds *************************
--------- warble1---------
Filesystem                    1K-blocks     Used Available Use% Mounted on
...
warble1-MGT-pool/MGT             245248     1920    241280   1% /lustre/MGT
warble1-dagg-MDT2-pool/MDT2   632920832 10578560 622340224   2% /lustre/dagg/MDT2
warble1-dagg-MDT1-pool/MDT1   632918016 11151360 621764608   2% /lustre/dagg/MDT1
warble1-images-MDT0-pool/MDT0 109846912  1504384 108340480   2% /lustre/images/MDT0
--------- warble2---------
Filesystem                  1K-blocks    Used Available Use% Mounted on
...
warble2-dagg-MDT0-pool/MDT0 632916480 8628480 624285952   2% /lustre/dagg/MDT0
warble2-home-MDT0-pool/MDT0 109847680  546944 109298688   1% /lustre/home/MDT0
warble2-apps-MDT0-pool/MDT0 109834880 3831680 106001152   4% /lustre/apps/MDT0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so dagg mdt0 on warble2, and mdt1,2 on warble1. each has their own zpool.&lt;/p&gt;</comment>
                            <comment id="221618" author="laisiyao" created="Sat, 24 Feb 2018 03:06:51 +0000"  >&lt;p&gt;the &apos;stat&apos; result shows &apos;Links&apos; count is 3, while the correct link count of an empty dir should be 2, I&apos;m afraid some error in &apos;rm&apos; caused one link not released. As you said there is nothing in dmesg or syslog, did you check them on MDS&apos;s?&lt;/p&gt;</comment>
                            <comment id="221719" author="scadmin" created="Tue, 27 Feb 2018 02:16:01 +0000"  >&lt;p&gt;Hi Lai,&lt;/p&gt;

&lt;p&gt;there is nothing in dmesg or syslog. either when this happens the first time, or on subsequent attempts to remove the directory.&lt;br/&gt;
to check this again, I re-ran my reproducer above. 1 attempt with 10k dirs and 1 attempt with 20k dirs didn&apos;t produce any corrupted dirs, but the next run got 2 corrupted (unremoveable) dirs, and again only on the -c3 -D directory -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[rhumble@farnarkle2 ~]$ logger mdt0-2 start 20k ; for f in /dagg/old_stuff/rjh/mdtest/mdt*; do echo === $f === ; time ( cd $f ; for g in {00000..19999}; do mkdir $g; mkdir $g/{00..99}; done ) ; time rm -rf $f/*; done ; logger mdt0-2 end 20k
=== /dagg/old_stuff/rjh/mdtest/mdt0 ===

real    7m24.288s
user    0m26.297s
sys     1m35.283s

real    18m17.211s
user    0m5.137s
sys     4m41.977s
=== /dagg/old_stuff/rjh/mdtest/mdt0-2 ===

real    58m7.293s
user    0m26.627s
sys     2m8.602s
rm: cannot remove &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/15892&#8217;: Directory not empty
rm: cannot remove &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/18199&#8217;: Directory not empty

real    144m0.824s
user    0m9.201s
sys     16m53.631s
=== /dagg/old_stuff/rjh/mdtest/mdt0-2-no-inherit ===

real    6m36.442s
user    0m24.291s
sys     1m22.405s

real    18m10.672s
user    0m4.968s
sys     4m49.614s
=== /dagg/old_stuff/rjh/mdtest/mdt1 ===

real    6m38.597s
user    0m24.619s
sys     1m19.193s

real    17m1.638s
user    0m5.000s
sys     4m12.672s
=== /dagg/old_stuff/rjh/mdtest/mdt2 ===

real    6m34.660s
user    0m23.893s
sys     1m15.038s

real    16m46.268s
user    0m4.694s
sys     4m6.953s
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;looking at the broken dirs -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[rhumble@farnarkle2 ~]$ ls -alR /dagg/old_stuff/rjh/mdtest/mdt0-2
/dagg/old_stuff/rjh/mdtest/mdt0-2:
total 1770
drwxr-xr-x 4 rhumble hpcadmin 1625088 Feb 27 04:15 ./
drwxr-xr-x 8 rhumble hpcadmin   33280 Feb 16 20:30 ../
drwxrwxr-x 3 rhumble hpcadmin   76800 Feb 27 03:46 15892/
drwxrwxr-x 3 rhumble hpcadmin   76800 Feb 27 04:02 18199/

/dagg/old_stuff/rjh/mdtest/mdt0-2/15892:
total 1662
drwxrwxr-x 3 rhumble hpcadmin   76800 Feb 27 03:46 ./
drwxr-xr-x 4 rhumble hpcadmin 1625088 Feb 27 04:15 ../

/dagg/old_stuff/rjh/mdtest/mdt0-2/18199:
total 1662
drwxrwxr-x 3 rhumble hpcadmin   76800 Feb 27 04:02 ./
drwxr-xr-x 4 rhumble hpcadmin 1625088 Feb 27 04:15 ../

[rhumble@farnarkle2 ~]$ lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0-2/15892
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000f650:0x29df:0x0]
     1           [0x28001b433:0x2a42:0x0]
     2           [0x68001bc55:0x2a42:0x0]

[rhumble@farnarkle2 ~]$ stat /dagg/old_stuff/rjh/mdtest/mdt0-2/15892
  File: &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/15892&#8217;
  Size: 76800           Blocks: 150        IO Block: 131072 directory
Device: ef57e2ach/4015514284d   Inode: 144116245963231287  Links: 3
Access: (0775/drwxrwxr-x)  Uid: ( 1040/ rhumble)   Gid: (10190/hpcadmin)
Access: 2018-02-27 03:46:11.000000000 +1100
Modify: 2018-02-27 03:46:11.000000000 +1100
Change: 2018-02-27 03:46:11.000000000 +1100
 Birth: -

[rhumble@farnarkle2 ~]$ stat /dagg/old_stuff/rjh/mdtest/mdt0-2/18199
  File: &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/18199&#8217;
  Size: 76800           Blocks: 150        IO Block: 131072 directory
Device: ef57e2ach/4015514284d   Inode: 468376269654884927  Links: 3
Access: (0775/drwxrwxr-x)  Uid: ( 1040/ rhumble)   Gid: (10190/hpcadmin)
Access: 2018-02-27 04:02:50.000000000 +1100
Modify: 2018-02-27 04:02:50.000000000 +1100
Change: 2018-02-27 04:02:50.000000000 +1100
 Birth: -

[rhumble@farnarkle2 ~]$ rmdir /dagg/old_stuff/rjh/mdtest/mdt0-2/18199 /dagg/old_stuff/rjh/mdtest/mdt0-2/15892
rmdir: failed to remove &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/18199&#8217;: Directory not empty
rmdir: failed to remove &#8216;/dagg/old_stuff/rjh/mdtest/mdt0-2/15892&#8217;: Directory not empty
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;one difference between the 2 recent ok runs, and this incorrect run, is that I moved the 3 MDTs apart onto their (normal) 2 separate MDS&apos;s for this run. for the 2 previous runs all the MDTs had been on one MDS. however as the first post in this ticket said, I&apos;ve also been able to hit the bug when all the MDTs were on all MDS. it does seem slightly easier to trigger when they&apos;re spread out though.&lt;/p&gt;

&lt;p&gt;please note we have also moved to zfs 0.7.6. the first corruption in this ticket was with 0.7.5. so ZFS version doesn&apos;t seem to matter.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="221793" author="laisiyao" created="Tue, 27 Feb 2018 07:12:48 +0000"  >&lt;p&gt;I see, it looks like there is race in refcounting somewhere, I&apos;ll check related code.&lt;/p&gt;</comment>
                            <comment id="222714" author="scadmin" created="Wed, 7 Mar 2018 17:54:35 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I&apos;d also greatly appreciate advice on how to delete the residual directories created from the above testing in order to tidy up the filesystem.&lt;br/&gt;
thanks.&lt;/p&gt;

&lt;p&gt;I presume I have to mount the MDTs as zfs and rm something by hand?&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="222769" author="laisiyao" created="Thu, 8 Mar 2018 06:40:56 +0000"  >&lt;p&gt;yes, I think so.&lt;/p&gt;</comment>
                            <comment id="223916" author="rjh" created="Mon, 19 Mar 2018 03:20:51 +0000"  >&lt;p&gt;can you be more specific please.&lt;br/&gt;
thanks.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="224495" author="yong.fan" created="Sun, 25 Mar 2018 17:44:00 +0000"  >&lt;p&gt;The 10677-warble2.log shows that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:00000001:13.0:1519436864.845186:0:332748:0:(mdd_dir.c:371:mdd_dir_is_empty()) Process entered
00000004:00000001:13.0:1519436864.845186:0:332748:0:(lod_object.c:382:lod_striped_it_init()) Process entered
00080000:00000001:13.0:1519436864.845187:0:332748:0:(osd_index.c:147:osd_index_it_init()) Process entered
00080000:00000010:13.0:1519436864.845187:0:332748:0:(osd_index.c:156:osd_index_it_init()) slab-alloced &apos;it&apos;: 280 at ffff88bdc4f16f08.
00080000:00000010:13.0:1519436864.845188:0:332748:0:(osd_index.c:106:osd_zap_cursor_init()) kmalloced &apos;t&apos;: 56 at ffff88bb0640a940.
00080000:00000001:13.0:1519436864.845188:0:332748:0:(osd_index.c:170:osd_index_it_init()) Process leaving (rc=18446612947367194376 : -131126342357240 : ffff88bdc4f16f08)
00080000:00000001:13.0:1519436864.845188:0:332748:0:(osd_index.c:726:osd_dir_it_init()) Process leaving (rc=18446612947367194376 : -131126342357240 : ffff88bdc4f16f08)
00000004:00000001:13.0:1519436864.845189:0:332748:0:(lod_object.c:461:lod_striped_it_get()) Process entered
00080000:00000001:13.0:1519436864.845190:0:332748:0:(osd_index.c:746:osd_dir_it_get()) Process entered
00080000:00000001:13.0:1519436864.845190:0:332748:0:(osd_index.c:760:osd_dir_it_get()) Process leaving (rc=1 : 1 : 1)
00000004:00000001:13.0:1519436864.845190:0:332748:0:(lod_object.c:509:lod_striped_it_next()) Process entered
00080000:00000001:13.0:1519436864.845191:0:332748:0:(osd_index.c:832:osd_dir_it_next()) Process entered
00080000:00000001:13.0:1519436864.845191:0:332748:0:(osd_index.c:845:osd_dir_it_next()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:13.0:1519436864.845191:0:332748:0:(lod_object.c:522:lod_striped_it_next()) Process leaving (rc=0 : 0 : 0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;==&amp;gt; for &quot;dot&quot;.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:00000001:13.0:1519436864.845192:0:332748:0:(lod_object.c:509:lod_striped_it_next()) Process entered
00080000:00000001:13.0:1519436864.845192:0:332748:0:(osd_index.c:832:osd_dir_it_next()) Process entered
00080000:00000001:13.0:1519436864.845192:0:332748:0:(osd_index.c:845:osd_dir_it_next()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:13.0:1519436864.845192:0:332748:0:(lod_object.c:522:lod_striped_it_next()) Process leaving (rc=0 : 0 : 0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;==&amp;gt; for &quot;dotdot&quot;.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:00000001:13.0:1519436864.845193:0:332748:0:(lod_object.c:509:lod_striped_it_next()) Process entered
00080000:00000001:13.0:1519436864.845193:0:332748:0:(osd_index.c:832:osd_dir_it_next()) Process entered
00080000:00000001:13.0:1519436864.845198:0:332748:0:(osd_index.c:862:osd_dir_it_next()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:13.0:1519436864.845198:0:332748:0:(lod_object.c:522:lod_striped_it_next()) Process leaving (rc=0 : 0 : 0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;==&amp;gt; anther unknown entry.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00080000:00000001:13.0:1519436864.845199:0:332748:0:(osd_index.c:177:osd_index_it_fini()) Process entered
00080000:00000010:13.0:1519436864.845199:0:332748:0:(osd_index.c:119:osd_zap_cursor_fini()) kfreed &apos;zc&apos;: 56 at ffff88bb0640a940.
00080000:00000010:13.0:1519436864.845200:0:332748:0:(osd_index.c:186:osd_index_it_fini()) slab-freed &apos;(it)&apos;: 280 at ffff88bdc4f16f08.
00080000:00000001:13.0:1519436864.845200:0:332748:0:(osd_index.c:188:osd_index_it_fini()) Process leaving
00000004:00000001:13.0:1519436864.845200:0:332748:0:(mdd_dir.c:399:mdd_dir_is_empty()) Process leaving (rc=18446744073709551577 : -39 : ffffffffffffffd9)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If the log is for the trying remove the trouble directory, then means that the first stripe of the striped directory contains at least one unknown entry (neither &quot;dot&quot; nor &quot;dotdot&quot;). But according to your description, such entry is invisible to client via &quot;ls -al&quot;. Since it is test environment, I would suggest that if you can reproduce the trouble, please mount related MDT as &quot;zfs&quot; and &quot;ls -alR&quot; on such trouble directory (its first stripe) via ZPL. If still show nothing, then I suspected that there are some dummy entries left in such stripe.&lt;/p&gt;

&lt;p&gt;NOTE: Please NOT remove such trouble directory via ZPL directly.&lt;/p&gt;</comment>
                            <comment id="224515" author="rjh" created="Mon, 26 Mar 2018 08:56:08 +0000"  >&lt;p&gt;Hi. thanks for looking into this.&lt;/p&gt;

&lt;p&gt;actually it&apos;s not a test filesystem now, it&apos;s the main production filesystem.&lt;br/&gt;
it doesn&apos;t seem possible to mount a zfs snapshot whilst lustre is running (please let me know how to do this if it&apos;s possible), so I did a short cluster downtime.&lt;/p&gt;

&lt;p&gt;we have (at least) 4 dirs so far -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john5 corrupted]# ls -alR
.:
total 365
drwxrwxr-x 6 root    root     33280 Feb 27 13:06 .
drwxr-xr-x 8 rhumble hpcadmin 33280 Feb 16 20:30 ..
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 0291
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 03:46 15892
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 04:02 18199
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 03:33 2556

./0291:
total 108
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 .
drwxrwxr-x 6 root    root     33280 Feb 27 13:06 ..

./15892:
total 108
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 03:46 .
drwxrwxr-x 6 root    root     33280 Feb 27 13:06 ..

./18199:
total 108
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 04:02 .
drwxrwxr-x 6 root    root     33280 Feb 27 13:06 ..

./2556:
total 108
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 03:33 .
drwxrwxr-x 6 root    root     33280 Feb 27 13:06 ..
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which can&apos;t be deleted&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john5 corrupted]# rmdir *
rmdir: failed to remove &#8216;0291&#8217;: Directory not empty
rmdir: failed to remove &#8216;15892&#8217;: Directory not empty
rmdir: failed to remove &#8216;18199&#8217;: Directory not empty
rmdir: failed to remove &#8216;2556&#8217;: Directory not empty
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;with striping&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john5 corrupted]# lfs getdirstripe *
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     1           [0x280019a41:0xa9dc:0x0]
     2           [0x68001a211:0xa9dd:0x0]
     0           [0x20000cf21:0xa9dd:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000f650:0x29df:0x0]
     1           [0x28001b433:0x2a42:0x0]
     2           [0x68001bc55:0x2a42:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x68001bc57:0x21e1:0x0]
     0           [0x20000f651:0x224b:0x0]
     1           [0x28001b435:0x2272:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000c067:0x12e91:0x0]
     1           [0x2800183d4:0x12f1b:0x0]
     2           [0x680018bb6:0x12f24:0x0]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I picked the first one and mounted MDT1 as zfs&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;zfs set canmount=on warble1-dagg-MDT1-pool/MDT1
umount /lustre/dagg/MDT1
zfs mount warble1-dagg-MDT1-pool/MDT1
find /warble1-dagg-MDT1-pool/MDT1 -name 0x280019a41:0xa9dc:0x0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and there is indeed something in there. 2 levels of dirs in fact -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[warble1]root: ls -alR /warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0
/warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0:
total 16571
drwxrwxr-x 3 rhumble hpcadmin 2 Feb 16 20:07 .
drwxr-xr-x 0 root    root     0 Jan  1  1970 ..
drwxrwxr-x 5 rhumble hpcadmin 2 Feb 16 19:28 58

/warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0/58:
total 75
drwxrwxr-x 5 rhumble hpcadmin 2 Feb 16 19:28 .
drwxrwxr-x 3 rhumble hpcadmin 2 Feb 16 20:07 ..
drwxrwxr-x 2 rhumble hpcadmin 2 Feb 16 19:28 [0x280019a41:0xa9f1:0x0]:0

/warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0/58/[0x280019a41:0xa9f1:0x0]:0:
total 50
drwxrwxr-x 2 rhumble hpcadmin 2 Feb 16 19:28 .
drwxrwxr-x 5 rhumble hpcadmin 2 Feb 16 19:28 ..
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="224528" author="yong.fan" created="Mon, 26 Mar 2018 13:22:57 +0000"  >&lt;blockquote&gt;
&lt;p&gt;/warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0/58:&lt;br/&gt;
total 75&lt;br/&gt;
drwxrwxr-x 5 rhumble hpcadmin 2 Feb 16 19:28 .&lt;br/&gt;
drwxrwxr-x 3 rhumble hpcadmin 2 Feb 16 20:07 ..&lt;br/&gt;
drwxrwxr-x 2 rhumble hpcadmin 2 Feb 16 19:28 &lt;span class=&quot;error&quot;&gt;&amp;#91;0x280019a41:0xa9f1:0x0&amp;#93;&lt;/span&gt;:0&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;This is not a common directory, instead, it is the shard of another striped directory - &quot;58&quot;. Normally, if you did not set stripe for the directory &quot;58&quot;, it should have 3 shards by default that was inherited from its parent. But as shown, there is only one shard. So I suspected that something wrong when removed the striped directory &quot;58&quot;, as to the other two shards has been destroyed, but failed to removed the local shard. With the corrupted striped directory &quot;58&quot;, we cannot remove the parent directory - I assume it is the &quot;0291&quot;.&lt;/p&gt;

&lt;p&gt;Please find a new client, run the following:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;ls -ailR 0291
ls -ailR 0291/58
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="224532" author="rjh" created="Mon, 26 Mar 2018 13:35:33 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john6 corrupted]# ls -ailR 0291
0291:
total 108
180145747155549248 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 .
144115977343280607 drwxrwxr-x 6 root    root     33280 Feb 27 13:06 ..
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john6 corrupted]# ls -ailR 0291/58
0291/58:
total 150
180145747155549269 drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 16 19:28 .
180145747155549248 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 ..
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="224533" author="yong.fan" created="Mon, 26 Mar 2018 13:43:22 +0000"  >&lt;blockquote&gt;
&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@john6 corrupted&amp;#93;&lt;/span&gt;# ls -ailR 0291/58&lt;br/&gt;
0291/58:&lt;br/&gt;
total 150&lt;br/&gt;
180145747155549269 drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 16 19:28 .&lt;br/&gt;
180145747155549248 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 ..&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;So the corrupted directory &quot;58&quot; is still visible. Please try luck to do &quot;rm -rf 0291/58&quot; although I am afraid you cannot do that successfully. Please collect -1 level debug logs on the MDT when rm. Thanks!&lt;/p&gt;</comment>
                            <comment id="224535" author="rjh" created="Mon, 26 Mar 2018 14:02:31 +0000"  >&lt;p&gt;wow! that worked. well done! &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john6 corrupted]# rm -rf 0291/58
[root@john6 corrupted]# ls -ailR 0291/58
ls: cannot access 0291/58: No such file or directory
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and I could remove the parent now too&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john6 corrupted]# rmdir 0291
[root@john6 corrupted]# ls -l
total 225
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 03:46 15892
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 04:02 18199
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 03:33 2556
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so now 3 problem dirs instead of 4.&lt;/p&gt;

&lt;p&gt;I suspect I can probably just&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;rmdir {00..99}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;in each of those and they&apos;ll be ok again too... sweet. I won&apos;t do it just yet though in case you want to experiment with other things.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="224537" author="rjh" created="Mon, 26 Mar 2018 14:26:34 +0000"  >&lt;p&gt;ok, this is crazy - each of the 3 remaining dirs has a &apos;58&apos; subdir -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john6 corrupted]# for f in *; do cd $f; echo $f; for g in {00..99}; do ls -ld $g; done; cd .. ; done 2&amp;gt;/dev/null
15892
drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 27 01:39 58
18199
drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 27 01:46 58
2556
drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 16 02:32 58
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and each of the &apos;58&apos; shards is on the same MDT as the parent dir -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john6 corrupted]# lfs getdirstripe 15892
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000f650:0x29df:0x0]
     1           [0x28001b433:0x2a42:0x0]
     2           [0x68001bc55:0x2a42:0x0]
[root@john6 corrupted]# lfs getdirstripe 15892/58
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000f650:0x29f4:0x0]
     1           [0x28001b433:0x2a57:0x0]
     2           [0x68001bc55:0x2a57:0x0]
[root@john6 corrupted]# lfs getdirstripe 18199
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x68001bc57:0x21e1:0x0]
     0           [0x20000f651:0x224b:0x0]
     1           [0x28001b435:0x2272:0x0]
[root@john6 corrupted]# lfs getdirstripe 18199/58
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     2           [0x68001bc57:0x21f6:0x0]
     0           [0x20000f651:0x2260:0x0]
     1           [0x28001b435:0x2287:0x0]
[root@john6 corrupted]# lfs getdirstripe 2556
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000c067:0x12e91:0x0]
     1           [0x2800183d4:0x12f1b:0x0]
     2           [0x680018bb6:0x12f24:0x0]
[root@john6 corrupted]# lfs getdirstripe 2556/58
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx           FID[seq:oid:ver]
     0           [0x20000c067:0x12ea6:0x0]
     1           [0x2800183d4:0x12f30:0x0]
     2           [0x680018bb6:0x12f39:0x0]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I thought 42 was the magic number, not 58 &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="224538" author="yong.fan" created="Mon, 26 Mar 2018 14:44:06 +0000"  >&lt;p&gt;That is unbelievable..., in spit of &quot;58&quot; or &quot;42&quot;, just the name, nothing special. May be related with your test parameters? What are the parameters you used to reproduce the trouble? Have you adjusted them during different instances? Change it, you may got different results.&lt;/p&gt;</comment>
                            <comment id="224542" author="rjh" created="Mon, 26 Mar 2018 15:27:43 +0000"  >&lt;p&gt;the reproducer is the 1-liner in the first post. I&apos;ve run it with 1k, 10k and 20k for the first level dirs, but always (as far as I can remember) &lt;/p&gt;
{00..99}
&lt;p&gt; for the second level dirs. 10k and 20k don&apos;t always create problems, but eventually do. 1k seems too small to generate problems.&lt;/p&gt;

&lt;p&gt;could there be something in Lustre directory randomisation for which 58 is special?&lt;br/&gt;
that&apos;s the only thing that really makes sense...&lt;br/&gt;
&apos;58&apos; is the 59th dir created in the parent, so that&apos;s 111011. is 4 special somehow because we have 3 MDT&apos;s to hash across? more buckets in a hash table? hash seed? dunno... I&apos;m just guessing.&lt;/p&gt;

&lt;p&gt;I&apos;ll run a few more tests the same and see if they also come up with &apos;58&apos;, and then try a few variations.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="224544" author="yong.fan" created="Mon, 26 Mar 2018 16:30:49 +0000"  >&lt;p&gt;Please adjust the 2nd level sub-dirs counts for try.&lt;/p&gt;</comment>
                            <comment id="224803" author="rjh" created="Thu, 29 Mar 2018 15:33:43 +0000"  >&lt;p&gt;I ran some more. they take a while, so apologies for the delay.&lt;br/&gt;
after a couple of runs through via the usual command line I got an invisible directory:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;7149/55
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so, darn, &apos;58&apos; isn&apos;t so special, that would have been awesome &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;I also tried new variants with more subdirs but didn&apos;t hit any problems. then I tried a variant with less subdirs&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;time for g in {00000..99999}; do mkdir $g; for h in {0..9}; do mkdir $g/$h; done; done ; time for g in {000..999}; do rm -rf ${g}* ; done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and got 4 busted dirs on the first run. as per usual these 4 dirs appear empty to ls -l&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;ls /dagg/old_stuff/rjh/mdtest/mdt0-2/*/
/dagg/old_stuff/rjh/mdtest/mdt0-2/24538/:
/dagg/old_stuff/rjh/mdtest/mdt0-2/46886/:
/dagg/old_stuff/rjh/mdtest/mdt0-2/59234/:
/dagg/old_stuff/rjh/mdtest/mdt0-2/93410/:
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;but actually have these dirs in them which I can see if I explicitly ls -l them&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;24538/9
46886/5
59234/5
93410/5
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so oddly 5* subdirs still seems preferred, but there&apos;s one 9* now, so it&apos;s possibly just a preference for mid-located dirs for some reason.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="224810" author="yong.fan" created="Thu, 29 Mar 2018 16:38:16 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=rjh&quot; class=&quot;user-hover&quot; rel=&quot;rjh&quot;&gt;rjh&lt;/a&gt;, thanks for the further verification. The results are in the expectation. The remaining shard of striped directory was not related with any special name, instead, I suspect that it may be related with some in-complete rmdir (striped directory) operation. Although we try to guarantee the atomicity for cross-MDTs modification, but it is difficult to say (or prove) that all the corner cases are properly handled, especially for error cases handing. Now, we need the logs, please help to collect the MDS side Lustre kernel debug logs (the details the better) when reproduce the trouble. Thanks!&lt;/p&gt;</comment>
                            <comment id="225091" author="rjh" created="Wed, 4 Apr 2018 08:48:55 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;the MDS logs for rm&apos;ing the &apos;58&apos; dir has been attached to this ticket for a while now.&lt;/p&gt;

&lt;p&gt;are you asking for logs whilst I create 1M dirs? surely that will overflow all the log buffers on the MDS&apos;s and be useless?&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="225393" author="yong.fan" created="Sun, 8 Apr 2018 11:06:32 +0000"  >&lt;p&gt;What I need is the Lustre kernel debug logs when &lt;tt&gt;failed&lt;/tt&gt; to rmdir the striped directory (such as the &quot;58&quot;). It is difficult to know in advance which striped directory will be the one.&lt;/p&gt;</comment>
                            <comment id="225415" author="scadmin" created="Mon, 9 Apr 2018 07:54:46 +0000"  >&lt;p&gt;ok. I understand now. thanks.&lt;br/&gt;
I&apos;ll try to find a shorter reproducer and run the &apos;lctl debug_daemon&apos; with &apos;-1&apos; whilst I running the &apos;rm -rf&apos; parts.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="225881" author="scadmin" created="Thu, 12 Apr 2018 15:38:21 +0000"  >&lt;p&gt;Hi Fan Yong,&lt;/p&gt;

&lt;p&gt;just to let you know I&apos;ve put this on the backburner and won&apos;t be doing any more on this bug until we have resolved most of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10887&quot; title=&quot;2 MDTs stuck in WAITING&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10887&quot;&gt;&lt;del&gt;LU-10887&lt;/del&gt;&lt;/a&gt; and have repaired the current fs errors. basically I&apos;m reluctant to make things worse by adding more errors to the fs at the moment.&lt;/p&gt;

&lt;p&gt;it&apos;ll be interesting to see if a successful lfsck pass can repair the current 93410/5 59234/5 46886/5 24538/9 &apos;hidden&apos; directories.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="225886" author="yong.fan" created="Thu, 12 Apr 2018 16:03:46 +0000"  >&lt;p&gt;Hi Robin,&lt;/p&gt;

&lt;p&gt;As you can see, we already have three patches for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10887&quot; title=&quot;2 MDTs stuck in WAITING&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10887&quot;&gt;&lt;del&gt;LU-10887&lt;/del&gt;&lt;/a&gt;, one is for LFSCK trouble (&lt;a href=&quot;https://review.whamcloud.com/31915&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/31915&lt;/a&gt;), another is for object leak (&lt;a href=&quot;https://review.whamcloud.com/31929&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/31929&lt;/a&gt;), the 3rd one (&lt;a href=&quot;https://review.whamcloud.com/29228&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/29228&lt;/a&gt;) for showing mount options. Please keep update on such ticket when you need more helps.&lt;/p&gt;</comment>
                            <comment id="227184" author="scadmin" created="Thu, 3 May 2018 12:14:04 +0000"  >&lt;p&gt;just so you know, these directories that can&apos;t be deleted &apos;cos they have &apos;hidden&apos; dirs in them are still there and behave the same after the namespace lfsck in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10988&quot; title=&quot;LBUG in lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10988&quot;&gt;&lt;del&gt;LU-10988&lt;/del&gt;&lt;/a&gt;. so lfsck missed them somehow.&lt;/p&gt;

&lt;p&gt;there are at least 9 directories like this at the moment.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="227187" author="yong.fan" created="Thu, 3 May 2018 12:31:28 +0000"  >&lt;blockquote&gt;
&lt;p&gt;just so you know, these directories that can&apos;t be deleted &apos;cos they have &apos;hidden&apos; dirs in them are still there and behave the same after the namespace lfsck in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10988&quot; title=&quot;LBUG in lfsck&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10988&quot;&gt;&lt;del&gt;LU-10988&lt;/del&gt;&lt;/a&gt;. so lfsck missed them somehow.&lt;br/&gt;
there are at least 9 directories like this at the moment.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I am afraid that these 9 directories are the ones skipped by the namespace LFSCK. I would suggest to enable LFSCK debug (lctl set_param debug+=lfsck) on the MDTs, then re-run namespace LFSCK. Please collect the debug logs on the MDT that will tell us which directories are skipped and why. Please use large debug buffer and mask unnecessary debug components to avoid debug buffer overflow.&lt;/p&gt;</comment>
                            <comment id="229830" author="pjones" created="Sat, 30 Jun 2018 13:35:51 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=scadmin&quot; class=&quot;user-hover&quot; rel=&quot;scadmin&quot;&gt;scadmin&lt;/a&gt; when do you expect to be able to run this test?&lt;/p&gt;</comment>
                            <comment id="229847" author="scadmin" created="Mon, 2 Jul 2018 11:26:37 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;sorry. yes, I should have got back to this, but it&apos;s been a pretty low priority for us. and high risk as it turns out &apos;cos the MDS crashed whilst running the lfsck (see &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11111&quot; title=&quot;crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj-&amp;gt;mod_flags &amp;amp; ORPHAN_OBJ)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11111&quot;&gt;&lt;del&gt;LU-11111&lt;/del&gt;&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;we did get as far as this though -&amp;gt;&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace | grep striped_striped_shard
striped_shards_skipped: 3
striped_shards_skipped: 4
striped_shards_skipped: 4
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which perhaps is enough to help.&lt;/p&gt;

&lt;p&gt;would you like me to upload the debug_file.txt.gz to somewhere, or grep for something? it&apos;s about 300M gzip&apos;d.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="230271" author="pjones" created="Sat, 14 Jul 2018 15:17:43 +0000"  >&lt;p&gt;Robin&lt;/p&gt;

&lt;p&gt;There is a Whamcloud ftp site that you could upload large files to. I can give you details directly (i.e via email) if you wish to do this&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="238082" author="laisiyao" created="Thu, 6 Dec 2018 08:37:15 +0000"  >&lt;p&gt;I&apos;m working on patch to improve lfsck on this, will provide a patch later.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="42780">LU-8990</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="48454">LU-10028</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="29903" name="10677-rm58-warble1.log" size="41535386" author="rjh" created="Mon, 26 Mar 2018 14:16:00 +0000"/>
                            <attachment id="29902" name="10677-rm58-warble2.log" size="17955656" author="rjh" created="Mon, 26 Mar 2018 14:12:47 +0000"/>
                            <attachment id="29620" name="10677-warble1.log" size="22560985" author="scadmin" created="Sat, 24 Feb 2018 02:02:36 +0000"/>
                            <attachment id="29619" name="10677-warble2.log" size="2964437" author="scadmin" created="Sat, 24 Feb 2018 01:56:36 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>dne</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzsx3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>