[LU-10677] can't delete directory Created: 16/Feb/18 Updated: 06/Dec/18 |
|
| Status: | In Progress |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | SC Admin (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
zfs 0.7.5, OPA, skylake, centos7 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Epic/Theme: | dne | ||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Hiya, we have 2 MDS's with 1 MDT on one of them and 2 MDTs on the other. so 3 MDT's in total. each MDT consists of 2 hardware raid1 mirrors with zmirror putting those together into one zfs MDT in one zpool. so 4-way replication. latest centos7.4 kernels 3.10.0-693.17.1.el7.x86_64 everywhere. nopti set on lustre servers. 8 OSS's if that matters. multipath on all lustre servers. purely software raidz3 on OSS's. we are testing DNE2 with 3-way dir striping, and also with default inheritance to all sub-dirs. # lfs setdirstripe -c 3 mdt0-2
# lfs setdirstripe -D -c 3 mdt0-2
# chown rhumble mdt0-2
[rhumble@farnarkle2 ~]$ for f in /dagg/old_stuff/rjh/mdtest/mdt*; do echo === $f === ; time ( cd $f ; for g in {0000..9999}; do mkdir $g; for h in {00..99}; do mkdir $g/$h; done; done ) ; time rm -rf $f/*; done
...
=== /dagg/old_stuff/rjh/mdtest/mdt0-2 ===
real 57m21.053s
user 8m36.378s
sys 18m25.963s
rm: cannot remove ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/2556’: Directory not empty
real 72m52.257s
user 0m4.197s
sys 7m59.024s
...
[rhumble@farnarkle2 ~]$ ls -al /dagg/old_stuff/rjh/mdtest/mdt0-2/2556
total 894
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 03:33 .
drwxr-xr-x 3 rhumble hpcadmin 838656 Feb 16 15:46 ..
[rhumble@farnarkle2 ~]$ rmdir /dagg/old_stuff/rjh/mdtest/mdt0-2/2556
rmdir: failed to remove ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/2556’: Directory not empty
[rhumble@farnarkle2 ~]$ rm -rf /dagg/old_stuff/rjh/mdtest/mdt0-2/2556
rm: cannot remove ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/2556’: Directory not empty
there aren't any problems seen with the other 4 dirs tested. [root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt1
lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none
[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt2
lmv_stripe_count: 0 lmv_stripe_offset: 2 lmv_hash_type: none
[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0-2
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000b7bd:0x4a1b:0x0]
1 [0x28001639c:0x4a58:0x0]
2 [0x680016b6b:0x4a58:0x0]
[root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0-2-no-inherit
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000bfa7:0xa63a:0x0]
1 [0x2800182f7:0xa69f:0x0]
2 [0x680018abd:0xa697:0x0]
[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt0
lmv_stripe_count: 0 lmv_stripe_offset: -1 lmv_hash_type: none
[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt1
lmv_stripe_count: 0 lmv_stripe_offset: -1 lmv_hash_type: none
[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt2
lmv_stripe_count: 0 lmv_stripe_offset: -1 lmv_hash_type: none
[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt0-2
lmv_stripe_count: 3 lmv_stripe_offset: -1 lmv_hash_type: fnv_1a_64
[root@farnarkle1 ~]# lfs getdirstripe -D /dagg/old_stuff/rjh/mdtest/mdt0-2-no-inherit/
lmv_stripe_count: 0 lmv_stripe_offset: -1 lmv_hash_type: none
the un-removable directories have only appeared on the 3-way -D directory, so I suspect the bug is to do with DNE2 and the -D inheritance stuff in particular. I also re-ran the test with all 3 MDT's on one MDS, and the same thing happened - one directory was un-removable by any means. there's nothing in dmesg or syslog. cheers, |
| Comments |
| Comment by Robin Humble [ 16/Feb/18 ] |
|
oh, and our lustre (client and server) were patched with https://jira.hpdd.intel.com/browse/LU-10212 https://review.whamcloud.com/#/c/29992/ 'cos we were hitting ESTALE on clients. |
| Comment by Andreas Dilger [ 23/Feb/18 ] |
|
Robin, after the "rm -r" fails, if you do "ls -lR" of the directory, does it show any entries left behind, or is the directory empty? It is possible there is some problem iterating the directory entries during the rm that leaves something behind. Lai, are you able to reproduce this? |
| Comment by SC Admin (Inactive) [ 23/Feb/18 ] |
|
Hi Andreas, the directory 2556/ is empty, but can't be rmdir'd. cheers, |
| Comment by Lai Siyao [ 23/Feb/18 ] |
|
I'm trying to reproduce this, but haven't seen in smaller scale on master and 2.10 yet. In the mean time could you do the following to dump debug logs? Also could you run 'lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0-2/2556' and 'stat /dagg/old_stuff/rjh/mdtest/mdt0-2/2556', and post the output here? |
| Comment by SC Admin (Inactive) [ 24/Feb/18 ] |
|
Hi Lai, yeah, sorry, I should have mentioned - scale is important. cheers, |
| Comment by SC Admin (Inactive) [ 24/Feb/18 ] |
|
Hi Lai, I couldn't delete these dirs (there are 2 now), but I did mv them so I could do more mdtests, so in the below and the attached the paths differ slightly from the above, but it's the same dir. [root@farnarkle1 ~]# lfs getdirstripe /dagg/old_stuff/rjh/mdtest/corrupted/2556
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000c067:0x12e91:0x0]
1 [0x2800183d4:0x12f1b:0x0]
2 [0x680018bb6:0x12f24:0x0]
[root@farnarkle1 ~]# stat /dagg/old_stuff/rjh/mdtest/corrupted/2556
File: ‘/dagg/old_stuff/rjh/mdtest/corrupted/2556’
Size: 76800 Blocks: 150 IO Block: 131072 directory
Device: ef57e2ach/4015514284d Inode: 144116014454481792 Links: 3
Access: (0775/drwxrwxr-x) Uid: ( 1040/ rhumble) Gid: (10190/hpcadmin)
Access: 2018-02-24 12:46:01.000000000 +1100
Modify: 2018-02-16 03:33:20.000000000 +1100
Change: 2018-02-16 20:30:03.000000000 +1100
Birth: -
|
| Comment by SC Admin (Inactive) [ 24/Feb/18 ] |
|
and just in case it isn't obvious from the logs, the current arrangement of MDTs is # cexec mds: df ************************* mds ************************* --------- warble1--------- Filesystem 1K-blocks Used Available Use% Mounted on ... warble1-MGT-pool/MGT 245248 1920 241280 1% /lustre/MGT warble1-dagg-MDT2-pool/MDT2 632920832 10578560 622340224 2% /lustre/dagg/MDT2 warble1-dagg-MDT1-pool/MDT1 632918016 11151360 621764608 2% /lustre/dagg/MDT1 warble1-images-MDT0-pool/MDT0 109846912 1504384 108340480 2% /lustre/images/MDT0 --------- warble2--------- Filesystem 1K-blocks Used Available Use% Mounted on ... warble2-dagg-MDT0-pool/MDT0 632916480 8628480 624285952 2% /lustre/dagg/MDT0 warble2-home-MDT0-pool/MDT0 109847680 546944 109298688 1% /lustre/home/MDT0 warble2-apps-MDT0-pool/MDT0 109834880 3831680 106001152 4% /lustre/apps/MDT0 so dagg mdt0 on warble2, and mdt1,2 on warble1. each has their own zpool. |
| Comment by Lai Siyao [ 24/Feb/18 ] |
|
the 'stat' result shows 'Links' count is 3, while the correct link count of an empty dir should be 2, I'm afraid some error in 'rm' caused one link not released. As you said there is nothing in dmesg or syslog, did you check them on MDS's? |
| Comment by SC Admin (Inactive) [ 27/Feb/18 ] |
|
Hi Lai, there is nothing in dmesg or syslog. either when this happens the first time, or on subsequent attempts to remove the directory. [rhumble@farnarkle2 ~]$ logger mdt0-2 start 20k ; for f in /dagg/old_stuff/rjh/mdtest/mdt*; do echo === $f === ; time ( cd $f ; for g in {00000..19999}; do mkdir $g; mkdir $g/{00..99}; done ) ; time rm -rf $f/*; done ; logger mdt0-2 end 20k
=== /dagg/old_stuff/rjh/mdtest/mdt0 ===
real 7m24.288s
user 0m26.297s
sys 1m35.283s
real 18m17.211s
user 0m5.137s
sys 4m41.977s
=== /dagg/old_stuff/rjh/mdtest/mdt0-2 ===
real 58m7.293s
user 0m26.627s
sys 2m8.602s
rm: cannot remove ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/15892’: Directory not empty
rm: cannot remove ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/18199’: Directory not empty
real 144m0.824s
user 0m9.201s
sys 16m53.631s
=== /dagg/old_stuff/rjh/mdtest/mdt0-2-no-inherit ===
real 6m36.442s
user 0m24.291s
sys 1m22.405s
real 18m10.672s
user 0m4.968s
sys 4m49.614s
=== /dagg/old_stuff/rjh/mdtest/mdt1 ===
real 6m38.597s
user 0m24.619s
sys 1m19.193s
real 17m1.638s
user 0m5.000s
sys 4m12.672s
=== /dagg/old_stuff/rjh/mdtest/mdt2 ===
real 6m34.660s
user 0m23.893s
sys 1m15.038s
real 16m46.268s
user 0m4.694s
sys 4m6.953s
looking at the broken dirs -> [rhumble@farnarkle2 ~]$ ls -alR /dagg/old_stuff/rjh/mdtest/mdt0-2
/dagg/old_stuff/rjh/mdtest/mdt0-2:
total 1770
drwxr-xr-x 4 rhumble hpcadmin 1625088 Feb 27 04:15 ./
drwxr-xr-x 8 rhumble hpcadmin 33280 Feb 16 20:30 ../
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 03:46 15892/
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 04:02 18199/
/dagg/old_stuff/rjh/mdtest/mdt0-2/15892:
total 1662
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 03:46 ./
drwxr-xr-x 4 rhumble hpcadmin 1625088 Feb 27 04:15 ../
/dagg/old_stuff/rjh/mdtest/mdt0-2/18199:
total 1662
drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 04:02 ./
drwxr-xr-x 4 rhumble hpcadmin 1625088 Feb 27 04:15 ../
[rhumble@farnarkle2 ~]$ lfs getdirstripe /dagg/old_stuff/rjh/mdtest/mdt0-2/15892
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000f650:0x29df:0x0]
1 [0x28001b433:0x2a42:0x0]
2 [0x68001bc55:0x2a42:0x0]
[rhumble@farnarkle2 ~]$ stat /dagg/old_stuff/rjh/mdtest/mdt0-2/15892
File: ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/15892’
Size: 76800 Blocks: 150 IO Block: 131072 directory
Device: ef57e2ach/4015514284d Inode: 144116245963231287 Links: 3
Access: (0775/drwxrwxr-x) Uid: ( 1040/ rhumble) Gid: (10190/hpcadmin)
Access: 2018-02-27 03:46:11.000000000 +1100
Modify: 2018-02-27 03:46:11.000000000 +1100
Change: 2018-02-27 03:46:11.000000000 +1100
Birth: -
[rhumble@farnarkle2 ~]$ stat /dagg/old_stuff/rjh/mdtest/mdt0-2/18199
File: ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/18199’
Size: 76800 Blocks: 150 IO Block: 131072 directory
Device: ef57e2ach/4015514284d Inode: 468376269654884927 Links: 3
Access: (0775/drwxrwxr-x) Uid: ( 1040/ rhumble) Gid: (10190/hpcadmin)
Access: 2018-02-27 04:02:50.000000000 +1100
Modify: 2018-02-27 04:02:50.000000000 +1100
Change: 2018-02-27 04:02:50.000000000 +1100
Birth: -
[rhumble@farnarkle2 ~]$ rmdir /dagg/old_stuff/rjh/mdtest/mdt0-2/18199 /dagg/old_stuff/rjh/mdtest/mdt0-2/15892
rmdir: failed to remove ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/18199’: Directory not empty
rmdir: failed to remove ‘/dagg/old_stuff/rjh/mdtest/mdt0-2/15892’: Directory not empty
one difference between the 2 recent ok runs, and this incorrect run, is that I moved the 3 MDTs apart onto their (normal) 2 separate MDS's for this run. for the 2 previous runs all the MDTs had been on one MDS. however as the first post in this ticket said, I've also been able to hit the bug when all the MDTs were on all MDS. it does seem slightly easier to trigger when they're spread out though. please note we have also moved to zfs 0.7.6. the first corruption in this ticket was with 0.7.5. so ZFS version doesn't seem to matter. cheers, |
| Comment by Lai Siyao [ 27/Feb/18 ] |
|
I see, it looks like there is race in refcounting somewhere, I'll check related code. |
| Comment by SC Admin (Inactive) [ 07/Mar/18 ] |
|
Hi, I'd also greatly appreciate advice on how to delete the residual directories created from the above testing in order to tidy up the filesystem. I presume I have to mount the MDTs as zfs and rm something by hand? cheers, |
| Comment by Lai Siyao [ 08/Mar/18 ] |
|
yes, I think so. |
| Comment by Robin Humble [ 19/Mar/18 ] |
|
can you be more specific please. cheers, |
| Comment by nasf (Inactive) [ 25/Mar/18 ] |
|
The 10677-warble2.log shows that: 00000004:00000001:13.0:1519436864.845186:0:332748:0:(mdd_dir.c:371:mdd_dir_is_empty()) Process entered 00000004:00000001:13.0:1519436864.845186:0:332748:0:(lod_object.c:382:lod_striped_it_init()) Process entered 00080000:00000001:13.0:1519436864.845187:0:332748:0:(osd_index.c:147:osd_index_it_init()) Process entered 00080000:00000010:13.0:1519436864.845187:0:332748:0:(osd_index.c:156:osd_index_it_init()) slab-alloced 'it': 280 at ffff88bdc4f16f08. 00080000:00000010:13.0:1519436864.845188:0:332748:0:(osd_index.c:106:osd_zap_cursor_init()) kmalloced 't': 56 at ffff88bb0640a940. 00080000:00000001:13.0:1519436864.845188:0:332748:0:(osd_index.c:170:osd_index_it_init()) Process leaving (rc=18446612947367194376 : -131126342357240 : ffff88bdc4f16f08) 00080000:00000001:13.0:1519436864.845188:0:332748:0:(osd_index.c:726:osd_dir_it_init()) Process leaving (rc=18446612947367194376 : -131126342357240 : ffff88bdc4f16f08) 00000004:00000001:13.0:1519436864.845189:0:332748:0:(lod_object.c:461:lod_striped_it_get()) Process entered 00080000:00000001:13.0:1519436864.845190:0:332748:0:(osd_index.c:746:osd_dir_it_get()) Process entered 00080000:00000001:13.0:1519436864.845190:0:332748:0:(osd_index.c:760:osd_dir_it_get()) Process leaving (rc=1 : 1 : 1) 00000004:00000001:13.0:1519436864.845190:0:332748:0:(lod_object.c:509:lod_striped_it_next()) Process entered 00080000:00000001:13.0:1519436864.845191:0:332748:0:(osd_index.c:832:osd_dir_it_next()) Process entered 00080000:00000001:13.0:1519436864.845191:0:332748:0:(osd_index.c:845:osd_dir_it_next()) Process leaving (rc=0 : 0 : 0) 00000004:00000001:13.0:1519436864.845191:0:332748:0:(lod_object.c:522:lod_striped_it_next()) Process leaving (rc=0 : 0 : 0) ==> for "dot". 00000004:00000001:13.0:1519436864.845192:0:332748:0:(lod_object.c:509:lod_striped_it_next()) Process entered 00080000:00000001:13.0:1519436864.845192:0:332748:0:(osd_index.c:832:osd_dir_it_next()) Process entered 00080000:00000001:13.0:1519436864.845192:0:332748:0:(osd_index.c:845:osd_dir_it_next()) Process leaving (rc=0 : 0 : 0) 00000004:00000001:13.0:1519436864.845192:0:332748:0:(lod_object.c:522:lod_striped_it_next()) Process leaving (rc=0 : 0 : 0) ==> for "dotdot". 00000004:00000001:13.0:1519436864.845193:0:332748:0:(lod_object.c:509:lod_striped_it_next()) Process entered 00080000:00000001:13.0:1519436864.845193:0:332748:0:(osd_index.c:832:osd_dir_it_next()) Process entered 00080000:00000001:13.0:1519436864.845198:0:332748:0:(osd_index.c:862:osd_dir_it_next()) Process leaving (rc=0 : 0 : 0) 00000004:00000001:13.0:1519436864.845198:0:332748:0:(lod_object.c:522:lod_striped_it_next()) Process leaving (rc=0 : 0 : 0) ==> anther unknown entry. 00080000:00000001:13.0:1519436864.845199:0:332748:0:(osd_index.c:177:osd_index_it_fini()) Process entered 00080000:00000010:13.0:1519436864.845199:0:332748:0:(osd_index.c:119:osd_zap_cursor_fini()) kfreed 'zc': 56 at ffff88bb0640a940. 00080000:00000010:13.0:1519436864.845200:0:332748:0:(osd_index.c:186:osd_index_it_fini()) slab-freed '(it)': 280 at ffff88bdc4f16f08. 00080000:00000001:13.0:1519436864.845200:0:332748:0:(osd_index.c:188:osd_index_it_fini()) Process leaving 00000004:00000001:13.0:1519436864.845200:0:332748:0:(mdd_dir.c:399:mdd_dir_is_empty()) Process leaving (rc=18446744073709551577 : -39 : ffffffffffffffd9) If the log is for the trying remove the trouble directory, then means that the first stripe of the striped directory contains at least one unknown entry (neither "dot" nor "dotdot"). But according to your description, such entry is invisible to client via "ls -al". Since it is test environment, I would suggest that if you can reproduce the trouble, please mount related MDT as "zfs" and "ls -alR" on such trouble directory (its first stripe) via ZPL. If still show nothing, then I suspected that there are some dummy entries left in such stripe. NOTE: Please NOT remove such trouble directory via ZPL directly. |
| Comment by Robin Humble [ 26/Mar/18 ] |
|
Hi. thanks for looking into this. actually it's not a test filesystem now, it's the main production filesystem. we have (at least) 4 dirs so far -> [root@john5 corrupted]# ls -alR .: total 365 drwxrwxr-x 6 root root 33280 Feb 27 13:06 . drwxr-xr-x 8 rhumble hpcadmin 33280 Feb 16 20:30 .. drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 0291 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 03:46 15892 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 04:02 18199 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 03:33 2556 ./0291: total 108 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 . drwxrwxr-x 6 root root 33280 Feb 27 13:06 .. ./15892: total 108 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 03:46 . drwxrwxr-x 6 root root 33280 Feb 27 13:06 .. ./18199: total 108 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 04:02 . drwxrwxr-x 6 root root 33280 Feb 27 13:06 .. ./2556: total 108 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 03:33 . drwxrwxr-x 6 root root 33280 Feb 27 13:06 .. which can't be deleted [root@john5 corrupted]# rmdir * rmdir: failed to remove ‘0291’: Directory not empty rmdir: failed to remove ‘15892’: Directory not empty rmdir: failed to remove ‘18199’: Directory not empty rmdir: failed to remove ‘2556’: Directory not empty with striping [root@john5 corrupted]# lfs getdirstripe *
lmv_stripe_count: 3 lmv_stripe_offset: 1 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
1 [0x280019a41:0xa9dc:0x0]
2 [0x68001a211:0xa9dd:0x0]
0 [0x20000cf21:0xa9dd:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000f650:0x29df:0x0]
1 [0x28001b433:0x2a42:0x0]
2 [0x68001bc55:0x2a42:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x68001bc57:0x21e1:0x0]
0 [0x20000f651:0x224b:0x0]
1 [0x28001b435:0x2272:0x0]
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000c067:0x12e91:0x0]
1 [0x2800183d4:0x12f1b:0x0]
2 [0x680018bb6:0x12f24:0x0]
I picked the first one and mounted MDT1 as zfs zfs set canmount=on warble1-dagg-MDT1-pool/MDT1 umount /lustre/dagg/MDT1 zfs mount warble1-dagg-MDT1-pool/MDT1 find /warble1-dagg-MDT1-pool/MDT1 -name 0x280019a41:0xa9dc:0x0 and there is indeed something in there. 2 levels of dirs in fact -> [warble1]root: ls -alR /warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0 /warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0: total 16571 drwxrwxr-x 3 rhumble hpcadmin 2 Feb 16 20:07 . drwxr-xr-x 0 root root 0 Jan 1 1970 .. drwxrwxr-x 5 rhumble hpcadmin 2 Feb 16 19:28 58 /warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0/58: total 75 drwxrwxr-x 5 rhumble hpcadmin 2 Feb 16 19:28 . drwxrwxr-x 3 rhumble hpcadmin 2 Feb 16 20:07 .. drwxrwxr-x 2 rhumble hpcadmin 2 Feb 16 19:28 [0x280019a41:0xa9f1:0x0]:0 /warble1-dagg-MDT1-pool/MDT1/oi.65/0x280019a41:0xa9dc:0x0/58/[0x280019a41:0xa9f1:0x0]:0: total 50 drwxrwxr-x 2 rhumble hpcadmin 2 Feb 16 19:28 . drwxrwxr-x 5 rhumble hpcadmin 2 Feb 16 19:28 .. cheers, |
| Comment by nasf (Inactive) [ 26/Mar/18 ] |
This is not a common directory, instead, it is the shard of another striped directory - "58". Normally, if you did not set stripe for the directory "58", it should have 3 shards by default that was inherited from its parent. But as shown, there is only one shard. So I suspected that something wrong when removed the striped directory "58", as to the other two shards has been destroyed, but failed to removed the local shard. With the corrupted striped directory "58", we cannot remove the parent directory - I assume it is the "0291". Please find a new client, run the following: ls -ailR 0291 ls -ailR 0291/58 |
| Comment by Robin Humble [ 26/Mar/18 ] |
[root@john6 corrupted]# ls -ailR 0291 0291: total 108 180145747155549248 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 . 144115977343280607 drwxrwxr-x 6 root root 33280 Feb 27 13:06 .. [root@john6 corrupted]# ls -ailR 0291/58 0291/58: total 150 180145747155549269 drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 16 19:28 . 180145747155549248 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 20:07 .. |
| Comment by nasf (Inactive) [ 26/Mar/18 ] |
So the corrupted directory "58" is still visible. Please try luck to do "rm -rf 0291/58" although I am afraid you cannot do that successfully. Please collect -1 level debug logs on the MDT when rm. Thanks! |
| Comment by Robin Humble [ 26/Mar/18 ] |
|
wow! that worked. well done! [root@john6 corrupted]# rm -rf 0291/58 [root@john6 corrupted]# ls -ailR 0291/58 ls: cannot access 0291/58: No such file or directory and I could remove the parent now too [root@john6 corrupted]# rmdir 0291 [root@john6 corrupted]# ls -l total 225 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 03:46 15892 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 27 04:02 18199 drwxrwxr-x 3 rhumble hpcadmin 76800 Feb 16 03:33 2556 so now 3 problem dirs instead of 4. I suspect I can probably just rmdir {00..99}
in each of those and they'll be ok again too... sweet. I won't do it just yet though in case you want to experiment with other things. cheers, |
| Comment by Robin Humble [ 26/Mar/18 ] |
|
ok, this is crazy - each of the 3 remaining dirs has a '58' subdir -> [root@john6 corrupted]# for f in *; do cd $f; echo $f; for g in {00..99}; do ls -ld $g; done; cd .. ; done 2>/dev/null
15892
drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 27 01:39 58
18199
drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 27 01:46 58
2556
drwxrwxr-x 2 rhumble hpcadmin 76800 Feb 16 02:32 58
and each of the '58' shards is on the same MDT as the parent dir -> [root@john6 corrupted]# lfs getdirstripe 15892
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000f650:0x29df:0x0]
1 [0x28001b433:0x2a42:0x0]
2 [0x68001bc55:0x2a42:0x0]
[root@john6 corrupted]# lfs getdirstripe 15892/58
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000f650:0x29f4:0x0]
1 [0x28001b433:0x2a57:0x0]
2 [0x68001bc55:0x2a57:0x0]
[root@john6 corrupted]# lfs getdirstripe 18199
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x68001bc57:0x21e1:0x0]
0 [0x20000f651:0x224b:0x0]
1 [0x28001b435:0x2272:0x0]
[root@john6 corrupted]# lfs getdirstripe 18199/58
lmv_stripe_count: 3 lmv_stripe_offset: 2 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
2 [0x68001bc57:0x21f6:0x0]
0 [0x20000f651:0x2260:0x0]
1 [0x28001b435:0x2287:0x0]
[root@john6 corrupted]# lfs getdirstripe 2556
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000c067:0x12e91:0x0]
1 [0x2800183d4:0x12f1b:0x0]
2 [0x680018bb6:0x12f24:0x0]
[root@john6 corrupted]# lfs getdirstripe 2556/58
lmv_stripe_count: 3 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x20000c067:0x12ea6:0x0]
1 [0x2800183d4:0x12f30:0x0]
2 [0x680018bb6:0x12f39:0x0]
I thought 42 was the magic number, not 58 cheers, |
| Comment by nasf (Inactive) [ 26/Mar/18 ] |
|
That is unbelievable..., in spit of "58" or "42", just the name, nothing special. May be related with your test parameters? What are the parameters you used to reproduce the trouble? Have you adjusted them during different instances? Change it, you may got different results. |
| Comment by Robin Humble [ 26/Mar/18 ] |
|
the reproducer is the 1-liner in the first post. I've run it with 1k, 10k and 20k for the first level dirs, but always (as far as I can remember) {00..99}for the second level dirs. 10k and 20k don't always create problems, but eventually do. 1k seems too small to generate problems. could there be something in Lustre directory randomisation for which 58 is special? I'll run a few more tests the same and see if they also come up with '58', and then try a few variations. cheers, |
| Comment by nasf (Inactive) [ 26/Mar/18 ] |
|
Please adjust the 2nd level sub-dirs counts for try. |
| Comment by Robin Humble [ 29/Mar/18 ] |
|
I ran some more. they take a while, so apologies for the delay. 7149/55 so, darn, '58' isn't so special, that would have been awesome I also tried new variants with more subdirs but didn't hit any problems. then I tried a variant with less subdirs time for g in {00000..99999}; do mkdir $g; for h in {0..9}; do mkdir $g/$h; done; done ; time for g in {000..999}; do rm -rf ${g}* ; done
and got 4 busted dirs on the first run. as per usual these 4 dirs appear empty to ls -l ls /dagg/old_stuff/rjh/mdtest/mdt0-2/*/ /dagg/old_stuff/rjh/mdtest/mdt0-2/24538/: /dagg/old_stuff/rjh/mdtest/mdt0-2/46886/: /dagg/old_stuff/rjh/mdtest/mdt0-2/59234/: /dagg/old_stuff/rjh/mdtest/mdt0-2/93410/: but actually have these dirs in them which I can see if I explicitly ls -l them 24538/9 46886/5 59234/5 93410/5 so oddly 5* subdirs still seems preferred, but there's one 9* now, so it's possibly just a preference for mid-located dirs for some reason. cheers, |
| Comment by nasf (Inactive) [ 29/Mar/18 ] |
|
rjh, thanks for the further verification. The results are in the expectation. The remaining shard of striped directory was not related with any special name, instead, I suspect that it may be related with some in-complete rmdir (striped directory) operation. Although we try to guarantee the atomicity for cross-MDTs modification, but it is difficult to say (or prove) that all the corner cases are properly handled, especially for error cases handing. Now, we need the logs, please help to collect the MDS side Lustre kernel debug logs (the details the better) when reproduce the trouble. Thanks! |
| Comment by Robin Humble [ 04/Apr/18 ] |
|
Hi, the MDS logs for rm'ing the '58' dir has been attached to this ticket for a while now. are you asking for logs whilst I create 1M dirs? surely that will overflow all the log buffers on the MDS's and be useless? cheers, |
| Comment by nasf (Inactive) [ 08/Apr/18 ] |
|
What I need is the Lustre kernel debug logs when failed to rmdir the striped directory (such as the "58"). It is difficult to know in advance which striped directory will be the one. |
| Comment by SC Admin (Inactive) [ 09/Apr/18 ] |
|
ok. I understand now. thanks. cheers, |
| Comment by SC Admin (Inactive) [ 12/Apr/18 ] |
|
Hi Fan Yong, just to let you know I've put this on the backburner and won't be doing any more on this bug until we have resolved most of it'll be interesting to see if a successful lfsck pass can repair the current 93410/5 59234/5 46886/5 24538/9 'hidden' directories. cheers, |
| Comment by nasf (Inactive) [ 12/Apr/18 ] |
|
Hi Robin, As you can see, we already have three patches for |
| Comment by SC Admin (Inactive) [ 03/May/18 ] |
|
just so you know, these directories that can't be deleted 'cos they have 'hidden' dirs in them are still there and behave the same after the namespace lfsck in there are at least 9 directories like this at the moment. cheers, |
| Comment by nasf (Inactive) [ 03/May/18 ] |
I am afraid that these 9 directories are the ones skipped by the namespace LFSCK. I would suggest to enable LFSCK debug (lctl set_param debug+=lfsck) on the MDTs, then re-run namespace LFSCK. Please collect the debug logs on the MDT that will tell us which directories are skipped and why. Please use large debug buffer and mask unnecessary debug components to avoid debug buffer overflow. |
| Comment by Peter Jones [ 30/Jun/18 ] |
|
scadmin when do you expect to be able to run this test? |
| Comment by SC Admin (Inactive) [ 02/Jul/18 ] |
|
Hi, sorry. yes, I should have got back to this, but it's been a pretty low priority for us. and high risk as it turns out 'cos the MDS crashed whilst running the lfsck (see we did get as far as this though -> # lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace | grep striped_striped_shard striped_shards_skipped: 3 striped_shards_skipped: 4 striped_shards_skipped: 4 which perhaps is enough to help. would you like me to upload the debug_file.txt.gz to somewhere, or grep for something? it's about 300M gzip'd. cheers, |
| Comment by Peter Jones [ 14/Jul/18 ] |
|
Robin There is a Whamcloud ftp site that you could upload large files to. I can give you details directly (i.e via email) if you wish to do this Peter |
| Comment by Lai Siyao [ 06/Dec/18 ] |
|
I'm working on patch to improve lfsck on this, will provide a patch later. |