[LU-11481] corrupt directory Created: 08/Oct/18 Updated: 25/Feb/19 Resolved: 15/Feb/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.4 |
| Fix Version/s: | Lustre 2.10.7 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Olaf Faaland | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
server: RHEL 7.4 derivative, zfs-0.7.11-4llnl.ch6.x86_64, lustre-2.10.4_1.chaos |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 2 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
A directory has an entry for subdirectory "2fe", but the object ID stored for that entry does not exist: alias ll="ls -l" [root@catalyst101:~]# ll /p/lustre3/videousr/YLI/mmcommons/data/images_v1 ls: cannot access /p/lustre3/videousr/YLI/mmcommons/data/images_v1/2fe: No such file or directory total 0 d????????? ? ? ? ? ? 2fe And when using zdb on the MDT to examine images_v1, one sees that 2fe refers to an object ID that is invalid: [root@porter81:snap]# zdb -ddddd porter81/mdt0 533741247
Dataset porter81/mdt0 [ZPL], ID 148, cr_txg 98, 910G, 61852198 objects, rootbp DVA[0]=<4:88d9c400:200> DVA[1]=<5:25ca03c200:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=1214040L/1214040P fill=61852198 cksum=139cf672b7:5dc8d6146f6:f8e6add4f57c:1e27e38477f5c0
Object lvl iblk dblk dsize dnsize lsize %full type
533741247 2 128K 16K 231K 512 528K 100.00 ZFS directory
192 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
dnode maxblkid: 32
path ???<object#533741247>
uid 0
gid 2093
atime Mon Oct 8 11:01:28 2018
mtime Wed Oct 3 15:53:08 2018
ctime Wed Oct 3 15:53:08 2018
crtime Mon Oct 1 20:53:54 2018
gen 1090081
mode 42700
size 2
parent 533740502
links 3
pflags 0
rdev 0x0000000000000000
SA xattrs: 204 bytes, 3 entries
trusted.lma = \000\000\000\000\000\000\000\0002@\000\000\002\000\000\000\245\037\001\000\000\000\000\000
trusted.link = \337\361\352\021\001\000\000\0003\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\033\000\000\000\002\000\000@F\000\0001\213\000\000\000\000images_v1
trusted.version = \022\231\236+\011\000\000\000
Fat ZAP stats:
Pointer table:
1024 elements
zt_blk: 0
zt_numblks: 0
zt_shift: 10
zt_blks_copied: 0
zt_nextblk: 0
ZAP entries: 1
Leaf blocks: 32
Total blocks: 33
zap_block_type: 0x8000000000000001
zap_magic: 0x2f52ab2ab
zap_salt: 0x3e3cbee7f
Leafs with 2^n pointers:
5: 32 ********************************
Blocks with n*5 entries:
0: 32 ********************************
Blocks n/10 full:
1: 32 ********************************
Entries with n chunks:
4: 1 *
Buckets with n entries:
0: 16383 ****************************************
1: 1 *
2fe = 533742980 (type: Directory)
Indirect blocks:
0 L1 6:1a0095d000:a00 20000L/a00P F=33 B=1133009/1133009
0 L0 4:d99372200:200 4000L/200P F=1 B=1133009/1133009
4000 L0 4:2b78affa00:e00 4000L/e00P F=1 B=1132989/1132989
8000 L0 4:1a409fa00:e00 4000L/e00P F=1 B=1133008/1133008
c000 L0 4:dbecc8800:e00 4000L/e00P F=1 B=1133003/1133003
10000 L0 4:2d07544a00:e00 4000L/e00P F=1 B=1132997/1132997
14000 L0 5:11130c9600:e00 4000L/e00P F=1 B=1133005/1133005
18000 L0 5:1053a11c00:e00 4000L/e00P F=1 B=1132991/1132991
1c000 L0 4:2d07545800:e00 4000L/e00P F=1 B=1132997/1132997
20000 L0 6:1a41dd7c00:e00 4000L/e00P F=1 B=1133002/1133002
24000 L0 5:112ca4cc00:e00 4000L/e00P F=1 B=1133007/1133007
28000 L0 5:559e31000:e00 4000L/e00P F=1 B=1133000/1133000
2c000 L0 4:d91a7e000:e00 4000L/e00P F=1 B=1133004/1133004
30000 L0 4:d99372400:e00 4000L/e00P F=1 B=1133009/1133009
34000 L0 4:265bf62800:e00 4000L/e00P F=1 B=1132993/1132993
38000 L0 6:134c5fcc00:e00 4000L/e00P F=1 B=1132992/1132992
3c000 L0 5:559e31e00:e00 4000L/e00P F=1 B=1133000/1133000
40000 L0 5:11130ca400:e00 4000L/e00P F=1 B=1133005/1133005
44000 L0 4:dbeccac00:e00 4000L/e00P F=1 B=1133003/1133003
48000 L0 4:2b78b02200:e00 4000L/e00P F=1 B=1132989/1132989
4c000 L0 6:134c5ff400:e00 4000L/e00P F=1 B=1132992/1132992
50000 L0 4:1a40a2400:e00 4000L/e00P F=1 B=1133008/1133008
54000 L0 5:11130cb200:e00 4000L/e00P F=1 B=1133005/1133005
58000 L0 6:19f0f10c00:e00 4000L/e00P F=1 B=1132991/1132991
5c000 L0 4:1a40a3200:e00 4000L/e00P F=1 B=1133008/1133008
60000 L0 7:b97b6aa00:e00 4000L/e00P F=1 B=1133004/1133004
64000 L0 5:112ca4f400:e00 4000L/e00P F=1 B=1133007/1133007
68000 L0 4:17f825800:e00 4000L/e00P F=1 B=1132999/1132999
6c000 L0 6:1a2429de00:e00 4000L/e00P F=1 B=1132995/1132995
70000 L0 6:1a41dd9a00:e00 4000L/e00P F=1 B=1133002/1133002
74000 L0 7:129d29e800:e00 4000L/e00P F=1 B=1133007/1133007
78000 L0 4:dbeccca00:e00 4000L/e00P F=1 B=1133003/1133003
7c000 L0 4:17f826600:e00 4000L/e00P F=1 B=1132999/1132999
80000 L0 5:569fa5000:e00 4000L/e00P F=1 B=1132994/1132994
segment [0000000000000000, 0000000000084000) size 528K
[root@porter81:snap]# zdb -ddddd porter81/mdt0 533742980
Dataset porter81/mdt0 [ZPL], ID 148, cr_txg 98, 910G, 61852198 objects, rootbp DVA[0]=<4:88d9c400:200> DVA[1]=<5:25ca03c200:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=1214040L/1214040P fill=61852198 cksum=139cf672b7:5dc8d6146f6:f8e6add4f57c:1e27e38477f5c0
Object lvl iblk dblk dsize dnsize lsize %full type
zdb: dmu_bonus_hold(533742980) failed, errno 2
This is on a new file system that has not been used by end-users yet, but which we attempted to copy data to. More specifically:
|
| Comments |
| Comment by Olaf Faaland [ 08/Oct/18 ] |
|
We are uncertain whether lfs migrate was involved. If there is anything I could look for within either of the MDTs to determine whether lfs migrate was attempting to migrate this directory, or its parent, to help corroborate or rule that out, let me know. |
| Comment by Olaf Faaland [ 08/Oct/18 ] |
|
I've attached the console logs for the two MDSs. The files are named "console.porter {81,82}.tgz" This corruption was discovered early in the day on 2018-10-04 and the first attempt to copy this directory tree to the new file system, was started on 2018-09-22. So the damage must have occurred during the period covered by these logs. |
| Comment by Olaf Faaland [ 08/Oct/18 ] |
|
MDT0000 is in pool porter81. both pools report state "ONLINE", which means no errors. We do not have debug logs for MDT0000; it was stopped and re-started on 2018-10-04. |
| Comment by Peter Jones [ 09/Oct/18 ] |
|
Lai Could you please advise? Thanks Peter |
| Comment by Lai Siyao [ 09/Oct/18 ] |
|
The only clue I see is from porter81: 2018-10-04 04:29:58 [125075.971428] LustreError: 166475:0:(mdt_open.c:1515:mdt_reint_open()) lustre3-MDT0000: name '2fe' present, but FID [0x200004031:0x1216b:0x0] is invalid But it only tells that the FID of '2fe' doesn't exist. Can you use LFSCK to fix this inconsistency? |
| Comment by Olaf Faaland [ 09/Oct/18 ] |
|
Yes, I can run lfsck. Is there anything else I can look for, that might give a clue as to how this happened? |
| Comment by Lai Siyao [ 10/Oct/18 ] |
|
It may be related with dir migration, because in 2.10, dir migration will migrate dirent of all sub files to target first, and then migrate inodes of sub files, if it fails in the middle, it may leave some dirents point to nowhere. |
| Comment by Olaf Faaland [ 11/Oct/18 ] |
|
I started lfsck about 40 hours ago, via pdsh -w e81 lctl lfsck_start --all --create-ostobj on --create-mdtobj on --delay-create-ostobj on --orphan The invalid directory entry has been removed. Ignoring the "0" valued results, it now reports: [root@porteri:toss-4318]# pdsh -w e81 lctl lfsck_query | awk "\$NF > 0 {print}"
e81: layout_mdts_completed: 2
e81: layout_osts_completed: 79
e81: layout_osts_unknown: 1
e81: namespace_mdts_completed: 2
e81: namespace_repaired: 1
1. Is there any way for me to know in more detail, what it changed? |
| Comment by Olaf Faaland [ 11/Oct/18 ] |
|
I found that many (possibly all) of the OSTs had BUG: reports in their console logs, and that one OST had been failed over to its partner. I will create a ticket for that issue if there isn't one already, and link to it here. After I powered on the OSS that was off, and the OST moved back, I re-ran lfsck_query got this instead: [root@porteri:toss-4318]# pdsh -w e81 lctl lfsck_query | awk "\$NF > 0 {print}"
e81: layout_mdts_completed: 2
e81: layout_osts_completed: 80
e81: namespace_mdts_completed: 2
e81: namespace_repaired: 1
This file system has 80 OSTs and 2 MDTs. Does that mean that: thanks |
| Comment by Lai Siyao [ 12/Oct/18 ] |
|
layout_osts_unknown should be a bug, maybe it's because that OST crashed, or server replied a status not known. The second run should have finished lfsck on all MDTs and OSTs. I need to check the BUG info to better understand it, and see if I can inject error in the code to get that output. |
| Comment by Lai Siyao [ 01/Nov/18 ] |
|
Can you provide the BUG info? |
| Comment by Olaf Faaland [ 05/Nov/18 ] |
|
Hello Lai, Sorry for the long delay. Please see https://jira.whamcloud.com/browse/LU-11620 for one stack related to lfsck. |
| Comment by Olaf Faaland [ 03/Jan/19 ] |
|
Hi. What's the plan for this issue (the creation of the inconsistency, not lfsck)? Thanks. |
| Comment by Lai Siyao [ 04/Jan/19 ] |
|
Directory migration is rewritten, which fixed many issues in migration, but it's in 2.12. |
| Comment by Olaf Faaland [ 04/Jan/19 ] |
OK, then you're saying: Is that correct? |
| Comment by Gerrit Updater [ 04/Jan/19 ] |
|
Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/33960 |
| Comment by Olaf Faaland [ 04/Jan/19 ] |
|
In case (a) is "yes", I've uploaded a patch for b2_10. |
| Comment by Lai Siyao [ 14/Jan/19 ] |
|
Yes, Olaf. |
| Comment by Olaf Faaland [ 17/Jan/19 ] |
|
Hello Lai, I've added you as a reviewer on my patch, which at last update passed tests except sanity-scrub.test_9 which seems to me like it's unrelated to my patch - but maybe I'm mistaken. Can you kick it so that the review-dne-part-2, which includes sanity-scrub, is re-tested? thanks |
| Comment by Peter Jones [ 18/Jan/19 ] |
|
I re-triggered it |
| Comment by Gerrit Updater [ 29/Jan/19 ] |
|
Pushed against Master by mistake. This one will be abandoned.
|
| Comment by Gerrit Updater [ 15/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33960/ |
| Comment by Peter Jones [ 15/Feb/19 ] |
|
Landed for 2.10.7. Not needed on master |