Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.10.4
-
server: RHEL 7.4 derivative, zfs-0.7.11-4llnl.ch6.x86_64, lustre-2.10.4_1.chaos
client: RHEL 7.4 derivative, lustre-2.8.2_4.chaos-1
We are using DNE 1 with two MDTs on two servers, porter81 and porter82
for the zfs tag, see https://github.com/LLNL/zfs/releases
for our 2.10 tag, see https://github.com/LLNL/lustre/
for our 2.8 tag, see lustre-release-fe-llnl on gerrittserver: RHEL 7.4 derivative, zfs-0.7.11-4llnl.ch6.x86_64, lustre-2.10.4_1.chaos client: RHEL 7.4 derivative, lustre-2.8.2_4.chaos-1 We are using DNE 1 with two MDTs on two servers, porter81 and porter82 for the zfs tag, see https://github.com/LLNL/zfs/releases for our 2.10 tag, see https://github.com/LLNL/lustre/ for our 2.8 tag, see lustre-release-fe-llnl on gerritt
-
2
-
9223372036854775807
Description
A directory has an entry for subdirectory "2fe", but the object ID stored for that entry does not exist:
alias ll="ls -l" [root@catalyst101:~]# ll /p/lustre3/videousr/YLI/mmcommons/data/images_v1 ls: cannot access /p/lustre3/videousr/YLI/mmcommons/data/images_v1/2fe: No such file or directory total 0 d????????? ? ? ? ? ? 2fe
And when using zdb on the MDT to examine images_v1, one sees that 2fe refers to an object ID that is invalid:
[root@porter81:snap]# zdb -ddddd porter81/mdt0 533741247
Dataset porter81/mdt0 [ZPL], ID 148, cr_txg 98, 910G, 61852198 objects, rootbp DVA[0]=<4:88d9c400:200> DVA[1]=<5:25ca03c200:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=1214040L/1214040P fill=61852198 cksum=139cf672b7:5dc8d6146f6:f8e6add4f57c:1e27e38477f5c0
Object lvl iblk dblk dsize dnsize lsize %full type
533741247 2 128K 16K 231K 512 528K 100.00 ZFS directory
192 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
dnode maxblkid: 32
path ???<object#533741247>
uid 0
gid 2093
atime Mon Oct 8 11:01:28 2018
mtime Wed Oct 3 15:53:08 2018
ctime Wed Oct 3 15:53:08 2018
crtime Mon Oct 1 20:53:54 2018
gen 1090081
mode 42700
size 2
parent 533740502
links 3
pflags 0
rdev 0x0000000000000000
SA xattrs: 204 bytes, 3 entries
trusted.lma = \000\000\000\000\000\000\000\0002@\000\000\002\000\000\000\245\037\001\000\000\000\000\000
trusted.link = \337\361\352\021\001\000\000\0003\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\033\000\000\000\002\000\000@F\000\0001\213\000\000\000\000images_v1
trusted.version = \022\231\236+\011\000\000\000
Fat ZAP stats:
Pointer table:
1024 elements
zt_blk: 0
zt_numblks: 0
zt_shift: 10
zt_blks_copied: 0
zt_nextblk: 0
ZAP entries: 1
Leaf blocks: 32
Total blocks: 33
zap_block_type: 0x8000000000000001
zap_magic: 0x2f52ab2ab
zap_salt: 0x3e3cbee7f
Leafs with 2^n pointers:
5: 32 ********************************
Blocks with n*5 entries:
0: 32 ********************************
Blocks n/10 full:
1: 32 ********************************
Entries with n chunks:
4: 1 *
Buckets with n entries:
0: 16383 ****************************************
1: 1 *
2fe = 533742980 (type: Directory)
Indirect blocks:
0 L1 6:1a0095d000:a00 20000L/a00P F=33 B=1133009/1133009
0 L0 4:d99372200:200 4000L/200P F=1 B=1133009/1133009
4000 L0 4:2b78affa00:e00 4000L/e00P F=1 B=1132989/1132989
8000 L0 4:1a409fa00:e00 4000L/e00P F=1 B=1133008/1133008
c000 L0 4:dbecc8800:e00 4000L/e00P F=1 B=1133003/1133003
10000 L0 4:2d07544a00:e00 4000L/e00P F=1 B=1132997/1132997
14000 L0 5:11130c9600:e00 4000L/e00P F=1 B=1133005/1133005
18000 L0 5:1053a11c00:e00 4000L/e00P F=1 B=1132991/1132991
1c000 L0 4:2d07545800:e00 4000L/e00P F=1 B=1132997/1132997
20000 L0 6:1a41dd7c00:e00 4000L/e00P F=1 B=1133002/1133002
24000 L0 5:112ca4cc00:e00 4000L/e00P F=1 B=1133007/1133007
28000 L0 5:559e31000:e00 4000L/e00P F=1 B=1133000/1133000
2c000 L0 4:d91a7e000:e00 4000L/e00P F=1 B=1133004/1133004
30000 L0 4:d99372400:e00 4000L/e00P F=1 B=1133009/1133009
34000 L0 4:265bf62800:e00 4000L/e00P F=1 B=1132993/1132993
38000 L0 6:134c5fcc00:e00 4000L/e00P F=1 B=1132992/1132992
3c000 L0 5:559e31e00:e00 4000L/e00P F=1 B=1133000/1133000
40000 L0 5:11130ca400:e00 4000L/e00P F=1 B=1133005/1133005
44000 L0 4:dbeccac00:e00 4000L/e00P F=1 B=1133003/1133003
48000 L0 4:2b78b02200:e00 4000L/e00P F=1 B=1132989/1132989
4c000 L0 6:134c5ff400:e00 4000L/e00P F=1 B=1132992/1132992
50000 L0 4:1a40a2400:e00 4000L/e00P F=1 B=1133008/1133008
54000 L0 5:11130cb200:e00 4000L/e00P F=1 B=1133005/1133005
58000 L0 6:19f0f10c00:e00 4000L/e00P F=1 B=1132991/1132991
5c000 L0 4:1a40a3200:e00 4000L/e00P F=1 B=1133008/1133008
60000 L0 7:b97b6aa00:e00 4000L/e00P F=1 B=1133004/1133004
64000 L0 5:112ca4f400:e00 4000L/e00P F=1 B=1133007/1133007
68000 L0 4:17f825800:e00 4000L/e00P F=1 B=1132999/1132999
6c000 L0 6:1a2429de00:e00 4000L/e00P F=1 B=1132995/1132995
70000 L0 6:1a41dd9a00:e00 4000L/e00P F=1 B=1133002/1133002
74000 L0 7:129d29e800:e00 4000L/e00P F=1 B=1133007/1133007
78000 L0 4:dbeccca00:e00 4000L/e00P F=1 B=1133003/1133003
7c000 L0 4:17f826600:e00 4000L/e00P F=1 B=1132999/1132999
80000 L0 5:569fa5000:e00 4000L/e00P F=1 B=1132994/1132994
segment [0000000000000000, 0000000000084000) size 528K
[root@porter81:snap]# zdb -ddddd porter81/mdt0 533742980
Dataset porter81/mdt0 [ZPL], ID 148, cr_txg 98, 910G, 61852198 objects, rootbp DVA[0]=<4:88d9c400:200> DVA[1]=<5:25ca03c200:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=1214040L/1214040P fill=61852198 cksum=139cf672b7:5dc8d6146f6:f8e6add4f57c:1e27e38477f5c0
Object lvl iblk dblk dsize dnsize lsize %full type
zdb: dmu_bonus_hold(533742980) failed, errno 2
This is on a new file system that has not been used by end-users yet, but which we attempted to copy data to. More specifically:
1, We copied about 500 million files/dirs to it
2. We tried to use lfs migrate -M to move some large subtrees from one MDT to another, but that failed due to a Lustre 2.8 bug with lfs migrate
3. We deleted most of the files/dirs
- The servers did not crash, as far as I can recall, while we were performing all the copy and delete operations. But I cannot be certain of that.
- We inspected the console logs on the servers and clients but found nothing that sounded like it indicated object creation or destruction failing.