Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.1.0, Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0
-
None
-
single node setup.
-
3
-
7624
Description
While investing a Cray issues we create a debug patch to verify a correct mode assigned to the md object after revalidating, but result is very confusing.
same fid assigned to the different files.
00000004:00000002:2.0:1365510202.512393:0:19372:0:(mdt_reint.c:281:mdt_md_create()) @@@ Create (bdir->[0x200000bd0:0x1:0x0]) in [0x200000400:0x7:0x0] req@ffff880036324928 x1431841008517781/t0(17179869191) o36->1e287f29-a399-fb10-7bd0-d9d83ebc1ce1@0@lo:0/0 lens 456/536 e 0 to 0 dl 1365510223 ref 1 fl Complete:/4/0 rc 0/0
00000004:00000002:6.0:1365510375.912867:0:23947:0:(mdt_reint.c:281:mdt_md_create()) @@@ Create (f9-2->[0x200000bd0:0x1:0x0]) in [0x61ab:0x54d71902:0x0] req@ffff88002a88c008 x1431841008518517/t0(0) o36->e35c9b0d-1221-a3ae-0357-637f30903536@0@lo:0/0 lens 456/536 e 0 to 0 dl 1365510395 ref 1 fl Interpret:/0/0 rc 0/0
as you see - bdir and f9-2 have a same fid.
replicating a very easy
1) apply an patch (to hit assert and panic) or dump log after test loop.
bash-3.2$ git diff diff --git a/lustre/llite/llite_lib.c b/lustre/llite/llite_lib.c index 7dad8f9..c40cfd2 100644 --- a/lustre/llite/llite_lib.c +++ b/lustre/llite/llite_lib.c @@ -1654,6 +1654,20 @@ void ll_update_inode(struct inode *inode, struct lustre_md *md) struct ll_sb_info *sbi = ll_i2sbi(inode); LASSERT ((lsm != NULL) == ((body->valid & OBD_MD_FLEASIZE) != 0)); + if (body->valid & OBD_MD_FLTYPE) { + if ((!S_ISDIR(inode->i_mode) && S_ISDIR(body->mode)) || + (S_ISDIR(inode->i_mode) && !S_ISDIR(body->mode))) { +// if ((inode->i_mode & S_IFMT) != (body->mode & S_IFMT)) { + CERROR("ino %p->%lu type is %s but server reply has %s " + "type (%o and %o). FID "DFID"\n", inode, inode->i_ino, + S_ISDIR(inode->i_mode) ? "directory" :"file", + S_ISDIR(body->mode) ? "directory" : "file", + inode->i_mode & S_IFMT, body->mode & S_IFMT, + PFID(&lli->lli_fid)); + LBUG(); + } + } + if (lsm != NULL) { LASSERT(S_ISREG(inode->i_mode));
2) run PTLDEBUG=-1 SUBSYSTEM=-1 DEBUG_SIZE=400 llmount.sh to setup a single node lustre cluster
3) run a replay-dual with simple script.
for ((i=0; i!=1000; i++)) do
for ((j=1; j!=10; j++)) do
echo "Iteration $i"
SLOW=YES ONLY=$j DIR=/mnt/lustre sh ./replay-dual.sh
done
done
in 90% times you have hit a bug after first loop.