After more investigation, I think that the operation sequences on the client when failure occurred were as following:
1) The thread "54770" was one IOR thread which was trying to parsing "/p/lcrater2/swltest/SWL/Hype2/IO/37988/test_dir/miranda_io.out.10022". Before processed the last component "miranda_io.out.10022", it needed to lookup the "test_dir" firstly.
2) The dentry "ffff8804302af900" for the "test_dir" existed in memory at that time, but was marked as "DCACHE_LUSTRE_INVALID". So "ll_compare()" against such dentry "ffff8804302af900" returned unmatched for the thread "54770". So the thread "54770" created new negative dentry "ffff8804245c3800" for the "test_dir" (reverse guess from the result), and triggered real lookup:
00000080:00002000:6.0:1337981358.181442:0:54770:0:(dcache.c:103:ll_dcompare()) found name test_dir(ffff8804302af900) - flags 0/4000000 - refc 25
00000080:00200000:6.0:1337981358.181444:0:54770:0:(namei.c:454:ll_lookup_it()) VFS Op:name=test_dir,dir=144237863481960082/33582994(ffff8803b184d838),intent=0
...
00800000:00000002:6.0:1337981358.181449:0:54770:0:(lmv_intent.c:405:lmv_intent_lookup()) LOOKUP_INTENT with fid1=[0x2006f9298:0xe692:0x0], fid2=[0x0:0x0:0x0], name='test_dir' -> mds #0
00000002:00010000:6.0:1337981358.181450:0:54770:0:(mdc_locks.c:917:mdc_intent_lock()) (name: test_dir,[0x0:0x0:0x0]) in obj [0x2006f9298:0xe692:0x0], intent: lookup flags 00
3) The thread "54770" succeeded to re-obtain related ibits lock from MDS, and tried to update inode and dentry for the "test_dir" on the client:
00000002:00002000:6.0:1337981358.181799:0:54770:0:(mdc_locks.c:844:mdc_finish_intent_lock()) D_IT dentry test_dir intent: lookup status 0 disp b rc 0
...
00000080:00200000:6.0:1337981358.181809:0:54770:0:(namei.c:151:ll_iget()) got inode: ffff8803b184d3f8 for [0x2006f9298:0xe693:0x0]
4) When the thread "54770" was in the function "ll_splice_alias()" to process the "test_dir", another thread "54766" (the failure thread as described above) began to lookup the "test_dir" for parsing "/p/lcrater2/swltest/SWL/Hype2/IO/37988/test_dir/miranda_io.out.10018".
5) There was race condition between the thread "54770" and the thread "54766" for the dentry "test_dir":
struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
{
struct dentry *new;
if (inode) {
new = ll_find_alias(inode, de);
if (new) {
ll_dops_init(new, 1, 1);
(5.1)===> d_rehash(de);
(5.3)===> d_move(new, de);
iput(inode);
CDEBUG(D_DENTRY,
"Reuse dentry %p inode %p refc %d flags %#x\n",
new, new->d_inode, d_refcount(new), new->d_flags);
return new;
}
}
...
}
int ll_dcompare(struct dentry *parent, struct qstr *d_name, struct qstr *name)
{
...
(5.2)===> CDEBUG(D_DENTRY,"found name %.*s(%p) - flags %d/%x - refc %d\n",
name->len, name->name, dchild,
d_mountpoint(dchild), dchild->d_flags & DCACHE_LUSTRE_INVALID,
atomic_read(&dchild->d_count));
...
}
5.1) The thread "54770" found the former dentry "ffff8804302af900", and tried to reuse "ffff8804302af900". But firstly, it rehashed the negative dentry "ffff8804245c3800" before established the inode alias.
5.2) The thread "54766" found the negative dentry "ffff8804245c3800" in "__d_lookup()", which was just rehashed by the thread "54770". Since the negative dentry "ffff8804245c3800" is not marked as "DCACHE_LUSTRE_INVALID", the thread "54766" regarded it as valid entry for the "test_dir".
5.3) The thread "54770" called "d_move()" to establish relative inode alias. After that, the exported dentry for "test_dir" became the valid "ffff8804302af900".
00000080:00010000:6.0:1337981358.181811:0:54770:0:(namei.c:410:ll_lookup_it_finish()) setting l_data to inode ffff8803b184d3f8 (144237863481960083/33582994)
...
00000080:00002000:2.0:1337981358.181812:0:54766:0:(dcache.c:103:ll_dcompare()) found name 37988(ffff8804302af540) - flags 0/0 - refc 6
00000080:00200000:2.0:1337981358.181812:0:54766:0:(dcache.c:359:ll_revalidate_it()) VFS Op:name=37988,intent=0
00000080:00200000:2.0:1337981358.181813:0:54766:0:(file.c:2465:ll_inode_permission()) VFS Op:inode=144237863481960082/33582994(ffff8803b184d838), inode mode 41c0 mask 1
00000080:00002000:2.0:1337981358.181814:0:54766:0:(dcache.c:103:ll_dcompare()) found name test_dir(ffff8804245c3800) - flags 0/0 - refc 1
00000080:00002000:6.0:1337981358.181815:0:54770:0:(namei.c:378:ll_splice_alias()) Reuse dentry ffff8804302af900 inode ffff8803b184d3f8 refc 26 flags 0x4000000
00000080:00010000:6.0:1337981358.181817:0:54770:0:(dcache.c:324:ll_lookup_finish_locks()) setting l_data to inode ffff8803b184d3f8 (144237863481960083/33582994)
00000080:00010000:6.0:1337981358.181818:0:54770:0:(dcache.c:233:ll_intent_drop_lock()) releasing lock with cookie 0x829fa7d835b2db86 from it ffff880431f49b98
So here, we should not make the negative dentry "ffff8804245c3800" to be visible to others in "ll_splice_alias()". Either remove "rehash" or mark the dentry as "DCACHE_LUSTRE_INVALID". So need more suitable processing for that.
Thanks Prakash. We will track landing this code under
LU-506so I am closing this ticket as a duplicate of that. As to whether this fix will also address the instances you may have observed prior to applying the intially flawedLU-1234patch, it may well do because the cache mechanism has been altered by this change.