[LU-3] Missing concurrent access control between statahead and VFS create/unlink operation caused dcache confused Created: 16/Sep/10  Updated: 08/Oct/11  Resolved: 08/Oct/11

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Major
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File b15962_b18.patch    
Severity: 3
Bugzilla ID: 15,962
Epic: statahead
Rank (Obsolete): 10307

 Description   

As originally reported on Lustre Bugzilla:
https://bugzilla.lustre.org/show_bug.cgi?id=15962

This issue will cause LBUG on client-side as following:

=====================
LustreError: 10521:0:(namei.c:826:ll_create_node()) ASSERTION(list_empty(&inode->i_dentry)) failed^M
LustreError: 10521:0:(namei.c:826:ll_create_node()) LBUG^M
Pid: 10521, comm: dir_create.sh^M
^M
Call Trace:^M
[<00000000f8e00640>] libcfs_debug_dumpstack+0x50/0x70 [libcfs]^M
[<00000000f8e00d9d>] lbug_with_loc+0x6d/0xd0 [libcfs]^M
[<00000000f8e0c1f6>] libcfs_assertion_failed+0x66/0x70 [libcfs]^M
[<00000000f927c762>] ll_create_nd+0xb62/0xbd0 [lustre]^M
[<00000000c0482b33>] vfs_create+0xc8/0x12f^M
[<00000000c04854e4>] open_namei+0x16a/0x5f9^M
[<00000000c0474b02>] do_filp_open+0x1c/0x31^M
[<00000000c0474b55>] do_sys_open+0x3e/0xae^M
[<00000000c0474bf2>] sys_open+0x16/0x18^M
[<00000000c0404f17>] syscall_call+0x7/0xb^M
=====================

It has been reported many times on lustre-1.8 in racer test. Iit can be reproduced locally with testing racer repeatedly.



 Comments   
Comment by nasf (Inactive) [ 16/Sep/10 ]

This is a mixed patch containing:

1) introduce rw_semaphore "lli_sa_rwsem" for the synchronization control between statahead and VFS create operation.
2) introduce spin_lock "lli_sa_lock" (special) to replace "lli_lock" (shared) for statahead related critical protection.
3) fix statahead related LUBG reported in lustre-discuss mail list recently: "(statahead.c:289:ll_sai_entry_fini()) LBUG".
4) drop unnecessary debug messages.
5) code cleanup.

This patch will make current statahead on lustre-1.8 more smoothly.

Comment by nasf (Inactive) [ 10/Oct/10 ]

The patch has been attached to Lustre bugzilla bug 15962, to be inspected by Oracle's engineers.

Comment by Kit Westneat (Inactive) [ 11/Jan/11 ]

Any progress on this? We have seen this happen at several sites and in fact have statahead disabled by default in our builds.

Comment by nasf (Inactive) [ 12/Jan/11 ]

The patch attached on bug 15962 is workable, but some inspector does not like it because it is hack, not only for the patch, but also for the statahead implementation. So it maybe not landed to the main branch. Currently, we are thinking about big change the statahead implementation to avoid those hack.

So if you want, I think you can apply such patch as temporary solution.

Comment by nasf (Inactive) [ 09/Feb/11 ]

new patch is in inspection:
http://review.whamcloud.com/#change,2

Comment by Sarah Liu [ 06/Apr/11 ]

got this issue when running runracer on three clients:
build: lustre-master/rhel5-x86_64/#16
-------------------------------------------------------

Apr 6 19:33:18 client-21 kernel: Lustre: DEBUG MARKER: == runracer test 1: racee
r on clients: client-5-ib,client-21-ib,client-22-ib DURATION=120 ============= 11
9:30:15 (1302143415)
Apr 6 19:33:18 client-21 xinetd[3131]: EXIT: shell status=0 pid=11966 duration==
0(sec)
Apr 6 19:33:18 client-21 xinetd[3131]: START: shell pid=11980 from=192.168.4.5
Apr 6 19:33:18 client-21 rshd[11981]: root@192.168.4.5 as root: cmd='(PATH=$PATT
H:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib644
/lustre/tests; LUSTRE="/usr/lib64/lustre" sh -c "DURATION=120 /usr/lib64/lustree
/tests/racer/racer.sh /mnt/lustre/racer ");echo XXRETCODE:$?'
Apr 6 19:33:34 client-21 kernel: LustreError: 11-0: an error occurred while comm
municating with 192.168.4.128@o2ib. The mds_getattr operation failed with -2
Apr 6 19:33:34 client-21 kernel: LustreError: Skipped 5 previous similar messagg
es
Apr 6 19:33:53 client-21 kernel: LustreError: 21677:0:(file.c:2151:ll_inode_revv
alidate_fini()) failure -2 inode 144115205339619325
Apr 6 19:34:08 client-21 kernel: LustreError: 1339:0:(file.c:2151:ll_inode_revaa
lidate_fini()) failure -116 inode 144115205322901335
Apr 6 19:34:19 client-21 kernel: LustreError: 10966:0:(file.c:2151:ll_inode_revv
alidate_fini()) failure -2 inode 144115205339624220
Apr 6 19:35:07 client-21 kernel: LustreError: 24950:0:(namei.c:743:ll_create_noo
de()) ASSERTION(list_empty(&inode->i_dentry)) failed
Apr 6 19:35:07 client-21 kernel: LustreError: 24950:0:(namei.c:743:ll_create_noo
de()) LBUG
Apr 6 19:35:07 client-21 kernel: Pid: 24950, comm: file_concat.sh
Apr 6 19:35:07 client-21 kernel:
Apr 6 19:35:07 client-21 kernel: Call Trace:
Apr 6 19:35:07 client-21 kernel: [<ffffffff886625f1>] libcfs_debug_dumpstack+00
x51/0x60 [libcfs]
Apr 6 19:35:07 client-21 kernel: [<ffffffff88662b2a>] lbug_with_loc+0x7a/0xd0
[libcfs]
Apr 6 19:35:07 client-21 kernel: [<ffffffff8866d960>] cfs_tracefile_init+0x0/00
x10a [libcfs]
Apr 6 19:35:07 client-21 kernel: [<ffffffff88aa00dc>] ll_create_nd+0x35c/0x9d00
[lustre]
Apr 6 19:35:07 client-21 kernel: [<ffffffff80022924>] d_alloc+0x174/0x1a9
Apr 6 19:35:07 client-21 kernel: [<ffffffff8003a579>] vfs_create+0xe6/0x158
Apr 6 19:35:07 client-21 kernel: [<ffffffff8001b0d9>] open_namei+0x19d/0x6d5
Apr 6 19:35:07 client-21 kernel: [<ffffffff80027533>] do_filp_open+0x1c/0x38
Apr 6 19:35:07 client-21 kernel: [<ffffffff80019e5d>] do_sys_open+0x44/0xbe
Apr 6 19:35:07 client-21 kernel: [<ffffffff8005d116>] system_call+0x7e/0x83
Apr 6 19:35:07 client-21 kernel:

Comment by nasf (Inactive) [ 16/Jun/11 ]

For 2.6.38 or newer kernel, RCU is used to make a significant part of the entire path walk. This is known as "rcu-walk" path walking. But even if that, the parent's "i_muext" is necessary for the caller of "do_lookup()" to prevent concurrent lookup/create operations with the same under the same parent.

======
mutex_lock(&dir->i_mutex);
/*

  • First re-do the cached lookup just in case it was created
  • while we waited for the directory semaphore, or the first
  • lookup failed due to an unrelated rename.
    *
  • This could use version numbering or similar to avoid unnecessary
  • cache lookups, but then we'd have to do the first lookup in the
  • non-racy way. However in the common case here, everything should
  • be hot in cache, so would it be a big win?
    */
    dentry = d_lookup(parent, name);
    if (likely(!dentry)) { dentry = d_alloc_and_lookup(parent, name, nd); mutex_unlock(&dir->i_mutex); if (IS_ERR(dentry)) goto fail; goto done; }

    /*

  • Uhhuh! Nasty case: the cache was re-populated while
  • we waited on the semaphore. Need to revalidate.
    */
    mutex_unlock(&dir->i_mutex);
    ======

So we still have to adjust statahead to make it controllable under concurrent access mode with other VFS create/unlink operations.

Comment by nasf (Inactive) [ 07/Jul/11 ]

The patch for removing statahead dcache hack:

http://review.whamcloud.com/#change,1208

Comment by nasf (Inactive) [ 08/Oct/11 ]

The race conditions caused by statahead will be fixed by the patch(es) for ORNL-7

Generated at Sat Feb 10 01:02:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.