[LU-3] Missing concurrent access control between statahead and VFS create/unlink operation caused dcache confused Created: 16/Sep/10 Updated: 08/Oct/11 Resolved: 08/Oct/11 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Bugzilla ID: | 15,962 |
| Epic: | statahead |
| Rank (Obsolete): | 10307 |
| Description |
|
As originally reported on Lustre Bugzilla: This issue will cause LBUG on client-side as following: ===================== It has been reported many times on lustre-1.8 in racer test. Iit can be reproduced locally with testing racer repeatedly. |
| Comments |
| Comment by nasf (Inactive) [ 16/Sep/10 ] |
|
This is a mixed patch containing: 1) introduce rw_semaphore "lli_sa_rwsem" for the synchronization control between statahead and VFS create operation. This patch will make current statahead on lustre-1.8 more smoothly. |
| Comment by nasf (Inactive) [ 10/Oct/10 ] |
|
The patch has been attached to Lustre bugzilla bug 15962, to be inspected by Oracle's engineers. |
| Comment by Kit Westneat (Inactive) [ 11/Jan/11 ] |
|
Any progress on this? We have seen this happen at several sites and in fact have statahead disabled by default in our builds. |
| Comment by nasf (Inactive) [ 12/Jan/11 ] |
|
The patch attached on bug 15962 is workable, but some inspector does not like it because it is hack, not only for the patch, but also for the statahead implementation. So it maybe not landed to the main branch. Currently, we are thinking about big change the statahead implementation to avoid those hack. So if you want, I think you can apply such patch as temporary solution. |
| Comment by nasf (Inactive) [ 09/Feb/11 ] |
|
new patch is in inspection: |
| Comment by Sarah Liu [ 06/Apr/11 ] |
|
got this issue when running runracer on three clients: Apr 6 19:33:18 client-21 kernel: Lustre: DEBUG MARKER: == runracer test 1: racee |
| Comment by nasf (Inactive) [ 16/Jun/11 ] |
|
For 2.6.38 or newer kernel, RCU is used to make a significant part of the entire path walk. This is known as "rcu-walk" path walking. But even if that, the parent's "i_muext" is necessary for the caller of "do_lookup()" to prevent concurrent lookup/create operations with the same under the same parent. ======
So we still have to adjust statahead to make it controllable under concurrent access mode with other VFS create/unlink operations. |
| Comment by nasf (Inactive) [ 07/Jul/11 ] |
|
The patch for removing statahead dcache hack: |
| Comment by nasf (Inactive) [ 08/Oct/11 ] |
|
The race conditions caused by statahead will be fixed by the patch(es) for ORNL-7 |