[LU-5153] LustreError: 14404:0:(fld_index.c:176:fld_index_create()) ASSERTION( mutex_is_locked(&fld->lsf_lock) ) failed: Created: 06/Jun/14 Updated: 16/Jun/14 Resolved: 16/Jun/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne, dne2, patch | ||
| Environment: |
Large DNE system on CentOS, upgrading from 2.5 with remote directories to 2.6/master. Occurred on some MDSes when trying to start the MDTs. |
||
| Severity: | 3 |
| Rank (Obsolete): | 14218 |
| Description |
|
When trying to start out large 2.5 DNE test bed with 2.6, we hit the following assertion: Looking at the assertion and other call chains to this function, I see that the mutex in question is usually taken around calls to: The problematic call chain was introduced by this commit: 1. Add local FLDB to each MDT, so OSD/OUT can check whether 2. OSD will only do local lookup when checking remote FID. 3. During upgrade, MDTn(n != 0) needs to retrieve its fldb 4. MDT should also use LWP(instead of OSP) to communicate Signed-off-by: wang di <di.wang@intel.com> I will generate a patch. |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 06/Jun/14 ] |
|
Patch here: |
| Comment by Patrick Farrell (Inactive) [ 06/Jun/14 ] |
|
A bit of further investigation shows this is happening when upgrading 2.5 DNE systems to 2.6. This LBUG occurs on all of the non-primary MDSes. (This also matches what's expected in the code, as this code is only called when updating those non-primary MDSes.) |
| Comment by Peter Jones [ 06/Jun/14 ] |
|
Di Could you please review this patch? Thanks Peter |
| Comment by Jodi Levi (Inactive) [ 16/Jun/14 ] |
|
Patch landed to Master. Please reopen ticket if more work is needed. |