[LU-5260] Null pointer dereference in ll_cl_find Created: 26/Jun/14 Updated: 08/Jul/14 Resolved: 08/Jul/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
SLES11 SP3 2.6 clients, CentOS 2.6 servers with striped directories. |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 14674 | ||||||||
| Description |
|
During testing of 2.6 clients and servers (with striped directories), we lost a client to a null pointer dereference here: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 Pid: 13639, comm: memfill3 Tainted: P 3.0.101-0.15.1_1.0502.8131-cray This is an untouched copy of master from last week, no other patches. The most recent commit: This is revive of git commit 83ae17df2bdce837e62473aec27c03d67312c8ea. Signed-off-by: Bobi Jam <bobijam.xu@intel.com> |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 26/Jun/14 ] |
|
I'll make the dump available shortly. A few thoughts: This was encountered during testing of striped directories. Not clearly relevant. |
| Comment by Patrick Farrell (Inactive) [ 26/Jun/14 ] |
|
Dump: |
| Comment by Jodi Levi (Inactive) [ 26/Jun/14 ] |
|
Jinshan, |
| Comment by Jinshan Xiong (Inactive) [ 26/Jun/14 ] |
|
it looks like that ll_cl_context{} list was corrupted. Please try the following debug patch and see what'll happen. |
| Comment by Patrick Farrell (Inactive) [ 26/Jun/14 ] |
|
Jinshan - We've only hit this once so far.. Do you think that patch would be OK to temporarily commit to Cray's copy of master, so it's part of all of our master testing? If it's expected to have a large perf impact, I'd have to limit it to test runs looking for this issue. |
| Comment by Patrick Farrell (Inactive) [ 30/Jun/14 ] |
|
On another test run, we hit what I assume is the same bug; a CPU stall when searching the list in ll_cl_find. We're going to arrange a test run with Jinshan's debug patch. |
| Comment by Jinshan Xiong (Inactive) [ 02/Jul/14 ] |
|
After taking a further look, I think I found the problem. I'm creating the patch and will share it with you shortly. |
| Comment by Jinshan Xiong (Inactive) [ 02/Jul/14 ] |
|
patch is at: http://review.whamcloud.com/10955 |
| Comment by Andreas Dilger [ 04/Jul/14 ] |
|
Was introduced by http://review.whamcloud.com/10503 . |
| Comment by Jodi Levi (Inactive) [ 08/Jul/14 ] |
|
Patch landed to Master. |