[LU-815] BUG: unable to handle kernel NULL pointer dereference" in lprocfs_rd_import() Created: 02/Nov/11 Updated: 08/Sep/16 Resolved: 15/Dec/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Lustre Bull | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre-2.0, RHEL6.0 |
||
| Severity: | 3 |
| Bugzilla ID: | 24,449 |
| Rank (Obsolete): | 6530 |
| Description |
|
We've been hitting this problem for several months whe we reading in "/proc/fs/lustre/osc/<OST>/import". I saw there's maybe a related patch (BZ#22032 - WC's git: 839280926956f16552194fe803ba21096770ebc4) which was integrated for Lustre-2.1. What do you think of this? If 22032's patch is not related, then does this sound to you as a know problem? ============================================================================== Pid: 29413, comm: grep Not tainted 2.6.32-30.el6.Bull.14.x86_64 #1 bullx super-node And further+in-deep analysis clearly indicates this problem comes from a race between a process reading Thanks, |
| Comments |
| Comment by Andreas Dilger [ 02/Nov/11 ] |
|
Looking at git for 839280926956f16552194fe803ba21096770ebc4, it definitely seems related, but "git describe" shows that this should be included into v2_0_0-rc1a, which means it should be in the Lustre 2.0.0 release already. Are you running the official 2.0.0 release, or some earlier build? The other possibility is that this is related to the patch in http://review.whamcloud.com/1544 ( |
| Comment by Diego Moreno (Inactive) [ 03/Nov/11 ] |
|
Actually I don't think the patch in 839280926956f16552194fe803ba21096770ebc4 is in the official 2.0.0 release. In git we can see this patch was introduced between 2.0.52.0 and 2.0.53.0 tags so the result shown by git describe is very strange. We're going to integrate this patch in our 2.0.0 (which is the official) and if we still have the problem we'll try with Thanks Andreas |
| Comment by Peter Jones [ 15/Dec/11 ] |
|
Any feedback on this ticket? Have you been able to try the suggested fix yet? If not, when do you expect to be able to do so? |
| Comment by Sebastien Buisson (Inactive) [ 15/Dec/11 ] |
|
Yes, we integrated the proposed patch, and delivered it to the customer. But we do not have any feedback yet. |
| Comment by Andreas Dilger [ 15/Dec/11 ] |
|
I'm going to mark this fixed in 2.1.0. Please reopen if the customer hits this problem again. |
| Comment by Bruno Faccini (Inactive) [ 04/Oct/12 ] |
|
Humm, even if running with lustre 2.1.1 (including fix for BZ#22032) we can still reproduce the same crash/Oops !! So would like to re-open this JIRA ... Again the crash is due to imp->imp_connection beeing NULL and beeing dereferenced in lprocfs_rd_import(). So I am back with my earlier fix idea, not choosen by Bull R&D in favor of BZ#22032 at that time ..., where imp->imp_connection access must be done under imp->imp_lock protection too and NULL value detected. Patch against b2_1 is at http://review.whamcloud.com/4187 |
| Comment by Bruno Faccini (Inactive) [ 05/Oct/12 ] |
|
Oops, thank's to Andreas asking me to review code+patches from top-down, starting with master branch !! And bingo, a similar patch has been already applied starting with b2_3, it comes from JIRA Patch on master is at http://review.whamcloud.com/2995, so it needs to be cherry-picked from there to be applied to b2_1/b2_2 branches. In the mean time, should I need to "Abandon" my change on Gerrit by pointing to master change ?? |