[LU-3411] Encountered at NULL pointer exception for function osd_read_prep Created: 28/May/13 Updated: 01/Oct/13 Resolved: 06/Aug/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | James A Simmons | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
OSS running Lustre 2.4.0-RC2 on a RHEL6.4 system. |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 8429 | ||||||||
| Description |
|
While running test application against the file system one of OSS crashed and rebooted. We managed to get a crash dump and form the early analysis determined the crash to occur in osd_read_prep. Here is what was dmesg ran in crash showed. [ 6304.888367] BUG: unable to handle kernel NULL pointer dereference at 0000000000000036 |
| Comments |
| Comment by Peter Jones [ 28/May/13 ] |
|
Alex Could you please comment on this one? Thanks Peter |
| Comment by Andreas Dilger [ 28/May/13 ] |
|
When you get a chance, can you please decode osd_read_prep+0x326/0x3b0 to a specific line number, and verify that 0x36 is a valid structure offset for a NULL pointer that is being accessed on that line. |
| Comment by James A Simmons [ 28/May/13 ] |
|
Uploaded the vmcore and debuginfo rpms to ftp.whamcloud.com/uploads/ |
| Comment by James A Simmons [ 28/May/13 ] |
crash> l *(osd_read_prep+0x326)
0xffffffffa0d31776 is in osd_read_prep (/usr/src/debug/lustre-2.4.0/lustre/osd-ldiskfs/osd_io.c:930).
925 cfs_gettimeofday(&end);
926 timediff = cfs_timeval_sub(&end, &start, NULL);
927 lprocfs_counter_add(osd->od_stats, LPROC_OSD_GET_PAGE, timediff);
928
929 if (iobuf->dr_npages) {
930 rc = osd->od_fsops->fs_map_inode_pages(inode, iobuf->dr_pages,
931 iobuf->dr_npages,
932 iobuf->dr_blocks,
933 0, NULL);
934 rc = osd_do_bio(osd, inode, iobuf);
|
| Comment by Alex Zhuravlev [ 29/May/13 ] |
|
given iobuf has been used before in few places and proximity of 0x36 to: I tend to think it's osd->od_fsops being NULL. though I'd expect struct fsfilt_operations to have exact offsets.. |
| Comment by Alex Zhuravlev [ 29/May/13 ] |
|
just discussed with Oleg, seem to be a result of incorrect error handling in osd_mount(): o->od_fsops = fsfilt_get_ops(mt_str(LDD_MT_LDISKFS)); while fsfilt_get_ops() can actually return -ENOENT, giving us that -2 offset to 0x38. |
| Comment by Alex Zhuravlev [ 30/May/13 ] |
| Comment by Oleg Drokin [ 30/May/13 ] |
|
Ok, so the patch fixes the crash. The pressing question I have - how come this path triggered at all? It's not like there was no fsfilt_ldiskfs.ko module present, right? So how come? |
| Comment by James A Simmons [ 30/May/13 ] |
|
The second attempt we did not have this problem so fsfilt_ldiskfs.ko was there. Yes its very strange. |
| Comment by James A Simmons [ 06/Aug/13 ] |
|
This patch was merged in the 2.4 branch. We can close this ticket now. |
| Comment by Peter Jones [ 06/Aug/13 ] |
|
Landed for 2.5 |