[LU-7061] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: osd_scrub_refresh_mapping+0x39d/0x410 Created: 31/Aug/15 Updated: 04/Dec/19 Resolved: 10/Sep/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0, Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andriy Skulysh | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [] osd_scrub_refresh_mapping+0x39d/0x410 [osd_ldiskfs] PGD 424f4067 PUD 379e9067 PMD 0 Oops: 0000 1 SMP last sysfs file: /sys/devices/pci0000:00/0000:00:0b.0/virtio3/block/vdb/queue/scheduler CPU 0 Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4] Pid: 7379, comm: mount.lustre Not tainted 2.6.32-431.17.1.el6_lustreb_neo_stable_us_unlabeled #1 Red Hat KVM RIP: 0010:[] [] osd_scrub_refresh_mapping+0x39d/0x410 [osd_ldiskfs] RSP: 0018:ffff88005b9af858 EFLAGS: 00010202 RAX: 00000000ffffffff RBX: ffff88005b8b8000 RCX: 0000000000000000 RDX: ffffffffa0b97b49 RSI: ffffffffa0b9c2e8 RDI: ffffffffa0bb4820 RBP: ffff88005b9af8b8 R08: 20737365636f7250 R09: 0000000000000001 R10: 20737365636f7250 R11: 0a64657265746e65 R12: 00000000ffffffe2 R13: 0000000000000001 R14: ffffffffffffffe2 R15: ffffffffa0b97f88 FS: 00007f21e28ec700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000004 CR3: 000000005df01000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mount.lustre (pid: 7379, threadinfo ffff88005b9ae000, task ffff88005b9caaa0) Stack: ffffffffa0b9ee9c ffff88005b8b2000 ffff88005b9af898 ffffffffffffffe2 ffff88005b9af801 0000000000000000 ffff8800414c43c8 ffff88005b8b8000 ffffffffa0b97fb0 ffff88005b8b2000 ffff8800414c43c0 ffff88005d5c2d80 Call Trace: [] osd_scrub_setup+0xe25/0xf30 [osd_ldiskfs] [] ? lfsck_key_init+0xd9/0x190 [lfsck] [] osd_device_alloc+0x717/0x990 [osd_ldiskfs] [] obd_setup+0x1bf/0x290 [obdclass] [] class_setup+0x208/0x870 [obdclass] [] class_process_config+0xc5c/0x1ac0 [obdclass] [] ? libcfs_log_return+0x28/0x40 [libcfs] [] ? lustre_cfg_new+0x40b/0x6f0 [obdclass] [] do_lcfg+0x158/0x450 [obdclass] [] lustre_start_simple+0x94/0x200 [obdclass] [] server_fill_super+0x1061/0x1b92 [obdclass] [] ? libcfs_log_return+0x28/0x40 [libcfs] [] lustre_fill_super+0x1d8/0x550 [obdclass] [] ? lustre_fill_super+0x0/0x550 [obdclass] [] get_sb_nodev+0x5f/0xa0 [] lustre_get_sb+0x25/0x30 [obdclass] [] vfs_kern_mount+0x7b/0x1b0 [] do_kern_mount+0x52/0x130 [] do_mount+0x2fb/0x930 [] sys_mount+0x90/0xe0 [] system_call_fastpath+0x16/0x1b Code: 48 c7 c7 20 48 bb a0 c7 05 71 d6 02 00 75 00 00 00 48 c7 05 72 d6 02 00 00 00 00 00 c7 05 60 d6 02 00 00 00 00 10 44 89 74 24 18 <8b> 41 04 48 8b 53 28 45 8b 4f 08 4d 8b 07 89 44 24 10 8b 01 48 RIP [] osd_scrub_refresh_mapping+0x39d/0x410 [osd_ldiskfs] |
| Comments |
| Comment by Gerrit Updater [ 31/Aug/15 ] |
|
Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/16138 |
| Comment by Andriy Skulysh [ 31/Aug/15 ] |
| Comment by Peter Jones [ 02/Sep/15 ] |
|
Fan Yong Could you please review this proposed fix? Thanks Peter |
| Comment by nasf (Inactive) [ 03/Sep/15 ] |
|
The patch looks good. The BUG is caused by accessing NULL pointer in osd_journal_start_sb() failure handler. But I also want to know what caused the osd_journal_start_sb() failure before the NULL pointer accessing, maybe EROFS? Andriy, would you please to show the new OI scrub debug logs after applying your patch? Thanks! |
| Comment by Gerrit Updater [ 10/Sep/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16138/ |
| Comment by Joseph Gmitter (Inactive) [ 10/Sep/15 ] |
|
Landed for 2.8.0 |
| Comment by Niu Yawei (Inactive) [ 10/Oct/15 ] |
|
Hit this again: https://testing.hpdd.intel.com/test_sets/5d97088a-6e55-11e5-b983-5254006e85c2 |
| Comment by Gerrit Updater [ 03/Jun/16 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/20620 |
| Comment by Gerrit Updater [ 15/Aug/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20620/ |