[LU-2845] NULL pointer deref in osp_precreate_thread() Created: 20/Feb/13  Updated: 20/Mar/13  Resolved: 20/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Ned Bass Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: LB
Environment:

2.6.32-279.9.1.1chaos.ch5.1.x86_64


Severity: 3
Rank (Obsolete): 6885

 Description   

After reproducing LU-2843 to fill the changelog catalog, mounting the MDT results in a NULL pointer dereference. (This is a separate incident using a different MDT from LU-2844).

Lustre: Lustre: Build Version: 2.3.58-g4f361a2-CHANGED-2.6.32-279.9.1.1chaos.ch5.1.x86_64
LNet: Added LNI 192.168.122.43@tcp [8/256/0/180]
LNet: Accept secure, port 988
LDISKFS-fs (sda): recovery complete
LDISKFS-fs (sda): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: lustre-MDT0000: used disk, loading
Lustre: 4351:0:(mdt_lproc.c:380:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /home/bass6/lustre-release/lustre/tests/../utils/l_getidentity
LustreError: 4351:0:(osd_io.c:1075:osd_ldiskfs_write_record()) sda: error reading offset 0 (block 0): rc = -28
LustreError: 4351:0:(llog_osd.c:157:llog_osd_write_blob()) lustre-MDT0000-osd: error writing log record: rc = -28
LustreError: 4351:0:(osd_io.c:1075:osd_ldiskfs_write_record()) sda: error reading offset 0 (block 0): rc = -28
LustreError: 4351:0:(llog_osd.c:157:llog_osd_write_blob()) lustre-MDT0000-osd: error writing log record: rc = -28
LustreError: 4351:0:(llog_cat.c:371:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88007ce78900
LustreError: 4351:0:(osp_sync.c:1029:osp_sync_init()) lustre-OST0000-osc-MDT0000: can't initialize llog: rc = -28
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa1055b64>] osp_precreate_thread+0x1e4/0x1460 [osp]
PGD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/module/fid/initstate
CPU 2 
Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) ebtable_nat ebtables fuse autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate vhost_net macvtap macvlan tun kvm virtio_balloon virtio_net sg snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk sd_mod crc_t10dif pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 4358, comm: osp-pre-0 Tainted: P           ---------------    2.6.32-279.9.1.1chaos.ch5.1.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa1055b64>]  [<ffffffffa1055b64>] osp_precreate_thread+0x1e4/0x1460 [osp]
RSP: 0018:ffff88005ac43df0  EFLAGS: 00010246
RAX: ffff88005d266238 RBX: ffff88005b3b3800 RCX: 0000000000000000
RDX: ffff88005b3b39c0 RSI: 0000000000000282 RDI: 0000000000000282
RBP: ffff88005ac43f40 R08: ffff88005ac43e88 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88005b3b3918
R13: ffff88005ac43eb0 R14: 0000000000000000 R15: 0000000000000001
FS:  00007f126b13a700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process osp-pre-0 (pid: 4358, threadinfo ffff88005ac42000, task ffff88007ab3caa0)
Stack:
 000000000019f72b 0000001f0000000a 00000000000e525d 000000000019f738
<d> 0000001f0000000a ffff88005b3b3920 ffff88005b3b3918 ffff88005b3b3980
<d> ffff88007ab3caa0 ffff88007ab3caa0 ffff88007ab3caa0 ffff88007ab3caa0
Call Trace:
 [<ffffffff810602a0>] ? default_wake_function+0x0/0x20
 [<ffffffffa1055980>] ? osp_precreate_thread+0x0/0x1460 [osp]
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffffa1055980>] ? osp_precreate_thread+0x0/0x1460 [osp]
 [<ffffffffa1055980>] ? osp_precreate_thread+0x0/0x1460 [osp]
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 00 85 d2 0f 84 80 11 00 00 c7 83 00 01 00 00 00 00 00 00 c7 83 04 01 00 00 00 00 00 00 48 8b 83 a8 00 00 00 4c 8b b0 b8 0d 00 00 <49> 83 3e 00 0f 84 72 04 00 00 48 89 df 49 be 00 00 00 00 ff ff 
RIP  [<ffffffffa1055b64>] osp_precreate_thread+0x1e4/0x1460 [osp]
 RSP <ffff88005ac43df0>
CR2: 0000000000000000
crash> bt
PID: 4358   TASK: ffff88007ab3caa0  CPU: 2   COMMAND: "osp-pre-0"
 #0 [ffff88005ac439e0] machine_kexec at ffffffff8103283b
 #1 [ffff88005ac43a40] crash_kexec at ffffffff810ba492
 #2 [ffff88005ac43b10] oops_end at ffffffff81501e60
 #3 [ffff88005ac43b40] no_context at ffffffff81043bfb
 #4 [ffff88005ac43b90] __bad_area_nosemaphore at ffffffff81043e85
 #5 [ffff88005ac43be0] bad_area_nosemaphore at ffffffff81043f53
 #6 [ffff88005ac43bf0] __do_page_fault at ffffffff810446b1
 #7 [ffff88005ac43d10] do_page_fault at ffffffff81503e3e
 #8 [ffff88005ac43d40] page_fault at ffffffff815011f5
    [exception RIP: osp_precreate_thread+484]
    RIP: ffffffffa1055b64  RSP: ffff88005ac43df0  RFLAGS: 00010246
    RAX: ffff88005d266238  RBX: ffff88005b3b3800  RCX: 0000000000000000
    RDX: ffff88005b3b39c0  RSI: 0000000000000282  RDI: 0000000000000282
    RBP: ffff88005ac43f40   R8: ffff88005ac43e88   R9: 00000000ffffffff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff88005b3b3918
    R13: ffff88005ac43eb0  R14: 0000000000000000  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88005ac43f48] kernel_thread at ffffffff8100c14a
crash>
(gdb) l *(osp_precreate_thread+0x1e4)
0xeb84 is in osp_precreate_thread (/home/bass6/lustre-release/lustre/osp/osp_precreate.c:926).
921                             d->opd_new_connection = 0;
922                             d->opd_got_disconnected = 0;
923                             break;
924                     }
925
926                     if (d->opd_obd->u.cli.cl_seq->lcs_exp == NULL) {
927                             /* Get new sequence for client first */
928                             LASSERT(d->opd_exp != NULL);
929                             d->opd_obd->u.cli.cl_seq->lcs_exp =
930                             class_export_get(d->opd_exp);
(gdb) 


 Comments   
Comment by Peter Jones [ 21/Feb/13 ]

Bobijam

could you please look into this one?

Thanks

Peter

Comment by Zhenyu Xu [ 22/Feb/13 ]

patch tracking at http://review.whamcloud.com/5508

commit message
LU-2845 osp: fix osp precreate thread init error handling
    
    If osp device hasn't connected OST, osp_precreate_thread() quitting
    path should heed to that and bypass the normal quitting path.

Comment by Minh Diep [ 19/Mar/13 ]

hit this on lustre-master build #1324

LustreError: 11-0: widelust-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.^M
LustreError: 32617:0:(osd_io.c:997:osd_ldiskfs_read()) widelust-MDT0000/: can't read 32@32 on ino 113: rc = 0^M
LustreError: 32617:0:(llog_osd.c:1221:llog_osd_get_cat_list()) widelust-MDT0000-osd: error reading CATALOGS: rc = -14^M
LustreError: 32617:0:(osp_sync.c:944:osp_sync_llog_init()) widelust-OST0001-osc-MDT0000: can't get id from catalogs: rc = -14^M
LustreError: 32617:0:(osp_sync.c:1044:osp_sync_init()) widelust-OST0001-osc-MDT0000: can't initialize llog: rc = -14^M
BUG: unable to handle kernel NULL pointer dereference at (null)^M
IP: [<ffffffffa0f6ab34>] osp_precreate_thread+0x1e4/0x1460 [osp]^M
PGD 0 ^M
Oops: 0000 1 SMP ^M
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map^M
CPU 15 ^M
Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa microcode serio_raw mlx4_ib ib_mad ib_core mlx4_en mlx4_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core ses enclosure sg igb dca ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]^M
^M
Pid: 32652, comm: osp-pre-1 Not tainted 2.6.32-279.19.1.el6_lustre.gc4681d8.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH^M
RIP: 0010:[<ffffffffa0f6ab34>] [<ffffffffa0f6ab34>] osp_precreate_thread+0x1e4/0x1460 [osp]^M
RSP: 0018:ffff880ef822fdf0 EFLAGS: 00010246^M
RAX: ffff880ef820c2f8 RBX: ffff880ef81b4800 RCX: 0000000000000002^M
RDX: ffff880ef81b49c0 RSI: 0000000000000282 RDI: 0000000000000282^M
RBP: ffff880ef822ff40 R08: ffff880ef822fe88 R09: 00000000ffffffff^M
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880ef81b4918^M
R13: ffff880ef822feb0 R14: 0000000000000000 R15: 0000000000000001^M
FS: 00007f73dc6df700(0000) GS:ffff88085c4e0000(0000) knlGS:0000000000000000^M
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000006e0^M
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
Process osp-pre-1 (pid: 32652, threadinfo ffff880ef822e000, task ffff880ef8204ae0)^M
Stack:^M
0000000000000000 0000000000000000 0000000000000000 0000000000000000^M
<d> 0000000000000000 ffff880ef81b4920 ffff880ef81b4918 ffff880ef81b4980^M
<d> ffff880ef8204ae0 ffff880ef8204ae0 ffff880ef8204ae0 ffff880ef8204ae0^M
Call Trace:^M
[<ffffffff8105fa40>] ? default_wake_function+0x0/0x20^M
[<ffffffffa0f6a950>] ? osp_precreate_thread+0x0/0x1460 [osp]^M
[<ffffffff8100c0ca>] child_rip+0xa/0x20^M
[<ffffffffa0f6a950>] ? osp_precreate_thread+0x0/0x1460 [osp]^M
[<ffffffffa0f6a950>] ? osp_precreate_thread+0x0/0x1460 [osp]^M
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20^M
Code: 00 85 d2 0f 84 80 11 00 00 c7 83 00 01 00 00 00 00 00 00 c7 83 04 01 00 00 00 00 00 00 48 8b 83 a8 00 00 00 4c 8b b0 b8 0d 00 00 <49> 83 3e 00 0f 84 72 04 00 00 48 89 df 49 be 00 00 00 00 ff ff ^M
RIP [<ffffffffa0f6ab34>] osp_precreate_thread+0x1e4/0x1460 [osp]^M
RSP <ffff880ef822fdf0>^M
CR2: 0000000000000000^M

Comment by Peter Jones [ 20/Mar/13 ]

Landed for 2.4

Generated at Sat Feb 10 06:19:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.