Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11175

Null pointer dereference in idle_timeout_show recovery-small test 57

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0
    • Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      In code added by https://review.whamcloud.com/32719 for LU-8066 but likely actually due to LU-7236 landing (idle import disconnection) we have this regression (and probably many others) where it is blindly assumed cli import is in place where it is not.

      Some auditing campaign is needed?

      [ 1930.731848] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash =================================== 23:58:28 (1532491108)
      [ 1931.971139] BUG: unable to handle kernel NULL pointer dereference at 0000000000000338
      [ 1931.972794] IP: [<ffffffffa075326f>] idle_timeout_show+0x1f/0x30 [osc]
      [ 1931.973668] PGD 6fb39067 PUD 78ab6067 PMD 0 
      [ 1931.974182] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 1931.974726] Modules linked in: loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) jbd2 mbcache lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) dm_flakey dm_mod libcfs(OE) crc_t10dif crct10dif_generic crct10dif_common rpcsec_gss_krb5 ata_generic pata_acpi ttm drm_kms_helper drm i2c_piix4 ata_piix pcspkr i2c_core virtio_balloon serio_raw virtio_console virtio_blk libata floppy ip_tables
      [ 1931.979448] CPU: 3 PID: 18210 Comm: lctl Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.5-debug #2
      [ 1931.980483] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 1931.981001] task: ffff880046fc8800 ti: ffff8800acb88000 task.ti: ffff8800acb88000
      [ 1931.981968] RIP: 0010:[<ffffffffa075326f>]  [<ffffffffa075326f>] idle_timeout_show+0x1f/0x30 [osc]
      [ 1931.982980] RSP: 0018:ffff8800acb8bde0  EFLAGS: 00010246
      [ 1931.983507] RAX: 0000000000000000 RBX: ffff880070915800 RCX: ffffffffa0753250
      [ 1931.984152] RDX: 0000000000000000 RSI: ffffffffa0779433 RDI: ffff88008d32c000
      [ 1931.984769] RBP: ffff8800acb8bde0 R08: ffff880079a49738 R09: 0000000000000000
      [ 1931.985332] R10: 0000000000001000 R11: 0000000000000000 R12: ffffffffa0372d60
      [ 1931.985867] R13: ffff8800acb8bf18 R14: 0000000000000001 R15: ffff880070915800
      [ 1931.986431] FS:  00007f1bcb331740(0000) GS:ffff8800bc980000(0000) knlGS:0000000000000000
      [ 1931.987404] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1931.987919] CR2: 0000000000000338 CR3: 0000000027d70000 CR4: 00000000000006e0
      [ 1931.988488] Call Trace:
      [ 1931.988980]  [<ffffffffa0318ff6>] lustre_attr_show+0x16/0x20 [obdclass]
      [ 1931.989521]  [<ffffffff8129a24c>] sysfs_kf_seq_show+0xcc/0x1e0
      [ 1931.990037]  [<ffffffff81298953>] kernfs_seq_show+0x23/0x30
      [ 1931.990571]  [<ffffffff81234bd5>] seq_read+0x115/0x3f0
      [ 1931.991072]  [<ffffffff8129951d>] kernfs_fop_read+0xfd/0x170
      [ 1931.991610]  [<ffffffff8120d91c>] vfs_read+0x9c/0x170
      [ 1931.992115]  [<ffffffff8120e7df>] SyS_read+0x7f/0xf0
      [ 1931.992636]  [<ffffffff8178383b>] ? system_call_after_swapgs+0xc8/0x160
      [ 1931.993166]  [<ffffffff817838e9>] system_call_fastpath+0x16/0x1b
      [ 1931.993742]  [<ffffffff8178383b>] ? system_call_after_swapgs+0xc8/0x160
      [ 1931.994299] Code: e8 47 07 93 e0 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 d0 48 8b 97 30 f4 ff ff 48 c7 c6 33 94 77 a0 48 89 c7 31 c0 48 89 e5 <8b> 92 38 03 00 00 e8 86 c4 c6 e0 5d 48 98 c3 66 90 0f 1f 44 00 
      [ 1931.996406] RIP  [<ffffffffa075326f>] idle_timeout_show+0x1f/0x30 [osc]
      
      (gdb) l *(idle_timeout_show+0x1f)
      0xe29f is in idle_timeout_show (/home/green/git/lustre-release/lustre/osc/lproc_osc.c:618).
      613	{
      614		struct obd_device *obd = container_of(kobj, struct obd_device,
      615						      obd_kset.kobj);
      616		struct client_obd *cli = &obd->u.cli;
      617	
      618		return sprintf(buf, "%u\n", cli->cl_import->imp_idle_timeout);
      619	}
      (gdb) p/x &((struct obd_import *)0)->imp_idle_timeout
      $3 = 0x338
      

      So it looks like cli->cl_import is NULL a

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: