Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10763

Use after free in lmv_striped_read_page

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.11.0
    • 3
    • 9223372036854775807

    Description

      After issues reported in LU-10762 winding down sanity test leads to the following crash:

      [13407.392152] Lustre: DEBUG MARKER: == sanity test complete, duration 13357 sec ========================================================== 18:41:43 (1520206903)
      [13415.646714] LustreError: 1942:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x280002b10:0x5:0x0] without object type: valid = 0x100000001
      [13415.648323] LustreError: 1942:0:(llite_lib.c:2355:ll_prep_inode()) new_inode -fatal: rc -12
      [13416.174418] LustreError: 1966:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x280002b10:0x10:0x0] without object type: valid = 0x100000001
      [13416.182023] LustreError: 1966:0:(namei.c:87:ll_set_inode()) Skipped 2 previous similar messages
      [13416.183013] LustreError: 1966:0:(llite_lib.c:2355:ll_prep_inode()) new_inode -fatal: rc -12
      [13416.184665] LustreError: 1966:0:(llite_lib.c:2355:ll_prep_inode()) Skipped 2 previous similar messages
      [13417.245051] LustreError: 2011:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x280002b10:0x26:0x0] without object type: valid = 0x100000001
      [13417.257155] LustreError: 2011:0:(namei.c:87:ll_set_inode()) Skipped 6 previous similar messages
      [13417.258625] LustreError: 2011:0:(llite_lib.c:2355:ll_prep_inode()) new_inode -fatal: rc -12
      [13417.259615] LustreError: 2011:0:(llite_lib.c:2355:ll_prep_inode()) Skipped 6 previous similar messages
      [13430.306910] LustreError: 2312:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x200000403:0x25:0x0] without object type: valid = 0x100000001
      [13430.317961] LustreError: 2312:0:(llite_lib.c:2355:ll_prep_inode()) new_inode -fatal: rc -12
      [13430.954786] BUG: unable to handle kernel paging request at ffff8800b4f19fe0
      [13430.963030] IP: [<ffffffffa02d5fee>] lmv_striped_read_page.isra.30+0x33b/0x5f9 [lmv]
      [13430.964636] PGD 2e75067 PUD 33fa01067 PMD 33f859067 PTE 80000000b4f19060
      [13430.965335] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [13430.966026] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate mbcache jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper drm i2c_piix4 ata_piix virtio_balloon pcspkr serio_raw virtio_blk i2c_core virtio_console libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
      [13430.971838] CPU: 11 PID: 2316 Comm: ll_sa_1811 Tainted: P        W  OE  ------------   3.10.0-debug #2
      [13430.973531] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13430.974112] task: ffff8802de2947c0 ti: ffff880284680000 task.ti: ffff880284680000
      [13430.975001] RIP: 0010:[<ffffffffa02d5fee>]  [<ffffffffa02d5fee>] lmv_striped_read_page.isra.30+0x33b/0x5f9 [lmv]
      [13430.975946] RSP: 0018:ffff880284683bf0  EFLAGS: 00010282
      [13430.976415] RAX: ffff88027197c018 RBX: ffff8800899f2fd0 RCX: 0000000000000073
      [13430.977028] RDX: 0000000000000003 RSI: ffffffffa02d7c65 RDI: 0000000000000001
      [13430.978421] RBP: ffff880284683c60 R08: 000000000000002e R09: 0000000280000403
      [13430.979076] R10: 0000000000000000 R11: 0000000000000025 R12: 0000000000000001
      [13430.979630] R13: ffff8802cf4adfc8 R14: ffff8800b4f19fd0 R15: ffff8802cf4adf80
      [13430.980111] FS:  0000000000000000(0000) GS:ffff88033e560000(0000) knlGS:0000000000000000
      [13430.981017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [13430.982254] CR2: ffff8800b4f19fe0 CR3: 0000000297ddd000 CR4: 00000000000006e0
      [13430.982933] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [13430.983621] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [13430.984306] Stack:
      [13430.992434]  ffff880284683cd0 ffff8800899f2018 ffffea0002267c80 ffff8800899f2000
      [13430.993756]  0000000000000068 ffff8801bba89e00 00088802a5468000 ffff8800899f2fa0
      [13430.995084]  0000000000000030 ffff8801bba89e00 ffff880284683cc8 2f9bc3b7c9eb6d8e
      [13430.996367] Call Trace:
      [13430.996967]  [<ffffffffa02bfbbb>] lmv_read_page+0x32b/0x3a0 [lmv]
      [13430.997958]  [<ffffffffa16543d8>] ll_get_dir_page+0xc8/0x2d0 [lustre]
      [13430.998730]  [<ffffffffa1690cf0>] ? ll_dom_lock_cancel+0x390/0x390 [lustre]
      [13430.999463]  [<ffffffffa16a7cb3>] ll_statahead_thread+0x293/0x11d0 [lustre]
      [13431.000167]  [<ffffffff810af8e4>] ? finish_task_switch+0x44/0x180
      [13431.000858]  [<ffffffff810b7cc0>] ? wake_up_state+0x20/0x20
      [13431.001621]  [<ffffffffa16a7a20>] ? ll_agl_thread+0x4d0/0x4d0 [lustre]
      [13431.002317]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [13431.002951]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [13431.003637]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [13431.004443]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [13431.005131] Code: ff ff ff e9 8b 01 00 00 49 63 c4 44 89 e2 4c 89 ff 48 ff c0 48 c1 e0 05 49 8d 74 07 08 4c 8b 76 10 e8 55 f5 ff ff 4d 85 f6 74 cd <49> 8b 46 10 49 89 47 18 45 8b 6e 18 66 45 85 ed 75 24 41 0f b7 
      [13431.007816] RIP  [<ffffffffa02d5fee>] lmv_striped_read_page.isra.30+0x33b/0x5f9 [lmv]
      

      I have several samples of this.

      (gdb) l *(lmv_striped_read_page+0x33b)
      0xb64 is in lmv_striped_read_page (/home/green/git/lustre-release/lustre/lmv/lmv_obd.c:2360).
      2355			/* end of directory */
      2356			if (!next) {
      2357				ctxt->ldc_hash = MDS_DIR_END_OFF;
      2358				break;
      2359			}
      2360			ctxt->ldc_hash = le64_to_cpu(next->lde_hash);
      2361	
      2362			ent_size = le16_to_cpu(next->lde_reclen);
      2363	
      2364			/* the last entry lde_reclen is 0, but it might not be the last
      

      So it appears that we are getting a bad 'next' pointer from the list here.

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: