Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13254

crash at lu_object_find() in mdt_lvbo_fill()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      few recent test runs at Oleg tests showed this trace:

      BUG: unable to handle kernel NULL pointer dereference at 0000000000000026
      IP: [<ffffffffa033793d>] lu_object_find+0xd/0x20 [obdclass]
      PGD 0
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) crc32_generic libcfs(OE) dm_flakey dm_mod crc_t10dif crct10dif_generic crct10dif_common rpcsec_gss_krb5 pcspkr squashfs i2c_piix4 i2c_core binfmt_misc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi serio_raw ata_piix libata
      CPU: 0 PID: 8810 Comm: mdt_out00_001 Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.7-debug #1
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      task: ffff8800a5f94780 ti: ffff8800aa2f0000 task.ti: ffff8800aa2f0000
      RIP: 0010:[<ffffffffa033793d>]  [<ffffffffa033793d>] lu_object_find+0xd/0x20 [obdclass]
      RSP: 0018:ffff8800aa2f3b58  EFLAGS: 00010246
      RAX: 0000000000000006 RBX: ffff8800a5f86448 RCX: 0000000000000000
      RDX: ffff8800a5f86448 RSI: ffff8800c05cf000 RDI: ffff88009c38a400
      RBP: ffff8800aa2f3b58 R08: ffff880106431000 R09: ffff8800b5ee8080
      R10: ffff8800a5f86000 R11: ffff8800aa2f3876 R12: ffff88009c38a400
      R13: ffff8800c05cf000 R14: ffff8800ac23fe70 R15: ffff88009c38a400
      FS:  0000000000000000(0000) GS:ffff88011e200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000026 CR3: 00000000a2856000 CR4: 00000000000006f0
      Call Trace:
       [<ffffffffa0cd7e9b>] mdt_object_find+0x4b/0x170 [mdt]
       [<ffffffffa0d10dc0>] mdt_lvbo_fill+0x530/0xa80 [mdt]
       [<ffffffffa05f1f5d>] ldlm_handle_enqueue0+0x5cd/0x15f0 [ptlrpc]
       [<ffffffffa061ba50>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
       [<ffffffffa067a292>] tgt_enqueue+0x62/0x210 [ptlrpc]
       [<ffffffffa0682f55>] tgt_request_handle+0x965/0x1620 [ptlrpc]
       [<ffffffffa020bdde>] ? libcfs_nid2str_r+0xfe/0x130 [lnet]
       [<ffffffffa0625f60>] ptlrpc_server_handle_request+0x250/0xb10 [ptlrpc]
       [<ffffffff810c6941>] ? __wake_up_common_lock+0x91/0xc0
       [<ffffffff810c6250>] ? sched_feat_set+0xf0/0xf0
       [<ffffffffa062a1c0>] ptlrpc_main+0xcb0/0x1cb0 [ptlrpc]
       [<ffffffff810c665d>] ? finish_task_switch+0x5d/0x1b0
       [<ffffffffa0629510>] ? ptlrpc_register_service+0xff0/0xff0 [ptlrpc]
       [<ffffffff810b8254>] kthread+0xe4/0xf0
       [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff817e5ddd>] ret_from_fork_nospec_begin+0x7/0x21
       [<ffffffff810b8170>] ? kthread_create_on_node+0x140/0x140
      

      problem is related to wrongly initialized mdt_thread_info values, particularly mti_mdt. Interesting that none of them are needed in mdt_lvbo_fill, there are only couple fields are needed as temporary storage for FID and data buffer.

      Attachments

        Activity

          People

            tappro Mikhail Pershin
            tappro Mikhail Pershin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: