Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9394

lu_object_find_try - kernel NULL pointer dereference

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • Soak performance cluster.
    • 3
    • 9223372036854775807

    Description

      Soak cluster had completed MDS failover of MDT0002, lfsck was aborted on the MDS after timeout.
      soak-2 OSS node crashed while under load.
      crash dump is available on soak-2, dmesg attached.
      Unfortunately console logged had failed, so some data is not available.
      Syslog:

      Apr 22 15:02:44 soak-2 kernel: LustreError: Skipped 3 previous similar messages
      Apr 22 15:02:44 soak-2 kernel: Lustre: soaked-MDT0002-lwp-OST0006: Connection restored to 192.168.1.110@o2ib10 (at 192.168.1.110@o2ib10)
      Apr 22 15:02:44 soak-2 kernel: Lustre: Skipped 4 previous similar messages
      Apr 22 15:03:09 soak-2 kernel: LustreError: 167-0: soaked-MDT0002-lwp-OST0012: This client was evicted by soaked-MDT0002; in progress operations using this service will fail.
      Apr 22 15:03:09 soak-2 kernel: LustreError: Skipped 1 previous similar message
      Apr 22 15:03:09 soak-2 kernel: Lustre: soaked-MDT0002-lwp-OST0012: Connection restored to 192.168.1.110@o2ib10 (at 192.168.1.110@o2ib10)
      Apr 22 15:03:09 soak-2 kernel: Lustre: Skipped 1 previous similar message
      Apr 22 15:07:11 soak-2 kernel: Lustre: soaked-OST0006: deleting orphan objects from 0x480000401:15286395 to 0x480000401:15292401
      Apr 22 15:07:11 soak-2 kernel: Lustre: soaked-OST0000: deleting orphan objects from 0x300000400:15323501 to 0x300000400:15333313
      Apr 22 15:07:11 soak-2 kernel: Lustre: soaked-OST000c: deleting orphan objects from 0x600000400:15327168 to 0x600000400:15338321
      Apr 22 15:07:11 soak-2 kernel: Lustre: soaked-OST0012: deleting orphan objects from 0x740000402:15313619 to 0x740000402:15327665
      Apr 22 15:21:56 soak-2 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="3806" x-info="http://www.rsyslog.com"] start
      

      Dmesg from vmcore:

      32845.239323] Lustre: soaked-OST000c: deleting orphan objects from 0x600000400:15327168 to 0x600000400:15338321
      [32845.239325] Lustre: soaked-OST0012: deleting orphan objects from 0x740000402:15313619 to 0x740000402:15327665
      [33465.017357] LustreError: 15131:0:(osd_object.c:427:osd_object_init()) soaked-OST000c: lookup [0x1000c0000:0x21c57a8:0x0]/0x387ef2 failed: rc = 17
      [33465.035118] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
      [33465.045485] IP: [<ffffffffa0a590d8>] lu_object_find_try+0x178/0x2b0 [obdclass]
      [33465.055153] PGD 0
      [33465.058735] Oops: 0000 [#1] SMP
      [33465.063750] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) 8021q garp mrp stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd dm_round_robin iTCO_wdt iTCO_vendor_support mei_me pcspkr ses ipmi_devintf enclosure mei ntb sg ipmi_si i2c_i801 lpc_ich sb_edac ipmi_msghandler ioatdma edac_core shpchp wmi nfsd dm_multipath nfs_acl dm_mod lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache
      [33465.151994]  jbd2 zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate sd_mod crc_t10dif crct10dif_generic mlx4_en mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb isci ttm ahci crct10dif_pclmul crct10dif_common libsas ptp crc32c_intel libahci pps_core mlx4_core drm mpt2sas dca libata raid_class i2c_algo_bit scsi_transport_sas devlink i2c_core fjes
      [33465.195938] CPU: 15 PID: 15131 Comm: ldlm_cn01_018 Tainted: P           OE  ------------   3.10.0-514.10.2.el7_lustre.x86_64 #1
      [33465.211995] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [33465.225976] task: ffff88042a2dbec0 ti: ffff88073b4f8000 task.ti: ffff88073b4f8000
      [33465.236319] RIP: 0010:[<ffffffffa0a590d8>]  [<ffffffffa0a590d8>] lu_object_find_try+0x178/0x2b0 [obdclass]
      [33465.249161] RSP: 0018:ffff88073b4fbaa8  EFLAGS: 00010207
      [33465.256998] RAX: 0000000000000011 RBX: ffff880825cd6cc0 RCX: ffff8800ad22a918
      [33465.266794] RDX: 00000001002a0021 RSI: ffffea0009502a80 RDI: ffff8800ad22a930
      [33465.276797] RBP: ffff88073b4fbb08 R08: ffff8802540aaa80 R09: 00000001002a0020
      [33465.286649] R10: 00000000540aad01 R11: ffffea0009502a80 R12: ffff8803fd4d5020
      [33465.296218] R13: ffff8804b928e1f0 R14: 0000000000000000 R15: ffff88080b3d9000
      [33465.305840] FS:  0000000000000000(0000) GS:ffff88082d9c0000(0000) knlGS:0000000000000000
      [33465.316722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [33465.324590] CR2: 0000000000000011 CR3: 00000000019ba000 CR4: 00000000000407e0
      [33465.333919] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [33465.343445] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [33465.353428] Stack:
      [33465.357274]  ffff88073b4fbc00 fffffffffffffffe ffff88073b4fbb30 000000000002246f
      [33465.367105]  ffff8800ad22a800 ffff88000000000d 00000000a848b037 ffff88073b4fbc00
      [33465.376714]  ffff88080b3d9000 ffff8803fd4d5020 0000000000000000 00000001000c0000
      [33465.386284] Call Trace:
      [33465.390755]  [<ffffffffa0a592bc>] lu_object_find_at+0xac/0xe0 [obdclass]
      [33465.400101]  [<ffffffffa0fb27db>] ? ofd_key_init+0x3b/0xd0 [ofd]
      [33465.408740]  [<ffffffffa0a59306>] lu_object_find+0x16/0x20 [obdclass]
      [33465.417192]  [<ffffffffa0fca4e5>] ofd_object_find+0x35/0x100 [ofd]
      [33465.425447]  [<ffffffffa0fd8d12>] ofd_lvbo_update+0x4b2/0xfc8 [ofd]
      [33465.434020]  [<ffffffffa0fd7ca9>] ? ofd_lvbo_free+0x59/0xe0 [ofd]
      [33465.442680]  [<ffffffffa0c566f2>] ldlm_request_cancel+0x132/0x720 [ptlrpc]
      [33465.451967]  [<ffffffffa0c5dc6a>] ldlm_handle_cancel+0xba/0x250 [ptlrpc]
      [33465.461254]  [<ffffffffa0c5df41>] ldlm_cancel_handler+0x141/0x490 [ptlrpc]
      [33465.470567]  [<ffffffffa0c8df1b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [33465.480874]  [<ffffffffa09359a8>] ? lc_watchdog_touch+0x68/0x180 [libcfs]
      [33465.489808]  [<ffffffffa0c8bac8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [33465.498533]  [<ffffffff810c5092>] ? default_wake_function+0x12/0x20
      [33465.507323]  [<ffffffff810ba2e8>] ? __wake_up_common+0x58/0x90
      [33465.515728]  [<ffffffffa0c91f40>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
      [33465.524220]  [<ffffffffa0c914a0>] ? ptlrpc_register_service+0xe60/0xe60 [ptlrpc]
      [33465.534131]  [<ffffffff810b06ff>] kthread+0xcf/0xe0
      [33465.540714]  [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140
      [33465.549134]  [<ffffffff81696c98>] ret_from_fork+0x58/0x90
      [33465.556096]  [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140
      [33465.564386] Code: c6 62 e0 48 83 f8 fe 0f 85 54 ff ff ff 48 8b 7d a0 4c 89 f1 4c 89 e2 4c 89 fe e8 24 fb ff ff 48 3d 00 f0 ff ff 0f 87 36 ff ff ff <48> 8b 38 ba 10 00 00 00 4c 89 e6 48 89 45 a8 e8 04 80 8c e0 85
      [33465.588258] RIP  [<ffffffffa0a590d8>] lu_object_find_try+0x178/0x2b0 [obdclass]
      [33465.597871]  RSP <ffff88073b4fbaa8>
      

      Attachments

        1. soak-2.vmcore-dmesg.txt
          235 kB
          Cliff White
        2. soak-8.log.gz
          166 kB
          Cliff White

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: