Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9678

BUG: Kernel NULL pointer - lu_site_purge_objects

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.0
    • Lustre 2.10.0
    • Soak cluster testing LU-9049 patch, lustre-reviews build 48203 Patch set 8.
    • 3
    • 9223372036854775807

    Description

      Soak was running oss_failover - soak-5 was stopped, OSTs were being imported and mounted on soak-4. Soak-4 failed after first OST was mounted (soaked-OST002)

      [13397.201448] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 172.16.1.46@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.^M
      [13397.201449] LustreError: 137-5: soaked-OST0009_UUID: not available for connect from 172.16.1.46@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.^M
      [13397.201450] LustreError: Skipped 112 previous similar messages^M
      [13397.201452] LustreError: Skipped 112 previous similar messages^M
      [13397.287999] LustreError: Skipped 1 previous similar message^M
      [13465.838060] Lustre: soaked-OST0002: Client 0202294a-5831-bbc6-ca11-92524b9ad7b5 (at 172.16.1.45@o2ib1) reconnecting^M
      [13465.852142] Lustre: soaked-OST0002: Connection restored to 0202294a-5831-bbc6-ca11-92524b9ad7b5 (at 172.16.1.45@o2ib1)^M
      [13533.044979] LustreError: 137-5: soaked-OST0003_UUID: not available for connect from 172.16.1.49@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.^M
      [13533.044980] LustreError: 137-5: soaked-OST000f_UUID: not available for connect from 172.16.1.49@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.^M
      [13533.044983] LustreError: Skipped 368 previous similar messages^M
      [13533.102556] LustreError: Skipped 2 previous similar messages^M
      [13567.173667] BUG: unable to handle kernel NULL pointer dereference at 0000000000000032^M
      [13567.184556] IP: [<ffffffffa0d98588>] lu_site_purge_objects+0x78/0x520 [obdclass]^M
      [13567.194924] PGD 0 ^M
      [13567.198876] Oops: 0000 [#1] SMP ^M
      [13567.204233] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) 8021q garp mrp stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ipmi_devintf iTCO_vendor_support pcspkr dm_round_robin ses enclosure mei_me ntb ipmi_ssif sg lpc_ich sb_edac shpchp i2c_i801 mei edac_core ipmi_si ipmi_msghandler ioatdma wmi dm_multipath dm_mod nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) zlib_deflate sd_mod crc_t10dif crct10dif_generic mlx4_en mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt isci igb fb_sys_fops ttm crct10dif_pclmul ahci ptp crct10dif_common libsas crc32c_intel libahci pps_core drm mlx4_core mpt2sas libata dca raid_class i2c_algo_bit scsi_transport_sas devlink i2c_core fjes^M
      [13567.341368] CPU: 3 PID: 1315 Comm: arc_prune Tainted: P           OE  ------------   3.10.0-514.21.1.el7_lustre.x86_64 #1^M
      [13567.354994] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013^M
      [13567.368895] task: ffff88040bc40000 ti: ffff88040bc3c000 task.ti: ffff88040bc3c000^M
      [13567.378641] RIP: 0010:[<ffffffffa0d98588>]  [<ffffffffa0d98588>] lu_site_purge_objects+0x78/0x520 [obdclass]^M
      [13567.391088] RSP: 0018:ffff88040bc3fd08  EFLAGS: 00010217^M
      [13567.398460] RAX: 0000000000000000 RBX: ffff8807607f4000 RCX: 0000000000000001^M
      [13567.407914] RDX: 0000000000000009 RSI: 0000000000000001 RDI: ffff88040bc3fdb0^M
      [13567.417333] RBP: ffff88040bc3fda0 R08: 0000000000019b40 R09: ffff88017fc07500^M
      [13567.426689] R10: ffffffffa11ed330 R11: 0000000000000005 R12: 0000000000002710^M
      [13567.436019] R13: ffff88040c790af0 R14: 0000000000000000 R15: ffff8807607f41f0^M
      [13567.445339] FS:  0000000000000000(0000) GS:ffff88042e0c0000(0000) knlGS:0000000000000000^M
      [13567.455741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
      [13567.457890] Lustre: soaked-OST0009: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900^M
      [13567.476592] CR2: 0000000000000032 CR3: 00000000019be000 CR4: 00000000000407e0^M
      [13567.485878] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
      [13567.495173] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
      [13567.498623] LustreError: 21759:0:(osd_oi.c:503:osd_oid()) soaked-OST0009-osd: unsupported quota oid: 0x16^M
      [13567.516449] Stack:^M
      [13567.520018]  ffff88039b538000 ffff88040bc3fdb0 ffff880400000001 ffff880400000000^M
      [13567.529611]  ffffffffa11ed34b 00000000000000a8 ffff88040bc3fd68 00000009a0d97654^M
      [13567.539243]  ffff88040bc3fdb0 0000000000002710 ffff88040c790af0 ffff88040bc3fd60^M
      [13567.548850] Call Trace:^M
      [13567.552825]  [<ffffffffa11ed34b>] ? osp_key_init+0x3b/0xd0 [osp]^M
      [13567.561423]  [<ffffffffa0ad3123>] arc_prune_func+0x53/0xe0 [osd_zfs]^M
      [13567.570393]  [<ffffffffa2c5d10f>] arc_prune_task+0x1f/0x30 [zfs]^M
      [13567.579031]  [<ffffffffa048c6de>] taskq_thread+0x22e/0x440 [spl]^M
      [13567.587659]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20^M
      [13567.595723]  [<ffffffffa048c4b0>] ? taskq_thread_spawn+0x60/0x60 [spl]^M
      [13567.604525]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0^M
      [13567.611730]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140^M
      [13567.620746]  [<ffffffff81697798>] ret_from_fork+0x58/0x90^M
      [13567.628739]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140^M
      [13567.637718] Code: 0f 84 47 03 00 00 83 7d a4 ff 48 8d 45 c0 48 89 45 c0 48 89 45 c8 0f 84 07 04 00 00 41 8b 47 08 be 01 00 00 00 89 45 80 49 8b 07 <0f> b6 48 32 2a 48 36 8b 45 a4 99 d3 e6 f7 fe 83 c0 01 89 85 7c ^M
      [13567.662467] RIP  [<ffffffffa0d98588>] lu_site_purge_objects+0x78/0x520 [obdclass]^M
      [13567.672646]  RSP <ffff88040bc3fd08>^M
      [13567.678345] CR2: 0000000000000032^M
      [13567.688413] ---[ end trace d781eec03cdf2214 ]---^M
      [13567.771189] Kernel panic - not syncing: Fatal exception^M
      

      Attempting to get a dump, will add more data as found

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            cliffw Cliff White (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: