Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11630

sanity-hsm test_13: Crash in mdt_cdt_waiting_cb

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/2ee0cf8e-e18f-11e8-815b-52540065bddc

      test_13 failed with the following error:

      trevis-20vm10 crashed during sanity-hsm test_13
      

      This hits quite frequently in maloo based on crash reports I am receiving.

      The stacktrace is

       4192.752828] Lustre: DEBUG MARKER: == sanity-hsm test 13: Recursively import and restore a directory ==================================== 02:54:47 (1541472887)
      [ 4246.836980] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [ 4246.837917] IP: [<ffffffffc12578b7>] mdt_cdt_waiting_cb.isra.25+0x357/0xc10 [mdt]
      [ 4246.838762] PGD 0 
      [ 4246.838995] Oops: 0000 [#1] SMP 
      [ 4246.839364] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod zfs(POE) zunicode(POE) zavl(POE) icp(POE) iosf_mbi zcommon(POE) znvpair(POE) crc32_pclmul ghash_clmulni_intel spl(OE) aesni_intel ppdev lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 joydev pcspkr virtio_balloon i2c_core parport_pc parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi ata_piix
      [ 4246.847577]  libata crct10dif_pclmul 8139too crct10dif_common virtio_blk crc32c_intel 8139cp serio_raw virtio_pci virtio_ring virtio mii floppy
      [ 4246.848953] CPU: 0 PID: 18857 Comm: hsm_cdtr Kdump: loaded Tainted: P           OE  ------------   3.10.0-862.14.4.el7_lustre.x86_64 #1
      [ 4246.850110] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 4246.850666] task: ffff9b4cf697eeb0 ti: ffff9b4d1906c000 task.ti: ffff9b4d1906c000
      [ 4246.851387] RIP: 0010:[<ffffffffc12578b7>]  [<ffffffffc12578b7>] mdt_cdt_waiting_cb.isra.25+0x357/0xc10 [mdt]
      [ 4246.852364] RSP: 0018:ffff9b4d1906f9a0  EFLAGS: 00010287
      [ 4246.852885] RAX: 0000000000000000 RBX: 0000000000000048 RCX: ffff9b4d1906fe30
      [ 4246.853564] RDX: 00000000ffffffff RSI: ffff9b4d39a2cd60 RDI: 00000000ffffffff
      [ 4246.854248] RBP: ffff9b4d1906f9e8 R08: 0000000000000001 R09: ffff9b4d23dc6c38
      [ 4246.854947] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9b4d127744c8
      [ 4246.855626] R13: ffff9b4d1906fe30 R14: ffff9b4d12973730 R15: ffff9b4d0ef4fa58
      [ 4246.856310] FS:  0000000000000000(0000) GS:ffff9b4d3fc00000(0000) knlGS:0000000000000000
      [ 4246.857089] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4246.857644] CR2: 0000000000000004 CR3: 0000000078cce000 CR4: 00000000000606f0
      [ 4246.858336] Call Trace:
      [ 4246.858608]  [<ffffffffc125b702>] mdt_coordinator_cb+0x162/0x290 [mdt]
      [ 4246.859350]  [<ffffffffc0ba5732>] llog_process_thread+0x892/0x15a0 [obdclass]
      [ 4246.860079]  [<ffffffffc0bc8c84>] ? lprocfs_oh_tally+0x34/0x40 [obdclass]
      [ 4246.860753]  [<ffffffffc125b5a0>] ? mdt_cdt_started_cb.isra.27+0x410/0x410 [mdt]
      [ 4246.861487]  [<ffffffffc0ba64fc>] llog_process_or_fork+0xbc/0x450 [obdclass]
      [ 4246.862188]  [<ffffffffc0babbd9>] llog_cat_process_cb+0x239/0x250 [obdclass]
      [ 4246.862886]  [<ffffffffc0ba5732>] llog_process_thread+0x892/0x15a0 [obdclass]
      [ 4246.863592]  [<ffffffffc0bab9a0>] ? llog_cat_cancel_records+0x3c0/0x3c0 [obdclass]
      [ 4246.864345]  [<ffffffffc0ba64fc>] llog_process_or_fork+0xbc/0x450 [obdclass]
      [ 4246.865040]  [<ffffffffc0bab9a0>] ? llog_cat_cancel_records+0x3c0/0x3c0 [obdclass]
      [ 4246.865788]  [<ffffffffc0baae89>] llog_cat_process_or_fork+0x199/0x2a0 [obdclass]
      [ 4246.866523]  [<ffffffffc125b5a0>] ? mdt_cdt_started_cb.isra.27+0x410/0x410 [mdt]
      [ 4246.867259]  [<ffffffffc0baafbe>] llog_cat_process+0x2e/0x30 [obdclass]
      [ 4246.867919]  [<ffffffffc124c496>] cdt_llog_process+0xc6/0x3a0 [mdt]
      [ 4246.868548]  [<ffffffffc125b5a0>] ? mdt_cdt_started_cb.isra.27+0x410/0x410 [mdt]
      [ 4246.869277]  [<ffffffffc12560b1>] mdt_coordinator+0x541/0x19f0 [mdt]
      [ 4246.869932]  [<ffffffffa16bef10>] ? wake_up_atomic_t+0x30/0x30
      [ 4246.870514]  [<ffffffffc1255b70>] ? mdt_hsm_user_request_mask_seq_write+0x30/0x30 [mdt]
      [ 4246.871287]  [<ffffffffa16bdf21>] kthread+0xd1/0xe0
      [ 4246.871769]  [<ffffffffa16bde50>] ? insert_kthread_work+0x40/0x40
      [ 4246.872375]  [<ffffffffa1d255f7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 4246.873013]  [<ffffffffa16bde50>] ? insert_kthread_work+0x40/0x40
      [ 4246.873602] Code: e3 4d 85 ff 0f 85 2f fe ff ff 41 80 7d 49 00 0f 84 7f 05 00 00 83 ef 01 4c 63 f7 41 89 7d 54 49 c1 e6 04 4d 03 75 58 49 8b 46 08 <44> 2b 58 04 45 89 5d 4c 49 83 7e 08 00 0f 84 c7 05 00 00 49 63 
      [ 4246.876673] RIP  [<ffffffffc12578b7>] mdt_cdt_waiting_cb.isra.25+0x357/0xc10 [mdt]
      [ 4246.877442]  RSP <ffff9b4d1906f9a0>
      [ 4246.877792] CR2: 0000000000000004
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-hsm test_13 - trevis-20vm10 crashed during sanity-hsm test_13

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: