Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9994

sanity-scrub test_5: test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/35ecbbdc-99c2-11e7-b778-5254006e85c2.

      The sub-test test_5 failed with the following error:

      test failed to respond and timed out
      

      This failure is a panic on MDS. It has a different stack trace than LU-5449 so I'm creating a new ticket.
      Panic seen:

      20:49:39:[15096.775030] LustreError: 1321:0:(llog_cat.c:269:llog_cat_id2handle()) lustre-MDT0000-osp-MDT0002: error opening log id [0x2:0x402:0x2]:0: rc = -2
      20:49:39:[15096.776495] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
      20:49:39:[15096.777310] IP: [<ffffffffc07d002a>] llog_process_thread+0x3a/0x1460 [obdclass]
      20:49:39:[15096.778071] PGD 78ab5067 PUD 78ab4067 PMD 0 
      20:49:39:[15096.778551] Oops: 0000 [#1] SMP 
      20:49:39:[15096.778921] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel nfsd aesni_intel lrw gf128mul ppdev glue_helper ablk_helper cryptd i2c_piix4 nfs_acl lockd joydev pcspkr i2c_core virtio_balloon auth_rpcgss grace parport_pc parport sunrpc ip_tables ata_generic pata_acpi ext4 mbcache jbd2 ata_piix virtio_blk libata 8139too crct10dif_pclmul crct10dif_common crc32c_intel serio_raw virtio_pci 8139cp virtio_ring virtio mii floppy
      20:49:39:[15096.788032] CPU: 1 PID: 1321 Comm: lod0002_rec0000 Tainted: G           OE  ------------   3.10.0-693.2.2.el7_lustre.x86_64 #1
      20:49:39:[15096.789136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      20:49:39:[15096.789694] task: ffff880067595ee0 ti: ffff8800606bc000 task.ti: ffff8800606bc000
      20:49:39:[15096.790413] RIP: 0010:[<ffffffffc07d002a>]  [<ffffffffc07d002a>] llog_process_thread+0x3a/0x1460 [obdclass]
      20:49:39:[15096.791385] RSP: 0018:ffff8800606bfb28  EFLAGS: 00010246
      20:49:39:[15096.791910] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800606bffd8
      20:49:39:[15096.792588] RDX: ffff8800606bfbb8 RSI: 0000000000000000 RDI: ffff88006797f180
      20:49:39:[15096.793279] RBP: ffff8800606bfbe0 R08: 0000000000019be0 R09: ffff88007d001a00
      20:49:39:[15096.793977] R10: ffffffffc07d14a6 R11: 000000000000000f R12: 0000000000000000
      20:49:39:[15096.794662] R13: ffff8800667ebc00 R14: ffff88006797f180 R15: 0000000000000000
      20:49:39:[15096.795352] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
      20:49:39:[15096.796149] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      20:49:39:[15096.796704] CR2: 0000000000000060 CR3: 0000000077c09000 CR4: 00000000000406e0
      20:49:39:[15096.797387] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      20:49:39:[15096.798084] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      20:49:39:[15096.798778] Stack:
      20:49:39:[15096.798987]  ffff8800606bfe58 ffff8800606bfb90 ffffffffc05eeba7 ffff880000000010
      20:49:39:[15096.799777]  ffff8800606bfba0 ffff8800606bfb60 0000000075f08900 ffff88007a318000
      20:49:39:[15096.800574]  ffff88006797f180 0000000000000001 0000000000000001 0000000000000402
      20:49:39:[15096.801374] Call Trace:
      20:49:39:[15096.801634]  [<ffffffffc05eeba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      20:49:39:[15096.802295]  [<ffffffffc07d5ccf>] ? llog_cat_cleanup+0x15f/0x380 [obdclass]
      20:49:39:[15096.802990]  [<ffffffffc0eea6b0>] ? lodname2mdt_index+0x2f0/0x2f0 [lod]
      20:49:39:[15096.803639]  [<ffffffffc07d150c>] llog_process_or_fork+0xbc/0x450 [obdclass]
      20:49:39:[15096.804337]  [<ffffffffc07d6a3a>] llog_cat_process_cb+0x20a/0x220 [obdclass]
      20:49:39:[15096.805040]  [<ffffffffc07d0865>] llog_process_thread+0x875/0x1460 [obdclass]
      20:49:39:[15096.805752]  [<ffffffffc07d6830>] ? llog_cat_process_common+0x440/0x440 [obdclass]
      20:49:39:[15096.806488]  [<ffffffffc07d150c>] llog_process_or_fork+0xbc/0x450 [obdclass]
      20:49:39:[15096.807184]  [<ffffffffc07d6830>] ? llog_cat_process_common+0x440/0x440 [obdclass]
      20:49:39:[15096.807936]  [<ffffffffc07d59b9>] llog_cat_process_or_fork+0x199/0x2a0 [obdclass]
      20:49:39:[15096.808673]  [<ffffffffc0f1ae2a>] ? lod_sub_prep_llog+0x24a/0x783 [lod]
      20:49:39:[15096.809322]  [<ffffffffc0eea6b0>] ? lodname2mdt_index+0x2f0/0x2f0 [lod]
      20:49:39:[15096.809976]  [<ffffffffc07d5aee>] llog_cat_process+0x2e/0x30 [obdclass]
      20:49:39:[15096.810613]  [<ffffffffc0ee6a89>] lod_sub_recovery_thread+0x439/0xc80 [lod]
      20:49:39:[15096.811300]  [<ffffffffc0ee6650>] ? lod_trans_stop+0x340/0x340 [lod]
      20:49:39:[15096.811920]  [<ffffffff810b098f>] kthread+0xcf/0xe0
      20:49:39:[15096.812393]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      20:49:39:[15096.812987]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
      20:49:39:[15096.813516]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      20:49:39:[15096.814113] Code: 41 54 53 48 81 ec 90 00 00 00 4c 8b 27 48 8b 47 18 65 48 8b 34 25 28 00 00 00 48 89 75 d0 31 f6 f6 05 d2 d5 e3 ff 01 48 89 7d 88 <4d> 8b 6c 24 60 48 89 45 80 c7 45 c4 00 00 00 00 74 0d f6 05 b9 
      20:49:39:[15096.817178] RIP  [<ffffffffc07d002a>] llog_process_thread+0x3a/0x1460 [obdclass]
      20:49:39:[15096.817925]  RSP <ffff8800606bfb28>
      20:49:39:[15096.818270] CR2: 0000000000000060
      

      Info required for matching: sanity-scrub 5

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: