Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15948

Interop conf-sanity test_32d: MDS hit NMI watchdog: BUG: soft lockup

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.9
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/77959384-2fe3-4f55-bb38-984f7bf61760

      test_32d failed with the following error:

      onyx-124vm8 crashed during conf-sanity test_32d
      

      MDS crash

      [ 4534.035258] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      [ 4534.065338] Lustre: MGS: Connection restored to MGC10.240.30.39@tcp_0 (at 0@lo)
      [ 4534.351344] Lustre: 2461:0:(obd_mount.c:968:lustre_check_exclusion()) Excluding t32fs-OST0000 (on exclusion list)
      [ 4534.353004] Lustre: 2461:0:(obd_mount.c:968:lustre_check_exclusion()) Skipped 1 previous similar message
      [ 4580.197426] Lustre: t32fs-MDT0000: Imperative Recovery not enabled, recovery window 60-180
      [ 4580.607165] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.t32fs-MDT0000.uuid
      [ 4580.926690] Lustre: DEBUG MARKER: tunefs.lustre --dryrun /tmp/t32/ost
      [ 4581.303203] Lustre: DEBUG MARKER: mount -t lustre -onomgs -oloop,mgsnode=10.240.30.39@tcp /tmp/t32/ost /tmp/t32/mnt/ost
      [ 4581.548860] LDISKFS-fs (loop1): file extents enabled, maximum tree depth=5
      [ 4581.550342] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
      [ 4620.308604] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [llog_process_th:2780]
      [ 4620.312080] Modules linked in: ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper parport_pc cryptd pcspkr joydev virtio_balloon i2c_piix4 parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi
      [ 4620.325003]  virtio_net net_failover failover ata_piix virtio_blk libata crct10dif_pclmul crct10dif_common floppy crc32c_intel serio_raw virtio_pci virtio_ring virtio [last unloaded: libcfs]
      [ 4620.327797] CPU: 1 PID: 2780 Comm: llog_process_th Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.49.1.el7_lustre.x86_64 #1
      [ 4620.329699] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 4620.330568] task: ffff98515cc4c200 ti: ffff985155fc8000 task.ti: ffff985155fc8000
      [ 4620.331691] RIP: 0010:[<ffffffffb918b795>]  [<ffffffffb918b795>] _raw_spin_unlock_irqrestore+0x15/0x20
      [ 4620.333145] RSP: 0018:ffff985155fcb678  EFLAGS: 00000246
      [ 4620.333954] RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff98515cc4c200
      [ 4620.335028] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
      [ 4620.336102] RBP: ffff985155fcb678 R08: ffff985148661fb0 R09: 0000000000000001
      [ 4620.337177] R10: 0000000000000001 R11: 000000000000000f R12: ffffffffb8a2b59e
      [ 4620.338249] R13: ffff985155fcb678 R14: ffffffffb8ae6321 R15: ffff985155fcb610
      [ 4620.339327] FS:  0000000000000000(0000) GS:ffff98517fd00000(0000) knlGS:0000000000000000
      [ 4620.340538] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4620.341409] CR2: 00007fa275698504 CR3: 00000000bb604000 CR4: 00000000001606e0
      [ 4620.342485] Call Trace:
      [ 4620.342909]  [<ffffffffb8ac6a46>] prepare_to_wait+0x56/0x90
      [ 4620.343806]  [<ffffffffc0966129>] lnet_discover_peer_locked+0x1e9/0x430 [lnet]
      [ 4620.344899]  [<ffffffffb8ac6f50>] ? wake_up_atomic_t+0x30/0x30
      [ 4620.345794]  [<ffffffffc0966425>] LNetPrimaryNID+0xb5/0x1f0 [lnet]
      [ 4620.346781]  [<ffffffffc0c906ce>] ptlrpc_connection_get+0x3e/0x450 [ptlrpc]
      [ 4620.347856]  [<ffffffffc0c84b4c>] ptlrpc_uuid_to_connection+0xec/0x1a0 [ptlrpc]
      [ 4620.348993]  [<ffffffffc0c563a2>] import_set_conn+0xb2/0x7a0 [ptlrpc]
      [ 4620.350002]  [<ffffffffc0c57c39>] client_obd_setup+0xd19/0x1430 [ptlrpc]
      [ 4620.351029]  [<ffffffffc1586e03>] lwp_setup.isra.5+0x363/0xc40 [osp]
      [ 4620.351999]  [<ffffffffc085e217>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 4620.353007]  [<ffffffffc15878d8>] lwp_device_alloc+0x1f8/0x590 [osp]
      [ 4620.354009]  [<ffffffffc09fa5e9>] obd_setup+0x119/0x280 [obdclass]
      [ 4620.354962]  [<ffffffffc09fa9f8>] class_setup+0x2a8/0x840 [obdclass]
      [ 4620.355942]  [<ffffffffc09fdaa6>] class_process_config+0x1726/0x2830 [obdclass]
      [ 4620.357056]  [<ffffffffc085e217>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 4620.358067]  [<ffffffffc0a01fc8>] do_lcfg+0x258/0x500 [obdclass]
      [ 4620.358998]  [<ffffffffc0a067f8>] lustre_start_simple+0x88/0x210 [obdclass]
      [ 4620.360070]  [<ffffffffc0a2fcdc>] client_lwp_config_process+0xb4c/0xe10 [obdclass]
      [ 4620.361223]  [<ffffffffc09c285b>] llog_process_thread+0x94b/0x1af0 [obdclass]
      [ 4620.362313]  [<ffffffffc09c4414>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
      [ 4620.363492]  [<ffffffffc09c4370>] ? llog_backup+0x500/0x500 [obdclass]
      [ 4620.364481]  [<ffffffffb8ac5e61>] kthread+0xd1/0xe0
      [ 4620.365226]  [<ffffffffb8ac5d90>] ? insert_kthread_work+0x40/0x40
      [ 4620.366149]  [<ffffffffb9195df7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 4620.367137]  [<ffffffffb8ac5d90>] ? insert_kthread_work+0x40/0x40
      [ 4620.368065] Code: 06 bd 98 ff 66 90 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 e8 e6 bc 98 ff 66 90 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 48 
      [ 4620.372845] Kernel panic - not syncing: softlockup: hung tasks
      [ 4620.373732] CPU: 1 PID: 2780 Comm: llog_process_th Kdump: loaded Tainted: G           OEL ------------   3.10.0-1160.49.1.el7_lustre.x86_64 #1
      [ 4620.375636] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 4620.376516] Call Trace:
      [ 4620.376903]  <IRQ>  [<ffffffffb9183539>] dump_stack+0x19/0x1b
      [ 4620.377820]  [<ffffffffb917d241>] panic+0xe8/0x21f
      [ 4620.378568]  [<ffffffffb8b4ee2a>] watchdog_timer_fn+0x20a/0x220
      [ 4620.379471]  [<ffffffffb8b4ec20>] ? watchdog+0x40/0x40
      [ 4620.380267]  [<ffffffffb8aca25e>] __hrtimer_run_queues+0x10e/0x270
      [ 4620.381215]  [<ffffffffb8aca7bf>] hrtimer_interrupt+0xaf/0x1d0
      [ 4620.382108]  [<ffffffffb8a5cdfb>] local_apic_timer_interrupt+0x3b/0x60
      [ 4620.383104]  [<ffffffffb919aa23>] smp_apic_timer_interrupt+0x43/0x60
      [ 4620.384066]  [<ffffffffb9196fba>] apic_timer_interrupt+0x16a/0x170
      [ 4620.384997]  <EOI>  [<ffffffffb918b795>] ? _raw_spin_unlock_irqrestore+0x15/0x20
      [ 4620.386144]  [<ffffffffb8ac6a46>] prepare_to_wait+0x56/0x90
      [ 4620.387001]  [<ffffffffc0966129>] lnet_discover_peer_locked+0x1e9/0x430 [lnet]
      [ 4620.388093]  [<ffffffffb8ac6f50>] ? wake_up_atomic_t+0x30/0x30
      [ 4620.388982]  [<ffffffffc0966425>] LNetPrimaryNID+0xb5/0x1f0 [lnet]
      [ 4620.389936]  [<ffffffffc0c906ce>] ptlrpc_connection_get+0x3e/0x450 [ptlrpc]
      [ 4620.391016]  [<ffffffffc0c84b4c>] ptlrpc_uuid_to_connection+0xec/0x1a0 [ptlrpc]
      [ 4620.392137]  [<ffffffffc0c563a2>] import_set_conn+0xb2/0x7a0 [ptlrpc]
      [ 4620.393139]  [<ffffffffc0c57c39>] client_obd_setup+0xd19/0x1430 [ptlrpc]
      [ 4620.394157]  [<ffffffffc1586e03>] lwp_setup.isra.5+0x363/0xc40 [osp]
      [ 4620.395125]  [<ffffffffc085e217>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 4620.396125]  [<ffffffffc15878d8>] lwp_device_alloc+0x1f8/0x590 [osp]
      [ 4620.397124]  [<ffffffffc09fa5e9>] obd_setup+0x119/0x280 [obdclass]
      [ 4620.398074]  [<ffffffffc09fa9f8>] class_setup+0x2a8/0x840 [obdclass]
      [ 4620.399054]  [<ffffffffc09fdaa6>] class_process_config+0x1726/0x2830 [obdclass]
      [ 4620.400165]  [<ffffffffc085e217>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 4620.401185]  [<ffffffffc0a01fc8>] do_lcfg+0x258/0x500 [obdclass]
      [ 4620.402119]  [<ffffffffc0a067f8>] lustre_start_simple+0x88/0x210 [obdclass]
      [ 4620.403189]  [<ffffffffc0a2fcdc>] client_lwp_config_process+0xb4c/0xe10 [obdclass]
      [ 4620.404339]  [<ffffffffc09c285b>] llog_process_thread+0x94b/0x1af0 [obdclass]
      [ 4620.405425]  [<ffffffffc09c4414>] llog_process_thread_daemonize+0xa4/0xe0 [obdclass]
      [ 4620.406598]  [<ffffffffc09c4370>] ? llog_backup+0x500/0x500 [obdclass]
      [ 4620.407583]  [<ffffffffb8ac5e61>] kthread+0xd1/0xe0
      [ 4620.408332]  [<ffffffffb8ac5d90>] ? insert_kthread_work+0x40/0x40
      [ 4620.409257]  [<ffffffffb9195df7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 4620.410247]  [<ffffffffb8ac5d90>] ? insert_kthread_work+0x40/0x40
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      conf-sanity test_32d - onyx-124vm8 crashed during conf-sanity test_32d

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: