Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12416

NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [mount.lustre:11956]

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      After the multi-rail branch merge mount faces with:

      [   80.154928] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [mount.lustre:11956]
      [   80.157527] Kernel panic - not syncing: softlockup: hung tasks
      [   80.158948] CPU: 0 PID: 11956 Comm: mount.lustre Tainted: G           OEL ------------   3.10.0-862.14.4.el7.x86_64 #22
      [   80.161585] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   80.163751] Call Trace:
      [   80.164430]  <IRQ>  [<ffffffff92c22b0e>] dump_stack+0x19/0x1b
      [   80.165605]  [<ffffffff92c1d7af>] panic+0xe8/0x21f
      [   80.166601]  [<ffffffff9262c838>] ? show_regs+0x58/0x210
      [   80.167670]  [<ffffffff9273a761>] watchdog_timer_fn+0x231/0x240
      [   80.169127]  [<ffffffff9273a530>] ? watchdog+0x40/0x40
      [   80.170551]  [<ffffffff926bdd93>] __hrtimer_run_queues+0xf3/0x270
      [   80.172526]  [<ffffffff926be31f>] hrtimer_interrupt+0xaf/0x1d0
      [   80.173902]  [<ffffffff926573ab>] local_apic_timer_interrupt+0x3b/0x60
      [   80.175189]  [<ffffffff92c38a13>] smp_apic_timer_interrupt+0x43/0x60
      [   80.176421]  [<ffffffff92c352b2>] apic_timer_interrupt+0x162/0x170
      [   80.178102]  <EOI>  [<ffffffffc074db71>] ? lnet_peer_ni_alloc+0x61/0x390 [lnet]
      [   80.180384]  [<ffffffff92703494>] ? __raw_callee_save___pv_queued_spin_unlock+0x10/0x17
      [   80.182187]  [<ffffffffc06c47b8>] cfs_percpt_unlock+0x38/0xb0 [libcfs]
      [   80.183421]  [<ffffffffc0756757>] lnet_discover_peer_locked+0x77/0x3d0 [lnet]
      [   80.184929]  [<ffffffff926bab40>] ? wake_up_atomic_t+0x30/0x30
      [   80.186831]  [<ffffffffc0756b20>] LNetPrimaryNID+0x70/0x1a0 [lnet]
      [   80.188202]  [<ffffffffc0b295ee>] ptlrpc_connection_get+0x3e/0x450 [ptlrpc]
      [   80.190003]  [<ffffffffc0b1d94c>] ptlrpc_uuid_to_connection+0xec/0x1a0 [ptlrpc]
      [   80.192225]  [<ffffffffc0aefcd2>] import_set_conn+0xb2/0x7a0 [ptlrpc]
      [   80.193890]  [<ffffffffc0af1d49>] client_obd_setup+0xd19/0x1430 [ptlrpc]
      [   80.195201]  [<ffffffffc06b994f>] ? cfs_hash_buckets_realloc+0x1bf/0x690 [libcfs]
      [   80.196932]  [<ffffffffc0e85aae>] mgc_setup+0x3e/0x650 [mgc]
      [   80.198397]  [<ffffffffc084259c>] obd_setup+0x15c/0x280 [obdclass]
      [   80.199982]  [<ffffffffc06ba18c>] ? cfs_hash_create+0x36c/0xa20 [libcfs]
      [   80.201576]  [<ffffffffc0843888>] class_setup+0x2a8/0x840 [obdclass]
      [   80.203390]  [<ffffffffc0846b2e>] class_process_config+0x191e/0x2840 [obdclass]
      [   80.205245]  [<ffffffffc0838e92>] ? class_add_uuid+0x282/0x4c0 [obdclass]
      [   80.206614]  [<ffffffffc084ae78>] do_lcfg+0x258/0x500 [obdclass]
      [   80.207822]  [<ffffffffc084f6a8>] lustre_start_simple+0x88/0x210 [obdclass]
      [   80.209462]  [<ffffffffc08504a5>] lustre_start_mgc+0xc75/0x2420 [obdclass]
      [   80.210917]  [<ffffffffc084f6a8>] ? lustre_start_simple+0x88/0x210 [obdclass]
      [   80.213187]  [<ffffffffc087d2eb>] server_fill_super+0xbfb/0x1890 [obdclass]
      [   80.214570]  [<ffffffffc08526b8>] lustre_fill_super+0x328/0x950 [obdclass]
      [   80.216089]  [<ffffffffc0852390>] ? lustre_common_put_super+0x270/0x270 [obdclass]
      [   80.218447]  [<ffffffff928100bf>] mount_nodev+0x4f/0xb0
      [   80.219779]  [<ffffffffc084a888>] lustre_mount+0x38/0x60 [obdclass]
      [   80.221244]  [<ffffffff92810c3e>] mount_fs+0x3e/0x1b0
      [   80.222538]  [<ffffffff9282e177>] vfs_kern_mount+0x67/0x110
      [   80.224375]  [<ffffffff9283079f>] do_mount+0x1ef/0xce0
      [   80.226543]  [<ffffffff927e836c>] ? kmem_cache_alloc_trace+0x3c/0x200
      [   80.228959]  [<ffffffff928315d3>] SyS_mount+0x83/0xd0
      [   80.230329]  [<ffffffff92c3429b>] system_call_fastpath+0x22/0x27
      

      Important: this is in case when there is only one CPU.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              vsaveliev Vladimir Saveliev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: