Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17349

sanity-quota test_81: Kernel panic - not syncing: softlockup: hung tasks

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      test_81 failed with the following error:

      trevis-68vm7 crashed during sanity-quota test_81
      

      Dmesg from vmcore:

      [42692.257956] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-quota test 81: Race qmt_start_pool_recalc with qmt_pool_free =============================== 05:00:18 \(1700888418\)
      [42692.536584] Lustre: DEBUG MARKER: == sanity-quota test 81: Race qmt_start_pool_recalc with qmt_pool_free =============================== 05:00:18 (1700888418)
      [42692.749799] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osc.*MDT*.sync_*
      [42693.171476] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osp.*.destroys_in_flight
      [42694.725049] Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0
      [42695.134885] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null
      [42695.547239] Lustre: DEBUG MARKER: lfs --list-commands
      [42695.915540] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ugp
      [42701.359171] Lustre: DEBUG MARKER: /usr/sbin/lctl mark User quota \(block hardlimit:20 MB\)
      [42701.570888] Lustre: DEBUG MARKER: User quota (block hardlimit:20 MB)
      [42701.765689] Lustre: DEBUG MARKER: lctl pool_new lustre.qpool1
      [42704.157598] Lustre: DEBUG MARKER: lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.qpool1 				2>/dev/null || echo foo
      [42704.557285] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param fail_loc=0x80000A07 fail_val=30
      [42704.949475] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_add lustre.qpool1 lustre-OST[0-0/1]
      [42706.444500] LustreError: 12680:0:(fail.c:130:__cfs_fail_timeout_set()) cfs_fail_timeout id a07 sleeping for 30000ms
      [42707.325568] Lustre: DEBUG MARKER: lctl get_param -n lod.lustre-MDT0000-mdtlov.pools.qpool1 |
      				sort -u | tr '\n' ' ' 
      [42707.702378] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
      [42708.083111] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
      [42711.531547] sched: RT throttling activated
      [42734.489940] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 21s! [ldlm_bl_03:9064]
      [42734.492361] Modules linked in: nfsd nfs_acl dm_flakey lustre(OE) obdecho(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper joydev ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi
      [42734.888235]  virtio_blk virtio_net net_failover failover crct10dif_pclmul crct10dif_common crc32c_intel ata_piix serio_raw libata virtio_pci virtio_ring virtio floppy [last unloaded: dm_flakey]
      [42734.926022] CPU: 1 PID: 9064 Comm: ldlm_bl_03 Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.83.1.el7_lustre.ddn17.x86_64 #1
      [42734.928412] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [42734.929490] task: ffff88382c5f1080 ti: ffff88382f584000 task.ti: ffff88382f584000
      [42734.930874] RIP: 0010:[<ffffffff931b9e1b>]  [<ffffffff931b9e1b>] native_safe_halt+0xb/0x30
      [42735.066856] RSP: 0018:ffff88382f587c50  EFLAGS: 00000202
      [42735.067898] RAX: 0000000000000005 RBX: ffff88382f587c10 RCX: 0000000000000000
      [42735.069221] RDX: 0000000000000005 RSI: 0000000000000005 RDI: ffff93b84581c000
      [42735.070544] RBP: ffff88382f587c50 R08: 0000000000000000 R09: 00000000000000b0
      [42735.071866] R10: 0000000000000000 R11: 0000000000000001 R12: 0000090900000010
      [42735.073188] R13: 0000236800000000 R14: 000000000000154f R15: 0000000000533f52
      [42735.074498] FS:  0000000000000000(0000) GS:ffff88387fd00000(0000) knlGS:0000000000000000
      [42735.076001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [42735.077070] CR2: 0000000000519001 CR3: 00000000bb992000 CR4: 00000000000606e0
      [42735.089216] Call Trace:
      [42735.097974]  [<ffffffff92a6fd95>] kvm_wait+0x65/0x70
      [42735.103299]  [<ffffffff92b1e95e>] __pv_queued_spin_lock_slowpath+0x29e/0x2e0
      [42735.111331]  [<ffffffff931ac21a>] queued_spin_lock_slowpath+0xb/0x13
      [42735.112554]  [<ffffffff931ba734>] _raw_spin_lock+0x24/0x30
      [42735.114265]  [<ffffffffc06cdec9>] cfs_hash_spin_lock+0x9/0x10 [libcfs]
      [42735.115552]  [<ffffffffc06cea57>] cfs_hash_for_each_relax+0x147/0x450 [libcfs]
      [42735.581459]  [<ffffffffc0bb9e30>] ? ldlm_lock_mode_downgrade+0x340/0x340 [ptlrpc]
      [42735.582936]  [<ffffffffc0bb9e30>] ? ldlm_lock_mode_downgrade+0x340/0x340 [ptlrpc]
      [42735.584409]  [<ffffffffc06d1ed5>] cfs_hash_for_each_nolock+0x75/0x1d0 [libcfs]
      [42735.586451]  [<ffffffffc0bbac8c>] ldlm_reprocess_recovery_done+0x3c/0x120 [ptlrpc]
      [42735.588445]  [<ffffffffc0bbba3c>] ldlm_export_cancel_locks+0x17c/0x190 [ptlrpc]
      [42735.598443]  [<ffffffffc0be72f0>] ldlm_bl_thread_main+0x780/0xab0 [ptlrpc]
      [42735.600524]  [<ffffffff92acc680>] ? wake_up_atomic_t+0x40/0x40
      [42735.601684]  [<ffffffffc0be6b70>] ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc]
      [42735.603122]  [<ffffffff92acb511>] kthread+0xd1/0xe0
      [42735.604049]  [<ffffffff92acb440>] ? insert_kthread_work+0x40/0x40
      [42735.605692]  [<ffffffff931c51f7>] ret_from_fork_nospec_begin+0x21/0x21
      [42735.606909]  [<ffffffff92acb440>] ? insert_kthread_work+0x40/0x40
      [42735.608052] Code: ff ff 66 0f 1f 44 00 00 0f 00 2d fd e7 04 00 e9 14 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 90 55 48 89 e5 66 66 66 66 90 fb f4 <5d> c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f 00 2d 21 f8 04 00 eb 
      [42735.613869] Kernel panic - not syncing: softlockup: hung tasks
      [42735.614975] CPU: 1 PID: 9064 Comm: ldlm_bl_03 Kdump: loaded Tainted: G           OEL ------------   3.10.0-1160.83.1.el7_lustre.ddn17.x86_64 #1
      [42735.617327] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [42735.618420] Call Trace:
      [42735.618895]  <IRQ>  [<ffffffff931b1bec>] dump_stack+0x19/0x1f
      [42735.620023]  [<ffffffff931ab708>] panic+0xe8/0x21f
      [42735.626270]  [<ffffffff92b56f5a>] watchdog_timer_fn+0x20a/0x220
      [42735.627420]  [<ffffffff92b56d50>] ? watchdog+0x50/0x50
      [42735.628385]  [<ffffffff92acfa0e>] __hrtimer_run_queues+0x10e/0x270
      [42735.629567]  [<ffffffff92acff9f>] hrtimer_interrupt+0xaf/0x1e0
      [42735.636517]  [<ffffffff92a5fa3b>] local_apic_timer_interrupt+0x3b/0x70
      [42735.637771]  [<ffffffff931ca053>] smp_apic_timer_interrupt+0x43/0x60
      [42735.638948]  [<ffffffff931c63f2>] apic_timer_interrupt+0x172/0x180
      [42735.640111]  <EOI>  [<ffffffff931b9e1b>] ? native_safe_halt+0xb/0x30
      [42735.641339]  [<ffffffff92a6fd95>] kvm_wait+0x65/0x70
      [42735.642265]  [<ffffffff92b1e95e>] __pv_queued_spin_lock_slowpath+0x29e/0x2e0
      [42735.643592]  [<ffffffff931ac21a>] queued_spin_lock_slowpath+0xb/0x13
      [42735.644781]  [<ffffffff931ba734>] _raw_spin_lock+0x24/0x30
      [42735.645820]  [<ffffffffc06cdec9>] cfs_hash_spin_lock+0x9/0x10 [libcfs]
      [42735.647043]  [<ffffffffc06cea57>] cfs_hash_for_each_relax+0x147/0x450 [libcfs]
      [42735.648417]  [<ffffffffc0bb9e30>] ? ldlm_lock_mode_downgrade+0x340/0x340 [ptlrpc]
      [42735.649822]  [<ffffffffc0bb9e30>] ? ldlm_lock_mode_downgrade+0x340/0x340 [ptlrpc]
      [42735.651229]  [<ffffffffc06d1ed5>] cfs_hash_for_each_nolock+0x75/0x1d0 [libcfs]
      [42735.652587]  [<ffffffffc0bbac8c>] ldlm_reprocess_recovery_done+0x3c/0x120 [ptlrpc]
      [42735.654041]  [<ffffffffc0bbba3c>] ldlm_export_cancel_locks+0x17c/0x190 [ptlrpc]
      [42735.655419]  [<ffffffffc0be72f0>] ldlm_bl_thread_main+0x780/0xab0 [ptlrpc]
      [42735.656699]  [<ffffffff92acc680>] ? wake_up_atomic_t+0x40/0x40
      [42735.657818]  [<ffffffffc0be6b70>] ? ldlm_handle_bl_callback+0x400/0x400 [ptlrpc]
      [42735.659216]  [<ffffffff92acb511>] kthread+0xd1/0xe0
      [42735.660139]  [<ffffffff92acb440>] ? insert_kthread_work+0x40/0x40
      [42735.661276]  [<ffffffff931c51f7>] ret_from_fork_nospec_begin+0x21/0x21
      [42735.662488]  [<ffffffff92acb440>] ? insert_kthread_work+0x40/0x40 

      Attachments

        Issue Links

          Activity

            People

              scherementsev Sergey Cheremencev
              scherementsev Sergey Cheremencev
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: