Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17884

Soft lockup after test ends

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for S Buisson <sbuisson@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/d533e8c9-2124-4c8c-93ed-0679aa9f2d88

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b_es-reviews/18667 - 4.18.0-477.27.1.el8_8.x86_64
      servers: https://build.whamcloud.com/job/lustre-b_es-reviews/18667 - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64

      [13584.283487] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [dist_txn-1:512872]
      [13584.288569] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc pcspkr joydev virtio_balloon i2c_piix4 ext4 mbcache jbd2 ata_generic ata_piix libata crc32c_intel virtio_net serio_raw net_failover virtio_blk failover [last unloaded: libcfs]
      [13584.298845] CPU: 1 PID: 512872 Comm: dist_txn-1 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
      [13584.301372] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13584.302527] RIP: 0010:native_safe_halt+0xe/0x20
      [13584.303499] Code: 00 f0 80 48 02 20 48 8b 00 a8 08 75 c0 e9 79 ff ff ff 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 46 90 60 00 fb f4 <e9> dd 01 40 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00
      [13584.307067] RSP: 0018:ffffbd2782f3fb98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [13584.308571] RAX: 0000000000000003 RBX: ffff9fb1a0a8d850 RCX: 0000000000000008
      [13584.309988] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9fb1a0a8d850
      [13584.311402] RBP: ffff9fb23fd33d40 R08: 0000000000000008 R09: 0000000000000024
      [13584.312815] R10: 0000000000000002 R11: ffffbd2782f3fc18 R12: 0000000000000000
      [13584.314220] R13: 0000000000000001 R14: 0000000000000100 R15: 0000000000080000
      [13584.315628] FS:  0000000000000000(0000) GS:ffff9fb23fd00000(0000) knlGS:0000000000000000
      [13584.317204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [13584.318355] CR2: 00005573c765d4b0 CR3: 00000000a4410001 CR4: 00000000001706e0
      [13584.319771] Call Trace:
      [13584.320342]  kvm_wait+0x58/0x60
      [13584.321053]  __pv_queued_spin_lock_slowpath+0x268/0x2a0
      [13584.322163]  _raw_spin_lock+0x1e/0x30
      [13584.322945]  osp_check_and_set_rpc_version+0x9f/0x240 [osp]
      [13584.324184]  ? osp_md_write+0x3a0/0x5d0 [osp]
      [13584.325102]  ? dt_record_write+0x32/0x120 [obdclass]
      [13584.326469]  ? llog_osd_write_rec+0x6e9/0x1a30 [obdclass]
      [13584.327618]  ? llog_osd_regular_fid_del_name_entry+0x29c/0x5d0 [obdclass]
      [13584.329042]  ? llog_write_rec+0x3f7/0x520 [obdclass]
      [13584.330098]  ? llog_cancel_arr_rec+0x3b2/0xbd0 [obdclass]
      [13584.331259]  ? llog_cat_cancel_arr_rec+0x1d5/0x430 [obdclass]
      [13584.332487]  ? llog_cat_cancel_records+0x61/0x190 [obdclass]
      [13584.333766]  ? distribute_txn_commit_thread+0x3cf/0xbb0 [ptlrpc]
      [13584.335530]  ? distribute_txn_commit_batchid_update+0x860/0x860 [ptlrpc]
      [13584.336959]  ? kthread+0x134/0x150
      [13584.337702]  ? set_kthread_struct+0x50/0x50
      [13584.338569]  ? ret_from_fork+0x35/0x40
      [13584.339360] Kernel panic - not syncing: softlockup: hung tasks
      [13584.340553] CPU: 1 PID: 512872 Comm: dist_txn-1 Kdump: loaded Tainted: G           OEL   --------- -  - 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1
      [13584.343089] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13584.344251] Call Trace:
      [13584.344820]  <IRQ>
      [13584.345284]  dump_stack+0x41/0x60
      [13584.346021]  panic+0xe7/0x2ac
      [13584.346688]  watchdog_timer_fn.cold.10+0x85/0x9e
      [13584.347674]  ? watchdog+0x30/0x30
      [13584.348398]  __hrtimer_run_queues+0x101/0x280
      [13584.349330]  hrtimer_interrupt+0x100/0x220
      [13584.350190]  smp_apic_timer_interrupt+0x6a/0x130
      [13584.351152]  apic_timer_interrupt+0xf/0x20
      [13584.352011]  </IRQ>
      [13584.352491] RIP: 0010:native_safe_halt+0xe/0x20
      [13584.353417] Code: 00 f0 80 48 02 20 48 8b 00 a8 08 75 c0 e9 79 ff ff ff 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 46 90 60 00 fb f4 <e9> dd 01 40 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00
      [13584.357002] RSP: 0018:ffffbd2782f3fb98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [13584.358509] RAX: 0000000000000003 RBX: ffff9fb1a0a8d850 RCX: 0000000000000008
      [13584.359945] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff9fb1a0a8d850
      [13584.361371] RBP: ffff9fb23fd33d40 R08: 0000000000000008 R09: 0000000000000024
      [13584.362795] R10: 0000000000000002 R11: ffffbd2782f3fc18 R12: 0000000000000000
      [13584.364215] R13: 0000000000000001 R14: 0000000000000100 R15: 0000000000080000
      [13584.365636]  kvm_wait+0x58/0x60
      [13584.366316]  __pv_queued_spin_lock_slowpath+0x268/0x2a0
      [13584.367379]  _raw_spin_lock+0x1e/0x30
      [13584.368147]  osp_check_and_set_rpc_version+0x9f/0x240 [osp]
      [13584.369299]  ? osp_md_write+0x3a0/0x5d0 [osp]
      [13584.370225]  ? dt_record_write+0x32/0x120 [obdclass]
      [13584.371303]  ? llog_osd_write_rec+0x6e9/0x1a30 [obdclass]
      [13584.372449]  ? llog_osd_regular_fid_del_name_entry+0x29c/0x5d0 [obdclass]
      [13584.373856]  ? llog_write_rec+0x3f7/0x520 [obdclass]
      [13584.374925]  ? llog_cancel_arr_rec+0x3b2/0xbd0 [obdclass]
      [13584.376070]  ? llog_cat_cancel_arr_rec+0x1d5/0x430 [obdclass]
      [13584.377283]  ? llog_cat_cancel_records+0x61/0x190 [obdclass]
      [13584.378481]  ? distribute_txn_commit_thread+0x3cf/0xbb0 [ptlrpc]
      [13584.379779]  ? distribute_txn_commit_batchid_update+0x860/0x860 [ptlrpc]
      [13584.381203]  ? kthread+0x134/0x150
      [13584.381930]  ? set_kthread_struct+0x50/0x50
      [13584.382806]  ? ret_from_fork+0x35/0x40
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: