Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17954

replay-single test_0d: MDS crash BUG: soft lockup [tgt_recover_0]

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.5, Lustre 2.15.6
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5c5a7e20-3817-4d4e-bf94-c0c463e0bdfd

      test_0d failed with the following error:

      test_0d returned 1
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b2_15/88 - 5.4.0-110-generic
      servers: https://build.whamcloud.com/job/lustre-b2_15/88 - 4.18.0-513.24.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      MDS dmesg

      [ 2568.298513] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 2 clients reconnect
      [ 2568.300460] Lustre: lustre-MDT0000: Denying connection for new client 7ca0c85b-8289-4bc1-8f66-fba11c1a6d22 (at 10.240.25.238@tcp), waiting for 2 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 0:59
      [ 2568.304177] Lustre: Skipped 2 previous similar messages
      [ 2633.866079] Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
      [ 2633.867849] Lustre: lustre-MDT0000: disconnecting 1 stale clients
      [ 2635.128884] Lustre: lustre-MDT0000: Denying connection for new client 7ca0c85b-8289-4bc1-8f66-fba11c1a6d22 (at 10.240.25.238@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 0:01
      [ 2635.133415] Lustre: Skipped 12 previous similar messages
      [ 2650.944188] Lustre: lustre-MDT0000: Recovery over after 1:23, of 2 clients 1 recovered and 1 was evicted.
      [ 2679.928995] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [tgt_recover_0:220356]
      [ 2679.932486] Modules linked in: dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver intel_rapl_msr nfs lockd grace intel_rapl_common fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcspkr i2c_piix4 virtio_balloon sunrpc ext4 mbcache jbd2 ata_generic ata_piix crc32c_intel serio_raw libata virtio_net virtio_blk net_failover failover [last unloaded: dm_flakey]
      [ 2679.982751] CPU: 1 PID: 220356 Comm: tgt_recover_0 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-513.24.1.el8_lustre.x86_64 #1
      [ 2679.985241] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 2679.986403] RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs]
      [ 2680.060896] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 86 a4 41 c1 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
      [ 2680.064472] RSP: 0018:ffffb70e83d73df8 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
      [ 2680.065998] RAX: ffffb70e82afe008 RBX: 0000000000000000 RCX: 000000000000000e
      [ 2680.067394] RDX: ffffb70e82ae1000 RSI: ffffb70e83d73e28 RDI: ffff892a03c8c600
      [ 2680.068809] RBP: 0000000000000000 R08: ffffb70e83d73dc0 R09: ffffb70e83d73dc8
      [ 2680.070236] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [ 2680.071657] R13: 0000000000000001 R14: ffff892a03c8c600 R15: 0000000000000004
      [ 2680.073055] FS:  0000000000000000(0000) GS:ffff892abfd00000(0000) knlGS:0000000000000000
      [ 2680.074619] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2680.075772] CR2: 00007ffcac990dc8 CR3: 0000000092e10006 CR4: 00000000001706e0
      [ 2680.077199] Call Trace:
      [ 2680.159866]  <IRQ>
      [ 2680.160376]  ? watchdog_timer_fn.cold.10+0x46/0x9e
      [ 2680.161403]  ? watchdog+0x30/0x30
      [ 2680.162109]  ? __hrtimer_run_queues+0x101/0x280
      [ 2680.163067]  ? hrtimer_interrupt+0x100/0x220
      [ 2680.163974]  ? smp_apic_timer_interrupt+0x6a/0x130
      [ 2680.164970]  ? apic_timer_interrupt+0xf/0x20
      [ 2680.165843]  </IRQ>
      [ 2680.166311]  ? cfs_hash_for_each_relax+0x173/0x460 [libcfs]
      [ 2680.167449]  ? cfs_hash_for_each_relax+0x16a/0x460 [libcfs]
      [ 2680.168569]  ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc]
      [ 2681.900765]  ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc]
      [ 2681.901977]  cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs]
      [ 2681.903101]  ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc]
      [ 2682.220190]  target_recovery_thread+0xdd0/0x12c0 [ptlrpc]
      [ 2682.224441]  ? replay_request_or_update.isra.29+0x9e0/0x9e0 [ptlrpc]
      [ 2682.225853]  kthread+0x134/0x150
      [ 2682.226590]  ? set_kthread_struct+0x50/0x50
      [ 2682.227460]  ret_from_fork+0x35/0x40
      [ 2682.228332] Kernel panic - not syncing: softlockup: hung tasks
      [ 2682.229509] CPU: 1 PID: 220356 Comm: tgt_recover_0 Kdump: loaded Tainted: G           OEL   --------- -  - 4.18.0-513.24.1.el8_lustre.x86_64 #1
      [ 2682.232109] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 2682.233288] Call Trace:
      [ 2682.233856]  <IRQ>
      [ 2682.234330]  dump_stack+0x41/0x60
      [ 2682.235071]  panic+0xe7/0x2ac
      [ 2682.235742]  ? __switch_to_asm+0x11/0x80
      [ 2682.236569]  watchdog_timer_fn.cold.10+0x85/0x9e
      [ 2682.237628]  ? watchdog+0x30/0x30
      [ 2682.238339]  __hrtimer_run_queues+0x101/0x280
      [ 2682.239338]  hrtimer_interrupt+0x100/0x220
      [ 2682.240190]  smp_apic_timer_interrupt+0x6a/0x130
      [ 2682.241164]  apic_timer_interrupt+0xf/0x20
      [ 2682.242044]  </IRQ>
      [ 2682.242571] RIP: 0010:cfs_hash_for_each_relax+0x173/0x460 [libcfs]
      [ 2682.243845] Code: 24 38 00 00 00 00 8b 40 2c 89 44 24 14 49 8b 46 38 48 8d 74 24 30 4c 89 f7 48 8b 00 e8 86 a4 41 c1 48 85 c0 0f 84 e6 01 00 00 <48> 8b 18 48 85 db 0f 84 c0 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
      [ 2682.247715] RSP: 0018:ffffb70e83d73df8 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
      [ 2682.249215] RAX: ffffb70e82afe008 RBX: 0000000000000000 RCX: 000000000000000e
      [ 2682.250697] RDX: ffffb70e82ae1000 RSI: ffffb70e83d73e28 RDI: ffff892a03c8c600
      [ 2682.252104] RBP: 0000000000000000 R08: ffffb70e83d73dc0 R09: ffffb70e83d73dc8
      [ 2682.253560] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [ 2682.255013] R13: 0000000000000001 R14: ffff892a03c8c600 R15: 0000000000000004
      [ 2682.256439]  ? cfs_hash_for_each_relax+0x16a/0x460 [libcfs]
      [ 2682.257583]  ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc]
      [ 2682.258814]  ? ldlm_lock_mode_downgrade+0x300/0x300 [ptlrpc]
      [ 2682.260053]  cfs_hash_for_each_nolock+0x11f/0x1f0 [libcfs]
      [ 2682.261188]  ldlm_reprocess_recovery_done+0x8b/0x100 [ptlrpc]
      [ 2682.262432]  target_recovery_thread+0xdd0/0x12c0 [ptlrpc]
      [ 2682.263636]  ? replay_request_or_update.isra.29+0x9e0/0x9e0 [ptlrpc]
      [ 2682.264969]  kthread+0x134/0x150
      [ 2682.265656]  ? set_kthread_struct+0x50/0x50
      [ 2682.266507]  ret_from_fork+0x35/0x40
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      replay-single test_0d - test_0d returned 1

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: