Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1872

OST soft lockup in recovery scale

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.3.0
    • Fix Version/s: Lustre 2.3.0, Lustre 2.4.0
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      4450

      Description

      Sep  5 21:59:48 hyperion-dit30 kernel: BUG: soft lockup - CPU#9 stuck for 67s! [ll_ost_io00_016:10900]
      Sep  5 21:59:48 hyperion-dit30 kernel: Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) exportfs ost(U) mgc(U) ldiskfs(U) mbcache jbd2 lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ib_srp scsi_transport_srp ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm_intel kvm sg sd_mod crc_t10dif serio_raw iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ata_generic pata_acpi ata_piix ioatdma i7core_edac edac_core qla2xxx scsi_transport_fc scsi_tgt ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc igb dca mlx4_en mlx4_core [last unloaded: scsi_wait_scan]
      Sep  5 21:59:48 hyperion-dit30 kernel: CPU 9
      Sep  5 21:59:48 hyperion-dit30 kernel: Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) exportfs ost(U) mgc(U) ldiskfs(U) mbcache jbd2 lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ib_srp scsi_transport_srp ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa mlx4_ib ib_mad ib_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm_intel kvm sg sd_mod crc_t10dif serio_raw iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ata_generic pata_acpi ata_piix ioatdma i7core_edac edac_core qla2xxx scsi_transport_fc scsi_tgt ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc igb dca mlx4_en mlx4_core [last unloaded: scsi_wait_scan]
      
      Sep  5 21:59:48 hyperion-dit30 kernel:
      Sep  5 21:59:48 hyperion-dit30 kernel: Pid: 10900, comm: ll_ost_io00_016 Tainted: P           ---------------    2.6.32-279.5.1.el6_lustre.x86_64 #1 Supermicro X8DTG-D/X8DTG-D
      Sep  5 21:59:48 hyperion-dit30 kernel: RIP: 0010:[<ffffffff8127e927>]  [<ffffffff8127e927>] __write_lock_failed+0x7/0x20
      Sep  5 21:59:48 hyperion-dit30 kernel: RSP: 0018:ffff8800282a3d68  EFLAGS: 00000287
      Sep  5 21:59:48 hyperion-dit30 kernel: RAX: ffffffffa0428ca0 RBX: ffff8800282a3d70 RCX: 0000000000000001
      Sep  5 21:59:48 hyperion-dit30 kernel: RDX: ffff8800282a3e28 RSI: 0000000000000001 RDI: ffff88033735b080
      Sep  5 21:59:48 hyperion-dit30 kernel: RBP: ffffffff8100bc13 R08: ffff8800282ae0e0 R09: 000018ff3747ec29
      Sep  5 21:59:48 hyperion-dit30 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800282a3ce0
      Sep  5 21:59:48 hyperion-dit30 kernel: R13: ffff8802dbbba810 R14: ffff88033735b080 R15: ffffffff8150604b
      Sep  5 21:59:48 hyperion-dit30 kernel: FS:  00002aaaab47e700(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000
      Sep  5 21:59:48 hyperion-dit30 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Sep  5 21:59:48 hyperion-dit30 kernel: CR2: 00002aaaab3fa450 CR3: 0000000336d57000 CR4: 00000000000006e0
      Sep  5 21:59:48 hyperion-dit30 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Sep  5 21:59:48 hyperion-dit30 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Sep  5 21:59:48 hyperion-dit30 kernel: Process ll_ost_io00_016 (pid: 10900, threadinfo ffff880278b9c000, task ffff8802a70b9540)
      Sep  5 21:59:48 hyperion-dit30 kernel: Stack:
      Sep  5 21:59:48 hyperion-dit30 kernel: ffffffff815005b7 ffff8800282a3d80 ffffffffa040b702 ffff8800282a3da0
      Sep  5 21:59:48 hyperion-dit30 kernel: <d> ffffffffa040b677 0000000000000000 ffff88033735b080 ffff8800282a3e00
      Sep  5 21:59:48 hyperion-dit30 kernel: <d> ffffffffa040d588 000000018110dd48 ffff8800282a3e28 0000000000000092
      Sep  5 21:59:48 hyperion-dit30 kernel: Call Trace:
      Sep  5 21:59:48 hyperion-dit30 kernel: <IRQ>
      
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff815005b7>] ? _write_lock+0x17/0x20
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040b702>] ? cfs_hash_rw_lock+0x12/0x30 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040b677>] ? cfs_hash_for_each_enter+0x37/0xb0 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040d588>] ? cfs_hash_for_each_tight+0x48/0x2a0 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0793fd0>] ? job_cleanup_callback+0x0/0x60 [obdclass]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040d853>] ? cfs_hash_for_each_safe+0x13/0x20 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa079400b>] ? job_cleanup_callback+0x3b/0x60 [obdclass]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81057fac>] ? scheduler_tick+0xcc/0x260
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8107e897>] ? run_timer_softirq+0x197/0x340
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff810a21d0>] ? tick_sched_timer+0x0/0xc0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81096c50>] ? hrtimer_interrupt+0x140/0x250
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81073ca5>] ? irq_exit+0x85/0x90
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81506050>] ? smp_apic_timer_interrupt+0x70/0x9b
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
      Sep  5 21:59:48 hyperion-dit30 kernel: <EOI>
      
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81279955>] ? strncmp+0x45/0x60
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0793fae>] ? job_stat_keycmp+0x5e/0x80 [obdclass]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040ad95>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040af17>] ? cfs_hash_dual_bd_lookup_locked+0x37/0x70 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040c1a4>] ? cfs_hash_lookup+0x54/0xa0 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81115438>] ? find_or_create_page+0x68/0xb0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa079458b>] ? lprocfs_job_stats_log+0x9b/0x870 [obdclass]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0e0134a>] ? filter_preprw_write+0x1d5a/0x2320 [obdfilter]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa04b7edb>] ? lnet_ni_send+0x4b/0x110 [lnet]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa096aaab>] ? null_alloc_rs+0x1ab/0x3b0 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0e02710>] ? filter_preprw+0x80/0xa0 [obdfilter]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa08a381c>] ? obd_preprw+0x12c/0x3d0 [ost]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa08aa98a>] ? ost_brw_write+0x87a/0x1600 [ost]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8127ce76>] ? vsnprintf+0x2b6/0x5f0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa092909c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa09291f8>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa08b102c>] ? ost_handle+0x360c/0x4850 [ost]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0405561>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0401364>] ? libcfs_id2str+0x74/0xb0 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa093885d>] ? ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa03f565e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa092fdf7>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0939e49>] ? ptlrpc_main+0xb59/0x1860 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa09392f0>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa09392f0>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa09392f0>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
      Sep  5 21:59:48 hyperion-dit30 kernel: Code: d8 1f 00 00 48 8b 5b 20 48 83 eb 07 48 39 d9 73 06 48 89 01 31 c0 c3 b8 f2 ff ff ff c3 90 90 90 90 90 90 90 f0 81 07 00 00 00 01 <f3> 90 81 3f 00 00 00 01 75 f6 f0 81 2f 00 00 00 01 0f 85 e2 ff
      Sep  5 21:59:48 hyperion-dit30 kernel: Call Trace:
      
      Sep  5 21:59:48 hyperion-dit30 kernel: <IRQ>  [<ffffffff815005b7>] ? _write_lock+0x17/0x20
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040b702>] ? cfs_hash_rw_lock+0x12/0x30 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040b677>] ? cfs_hash_for_each_enter+0x37/0xb0 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040d588>] ? cfs_hash_for_each_tight+0x48/0x2a0 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0793fd0>] ? job_cleanup_callback+0x0/0x60 [obdclass]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040d853>] ? cfs_hash_for_each_safe+0x13/0x20 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa079400b>] ? job_cleanup_callback+0x3b/0x60 [obdclass]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81057fac>] ? scheduler_tick+0xcc/0x260
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8107e897>] ? run_timer_softirq+0x197/0x340
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff810a21d0>] ? tick_sched_timer+0x0/0xc0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81096c50>] ? hrtimer_interrupt+0x140/0x250
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81073ca5>] ? irq_exit+0x85/0x90
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81506050>] ? smp_apic_timer_interrupt+0x70/0x9b
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
      Sep  5 21:59:48 hyperion-dit30 kernel: <EOI>  [<ffffffff81279955>] ? strncmp+0x45/0x60
      
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0793fae>] ? job_stat_keycmp+0x5e/0x80 [obdclass]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040ad95>] ? cfs_hash_bd_lookup_intent+0x65/0x130 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040af17>] ? cfs_hash_dual_bd_lookup_locked+0x37/0x70 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa040c1a4>] ? cfs_hash_lookup+0x54/0xa0 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff81115438>] ? find_or_create_page+0x68/0xb0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa079458b>] ? lprocfs_job_stats_log+0x9b/0x870 [obdclass]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0e0134a>] ? filter_preprw_write+0x1d5a/0x2320 [obdfilter]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa04b7edb>] ? lnet_ni_send+0x4b/0x110 [lnet]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa096aaab>] ? null_alloc_rs+0x1ab/0x3b0 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0e02710>] ? filter_preprw+0x80/0xa0 [obdfilter]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa08a381c>] ? obd_preprw+0x12c/0x3d0 [ost]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa08aa98a>] ? ost_brw_write+0x87a/0x1600 [ost]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8127ce76>] ? vsnprintf+0x2b6/0x5f0
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa092909c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa09291f8>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa08b102c>] ? ost_handle+0x360c/0x4850 [ost]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0405561>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0401364>] ? libcfs_id2str+0x74/0xb0 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa093885d>] ? ptlrpc_server_handle_request+0x40d/0xea0 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa03f565e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa092fdf7>] ? ptlrpc_wait_event+0xa7/0x2a0 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff810533f3>] ? __wake_up+0x53/0x70
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa0939e49>] ? ptlrpc_main+0xb59/0x1860 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa09392f0>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa09392f0>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffffa09392f0>] ? ptlrpc_main+0x0/0x1860 [ptlrpc]
      Sep  5 21:59:48 hyperion-dit30 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

        Attachments

          Activity

            People

            • Assignee:
              niu Niu Yawei (Inactive)
              Reporter:
              cliffw Cliff White (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: