Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13483

Apparently infinite recursion in lnet_finalize()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.12.4
    • lustre 2.12.4 + patches
      lustre 2.10.8 + patches
      RHEL 7.8 + patches
    • 3
    • 9223372036854775807

    Description

      Router node crashes with apparently infinite recursion in lnet.

      [15037.327128] Thread overran stack, or stack corrupted
      [15037.332674] Oops: 0000 [#1] SMP
      [15037.336294] Modules linked in: ko2iblnd(OE) lnet(OE) libcfs(OE) mlx4_ib mlx4_en rpcrdma ib_iser iTCO_wdt iTCO_vendor_support sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass pcspkr zfs(POE) ib_qib rdmavt lpc_ich zunicode(POE) zavl(POE) icp(POE) joydev zcommon(POE) znvpair(POE) spl(OE) mlx4_core devlink sg i2c_i801 ioatdma ipmi_si ipmi_devintf ipmi_msghandler ib_ipoib rdma_ucm ib_uverbs ib_umad acpi_cpufreq iw_cxgb4 rdma_cm iw_cm ib_cm iw_cxgb3 ib_core sch_fq_codel binfmt_misc msr_safe(OE) ip_tables nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache overlay(T) ext4 mbcache jbd2 dm_service_time sd_mod crc_t10dif crct10dif_generic be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs 8021q mgag200 garp mrp
      [15037.415946]  stp drm_kms_helper llc syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul crct10dif_common ttm crc32_pclmul crc32c_intel ghash_clmulni_intel ahci drm mpt2sas igb isci libahci aesni_intel lrw gf128mul libsas glue_helper dca ablk_helper raid_class ptp cryptd libata dm_multipath scsi_transport_sas drm_panel_orientation_quirks pps_core wmi i2c_algo_bit sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
      [15037.461455] CPU: 3 PID: 21567 Comm: kiblnd_sd_00_00 Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1
      [15037.475718] Hardware name: cray cray-2628-lr/S2600GZ, BIOS SE5C600.86B.02.06.0002.101320150901 10/13/2015
      [15037.486395] task: ffff9fd2b7775230 ti: ffff9fd27c1e8000 task.ti: ffff9fd27c1e8000
      [15037.494744] RIP: 0010:[<ffffffffa17acd4d>]  [<ffffffffa17acd4d>] strnlen+0xd/0x40
      [15037.503106] RSP: 0018:ffff9fd27c1e7e80  EFLAGS: 00010086
      [15037.509032] RAX: ffffffffa1e86261 RBX: ffffffffa2402fd6 RCX: fffffffffffffffe
      [15037.516994] RDX: 00000000c0ab26ee RSI: ffffffffffffffff RDI: 00000000c0ab26ee
      [15037.524957] RBP: ffff9fd27c1e7e80 R08: 000000000000ffff R09: 000000000000ffff
      [15037.532917] R10: 0000000000000000 R11: ffff9fd27c1e7e46 R12: 00000000c0ab26ee
      [15037.540880] R13: ffffffffa24033a0 R14: 00000000ffffffff R15: 0000000000000000
      [15037.548842] FS:  0000000000000000(0000) GS:ffff9fd2be8c0000(0000) knlGS:0000000000000000
      [15037.557871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [15037.564282] CR2: 00000000c0ab26ee CR3: 0000001ffbd04000 CR4: 00000000001607e0
      [15037.572244] Call Trace:
      [15037.574990]  [<ffffffffc08e759a>] ? cfs_print_to_console+0x7a/0x1c0 [libcfs]
      [15037.582862]  [<ffffffffc08eda74>] ? libcfs_debug_vmsg2+0x574/0xbb0 [libcfs]
      [15037.590635]  [<ffffffffc08ee107>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [15037.598037]  [<ffffffffc0a7e226>] ? lnet_finalize+0x976/0x9f0 [lnet]
      ...
      [15039.259572]  [<ffffffffc0a7d9e9>] ? lnet_finalize+0x139/0x9f0 [lnet]
      [15039.266666]  [<ffffffffc08ee107>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [15039.274052]  [<ffffffffc0a87b8a>] ? lnet_post_send_locked+0x42a/0xa40 [lnet]
      [15039.281922]  [<ffffffffc0a89e38>] ? lnet_return_tx_credits_locked+0x238/0x4a0 [lnet]
      [15039.290583]  [<ffffffffc0a7c88c>] ? lnet_msg_decommit+0xec/0x700 [lnet]
      [15039.297960]  [<ffffffffc0a7dc3f>] ? lnet_finalize+0x38f/0x9f0 [lnet]
      [15039.305055]  [<ffffffffc09bc75d>] ? kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd]
      [15039.312634]  [<ffffffffc09c7d19>] ? kiblnd_scheduler+0x8c9/0x1160 [ko2iblnd]
      [15039.320502]  [<ffffffffa142d59e>] ? __switch_to+0xce/0x5a0
      [15039.326626]  [<ffffffffa14e29b0>] ? wake_up_state+0x20/0x20
      [15039.332835]  [<ffffffffa14b46bc>] ? mod_timer+0x11c/0x260
      [15039.338859]  [<ffffffffc09c7450>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd]
      [15039.346338]  [<ffffffffa14cca01>] ? kthread+0xd1/0xe0
      [15039.351974]  [<ffffffffa14cc930>] ? insert_kthread_work+0x40/0x40
      [15039.358776]  [<ffffffffa1bbff77>] ? ret_from_fork_nospec_begin+0x21/0x21
      [15039.366253]  [<ffffffffa14cc930>] ? insert_kthread_work+0x40/0x40
      

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: