Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11278

LNet failures on Power8

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • None
    • Lustre 2.12.0
    • None
    • Power8 running RHEL7 alt kernel.
    • 3
    • 9223372036854775807

    Description

      After the LNet health merger I'm seeing a new bug on Power8 platforms. Currently I can't  even ping the MGS from my Power8 client. I see the following back trace:

      [  172.614537] [c000001fffdc7180] [c000000000c683a0] _raw_spin_unlock_bh+0x50/0x80

      [  172.614610] [c000001fffdc71b0] [c000000000a4d0e8] peernet2id+0x78/0xd0

      [  172.614670] [c000001fffdc71f0] [c000000000acd06c] netlink_broadcast_filtered+0x31c/0x740

      [  172.614749] [c000001fffdc72b0] [d00000000cc33298] rdma_nl_multicast+0x58/0x90 [ib_core]

      [  172.614826] [c000001fffdc72f0] [d00000000cc3a270] send_mad+0x4e0/0x6a0 [ib_core]

      [  172.614903] [c000001fffdc7390] [d00000000cc3bdcc] ib_sa_path_rec_get+0x21c/0x5b0 [ib_core]

      [  172.614977] [c000001fffdc7460] [d00000000fc629e4] path_rec_start+0xb4/0x190 [ib_ipoib]

      [  172.615051] [c000001fffdc7500] [d00000000fc65e1c] ipoib_start_xmit+0x63c/0x7e0 [ib_ipoib]

      [  172.615122] [c000001fffdc75b0] [c000000000a6906c] dev_hard_start_xmit+0xec/0x2f0

      [  172.615193] [c000001fffdc7640] [c000000000ab7ef4] sch_direct_xmit+0x164/0x260

      [  172.615264] [c000001fffdc76e0] [c000000000a69908] __dev_queue_xmit+0x698/0x9e0

      [  172.615335] [c000001fffdc7790] [c000000000a7900c] neigh_connected_output+0xfc/0x170

      [  172.615406] [c000001fffdc77e0] [c000000000a80194] neigh_update+0x644/0x790

      [  172.615465] [c000001fffdc7860] [c000000000b400a8] arp_process+0x2c8/0x850

      [  172.615525] [c000001fffdc7940] [c000000000b407cc] arp_rcv+0x19c/0x230

      [  172.615584] [c000001fffdc79b0] [c000000000a57c4c] __netif_receive_skb_core+0x73c/0x1010

      [  172.615694] [c000001fffdc7a70] [c000000000a5f8a8] netif_receive_skb_internal+0x58/0x160

      [  172.615831] [c000001fffdc7ab0] [c000000000a62e38] napi_gro_receive+0x1c8/0x2f0

      [  172.615983] [c000001fffdc7af0] [d00000000c8e85fc] mlx5i_handle_rx_cqe+0x20c/0x3c0 [mlx5_core]

      [  172.616158] [c000001fffdc7ba0] [d00000000c8e7878] mlx5e_poll_rx_cq+0x278/0xb50 [mlx5_core]

      [  172.616308] [c000001fffdc7c30] [d00000000c8e8a30] mlx5e_napi_poll+0x160/0xe50 [mlx5_core]

      [  172.616444] [c000001fffdc7cf0] [c000000000a62aec] net_rx_action+0x3bc/0x540

      [  172.616559] [c000001fffdc7e00] [c000000000c690cc] __do_softirq+0x14c/0x3dc

      [  172.616675] [c000001fffdc7ef0] [c0000000001423d4] irq_exit+0x1e4/0x1f0

      [  172.616791] [c000001fffdc7f20] [c000000000017190] __do_irq+0xa0/0x200

      [  172.616905] [c000001fffdc7f90] [c00000000002ea40] call_do_irq+0x14/0x24

      [  172.617019] [c000000ffba43a40] [c000000000017390] do_IRQ+0xa0/0x120

      [  172.617135] [c000000ffba43aa0] [c000000000008bd4] hardware_interrupt_common+0x114/0x120

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              simmonsja James A Simmons
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: