Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5151

Oops in lnet_return_rx_credits_locked

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.6.0
    • Fix Version/s: Lustre 2.6.0
    • Labels:
    • Environment:
      Cray router to connect infiniband to gemini interconnect.
    • Severity:
      3
    • Epic:
    • Rank (Obsolete):
      14214

      Description

      While testing 2.6 in my Cray test environment I keep losing my routers which NMI produces the following back traces:

      2014-06-05T16:45:09.828951-04:00 c0-0c0s2n3 Pid: 4554, comm: kiblnd_sd_01_01 Tainted: P N 3.0.82-0.7.9_1.0502.7780-cray_gem_s #1
      2014-06-05T16:45:09.828965-04:00 c0-0c0s2n3 RIP: 0010:[<ffffffffa0341831>] [<ffffffffa0341831>] lnet_return_rx_credits_locked+0x171/0x310 [lnet]
      2014-06-05T16:45:09.828971-04:00 c0-0c0s2n3 RSP: 0018:ffff8803ea379bb0 EFLAGS: 00010286
      2014-06-05T16:45:09.858936-04:00 c0-0c0s2n3 RAX: dead000000200200 RBX: ffff880317d5a800 RCX: 00000000ffffffff
      2014-06-05T16:45:09.858949-04:00 c0-0c0s2n3 RDX: dead000000100100 RSI: 0000000000000001 RDI: ffff880317d5a800
      2014-06-05T16:45:09.858960-04:00 c0-0c0s2n3 RBP: ffff8803ea379be0 R08: ffff8803e821c860 R09: ffff880317d5a850
      2014-06-05T16:45:09.858970-04:00 c0-0c0s2n3 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880317d5a800
      2014-06-05T16:45:09.858977-04:00 c0-0c0s2n3 R13: ffff8803daf91880 R14: 00000000fffffff5 R15: 0000000000000001
      2014-06-05T16:45:09.888794-04:00 c0-0c0s2n3 FS: 00007f28c44457a0(0000) GS:ffff880407cc0000(0000) knlGS:0000000000000000
      2014-06-05T16:45:09.888807-04:00 c0-0c0s2n3 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      2014-06-05T16:45:09.888818-04:00 c0-0c0s2n3 CR2: 000000000063c800 CR3: 000000031f33f000 CR4: 00000000000007e0
      2014-06-05T16:45:09.888824-04:00 c0-0c0s2n3 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      2014-06-05T16:45:09.888834-04:00 c0-0c0s2n3 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      2014-06-05T16:45:09.918910-04:00 c0-0c0s2n3 Process kiblnd_sd_01_01 (pid: 4554, threadinfo ffff8803ea378000, task ffff8803e89480c0)
      2014-06-05T16:45:09.918924-04:00 c0-0c0s2n3 Stack:
      2014-06-05T16:45:09.918940-04:00 c0-0c0s2n3 ffff8803ea379bd0 ffff880317d5a800 0000000000000001 0000000000000001
      2014-06-05T16:45:09.918951-04:00 c0-0c0s2n3 00000000fffffff5 0000000000000001 ffff8803ea379c10 ffffffffa0338b28
      2014-06-05T16:45:09.918956-04:00 c0-0c0s2n3 ffff880317d5a918 dead000000200200 ffff880317d5a800 ffff8803e9b18d80
      2014-06-05T16:45:09.918961-04:00 c0-0c0s2n3 Call Trace:
      2014-06-05T16:45:09.918966-04:00 c0-0c0s2n3 [<ffffffffa0338b28>] lnet_msg_decommit+0xf8/0x6b0 [lnet]
      2014-06-05T16:45:09.948770-04:00 c0-0c0s2n3 [<ffffffffa0339b47>] lnet_finalize+0x297/0x7d0 [lnet]
      2014-06-05T16:45:09.948783-04:00 c0-0c0s2n3 [<ffffffffa03465ed>] lnet_parse+0xc2d/0x1b80 [lnet]
      2014-06-05T16:45:09.948794-04:00 c0-0c0s2n3 [<ffffffffa03db68a>] kiblnd_handle_rx+0x30a/0x690 [ko2iblnd]
      2014-06-05T16:45:09.948805-04:00 c0-0c0s2n3 [<ffffffffa03e03af>] kiblnd_rx_complete+0x34f/0x420 [ko2iblnd]
      2014-06-05T16:45:09.948815-04:00 c0-0c0s2n3 [<ffffffffa03e0d25>] kiblnd_scheduler+0x7c5/0x970 [ko2iblnd]
      2014-06-05T16:45:09.948821-04:00 c0-0c0s2n3 [<ffffffff810672fe>] kthread+0x9e/0xb0
      2014-06-05T16:45:09.978765-04:00 c0-0c0s2n3 [<ffffffff81481874>] kernel_thread_helper+0x4/0x10
      2014-06-05T16:45:09.978785-04:00 c0-0c0s2n3 Code: c2 0f 85 2b 01 00 00 8d 41 01 85 c0 41 89 45 48 0f 8f dc fe ff ff 49 8b 7d 20 be 01 00 00 00 48 83
      ef 10 48 8b 47 18 48 8b 57 10
      2014-06-05T16:45:10.004304-04:00 c0-0c0s2n3 89 42 08 48 89 10 48 b8 00 01 10 00 00 00 ad de 48 89 47 10
      2014-06-05T16:45:10.004326-04:00 c0-0c0s2n3 RIP [<ffffffffa0341831>] lnet_return_rx_credits_locked+0x171/0x310 [lnet]
      2014-06-05T16:45:10.004333-04:00 c0-0c0s2n3 RSP <ffff8803ea379bb0>
      2014-06-05T16:45:10.029888-04:00 c0-0c0s2n3 --[ end trace 17126666cf42dece ]--

        Attachments

          Activity

            People

            • Assignee:
              liang Liang Zhen (Inactive)
              Reporter:
              simmonsja James A Simmons
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: