Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10103

LBUG: lib-move.c:2121:lnet_send()) ASSERTION( msg->msg_txpeer == ((void *)0) ) failed

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.10.2
    • Fix Version/s: None
    • Labels:
    • Environment:
      Soak test cluster
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      Testing https://review.whamcloud.com/29341.(Revert patch for LU-9810 to determine if preferring
      Fast Reg breaks mounting targets.)
      System mounts fine (LU-10068) - but after a few hours, routers have LBUG:

      Oct  5 16:25:31 soak-14 kernel: LNet: 2153:0:(o2iblnd_modparams.c:253:kiblnd_tunables_setup()) Invalid map_on_demand (0), expects 1 - 256. Using default of 256
      Oct  5 16:25:31 soak-14 kernel: LNet: Using FMR for registration
      Oct  5 16:25:31 soak-14 kernel: LNetError: 4:0:(o2iblnd_cb.c:2304:kiblnd_passive_connect()) Can't accept conn from 192.168.1.121@o2ib on NA (ib1:0:192.168.1.114): bad dst nid 192.168.1.114@o2ib
      Oct  5 16:25:31 soak-14 kernel: LNet: Added LNI 192.168.1.114@o2ib [8/256/0/180]
      Oct  5 16:25:31 soak-14 kernel: LNet: Added LNI 172.16.1.14@o2ib1 [128/2048/0/180]
      Oct  5 16:25:31 soak-14 sshd[2130]: Received disconnect from 10.10.1.116 port 38944:11: disconnected by user
      Oct  5 16:25:31 soak-14 sshd[2130]: Disconnected from 10.10.1.116 port 38944
      Oct  5 16:25:31 soak-14 sshd[2130]: pam_unix(sshd:session): session closed for user root
      Oct  5 16:25:31 soak-14 systemd-logind: Removed session 4.
      Oct  5 16:25:31 soak-14 systemd: Removed slice User Slice of root.
      Oct  5 16:25:31 soak-14 systemd: Stopping User Slice of root.
      Oct  5 16:37:04 soak-14 kernel: LNetError: 1979:0:(lib-move.c:2121:lnet_send()) ASSERTION( msg->msg_txpeer == ((void *)0) ) failed:
      Oct  5 16:37:04 soak-14 kernel: LNetError: 1979:0:(lib-move.c:2121:lnet_send()) LBUG
      Oct  5 16:37:04 soak-14 kernel: Pid: 1979, comm: lnet_discovery
      Oct  5 16:37:05 soak-14 kernel: #012Call Trace:
      Oct  5 16:37:05 soak-14 kernel: [<ffffffffc09ec7ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      Oct  5 16:37:05 soak-14 kernel: [<ffffffffc09ec83c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      Oct  5 16:37:05 soak-14 kernel: [<ffffffffc0a7179e>] lnet_send+0x17e/0x180 [lnet]
      Oct  5 16:37:05 soak-14 kernel: [<ffffffffc0a80ef8>] lnet_peer_discovery_complete+0x178/0x320 [lnet]
      Oct  5 16:37:05 soak-14 kernel: [<ffffffffc0a868a8>] lnet_peer_discovery+0x588/0x1030 [lnet]
      Oct  5 16:37:05 soak-14 kernel: [<ffffffff810b1910>] ? autoremove_wake_function+0x0/0x40
      Oct  5 16:37:05 soak-14 kernel: [<ffffffffc0a86320>] ? lnet_peer_discovery+0x0/0x1030 [lnet]
      Oct  5 16:37:05 soak-14 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
      Oct  5 16:37:05 soak-14 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      Oct  5 16:37:05 soak-14 kernel: [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
      Oct  5 16:37:05 soak-14 kernel: [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      Oct  5 16:37:05 soak-14 kernel:
      Oct  5 16:37:05 soak-14 kernel: Kernel panic - not syncing: LBUG
      

        Attachments

          Activity

            People

            • Assignee:
              ashehata Amir Shehata
              Reporter:
              cliffw Cliff White (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: