Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15125

kiblnd_connd kernel BUG at lib/list_debug.c:53!

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      PID: 1626765  TASK: ffff89a7a1ccc740  CPU: 1   COMMAND: "kiblnd_connd"
       #0 [ffffa395c6a3bbe0] machine_kexec at ffffffffb346156e
       #1 [ffffa395c6a3bc38] __crash_kexec at ffffffffb358f94d
       #2 [ffffa395c6a3bd00] crash_kexec at ffffffffb359083d
       #3 [ffffa395c6a3bd18] oops_end at ffffffffb342434d
       #4 [ffffa395c6a3bd38] do_trap at ffffffffb3420b13
       #5 [ffffa395c6a3bd80] do_invalid_op at ffffffffb3421476
       #6 [ffffa395c6a3bda0] invalid_op at ffffffffb3e00d64
          [exception RIP: __list_del_entry_valid.cold.1+52]
          RIP: ffffffffb38913c8  RSP: ffffa395c6a3be58  RFLAGS: 00010046
          RAX: 0000000000000054  RBX: 0000000000000000  RCX: 0000000000000007
          RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffff89a7bec567c0
          RBP: 0000000000000202   R8: 0000000000000000   R9: 0000000000aaaaaa
          R10: 0000000000000000  R11: 0000000000000001  R12: 000000006165a0f9
          R13: 0000000000000001  R14: dead000000000200  R15: ffff89a7a1e82418
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ffffa395c6a3be50] __list_del_entry_valid.cold.1 at ffffffffb38913c8
       #8 [ffffa395c6a3be58] kiblnd_connd at ffffffffc170305f [ko2iblnd]
       #9 [ffffa395c6a3bf10] kthread at ffffffffb35043a6
      #10 [ffffa395c6a3bf50] ret_from_fork at ffffffffb3e0023f
      
      [root@snx11922n000 ~]# pdsh -g lustre 'lctl get_param version' | dshbak -c
      ----------------
      snx11922n[002-005]
      ----------------
      version=2.14.55_10_g620dd1b
      [root@snx11922n000 ~]#
      
      620dd1bf6d (es/dev/wc/testing-m5) LU-14437 gnilnd: Use NSEC_PER_USEC to convert nsec to usec
      6bcc0798a4 LU-14402 osd-ldiskfs: disable pagecache bypass feature
      9e6d2fa865 LU-14402 osd-ldiskfs: Page cache pages dirtied in writeback
      f248dcbdda LUS-9546 build: Cray obs build support for master
      b8e2e1c76d LU-14391 lnet: add config file support
      6be8ea0e7b LU-14392 gnilnd: re-enable large i/o buffers
      2de9d6bb32 LU-14409 ldiskfs: Add support for SUSE 5.3.18-24.46.1
      966362f836 LU-15074 build: Use strlcpy if strscpy is not available
      f8a7abd63f LU-13906 build: consistent use of %{name}
      1a409a3e6a LU-14711 osc: Do not attempt sending empty pages
      09e2e43241 (tag: v2_14_55, tag: 2.14.55) New tag 2.14.55
      

      Attachments

        Activity

          [LU-15125] kiblnd_connd kernel BUG at lib/list_debug.c:53!
          pjones Peter Jones added a comment -

          Landed for 2.15

          pjones Peter Jones added a comment - Landed for 2.15

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45316/
          Subject: LU-15125 o2iblnd: wrong list used for kib_connd_waits
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: f6f1e395cd26369d7441a70eb5d598ea64f1589a

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45316/ Subject: LU-15125 o2iblnd: wrong list used for kib_connd_waits Project: fs/lustre-release Branch: master Current Patch Set: Commit: f6f1e395cd26369d7441a70eb5d598ea64f1589a

          "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45316
          Subject: LU-15125 o2iblnd: wrong list used for kib_connd_waits
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: c3bbdf8477cd89cdaed09e3b8b7d5db7266adc17

          gerrit Gerrit Updater added a comment - "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45316 Subject: LU-15125 o2iblnd: wrong list used for kib_connd_waits Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c3bbdf8477cd89cdaed09e3b8b7d5db7266adc17
          hornc Chris Horn added a comment - - edited

          Ah, the issue was actually with https://review.whamcloud.com/#/c/43419/

          @@ -3571,9 +3571,10 @@ kiblnd_connd (void *arg)
           			spin_lock_irqsave(lock, flags);
           		}
          
          -		if (!list_empty(&kiblnd_data.kib_connd_waits)) {
          -			conn = list_entry(kiblnd_data.kib_connd_waits.next,
          -					  struct kib_conn, ibc_list);
          +		conn = list_first_entry_or_null(&kiblnd_data.kib_connd_waits,
          +						struct kib_conn,
          +						ibc_sched_list);
          +		if (conn) {
           			list_del(&conn->ibc_list);
           			spin_unlock_irqrestore(lock, flags);
          
          

          Wrong list_head field used in the list_first_entry_or_null() macro

          hornc Chris Horn added a comment - - edited Ah, the issue was actually with https://review.whamcloud.com/#/c/43419/ @@ -3571,9 +3571,10 @@ kiblnd_connd (void *arg) spin_lock_irqsave(lock, flags); } - if (!list_empty(&kiblnd_data.kib_connd_waits)) { - conn = list_entry(kiblnd_data.kib_connd_waits.next, - struct kib_conn, ibc_list); + conn = list_first_entry_or_null(&kiblnd_data.kib_connd_waits, + struct kib_conn, + ibc_sched_list); + if (conn) { list_del(&conn->ibc_list); spin_unlock_irqrestore(lock, flags); Wrong list_head field used in the list_first_entry_or_null() macro
          hornc Chris Horn added a comment -

          I suspect https://review.whamcloud.com/#/c/38845/ is responsible since there is no other recent change to kiblnd_connd().

          hornc Chris Horn added a comment - I suspect https://review.whamcloud.com/#/c/38845/ is responsible since there is no other recent change to kiblnd_connd().

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: