Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9794

kernel panic during obdsurvey test run

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      OSS0-A213:~/oss-213-tar-files/benchmark/obd # sh ./obdsurvey-script.sh
      /usr/bin/obdfilter-survey: line 242: ( << 16) | ( << 8) | : syntax error: operand expected (error token is "<< 16) | ( << 8) | ")
      /usr/bin/obdfilter-survey: line 254: [: -lt: unary operator expected
      Fri Jul 14 13:14:41 PDT 2017 Obdfilter-survey for case=disk from OSS0-A213
      ost 8 sz 2048000000K rsz 1024K obj 8 thr 8 write 3354.53 [ 0.00, 1769.93] rewrite 3272.87 [ 0.00, 2786.88] read 3064.40 ERROR
      ost 8 sz 2048000000K rsz 1024K obj 8 thr 16 write 3461.43 [ 0.00, 1821.87] rewrite 3395.52 [ 0.00, 1627.93] read 3629.66 ERROR
      ost 8 sz 2048000000K rsz 1024K obj 8 thr 32 write 3561.64 [ 0.00, 1824.88] rewrite 3465.79 [ 0.00, 1714.86] read 10853.31 ERROR
      ost 8 sz 2048000000K rsz 1024K obj 8 thr 64 write 3636.38 [ 0.00, 1966.92] rewrite
      Message from syslogd@OSS0-A213 at Jul 14 14:50:09 ...
      kernel:[347802.884464] VERIFY3(0 == remove_reference(hdr, ((void *)0), tag)) failed (0 == 3)

      Message from syslogd@OSS0-A213 at Jul 14 14:50:09 ...
      kernel:[347802.884465] PANIC at arc.c:3069:arc_buf_destroy()

      SS0-A213:~/oss-213-tar-files/benchmark/obd # zpool status
      pool: mgs
      state: ONLINE
      scan: none requested
      config:

      NAME STATE READ WRITE CKSUM
      mgs ONLINE 0 0 0
      35000c5008355dc6f ONLINE 0 0 0

      errors: No known data errors

      pool: ost1
      state: ONLINE
      scan: none requested
      config:

      NAME STATE READ WRITE CKSUM
      ost1 ONLINE 0 0 0
      raidz2-0 ONLINE 0 0 0
      35000c5008357281b ONLINE 0 0 0
      35000c50083552aeb ONLINE 0 0 0
      35000c500837f27fb ONLINE 0 0 0
      35000c5008355705f ONLINE 0 0 0
      35000c500837f234b ONLINE 0 0 0
      35000c500837ecd6f ONLINE 0 0 0
      35000c500837f2b37 ONLINE 0 0 0
      35000c500835535cf ONLINE 0 0 0
      35000c5008355cad3 ONLINE 0 0 0
      35000c500835554f7 ONLINE 0 0 0
      35000c5008355c11b ONLINE 0 0 0

      errors: No known data errors

      pool: ost2
      state: ONLINE
      status: One or more devices has experienced an unrecoverable error. An
      attempt was made to correct the error. Applications are unaffected.
      action: Determine if the device needs to be replaced, and clear the errors
      using 'zpool clear' or replace the device with 'zpool replace'.
      see: http://zfsonlinux.org/msg/ZFS-8000-9P
      scan: none requested
      config:

      NAME STATE READ WRITE CKSUM
      ost2 ONLINE 0 0 0
      raidz2-0 ONLINE 0 0 0
      35000c50083552443 ONLINE 0 0 0
      35000c50083555057 ONLINE 0 0 0
      35000c5008355cedb ONLINE 0 0 0
      35000c5008355ddcf ONLINE 0 0 0
      35000c500837f2b93 ONLINE 0 0 0
      35000c5008355bbdf ONLINE 0 0 0
      35000c500837f249f ONLINE 0 0 0
      35000c5008355c33b ONLINE 0 0 0
      35000c50083571913 ONLINE 0 0 0
      35000c5008355dd6f ONLINE 0 0 10
      35000c5008355d73f ONLINE 0 0 7

      errors: No known data errors

      pool: ost3
      state: ONLINE
      status: One or more devices has experienced an error resulting in data
      corruption. Applications may be affected.
      action: Restore the file in question if possible. Otherwise restore the
      entire pool from backup.
      see: http://zfsonlinux.org/msg/ZFS-8000-8A
      scan: none requested
      config:

      NAME STATE READ WRITE CKSUM
      ost3 ONLINE 0 0 134
      raidz2-0 ONLINE 0 0 901
      35000c50083571d7b ONLINE 0 0 4
      35000c50083570b87 ONLINE 0 0 0
      35000c5008355bba7 ONLINE 0 0 0
      35000c50083571937 ONLINE 0 0 0
      35000c5008355bd23 ONLINE 0 0 7
      35000c5008355d007 ONLINE 0 0 4
      35000c50083572bb3 ONLINE 0 0 0
      35000c500837f11ef ONLINE 0 0 0
      35000c5008355e177 ONLINE 0 0 0
      35000c500837f23f7 ONLINE 0 0 1
      35000c500835529ef ONLINE 0 0 21

      errors: 7 data errors, use '-v' for a list

      pool: ost4
      state: ONLINE
      status: One or more devices has experienced an error resulting in data
      corruption. Applications may be affected.
      action: Restore the file in question if possible. Otherwise restore the
      entire pool from backup.
      see: http://zfsonlinux.org/msg/ZFS-8000-8A
      scan: none requested
      config:

      NAME STATE READ WRITE CKSUM
      ost4 ONLINE 0 0 24
      raidz2-0 ONLINE 0 0 354
      35000c5008355e3ef ONLINE 0 0 12
      35000c5008355c19b ONLINE 0 0 6
      35000c5008355b603 ONLINE 0 0 0
      35000c5008355cd5b ONLINE 0 0 3
      35000c5008355c0f3 ONLINE 0 0 0
      35000c500837f24c7 ONLINE 0 0 1
      35000c500837f2a37 ONLINE 0 0 0
      35000c50083555217 ONLINE 0 0 0
      35000c500837f2a77 ONLINE 0 0 0
      35000c5008355e01f ONLINE 0 0 7
      35000c5008355ce6b ONLINE 0 0 0

      errors: 1 data errors, use '-v' for a list
      OSS0-A213:~/oss-213-tar-files/benchmark/obd #

      dmesg
      [262188.139753] Lustre: Echo OBD driver; http://www.lustre.org/
      [262532.913462] perf interrupt took too long (2604 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
      [263037.870130] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [263037.870205] LustreError: 28451:0:(class_obd.c:387:class_handle_ioctl()) OBD ioctl: device not setup 30
      [264430.833796] perf interrupt took too long (5036 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
      [277208.904437] perf interrupt took too long (10037 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
      [334273.467060] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [334273.467074] LustreError: 23712:0:(class_obd.c:387:class_handle_ioctl()) OBD ioctl: device not setup 30
      [334273.467077] LustreError: 23712:0:(class_obd.c:387:class_handle_ioctl()) Skipped 3 previous similar messages
      [335724.942682] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [335724.942696] LustreError: 14287:0:(class_obd.c:387:class_handle_ioctl()) OBD ioctl: device not setup 30
      [335724.942698] LustreError: 14287:0:(class_obd.c:387:class_handle_ioctl()) Skipped 3 previous similar messages
      [338549.537566] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [338549.537596] LustreError: 30054:0:(class_obd.c:387:class_handle_ioctl()) OBD ioctl: device not setup 30
      [338549.537598] LustreError: 30054:0:(class_obd.c:387:class_handle_ioctl()) Skipped 3 previous similar messages
      [338981.231985] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [338981.232155] LustreError: 13121:0:(class_obd.c:387:class_handle_ioctl()) OBD ioctl: device not setup 30
      [338981.232157] LustreError: 13121:0:(class_obd.c:387:class_handle_ioctl()) Skipped 3 previous similar messages
      [347802.884464] VERIFY3(0 == remove_reference(hdr, ((void *)0), tag)) failed (0 == 3)
      [347802.884465] PANIC at arc.c:3069:arc_buf_destroy()
      [347802.884466] Showing stack for process 9032
      [347802.884468] CPU: 4 PID: 9032 Comm: z_rd_int_1 Tainted: P OE N 4.4.21-69-default #1
      [347802.884471] Hardware name: Supermicro X10DRH/X10DRH-IT, BIOS 2.0 12/17/2015
      [347802.884478] 0000000000000000 ffffffff8130d890 ffffffffa12eff30 ffff881f6c40fd30
      [347802.884483] ffffffffa066f08f ffff881fb4ab42c0 0000000000000030 ffff881f6c40fd40
      [347802.884487] ffff881f6c40fce0 2833594649524556 6d6572203d3d2030 656665725f65766f
      [347802.884488] Call Trace:
      [347802.884497] [<ffffffff81019a59>] dump_trace+0x59/0x310
      [347802.884502] [<ffffffff81019dfa>] show_stack_log_lvl+0xea/0x170
      [347802.884505] [<ffffffff8101ab81>] show_stack+0x21/0x40
      [347802.884507] [<ffffffff8130d890>] dump_stack+0x5c/0x7c
      [347802.884518] [<ffffffffa066f08f>] spl_panic+0xbf/0xf0 [spl]
      [347802.884589] [<ffffffffa11c1b0f>] arc_buf_destroy+0xef/0x100 [zfs]
      [347802.884609] [<ffffffffa11c7ad9>] dbuf_read_done+0x79/0xd0 [zfs]
      [347802.884626] [<ffffffffa11bf00f>] arc_read_done+0x15f/0x2a0 [zfs]
      [347802.884655] [<ffffffffa126da2b>] zio_done+0x2eb/0xb80 [zfs]
      [347802.884683] [<ffffffffa12686c1>] zio_execute+0x81/0xe0 [zfs]
      [347802.884690] [<ffffffffa066d24d>] taskq_thread+0x22d/0x430 [spl]
      [347802.884695] [<ffffffff8109980d>] kthread+0xbd/0xe0
      [347802.884699] [<ffffffff815e177f>] ret_from_fork+0x3f/0x70
      [347802.885767] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70

      [347802.885768] Leftover inexact backtrace:

      [347802.885771] [<ffffffff81099750>] ? kthread_park+0x50/0x50
      [578289.860341] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [578290.864148] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [578292.864333] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [578292.864337] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) Skipped 1 previous similar message
      [578295.864567] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [578295.864571] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) Skipped 2 previous similar messages
      [578300.865021] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [578300.865025] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) Skipped 4 previous similar messages
      [578309.865780] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [578309.865784] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) Skipped 8 previous similar messages
      [578326.867232] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
      [578326.867236] LustreError: 10923:0:(echo_client.c:1039:echo_device_free()) Skipped 16 previous similar messages
      OSS0-A213:~/oss-213-tar-files/benchmark/obd #

      Attachments

        Activity

          People

            wc-triage WC Triage
            abea@supermicro.com Abe
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: