Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9465

Kernel NULL pointer: osd_object.c:427:osd_object_init()) soaked-OST0005: lookup [0x440000401:0x195026b:0x0]/0x920ea8 failed: rc = 17

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.10.0
    • Soak stress cluster
    • 3
    • 9,394
    • 9223372036854775807

    Description

      soak-7 survived several failovers, last failover at 2017-05-07 07:41:31
      The soak cluster failed over soak-10 at 2017-05-07 18:23:22
      Immediately after finishing recovery, soak-7 crashed.
      The OSS is reconnected to the recently failed-over MDT on soak-10/11

      May  7 18:22:39 soak-7 kernel: LustreError: 11-0: soaked-MDT0003-lwp-OST0011: operation obd_ping to node 192.168.1.110@o2ib10 failed: rc = -107
      May  7 18:22:39 soak-7 kernel: Lustre: soaked-MDT0003-lwp-OST0005: Connection to soaked-MDT0003 (at 192.168.1.110@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
      May  7 18:22:39 soak-7 kernel: Lustre: Skipped 2 previous similar messages
      May  7 18:22:39 soak-7 kernel: LustreError: Skipped 3 previous similar messages
      May  7 18:23:21 soak-7 kernel: LNet: 228:0:(o2iblnd_cb.c:2421:kiblnd_passive_connect()) Conn stale 192.168.1.111@o2ib10 version 12/12 incarnation 1494181401470091/1494181401470091
      May  7 18:23:21 soak-7 kernel: Lustre: soaked-OST0005: Connection restored to  (at 192.168.1.111@o2ib10)
      May  7 18:23:21 soak-7 kernel: Lustre: Skipped 2 previous similar messages
      May  7 18:23:22 soak-7 kernel: LNet: 7422:0:(o2iblnd_cb.c:1377:kiblnd_reconnect_peer()) Abort reconnection of 192.168.1.111@o2ib10: connected
      May  7 18:23:29 soak-7 kernel: LustreError: 167-0: soaked-MDT0003-lwp-OST0011: This client was evicted by soaked-MDT0003; in progress operations using this service will fail.
      May  7 18:23:29 soak-7 kernel: LustreError: Skipped 1 previous similar message
      May  7 18:23:43 soak-7 kernel: Lustre: soaked-OST0005: deleting orphan objects from 0x440000401:26279429 to 0x440000401:26291121
      May  7 18:23:43 soak-7 kernel: Lustre: soaked-OST0011: deleting orphan objects from 0x780000401:26209136 to 0x780000401:26218273
      May  7 18:23:43 soak-7 kernel: Lustre: soaked-OST000b: deleting orphan objects from 0x5c0000400:26329949 to 0x5c0000400:26339745
      May  7 18:23:43 soak-7 kernel: Lustre: soaked-OST0017: deleting orphan objects from 0x8c0000401:26229632 to 0x8c0000401:26238017
      May  7 18:23:54 soak-7 kernel: LustreError: 167-0: soaked-MDT0003-lwp-OST000b: This client was evicted by soaked-MDT0003; in progress operations using this service will fail.
      May  7 18:23:54 soak-7 kernel: Lustre: soaked-MDT0003-lwp-OST0017: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)
      

      Then, a hard crash

      [38854.133273] Lustre: soaked-MDT0003-lwp-OST0017: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)
      [38854.147850] Lustre: Skipped 3 previous similar messages
      [55622.538966] perf: interrupt took too long (5010 > 5007), lowering kernel.perf_event_max_sample_rate to 39000
      [60371.183844] LustreError: 16407:0:(osd_object.c:427:osd_object_init()) soaked-OST0005: lookup [0x440000401:0x195026b:0x0]/0x920ea8 failed: rc = 17
      [60371.201275] BUG: unable to handle kernel NULL pointer dereference at 0000000000000011
      [60371.211442] IP: [<ffffffffa0a0d328>] lu_object_find_try+0x178/0x2b0 [obdclass]
      [60371.221570] PGD 0
      [60371.225825] Oops: 0000 [#1] SMP
      

      There is a crash dump available on the node, vmcore-dmesg attached.

      Attachments

        1. soak-8.console.log
          2.23 MB
        2. soak-8.vmcore-dmesg.txt
          1010 kB
        3. vmcore-dmesg.txt
          226 kB

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: