Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9504

LBUG ptlrpc_handle_rs()) ASSERTION( lock != ((void *)0) ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.0
    • Lustre 2.10.0
    • Soak stress cluster
    • 3
    • 9223372036854775807

    Description

      Loaded latest master build, started soak. Only fault induced was router drop.
      Soak-9 is the second MDS (MDT0001)
      soak-9 has a hard crash in normal operation:
      Soak is started, mount completes

      [  893.779340] LustreError: 11-0: soaked-OST000c-osc-MDT0001: operation ost_connect to node 192.168.1.102@o2ib10 failed: rc = -16^M
      [  893.801270] LustreError: Skipped 139 previous similar messages^M
      [  894.095801] Lustre: soaked-MDT0003-osp-MDT0001: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)^M
      [  894.110919] Lustre: Skipped 7 previous similar messages^M
      [  894.253639] Lustre: soaked-MDT0001: recovery is timed out, evict stale exports^M
      [  894.265333] Lustre: soaked-MDT0001: disconnecting 28 stale clients^M
      [  894.278063] Lustre: soaked-MDT0001: Recovery over after 5:01, of 31 clients 3 recovered and 28 were evicted.^M
      [ 1465.548946] Lustre: soaked-MDT0001: Client 8e539072-a775-2171-7825-433ade3d0c39 (at 192.168.1.132@o2ib100) reconnecting^M
      [ 1465.563698] Lustre: soaked-MDT0001: Connection restored to 8e539072-a775-2171-7825-433ade3d0c39 (at 192.168.1.132@o2ib100)^M
      [ 1465.579478] Lustre: Skipped 30 previous similar messages^M
      [ 1757.516066] Lustre: soaked-MDT0001: Client 6fb4512c-e89d-8233-88ad-b696d11c9821 (at 192.168.1.138@o2ib100) reconnecting^M
      [ 1757.531619] Lustre: soaked-MDT0001: Connection restored to 6fb4512c-e89d-8233-88ad-b696d11c9821 (at 192.168.1.138@o2ib100)^M
      [ 1904.507160] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
      [ 1942.000624] Lustre: soaked-MDT0001: Client 0a5dee9d-f606-8dd1-e9d4-d42a75c735e1 (at 192.168.1.129@o2ib100) reconnecting^M
      [ 2436.463475] Lustre: soaked-MDT0001: Client 2d2ea61b-cf5f-add6-6e06-30458f85a726 (at 192.168.1.139@o2ib100) reconnecting^M
      [ 2436.478139] Lustre: soaked-MDT0001: Connection restored to 2d2ea61b-cf5f-add6-6e06-30458f85a726 (at 192.168.1.139@o2ib100)^M
      [ 2436.493810] Lustre: Skipped 2 previous similar messages^M
      [ 2733.438247] Lustre: soaked-MDT0001: Client cb715667-fb6c-b895-e632-274b232c5bc9 (at 192.168.1.119@o2ib100) reconnecting^M
      [ 3117.401693] Lustre: soaked-MDT0001: Client 0a5dee9d-f606-8dd1-e9d4-d42a75c735e1 (at 192.168.1.129@o2ib100) reconnecting^M
      [ 3117.416360] Lustre: soaked-MDT0001: Connection restored to 0a5dee9d-f606-8dd1-e9d4-d42a75c735e1 (at 192.168.1.129@o2ib100)^M
      [ 3117.430856] Lustre: Skipped 1 previous similar message^M
      [ 3359.388740] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
      [ 3497.321905] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
      [ 3951.338201] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
      [ 3951.353148] Lustre: soaked-MDT0001: Connection restored to 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100)^M
      [ 3951.368647] Lustre: Skipped 2 previous similar messages^M
      [ 4415.324113] Lustre: soaked-MDT0001: Client 6ae0a03c-6567-9e19-d483-8743882e83e1 (at 192.168.1.116@o2ib100) reconnecting^M
      [ 4501.233379] Lustre: 3876:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1494873743/real 1494873743]  req@ffff8803b3329b00 x1567484370368192/t0(0) o104->soaked-MDT0001@192.168.1.139@o2ib100:15/16 lens 296/224 e 0 to 1 dl 1494873750 ref 1 fl Rpc:X/0/ffffffff rc 0/-1^M
      [ 4501.273401] Lustre: 3876:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 7 previous similar messages^M
      [ 4632.525998] LustreError: 3049:0:(service.c:2229:ptlrpc_handle_rs()) ASSERTION( lock != ((void *)0) ) failed: ^M
      [ 4632.539820] LustreError: 3049:0:(service.c:2229:ptlrpc_handle_rs()) LBUG^M
      [ 4632.550400] Pid: 3049, comm: ptlrpc_hr01_004^M
      [ 4632.557537] ^M
      [ 4632.557537] Call Trace:^M
      [ 4632.566532]  [<ffffffffa080e7ee>] libcfs_call_trace+0x4e/0x60 [libcfs]^M
      [ 4632.575923]  [<ffffffffa080e87c>] lbug_with_loc+0x4c/0xb0 [libcfs]^M
      [ 4632.585006]  [<ffffffffa0ba2bed>] ptlrpc_hr_main+0x83d/0x8f0 [ptlrpc]^M
      [ 4632.594107]  [<ffffffff810c8345>] ? sched_clock_cpu+0x85/0xc0^M
      [ 4632.602416]  [<ffffffff810c54c0>] ? default_wake_function+0x0/0x20^M
      [ 4632.611086]  [<ffffffffa0ba23b0>] ? ptlrpc_hr_main+0x0/0x8f0 [ptlrpc]^M
      [ 4632.620009]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0^M
      [ 4632.627027]  [<ffffffff810b0980>] ? kthread+0x0/0xe0^M
      [ 4632.634106]  [<ffffffff81697318>] ret_from_fork+0x58/0x90^M
      [ 4632.641571]  [<ffffffff810b0980>] ? kthread+0x0/0xe0^M
      [ 4632.648510] ^M
      [ 4632.651470] Kernel panic - not syncing: LBUG^M
      [ 4632.657497] CPU: 11 PID: 3049 Comm: ptlrpc_hr01_004 Tainted: P           OE  ------------   3.10.0-514.16.1.el7_lustre.x86_64 #1^M
      [ 4632.673033] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013^M
      [ 4632.686895]  ffffffffa082cdac 00000000329e47aa ffff8808212b7d30 ffffffff81686d1f^M
      [ 4632.696571]  ffff8808212b7db0 ffffffff8168014a ffffffff00000008 ffff8808212b7dc0^M
      [ 4632.706233]  ffff8808212b7d60 00000000329e47aa 00000000329e47aa ffff88082d8cf838^M
      [ 4632.715875] Call Trace:^M
      [ 4632.715875] Call Trace:^M
      [ 4632.719901]  [<ffffffff81686d1f>] dump_stack+0x19/0x1b^M
      [ 4632.726949]  [<ffffffff8168014a>] panic+0xe3/0x1f2^M
      [ 4632.733611]  [<ffffffffa080e894>] lbug_with_loc+0x64/0xb0 [libcfs]^M
      [ 4632.741856]  [<ffffffffa0ba2bed>] ptlrpc_hr_main+0x83d/0x8f0 [ptlrpc]^M
      [ 4632.750326]  [<ffffffff810c8345>] ? sched_clock_cpu+0x85/0xc0^M
      [ 4632.757970]  [<ffffffff810c54c0>] ? wake_up_state+0x20/0x20^M
      [ 4632.765508]  [<ffffffffa0ba23b0>] ? ptlrpc_svcpt_stop_threads+0x590/0x590 [ptlrpc]^M
      [ 4632.775290]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0^M
      [ 4632.781973]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140^M
      [ 4632.790485]  [<ffffffff81697318>] ret_from_fork+0x58/0x90^M
      [ 4632.797662]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140^M
      

      System then crashed, crash dump is available on the node.
      vmcore-dmesg attached.

      Attachments

        1. lu-9504-6-07.soak-11.log.txt
          55 kB
          Cliff White
        2. lustre-log.1496836992.5097.txt.gz
          941 kB
          Cliff White
        3. lustre-log.1496843934.5221.txt.gz
          11.17 MB
          Cliff White
        4. lustre-log.1496861499.5292.txt.gz
          12.61 MB
          Cliff White
        5. lustre-log.1496861511.5224.txt.gz
          0.4 kB
          Cliff White
        6. lustre-log.1496861518.5296.txt.gz
          354 kB
          Cliff White
        7. lustre-log.1497161984.5008.preLBUG.txt.gz
          4.81 MB
          Cliff White
        8. lustre-log.1497165943.4581.txt.gz
          4.44 MB
          Cliff White
        9. lustre-log.1497166143.4807.postLBUG.txt.gz
          3.23 MB
          Cliff White
        10. vmcore-dmesg.txt
          136 kB
          Cliff White

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              cliffw Cliff White (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: