Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6334

racer test_1: ldlm_lock_destroy_internal() LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.7.0
    • OpenSFS cluster running lustre 2.7.0-RC-4 build # 29 with two MDSs with two MDTs each, three OSSs with two OSTs each and three clients.
    • 3
    • 17740

    Description

      Racer test 1 hangs and second MDS crashes. From the client test log:

      == racer test 1: racer on clients: c11,c12,c13 DURATION=900 == 01:00:24 (1425546
      024)
      racers pids: 24447 24448 24449 24450 24452 24455 24457 24463
      c12: ./file_exec.sh: line 12: 32546 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c13: ./file_exec.sh: line 12:  3014 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 22190 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c13: ./file_exec.sh: line 12: 20868 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c13: ./file_exec.sh: line 12: 27709 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12:  4393 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c13: ./file_exec.sh: line 12: 30807 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c13: ./file_exec.sh: line 12: 27894 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c13: ./file_exec.sh: line 12:  6553 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c13: ./file_exec.sh: line 12:  8077 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c13: ./file_exec.sh: line 12: 15490 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 25110 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 22466 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 13403 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 31697 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12:  4156 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 13081 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 13991 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 31225 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12:  5779 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 23907 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c12: ./file_exec.sh: line 12: 14579 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 13126 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 12056 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 12121 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 21271 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 26034 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 24871 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 27385 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12:  2404 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12:  5912 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12:  6463 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12:  8118 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 10024 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 10552 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 17440 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 21595 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12:  3143 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12:  9341 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12: 18994 Bus error               (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_exec.sh: line 12:  2735 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      c11: ./file_mknod.sh: line 11:  5348 Terminated              $MCREATE $file 2> /dev/null
      c11: ./file_mknod.sh: line 11:  5349 Terminated              $MCREATE $file 2> /dev/null
      c11: ./file_mknod.sh: line 11:  5323 Terminated              $MCREATE $file 2> /dev/null
      …
      

      The second MDS crashed with the following on the console:

      Message from syslogd@mds02 at Mar  5 01:02:21 ...
       kernel:LustreError: 24677:0:(ldlm_lock.c:364:ldlm_lock_destroy_internal()) LBUG
      

      In the dmesg before the crash:

      <4>Lustre: DEBUG MARKER: == racer test 1: racer on clients: c11,c12,c13 DURATION=900 == 01:00:24 (1425546024)
      <3>LustreError: 25276:0:(mdt_reint.c:1575:mdt_reint_migrate_internal()) scratch-MDT0001: can not migrate striped dir [0x3c0002b17:0xa:0x0]: rc = -1
      <3>LustreError: 10002:0:(mdd_dir.c:4034:mdd_migrate()) scratch-MDD0003: [0x4400013a9:0x4:0x0]0 is already opened count 1: rc = -16
      <3>LustreError: 9999:0:(mdt_reint.c:1536:mdt_reint_migrate_internal()) scratch-MDT0001: parent [0x3c0002b10:0x41e:0x0] is still on the same MDT, which should be migrated first: rc = -1
      <3>LustreError: 9993:0:(mdt_reint.c:1173:mdt_reint_link()) scratch-MDT0001: source inode [0x400002347:0x25:0x0] on remote MDT from [0x3c0002b12:0x7:0x0]
      <3>LustreError: 9993:0:(mdd_dir.c:4034:mdd_migrate()) scratch-MDD0001: [0x3c0002b17:0x6:0x0]13 is already opened count 1: rc = -16
      <3>LustreError: 25282:0:(mdt_reint.c:1536:mdt_reint_migrate_internal()) scratch-MDT0003: parent [0x4400013a0:0xa:0x0] is still on the same MDT, which should be migrated first: rc = -1
      <3>LustreError: 27158:0:(mdt_reint.c:1173:mdt_reint_link()) scratch-MDT0003: source inode [0x400002348:0x3c:0x0] on remote MDT from [0x4400013aa:0x1:0x0]
      <3>LustreError: 27158:0:(mdt_reint.c:1173:mdt_reint_link()) Skipped 1 previous similar message
      …
      

      and call trace:

      <3>LustreError: 24677:0:(ldlm_lock.c:363:ldlm_lock_destroy_internal()) ### lock 
      still on resource ns: mdt-scratch-MDT0003_UUID lock: ffff8806b331dd00/0x652e27c3
      5471591f lrc: 3/0,0 mode: PR/PR res: [0x4400013a5:0x1e94:0x0].0 bits 0x20 rrc: 1
       type: IBT flags: 0x50200000000000 nid: 192.168.2.111@o2ib remote: 0xe6ac309763e
      dc3fe expref: 30 pid: 9995 timeout: 0 lvb_type: 0
      <3>LustreError: 9997:0:(mdt_open.c:1564:mdt_cross_open()) scratch-MDT0001: [0x3c0002b14:0x3d8:0x0] doesn't exist!: rc = -14
      <3>LustreError: 9997:0:(mdt_open.c:1564:mdt_cross_open()) Skipped 6 previous similar messages
      <0>LustreError: 24677:0:(ldlm_lock.c:364:ldlm_lock_destroy_internal()) LBUG
      <4>Pid: 24677, comm: mdt00_001
      <4>
      <4>Call Trace:
      <4> [<ffffffffa0565895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4> [<ffffffffa0565e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4> [<ffffffffa089b2c1>] ldlm_lock_destroy_internal+0x251/0x2c0 [ptlrpc]
      <4> [<ffffffffa089cef5>] ldlm_lock_destroy+0x35/0x130 [ptlrpc] 
      <4> [<ffffffffa089d525>] ldlm_lock_enqueue+0x155/0x9d0 [ptlrpc]
      <4> [<ffffffffa08c964b>] ldlm_handle_enqueue0+0x51b/0x13f0 [ptlrpc]
      <4> [<ffffffffa094a2f1>] tgt_enqueue+0x61/0x230 [ptlrpc]
      <4> [<ffffffffa094af3e>] tgt_request_handle+0x8be/0x1000 [ptlrpc]
      <4> [<ffffffffa08fa9b1>] ptlrpc_main+0xe41/0x1960 [ptlrpc]
      <4> [<ffffffffa08f9b70>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
      <4> [<ffffffff8109abf6>] kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      <4>
      <0>Kernel panic - not syncing: LBUG
      <4>Pid: 24677, comm: mdt00_001 Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1
      <4>Call Trace:
      <4> [<ffffffff81528fdc>] ? panic+0xa7/0x16f
      <4> [<ffffffffa0565eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4> [<ffffffffa089b2c1>] ? ldlm_lock_destroy_internal+0x251/0x2c0 [ptlrpc]
      <4> [<ffffffffa089cef5>] ? ldlm_lock_destroy+0x35/0x130 [ptlrpc]
      <4> [<ffffffffa089d525>] ? ldlm_lock_enqueue+0x155/0x9d0 [ptlrpc]
      <4> [<ffffffffa08c964b>] ? ldlm_handle_enqueue0+0x51b/0x13f0 [ptlrpc]
      <4> [<ffffffffa094a2f1>] ? tgt_enqueue+0x61/0x230 [ptlrpc]
      <4> [<ffffffffa094af3e>] ? tgt_request_handle+0x8be/0x1000 [ptlrpc]
      <4> [<ffffffffa08fa9b1>] ? ptlrpc_main+0xe41/0x1960 [ptlrpc]
      <4> [<ffffffffa08f9b70>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
      <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
      <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      I will upload vmcore and node logs when they are available.

      Attachments

        Issue Links

          Activity

            People

              di.wang Di Wang
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: