Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12295

MDS Panic on DNE2 directory removing

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • Lustre 2.10.5
    • None
    • 3
    • 9223372036854775807

    Description

      MDS Panic when handling remote object fails.

      Steps to reproduce are as follows:

      1) create/delete files and directorys under striped directory
      [client]# lfs mkdir -c 2 -i 0 /mnt/lustre/dir
      [client]# lfs mkdir -c 2 -i 0 -D /mnt/lustre/dir
      [client]# while :; do rm -rf /mnt/lustre/dir/*;  ./mdtest -v -n 1000 -p 1 -i 3 -d /mnt/lustre/dir; done
      
      2) simulate ENOSPC error at remote object handling (that is, out_tx_write_exec() function)
      [MDS1]# while :; do sysctl lnet.fail_loc=0x1704 ; sleep 3; sysctl lnet.fail_loc=0; sleep 5; done
      

      {{}}

      MDS console and dump:

      {{}}

      Message from syslogd@rx200-076 at May 10 20:08:27 ...
       kernel:LustreError: 20269:0:(osd_handler.c:3229:osd_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed:
      
      Message from syslogd@rx200-076 at May 10 20:08:27 ...
       kernel:LustreError: 20269:0:(osd_handler.c:3229:osd_destroy()) LBUG
      
       [9798957.173503] Call Trace:
      [9798957.190509]  [<ffffffffb3b0d78e>] dump_stack+0x19/0x1b
      [9798957.223630]  [<ffffffffb3b07a90>] panic+0xe8/0x21f
      [9798957.254673]  [<ffffffffc0ad18cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [9798957.294020]  [<ffffffffc1133dd0>] osd_destroy+0x710/0x750 [osd_ldiskfs]
      [9798957.335950]  [<ffffffffc1132bcd>] ? osd_ref_del+0x1ad/0x6a0 [osd_ldiskfs]
      [9798957.378897]  [<ffffffffc1132141>] ? osd_attr_set+0x201/0xae0 [osd_ldiskfs]
      [9798957.422331]  [<ffffffffb3b120d2>] ? down_write+0x12/0x3d
      [9798957.456457]  [<ffffffffc0f6c851>] out_obj_destroy+0x101/0x2c0 [ptlrpc]
      [9798957.497826]  [<ffffffffc0f6cac0>] out_tx_destroy_exec+0x20/0x190 [ptlrpc]
      [9798957.540746]  [<ffffffffc0f67591>] out_tx_end+0xe1/0x5c0 [ptlrpc]
      [9798957.578950]  [<ffffffffc0f6b6d3>] out_handle+0x1453/0x1bc0 [ptlrpc]
      [9798957.618701]  [<ffffffffc0efbf72>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc]
      [9798957.661558]  [<ffffffffc0f5fc69>] ? tgt_request_preprocess.isra.26+0x299/0x790 [ptlrpc]
      [9798957.711684]  [<ffffffffc0f6138a>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [9798957.755032]  [<ffffffffc0f09e4b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [9798957.803047]  [<ffffffffc0f06478>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [9798957.845811]  [<ffffffffb34cee92>] ? default_wake_function+0x12/0x20
      [9798957.885436]  [<ffffffffb34c4abb>] ? __wake_up_common+0x5b/0x90
      [9798957.922487]  [<ffffffffc0f0d592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [9798957.962103]  [<ffffffffc0f0cb00>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
      [9798958.008436]  [<ffffffffb34bae31>] kthread+0xd1/0xe0
      [9798958.039672]  [<ffffffffb34bad60>] ? insert_kthread_work+0x40/0x40
      [9798958.078163]  [<ffffffffb3b1f5f7>] ret_from_fork_nospec_begin+0x21/0x21
      [9798958.119234]  [<ffffffffb34bad60>] ? insert_kthread_work+0x40/0x40
      
      

      {{}}

      Could you please look into this one?

      Attachments

        Activity

          [LU-12295] MDS Panic on DNE2 directory removing
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39734/
          Subject: LU-12295 mdd: don't LBUG() if dir nlink is wrong
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: afa39b3cceabccd19e7c412ff90667e95cbfe3e8

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39734/ Subject: LU-12295 mdd: don't LBUG() if dir nlink is wrong Project: fs/lustre-release Branch: master Current Patch Set: Commit: afa39b3cceabccd19e7c412ff90667e95cbfe3e8

          Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39734
          Subject: LU-12295 osd-ldiskfs: don't LBUG() if dir nlink is wrong
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 1f563d379c6415b93fbc50d5613e532ebd6a9d34

          gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39734 Subject: LU-12295 osd-ldiskfs: don't LBUG() if dir nlink is wrong Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1f563d379c6415b93fbc50d5613e532ebd6a9d34
          ofaaland Olaf Faaland added a comment -

          I don't recall seeing this specific bug at LLNL, but we've seen a variety of failures when MDTs run out of space.  It would be nice to work them so that users can recover on their own by deleting files/directories, and so that readdir/stat/open/close succeed while the housecleaning is being done.

          ofaaland Olaf Faaland added a comment - I don't recall seeing this specific bug at LLNL, but we've seen a variety of failures when MDTs run out of space.  It would be nice to work them so that users can recover on their own by deleting files/directories, and so that readdir/stat/open/close succeed while the housecleaning is being done.
          green Oleg Drokin added a comment -

          hm, it looks like I hit a very similar failure in master-next two days ago and yesterday:

          [ 5930.469393] LustreError: 9370:0:(osd_handler.c:3573:osd_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: 
          [ 5930.502768] LustreError: 9370:0:(osd_handler.c:3573:osd_destroy()) LBUG
          [ 5930.505164] Pid: 9370, comm: mdt_rdpg07_003 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018
          [ 5930.509233] Call Trace:
          [ 5930.511319]  [<ffffffffa02b27dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
          [ 5930.514891]  [<ffffffffa02b288c>] lbug_with_loc+0x4c/0xa0 [libcfs]
          [ 5930.522770]  [<ffffffffa0c4eeb3>] osd_destroy+0x713/0x750 [osd_ldiskfs]
          [ 5930.527762]  [<ffffffffa0e8f83b>] lod_sub_destroy+0x1bb/0x450 [lod]
          [ 5930.531206]  [<ffffffffa0e777a0>] lod_destroy+0x140/0x820 [lod]
          [ 5930.546681]  [<ffffffffa0d39e26>] mdd_close+0x846/0xf30 [mdd]
          [ 5930.549991]  [<ffffffffa0db7aab>] mdt_mfd_close+0x3fb/0x850 [mdt]
          [ 5930.555677]  [<ffffffffa0dbd401>] mdt_close_internal+0xb1/0x220 [mdt]
          [ 5930.560137]  [<ffffffffa0dbd790>] mdt_close+0x220/0x740 [mdt]
          [ 5930.564650]  [<ffffffffa072eb05>] tgt_request_handle+0x915/0x15c0 [ptlrpc]
          [ 5930.567750]  [<ffffffffa06d12b9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc]
          [ 5930.584402]  [<ffffffffa06d52bc>] ptlrpc_main+0xb6c/0x20b0 [ptlrpc]
          [ 5930.585599]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
          [ 5930.587608]  [<ffffffff817c4c5d>] ret_from_fork_nospec_begin+0x7/0x21
          [ 5930.588809]  [<ffffffffffffffff>] 0xffffffffffffffff
          [ 5930.589680] Kernel panic - not syncing: LBUG
          

          and

          [13720.662563] LustreError: 14705:0:(osd_handler.c:3573:osd_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: 
          [13720.683253] LustreError: 14705:0:(osd_handler.c:3573:osd_destroy()) LBUG
          [13720.684186] Pid: 14705, comm: mdt04_003 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018
          [13720.685838] Call Trace:
          [13720.686625]  [<ffffffffa02cb7dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
          [13720.688731]  [<ffffffffa02cb88c>] lbug_with_loc+0x4c/0xa0 [libcfs]
          [13720.690977]  [<ffffffffa0c2aeb3>] osd_destroy+0x713/0x750 [osd_ldiskfs]
          [13720.701737]  [<ffffffffa0e6b83b>] lod_sub_destroy+0x1bb/0x450 [lod]
          [13720.707438]  [<ffffffffa0e537a0>] lod_destroy+0x140/0x820 [lod]
          [13720.712593]  [<ffffffffa0d0aa63>] mdd_finish_unlink+0x123/0x410 [mdd]
          [13720.714811]  [<ffffffffa0d0cce4>] mdd_unlink+0x9c4/0xad0 [mdd]
          [13720.719251]  [<ffffffffa0dc177f>] mdo_unlink+0x43/0x45 [mdt]
          [13720.721165]  [<ffffffffa0d83c15>] mdt_reint_unlink+0xb25/0x13e0 [mdt]
          [13720.728197]  [<ffffffffa0d8a7c0>] mdt_reint_rec+0x80/0x210 [mdt]
          [13720.734164]  [<ffffffffa0d66a40>] mdt_reint_internal+0x780/0xb50 [mdt]
          [13720.736305]  [<ffffffffa0d71aa7>] mdt_reint+0x67/0x140 [mdt]
          [13720.744742]  [<ffffffffa0727b05>] tgt_request_handle+0x915/0x15c0 [ptlrpc]
          [13720.758897]  [<ffffffffa06ca2b9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc]
          [13720.798963]  [<ffffffffa06ce2bc>] ptlrpc_main+0xb6c/0x20b0 [ptlrpc]
          [13720.801378]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
          [13720.822348]  [<ffffffff817c4c5d>] ret_from_fork_nospec_begin+0x7/0x21
          [13720.824379]  [<ffffffffffffffff>] 0xffffffffffffffff
          [13720.826530] Kernel panic - not syncing: LBUG
          

          I have cashdumps too.

          green Oleg Drokin added a comment - hm, it looks like I hit a very similar failure in master-next two days ago and yesterday: [ 5930.469393] LustreError: 9370:0:(osd_handler.c:3573:osd_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: [ 5930.502768] LustreError: 9370:0:(osd_handler.c:3573:osd_destroy()) LBUG [ 5930.505164] Pid: 9370, comm: mdt_rdpg07_003 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018 [ 5930.509233] Call Trace: [ 5930.511319] [<ffffffffa02b27dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 5930.514891] [<ffffffffa02b288c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 5930.522770] [<ffffffffa0c4eeb3>] osd_destroy+0x713/0x750 [osd_ldiskfs] [ 5930.527762] [<ffffffffa0e8f83b>] lod_sub_destroy+0x1bb/0x450 [lod] [ 5930.531206] [<ffffffffa0e777a0>] lod_destroy+0x140/0x820 [lod] [ 5930.546681] [<ffffffffa0d39e26>] mdd_close+0x846/0xf30 [mdd] [ 5930.549991] [<ffffffffa0db7aab>] mdt_mfd_close+0x3fb/0x850 [mdt] [ 5930.555677] [<ffffffffa0dbd401>] mdt_close_internal+0xb1/0x220 [mdt] [ 5930.560137] [<ffffffffa0dbd790>] mdt_close+0x220/0x740 [mdt] [ 5930.564650] [<ffffffffa072eb05>] tgt_request_handle+0x915/0x15c0 [ptlrpc] [ 5930.567750] [<ffffffffa06d12b9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc] [ 5930.584402] [<ffffffffa06d52bc>] ptlrpc_main+0xb6c/0x20b0 [ptlrpc] [ 5930.585599] [<ffffffff810b4ed4>] kthread+0xe4/0xf0 [ 5930.587608] [<ffffffff817c4c5d>] ret_from_fork_nospec_begin+0x7/0x21 [ 5930.588809] [<ffffffffffffffff>] 0xffffffffffffffff [ 5930.589680] Kernel panic - not syncing: LBUG and [13720.662563] LustreError: 14705:0:(osd_handler.c:3573:osd_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: [13720.683253] LustreError: 14705:0:(osd_handler.c:3573:osd_destroy()) LBUG [13720.684186] Pid: 14705, comm: mdt04_003 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018 [13720.685838] Call Trace: [13720.686625] [<ffffffffa02cb7dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [13720.688731] [<ffffffffa02cb88c>] lbug_with_loc+0x4c/0xa0 [libcfs] [13720.690977] [<ffffffffa0c2aeb3>] osd_destroy+0x713/0x750 [osd_ldiskfs] [13720.701737] [<ffffffffa0e6b83b>] lod_sub_destroy+0x1bb/0x450 [lod] [13720.707438] [<ffffffffa0e537a0>] lod_destroy+0x140/0x820 [lod] [13720.712593] [<ffffffffa0d0aa63>] mdd_finish_unlink+0x123/0x410 [mdd] [13720.714811] [<ffffffffa0d0cce4>] mdd_unlink+0x9c4/0xad0 [mdd] [13720.719251] [<ffffffffa0dc177f>] mdo_unlink+0x43/0x45 [mdt] [13720.721165] [<ffffffffa0d83c15>] mdt_reint_unlink+0xb25/0x13e0 [mdt] [13720.728197] [<ffffffffa0d8a7c0>] mdt_reint_rec+0x80/0x210 [mdt] [13720.734164] [<ffffffffa0d66a40>] mdt_reint_internal+0x780/0xb50 [mdt] [13720.736305] [<ffffffffa0d71aa7>] mdt_reint+0x67/0x140 [mdt] [13720.744742] [<ffffffffa0727b05>] tgt_request_handle+0x915/0x15c0 [ptlrpc] [13720.758897] [<ffffffffa06ca2b9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc] [13720.798963] [<ffffffffa06ce2bc>] ptlrpc_main+0xb6c/0x20b0 [ptlrpc] [13720.801378] [<ffffffff810b4ed4>] kthread+0xe4/0xf0 [13720.822348] [<ffffffff817c4c5d>] ret_from_fork_nospec_begin+0x7/0x21 [13720.824379] [<ffffffffffffffff>] 0xffffffffffffffff [13720.826530] Kernel panic - not syncing: LBUG I have cashdumps too.

          People

            laisiyao Lai Siyao
            takamura Tatsushi Takamura
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: