Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11668

mdd_parent_fid()) ASSERTION( (((mdd_object_type(obj)) & 00170000) == 0040000) ) failed

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0, Lustre 2.14.0
    • Lustre 2.12.0
    • None
    • 9223372036854775807

    Description

      I hit this assertion in current master-next testing but I don't see anything obvious included that would lead to it so perhaps it's some rare race that just happened to happen?

       [ 6095.328424] Lustre: DEBUG MARKER: == racer test 1: racer on clients: centos-30.localnet DURATION=2700 ================================== 02:51:44 (1542181904)
       [ 6097.825252] Lustre: lfs: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200000403:0x5:0x0], use llapi_layout_get_by_path()
       [ 6101.235171] Lustre: DEBUG MARKER: racer test_1: @@@@@@ FAIL: generate lss conf (mds1)
       [ 6106.472165] LustreError: 4856:0:(mdt_lvb.c:430:mdt_lvbo_fill()) lustre-MDT0000: small buffer size 448 for EA 496 (max_mdsize 496): rc = -34
       [ 6108.575804] LustreError: 26511:0:(mdt_lvb.c:430:mdt_lvbo_fill()) lustre-MDT0001: small buffer size 448 for EA 472 (max_mdsize 472): rc = -34
       [ 6361.959073] 9[28537]: segfault at 8 ip 00007f20a23dc958 sp 00007fffccffcf80 error 4 in ld-2.17.so[7f20a23d1000+22000]
       [ 6469.162820] LustreError: 26494:0:(mdd_dir.c:222:mdd_parent_fid()) ASSERTION( (((mdd_object_type(obj)) & 00170000) == 0040000) ) failed: 
       [ 6469.214647] LustreError: 26494:0:(mdd_dir.c:222:mdd_parent_fid()) LBUG
       [ 6469.215925] Pid: 26494, comm: mdt00_001 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018
       [ 6469.219120] Call Trace:
       [ 6469.222463]  [<ffffffffa02637dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
       [ 6469.250486]  [<ffffffffa026388c>] lbug_with_loc+0x4c/0xa0 [libcfs]
       [ 6469.251895]  [<ffffffffa100ef22>] mdd_is_parent+0x4d2/0x510 [mdd]
       [ 6469.253469]  [<ffffffffa100f164>] mdd_is_subdir+0x204/0x240 [mdd]
       [ 6469.315072]  [<ffffffffa108f8a0>] mdt_reint_rename_internal.isra.47+0x810/0x2750 [mdt]
       [ 6469.318228]  [<ffffffffa109689b>] mdt_reint_rename_or_migrate.isra.51+0x19b/0x860 [mdt]
       [ 6469.340401]  [<ffffffffa1096f93>] mdt_reint_rename+0x13/0x20 [mdt]
       [ 6469.358495]  [<ffffffffa10984f0>] mdt_reint_rec+0x80/0x210 [mdt]
       [ 6469.400446]  [<ffffffffa1075882>] mdt_reint_internal+0x6b2/0xa50 [mdt]
       [ 6469.405016]  [<ffffffffa1080997>] mdt_reint+0x67/0x140 [mdt]
       [ 6469.406310]  [<ffffffffa05c3365>] tgt_request_handle+0xaf5/0x1590 [ptlrpc]
       [ 6469.412532]  [<ffffffffa0567436>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc]
       [ 6469.415111]  [<ffffffffa056b329>] ptlrpc_main+0xa99/0x1f60 [ptlrpc]
       [ 6469.416569]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
       [ 6469.417870]  [<ffffffff817c4c77>] ret_from_fork_nospec_end+0x0/0x39
       [ 6469.419494]  [<ffffffffffffffff>] 0xffffffffffffffff
       [ 6469.420822] Kernel panic - not syncing: LBUG
      

      crashdump: 192.168.123.130-2018-11-14-02:58:09 git source: 46bcdb588e22abf162af9a486107c7b59b438dd2

      Attachments

        Activity

          [LU-11668] mdd_parent_fid()) ASSERTION( (((mdd_object_type(obj)) & 00170000) == 0040000) ) failed

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35047/
          Subject: LU-11668 mdd: use mdd_object_fid() instead of mdo2fid()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 7de9babe6f9af6dfdb20360211f8ecea344b0500

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35047/ Subject: LU-11668 mdd: use mdd_object_fid() instead of mdo2fid() Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7de9babe6f9af6dfdb20360211f8ecea344b0500

          Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35047
          Subject: LU-11668 mdd: use mdd_object_fid() instead of mdo2fid()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: cfa0e83b83cca5b47c005d3934ee1abba07313ba

          gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35047 Subject: LU-11668 mdd: use mdd_object_fid() instead of mdo2fid() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cfa0e83b83cca5b47c005d3934ee1abba07313ba
          green Oleg Drokin added a comment -

          I hit this in current master-next with the new debug print. Running racer:

          [ 5609.701558] LustreError: 29511:0:(mdd_dir.c:225:mdd_parent_fid()) ASSERTION( S_ISDIR(mdd_object_type(obj)) ) failed: lustre-MDD0000: FID [0x200000003:0xa:0x0] is not a directory type = 100000
          [ 5609.713377] LustreError: 29511:0:(mdd_dir.c:225:mdd_parent_fid()) LBUG
          [ 5609.714440] Pid: 29511, comm: mdt07_012 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018
          [ 5609.716491] Call Trace:
          [ 5609.717566]  [<ffffffffa02077dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
          [ 5609.719016]  [<ffffffffa020788c>] lbug_with_loc+0x4c/0xa0 [libcfs]
          [ 5609.722670]  [<ffffffffa0fe5ab4>] mdd_parent_fid+0x374/0x3b0 [mdd]
          [ 5609.724305]  [<ffffffffa0fe5bc0>] mdd_is_parent+0xd0/0x1a0 [mdd]
          [ 5609.725641]  [<ffffffffa0fe5e94>] mdd_is_subdir+0x204/0x240 [mdd]
          [ 5609.726669]  [<ffffffffa10642d0>] mdt_reint_rename_internal.isra.46+0x810/0x2750 [mdt]
          [ 5609.728468]  [<ffffffffa106e14b>] mdt_reint_rename_or_migrate.isra.51+0x19b/0x860 [mdt]
          [ 5609.730274]  [<ffffffffa106e843>] mdt_reint_rename+0x13/0x20 [mdt]
          [ 5609.731149]  [<ffffffffa106e8d0>] mdt_reint_rec+0x80/0x210 [mdt]
          [ 5609.732097]  [<ffffffffa104b723>] mdt_reint_internal+0x6e3/0xab0 [mdt]
          [ 5609.732988]  [<ffffffffa10568e7>] mdt_reint+0x67/0x140 [mdt]
          [ 5609.734283]  [<ffffffffa05f5605>] tgt_request_handle+0xaf5/0x1590 [ptlrpc]
          [ 5609.735808]  [<ffffffffa05993a9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc]
          [ 5609.737741]  [<ffffffffa059d36c>] ptlrpc_main+0xb5c/0x2040 [ptlrpc]
          [ 5609.738705]  [<ffffffff810b4ed4>] kthread+0xe4/0xf0
          [ 5609.739563]  [<ffffffff817c4c77>] ret_from_fork_nospec_end+0x0/0x39
          [ 5609.740589]  [<ffffffffffffffff>] 0xffffffffffffffff
          [ 5609.741650] Kernel panic - not syncing: LBUG
          
          green Oleg Drokin added a comment - I hit this in current master-next with the new debug print. Running racer: [ 5609.701558] LustreError: 29511:0:(mdd_dir.c:225:mdd_parent_fid()) ASSERTION( S_ISDIR(mdd_object_type(obj)) ) failed: lustre-MDD0000: FID [0x200000003:0xa:0x0] is not a directory type = 100000 [ 5609.713377] LustreError: 29511:0:(mdd_dir.c:225:mdd_parent_fid()) LBUG [ 5609.714440] Pid: 29511, comm: mdt07_012 3.10.0-7.6-debug #1 SMP Wed Nov 7 21:55:08 EST 2018 [ 5609.716491] Call Trace: [ 5609.717566] [<ffffffffa02077dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 5609.719016] [<ffffffffa020788c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 5609.722670] [<ffffffffa0fe5ab4>] mdd_parent_fid+0x374/0x3b0 [mdd] [ 5609.724305] [<ffffffffa0fe5bc0>] mdd_is_parent+0xd0/0x1a0 [mdd] [ 5609.725641] [<ffffffffa0fe5e94>] mdd_is_subdir+0x204/0x240 [mdd] [ 5609.726669] [<ffffffffa10642d0>] mdt_reint_rename_internal.isra.46+0x810/0x2750 [mdt] [ 5609.728468] [<ffffffffa106e14b>] mdt_reint_rename_or_migrate.isra.51+0x19b/0x860 [mdt] [ 5609.730274] [<ffffffffa106e843>] mdt_reint_rename+0x13/0x20 [mdt] [ 5609.731149] [<ffffffffa106e8d0>] mdt_reint_rec+0x80/0x210 [mdt] [ 5609.732097] [<ffffffffa104b723>] mdt_reint_internal+0x6e3/0xab0 [mdt] [ 5609.732988] [<ffffffffa10568e7>] mdt_reint+0x67/0x140 [mdt] [ 5609.734283] [<ffffffffa05f5605>] tgt_request_handle+0xaf5/0x1590 [ptlrpc] [ 5609.735808] [<ffffffffa05993a9>] ptlrpc_server_handle_request+0x259/0xad0 [ptlrpc] [ 5609.737741] [<ffffffffa059d36c>] ptlrpc_main+0xb5c/0x2040 [ptlrpc] [ 5609.738705] [<ffffffff810b4ed4>] kthread+0xe4/0xf0 [ 5609.739563] [<ffffffff817c4c77>] ret_from_fork_nospec_end+0x0/0x39 [ 5609.740589] [<ffffffffffffffff>] 0xffffffffffffffff [ 5609.741650] Kernel panic - not syncing: LBUG
          pjones Peter Jones added a comment -

          Landed for 2.12

          pjones Peter Jones added a comment - Landed for 2.12

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33709/
          Subject: LU-11668 mdt: check parent type in rename/migrate
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 214b12adc315c4adc3c56deb7e790fdc6f0095c8

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33709/ Subject: LU-11668 mdt: check parent type in rename/migrate Project: fs/lustre-release Branch: master Current Patch Set: Commit: 214b12adc315c4adc3c56deb7e790fdc6f0095c8

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33700/
          Subject: LU-11668 debug: print object type in mdd_parent_fid
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 9a4a99b81a267a098b92ec10991af27b7f3cae7e

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33700/ Subject: LU-11668 debug: print object type in mdd_parent_fid Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9a4a99b81a267a098b92ec10991af27b7f3cae7e

          Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33709
          Subject: LU-11668 mdt: check parent type in rename/migrate
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 3ede5bf5d6539ac2a8ca4831722bff367d1aed68

          gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33709 Subject: LU-11668 mdt: check parent type in rename/migrate Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 3ede5bf5d6539ac2a8ca4831722bff367d1aed68

          I think it makes sense to just check the parent type in the MDT code, since there could be all kinds of reasons that it is wrong. In this case, it is likely that racer moved or deleted a directory that the client was going to rename a file in, and another thread created a regular file in its place with the same name. The mdt_reint_rename_internal() code should just check the type after the parent is looked up, and return -ENOTDIR if it isn't a directory.

          It may be the best place for that is in mdt_object_find_check() since that is only called for parent directories, in which case it would be better to be renamed as mdt_parent_find_check() or similar.

          Could you please work on a patch today, as this is one of the last blockers for 2.12 that doesn't have a patch yet.

          adilger Andreas Dilger added a comment - I think it makes sense to just check the parent type in the MDT code, since there could be all kinds of reasons that it is wrong. In this case, it is likely that racer moved or deleted a directory that the client was going to rename a file in, and another thread created a regular file in its place with the same name. The mdt_reint_rename_internal() code should just check the type after the parent is looked up, and return -ENOTDIR if it isn't a directory. It may be the best place for that is in mdt_object_find_check() since that is only called for parent directories, in which case it would be better to be renamed as mdt_parent_find_check() or similar. Could you please work on a patch today, as this is one of the last blockers for 2.12 that doesn't have a patch yet.
          laisiyao Lai Siyao added a comment -

          I don't find any clue from the code, let's see what type this object is. Since the parent FID is read from disk, and system may be inconsistent, in the future we may turn this assert into a check and return error if it's not directory.

          laisiyao Lai Siyao added a comment - I don't find any clue from the code, let's see what type this object is. Since the parent FID is read from disk, and system may be inconsistent, in the future we may turn this assert into a check and return error if it's not directory.

          Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33700
          Subject: LU-11668 debug: print object type in mdd_parent_fid
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: a4c91e284d998de40fd08bb18e046fb23bd0044d

          gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33700 Subject: LU-11668 debug: print object type in mdd_parent_fid Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a4c91e284d998de40fd08bb18e046fb23bd0044d

          People

            laisiyao Lai Siyao
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: