Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10067

LBUG mdt_handler.c:222:mdt_lock_pdo_mode()

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • Lustre 2.11.0
    • None
    • trevis, full DNE
      servers: el7.4, zfs, branch master, v2.10.53.1, b3642
      clients: el7.4, branch master, v2.10.53.1, b3642
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/00511056-db38-4995-a4f8-b35856fa09f7

      racer, test_1: test failed to respond and timed out

      From MDS console:

      06:18:07:[19681.903127] LustreError: 22680:0:(mdt_handler.c:222:mdt_lock_pdo_mode()) ASSERTION( lh->mlh_pdo_mode == LCK_MINMODE ) failed: 
      06:18:07:[19681.908979] LustreError: 22680:0:(mdt_handler.c:222:mdt_lock_pdo_mode()) LBUG
      06:18:07:[19681.912061] Pid: 22680, comm: mdt00_037
      06:18:07:[19681.914860] 
      06:18:07:[19681.914860] Call Trace:
      06:18:07:[19681.920079]  [<ffffffffc06817ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      06:18:07:[19681.923062]  [<ffffffffc068183c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      06:18:07:[19681.926146]  [<ffffffffc11b59db>] mdt_object_local_lock+0xa4b/0xad0 [mdt]
      06:18:07:[19681.928946]  [<ffffffffc11b56db>] ? mdt_object_local_lock+0x74b/0xad0 [mdt]
      06:18:07:[19681.931741]  [<ffffffffc11a5f10>] ? mdt_blocking_ast+0x0/0x2e0 [mdt]
      06:18:07:[19681.934380]  [<ffffffffc0e33770>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
      06:18:07:[19681.937367]  [<ffffffffc11b5ad0>] mdt_object_lock_internal+0x70/0x330 [mdt]
      06:18:07:[19681.940203]  [<ffffffffc11b5ad0>] ? mdt_object_lock_internal+0x70/0x330 [mdt]
      06:18:07:[19681.943129]  [<ffffffffc11b5db0>] mdt_object_lock+0x20/0x30 [mdt]
      06:18:07:[19681.945893]  [<ffffffffc11fa3f4>] mdt_lock_objects_in_linkea+0x748/0xa68 [mdt]
      06:18:07:[19681.948705]  [<ffffffffc11ca148>] mdt_reint_migrate_internal.isra.38+0x8c8/0x16d0 [mdt]
      06:18:07:[19681.951559]  [<ffffffffc0692002>] ? cfs_hash_bd_from_key+0x32/0xb0 [libcfs]
      06:18:07:[19681.954250]  [<ffffffffc0c41788>] ? lu_object_put+0x148/0x3d0 [obdclass]
      06:18:07:[19681.956946]  [<ffffffffc11cb1b5>] mdt_reint_rename_or_migrate.isra.39+0x265/0x860 [mdt]
      06:18:07:[19681.959450]  [<ffffffffc11c0181>] ? mdt_root_squash+0x21/0x430 [mdt]
      06:18:07:[19681.961948]  [<ffffffff8132c212>] ? strlcpy+0x42/0x60
      06:18:07:[19681.964254]  [<ffffffffc11cb7c0>] mdt_reint_migrate+0x10/0x20 [mdt]
      06:18:07:[19681.966606]  [<ffffffffc11cf790>] mdt_reint_rec+0x80/0x210 [mdt]
      06:18:07:[19681.969134]  [<ffffffffc11b131b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      06:18:07:[19681.971704]  [<ffffffffc11bcda7>] mdt_reint+0x67/0x140 [mdt]
      06:18:07:[19681.974204]  [<ffffffffc0ecc225>] tgt_request_handle+0x925/0x1370 [ptlrpc]
      06:18:07:[19681.976507]  [<ffffffffc0e750c6>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
      06:18:07:[19681.979113]  [<ffffffff810ba588>] ? __wake_up_common+0x58/0x90
      06:18:07:[19681.981580]  [<ffffffffc0e78862>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      06:18:07:[19681.983929]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510
      06:18:07:[19681.986432]  [<ffffffff816a8f00>] ? __schedule+0x330/0x8b0
      06:18:07:[19681.988691]  [<ffffffffc0e77dd0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
      06:18:07:[19681.990751]  [<ffffffff810b098f>] kthread+0xcf/0xe0
      06:18:07:[19681.993085]  [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      06:18:07:[19681.995015]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
      06:18:07:[19681.997259]  [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      06:18:07:[19681.999174] 
      06:18:07:[19682.000888] Kernel panic - not syncing: LBUG
      06:18:07:[19682.001870] CPU: 1 PID: 22680 Comm: mdt00_037 Tainted: P           OE  ------------   3.10.0-693.1.1.el7_lustre.x86_64 #1
      06:18:07:[19682.001870] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      06:18:07:[19682.001870]  ffff8800664d1f00 0000000068d88604 ffff88004cf8b898 ffffffff816a3d6d
      06:18:07:[19682.001870]  ffff88004cf8b918 ffffffff8169dc54 ffffffff00000008 ffff88004cf8b928
      06:18:07:[19682.001870]  ffff88004cf8b8c8 0000000068d88604 0000000068d88604 ffff88007fd0f8b8
      06:18:07:[19682.001870] Call Trace:
      06:18:07:[19682.001870]  [<ffffffff816a3d6d>] dump_stack+0x19/0x1b
      06:18:07:[19682.001870]  [<ffffffff8169dc54>] panic+0xe8/0x20d
      06:18:07:[19682.001870]  [<ffffffffc0681854>] lbug_with_loc+0x64/0xb0 [libcfs]
      06:18:07:[19682.001870]  [<ffffffffc11b59db>] mdt_object_local_lock+0xa4b/0xad0 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11b56db>] ? mdt_object_local_lock+0x74b/0xad0 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11a5f10>] ? mdt_obd_reconnect+0x1d0/0x1d0 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc0e33770>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc]
      06:18:07:[19682.001870]  [<ffffffffc11b5ad0>] mdt_object_lock_internal+0x70/0x330 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11b5ad0>] ? mdt_object_lock_internal+0x70/0x330 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11b5db0>] mdt_object_lock+0x20/0x30 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11fa3f4>] mdt_lock_objects_in_linkea+0x748/0xa68 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11ca148>] mdt_reint_migrate_internal.isra.38+0x8c8/0x16d0 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc0692002>] ? cfs_hash_bd_from_key+0x32/0xb0 [libcfs]
      06:18:07:[19682.001870]  [<ffffffffc0c41788>] ? lu_object_put+0x148/0x3d0 [obdclass]
      06:18:07:[19682.001870]  [<ffffffffc11cb1b5>] mdt_reint_rename_or_migrate.isra.39+0x265/0x860 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11c0181>] ? mdt_root_squash+0x21/0x430 [mdt]
      06:18:07:[19682.001870]  [<ffffffff8132c212>] ? strlcpy+0x42/0x60
      06:18:07:[19682.001870]  [<ffffffffc11cb7c0>] mdt_reint_migrate+0x10/0x20 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11cf790>] mdt_reint_rec+0x80/0x210 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11b131b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc11bcda7>] mdt_reint+0x67/0x140 [mdt]
      06:18:07:[19682.001870]  [<ffffffffc0ecc225>] tgt_request_handle+0x925/0x1370 [ptlrpc]
      06:18:07:[19682.001870]  [<ffffffffc0e750c6>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
      06:18:07:[19682.001870]  [<ffffffff810ba588>] ? __wake_up_common+0x58/0x90
      06:18:07:[19682.001870]  [<ffffffffc0e78862>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      06:18:07:[19682.001870]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510
      06:18:07:[19682.001870]  [<ffffffff816a8f00>] ? __schedule+0x330/0x8b0
      06:18:07:[19682.001870]  [<ffffffffc0e77dd0>] ? ptlrpc_register_service+0xe80/0xe80 [ptlrpc]
      06:18:07:[19682.001870]  [<ffffffff810b098f>] kthread+0xcf/0xe0
      06:18:07:[19682.001870]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      06:18:07:[19682.001870]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
      06:18:07:[19682.001870]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
      

      Attachments

        Issue Links

          Activity

            [LU-10067] LBUG mdt_handler.c:222:mdt_lock_pdo_mode()
            mdiep Minh Diep added a comment -

            Landed for 2.11

            mdiep Minh Diep added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29597/
            Subject: LU-10067 mdt: reinit lock when fail to try lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 37e4bcaad4b4cd1f539c257f7424850e51d685c1

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29597/ Subject: LU-10067 mdt: reinit lock when fail to try lock Project: fs/lustre-release Branch: master Current Patch Set: Commit: 37e4bcaad4b4cd1f539c257f7424850e51d685c1

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/29597
            Subject: LU-10067 mdt: reinit lock when fail to try lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 24bd3ed58b9fd6a80f5e9c000119c15076d38053

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/29597 Subject: LU-10067 mdt: reinit lock when fail to try lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 24bd3ed58b9fd6a80f5e9c000119c15076d38053

            Hongchao,

            Would you please comment on this issue?

            James

            jamesanunez James Nunez (Inactive) added a comment - Hongchao, Would you please comment on this issue? James

            In the test_log, we see

            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_27457
            trevis-6vm4: Warning: Permanently added 'trevis-6vm4' (ECDSA) to the list of known hosts.
            trevis-6vm4: Warning: Permanently added 'trevis-6vm3' (ECDSA) to the list of known hosts.
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null
            trevis-6vm4: Warning: Permanently added 'trevis-6vm8' (ECDSA) to the list of known hosts.
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_29957
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_destroy -F lustre -n lss_29957 -f
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_12954
            trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_2287
            trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_14963
            trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable
            CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_12585
            trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable
            trevis-6vm4: Connection closed by UNKNOWN port 65535
            …
            

            Although racer test_1 hangs frequently during autotesting and does fail with “Resource temporarily unavailable” messages in the test log, we don’t see the LBUG in the MDS logs in previous runs of racer.

            jamesanunez James Nunez (Inactive) added a comment - In the test_log, we see CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_27457 trevis-6vm4: Warning: Permanently added 'trevis-6vm4' (ECDSA) to the list of known hosts. trevis-6vm4: Warning: Permanently added 'trevis-6vm3' (ECDSA) to the list of known hosts. CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null trevis-6vm4: Warning: Permanently added 'trevis-6vm8' (ECDSA) to the list of known hosts. CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_29957 CMD: trevis-6vm4 /usr/sbin/lctl snapshot_list -F lustre 2>/dev/null CMD: trevis-6vm4 /usr/sbin/lctl snapshot_destroy -F lustre -n lss_29957 -f CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_12954 trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_2287 trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_14963 trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable CMD: trevis-6vm4 /usr/sbin/lctl snapshot_create -F lustre -n lss_12585 trevis-6vm4: Can't lock the snapshot config file /etc/ldev.conf (2): Resource temporarily unavailable trevis-6vm4: Connection closed by UNKNOWN port 65535 … Although racer test_1 hangs frequently during autotesting and does fail with “Resource temporarily unavailable” messages in the test log, we don’t see the LBUG in the MDS logs in previous runs of racer.

            People

              hongchao.zhang Hongchao Zhang
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: