Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8880

sanity test_1: fails w/DNE @@@ faked source MDT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      The `/usr/bin/lfs setdirstripe -i1 -c2 /mnt/lustre/d1.sanity` command issued by sanity.sh test_1 hangs when mdt2mdt connections are secured with the krb5p security flavor. This test has failed with SSK flavors skpi and ski; other flavors have not yet been tested.

      == MDS2 logs ==

      [  341.920887] Lustre: DEBUG MARKER: == sanity test 1: mkdir; remkdir; rmdir ============================================================== 09:13:33 (1480515213)
      [  341.954824] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d368ed00 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1
      [  343.088324] Lustre: 2602:0:(sec_gss.c:2063:gss_svc_handle_init()) create svc ctx ffff8802051cc440: user from 172.18.0.18@tcp authenticated as mds
      [  343.088328] Lustre: 2602:0:(sec_gss.c:2063:gss_svc_handle_init()) Skipped 9 previous similar messages
      [  343.088346] Lustre: 2602:0:(sec_gss.c:371:gss_cli_ctx_uptodate()) server installed reverse ctx ffff8802051a6900 idx 0x35ebf0324b78e726, expiry 1480599087(+83873s)
      [  343.088348] Lustre: 2602:0:(sec_gss.c:371:gss_cli_ctx_uptodate()) Skipped 9 previous similar messages
      [  352.954000] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880203a13300 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  363.953599] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880204ecc600 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  374.953137] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880205722850 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  382.074673] LNet: Service thread pid 3152 was inactive for 40.12s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [  382.074679] Pid: 3152, comm: mdt00_003
      [  382.074681] 
      Call Trace:
      [  382.074692]  [<ffffffff8163bd69>] schedule+0x29/0x70
      [  382.074697]  [<ffffffff816399c5>] schedule_timeout+0x175/0x2d0
      [  382.074704]  [<ffffffff8108bff0>] ? process_timeout+0x0/0x10
      [  382.074723]  [<ffffffffa0642ef1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs]
      [  382.074777]  [<ffffffffa0970fa0>] ? ldlm_expired_completion_wait+0x0/0x250 [ptlrpc]
      [  382.074820]  [<ffffffffa0971791>] ldlm_completion_ast+0x5a1/0x910 [ptlrpc]
      [  382.074860]  [<ffffffffa095e730>] ? ldlm_resource_add_lock+0x70/0x1b0 [ptlrpc]
      [  382.074867]  [<ffffffff810b8940>] ? default_wake_function+0x0/0x20
      [  382.074910]  [<ffffffffa0973448>] ldlm_cli_enqueue_fini+0x938/0xdb0 [ptlrpc]
      [  382.074952]  [<ffffffffa09763b2>] ldlm_cli_enqueue+0x6c2/0x810 [ptlrpc]
      [  382.074994]  [<ffffffffa09711f0>] ? ldlm_completion_ast+0x0/0x910 [ptlrpc]
      [  382.075012]  [<ffffffffa0cfda80>] ? mdt_remote_blocking_ast+0x0/0x590 [mdt]
      [  382.075032]  [<ffffffffa0eafc82>] osp_md_object_lock+0x172/0x2e0 [osp]
      [  382.075054]  [<ffffffffa0e56c51>] ? mdd_la_get+0x81/0x1a0 [mdd]
      [  382.075077]  [<ffffffffa0decb20>] lod_object_lock+0xf0/0x950 [lod]
      [  382.075093]  [<ffffffffa06524be>] ? cfs_hash_bd_add_locked+0x1e/0x80 [libcfs]
      [  382.075105]  [<ffffffffa0652869>] ? cfs_hash_spin_unlock+0x9/0x10 [libcfs]
      [  382.075171]  [<ffffffffa077fdb7>] ? lu_object_find_try+0x207/0x2b0 [obdclass]
      [  382.075177]  [<ffffffffa0e5410b>] mdd_object_lock+0x3b/0xd0 [mdd]
      [  382.075186]  [<ffffffffa0d0ca72>] mdt_remote_object_lock+0x1e2/0x710 [mdt]
      [  382.075202]  [<ffffffffa077ff0c>] ? lu_object_find_at+0xac/0xe0 [obdclass]
      [  382.075228]  [<ffffffffa09a3bc7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc]
      [  382.075237]  [<ffffffffa0d0dfc6>] mdt_object_lock_internal+0x166/0x300 [mdt]
      [  382.075244]  [<ffffffffa0d0e174>] mdt_object_lock+0x14/0x20 [mdt]
      [  382.075254]  [<ffffffffa0d246a1>] mdt_md_create+0x421/0xba0 [mdt]
      [  382.075268]  [<ffffffffa075d838>] ? lprocfs_counter_add+0x128/0x1a0 [obdclass]
      [  382.075285]  [<ffffffffa0793ef1>] ? lprocfs_job_stats_log+0xd1/0x630 [obdclass]
      [  382.075294]  [<ffffffffa0d24f8b>] mdt_reint_create+0x16b/0x350 [mdt]
      [  382.075302]  [<ffffffffa0d26490>] mdt_reint_rec+0x80/0x210 [mdt]
      [  382.075309]  [<ffffffffa0d08db2>] mdt_reint_internal+0x5b2/0x9b0 [mdt]
      [  382.075317]  [<ffffffffa0d141e7>] mdt_reint+0x67/0x140 [mdt]
      [  382.075351]  [<ffffffffa0a05225>] tgt_request_handle+0x915/0x1320 [ptlrpc]
      [  382.075378]  [<ffffffffa09b11ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [  382.075404]  [<ffffffffa09aed68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [  382.075407]  [<ffffffff810b8952>] ? default_wake_function+0x12/0x20
      [  382.075409]  [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90
      [  382.075434]  [<ffffffffa09b5260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
      [  382.075458]  [<ffffffffa09b47c0>] ? ptlrpc_main+0x0/0x1de0 [ptlrpc]
      [  382.075461]  [<ffffffff810a5b8f>] kthread+0xcf/0xe0
      [  382.075463]  [<ffffffff810a5ac0>] ? kthread+0x0/0xe0
      [  382.075466]  [<ffffffff81646c98>] ret_from_fork+0x58/0x90
      [  382.075469]  [<ffffffff810a5ac0>] ? kthread+0x0/0xe0
      
      [  382.075472] LustreError: dumping log to /tmp/lustre-log.1480515253.3152
      [  385.952974] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880205375800 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  396.952564] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880204d3db00 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  418.952006] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d72c8050 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  418.952150] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 1 previous similar message
      [  451.951069] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d04f0f00 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  451.951177] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 2 previous similar messages
      [  517.949644] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d0514600 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  517.949761] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 5 previous similar messages
      [  641.950918] Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 172.18.0.18@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [  641.950949] LustreError: 3152:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1480515213, 300s ago), entering recovery for lustre-MDT0000_UUID@172.18.0.18@tcp ns: lustre-MDT0000-osp-MDT0001 lock: ffff880211f20e00/0xd408ec1632ce7580 lrc: 4/0,1 mode: --/EX res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 1 type: IBT flags: 0x1000001000000 nid: local remote: 0xf8f327625a9c3943 expref: -99 pid: 3152 timeout: 0 lvb_type: 0
      [  641.951489] Lustre: lustre-MDT0000-osp-MDT0001: Connection restored to 172.18.0.18@tcp (at 172.18.0.18@tcp)
      [  641.951493] Lustre: Skipped 3 previous similar messages
      [  652.950923] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8802051a8900 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  652.951025] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 12 previous similar messages
      [  916.946544] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d04d3c00 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  916.946651] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 23 previous similar messages
      [  936.436828] Lustre: 2596:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
        req@ffff8800d368ea00 x1552432529149200/t0(0) o36->c91cef1e-2de0-ca74-2459-2c580f0f0023@172.18.0.22@tcp:-1/-1 lens 672/568 e 24 to 0 dl 1480515813 ref 2 fl Interpret:/0/0 rc 0/0
      [  942.942067] Lustre: lustre-MDT0001: Client c91cef1e-2de0-ca74-2459-2c580f0f0023 (at 172.18.0.22@tcp) reconnecting
      [  942.942090] Lustre: lustre-MDT0001: Connection restored to c91cef1e-2de0-ca74-2459-2c580f0f0023 (at 172.18.0.22@tcp)
      

      The test was executed with:

      export SEC=krb5p; ./auster -f multinode -rsv -d /import/results/krb5p sanity --only 1
      

      Attachments

        Issue Links

          Activity

            People

              jfilizetti Jeremy Filizetti
              nblavend Nathan Lavender (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: