Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6444

Hard Failover recovery-mds-scale test_failover_ost: test_failover_ost returned 3

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.8.0
    • None
    • client and server: lustre-master build #2967 zfs
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/83ac0620-d67e-11e4-8a24-5254006e85c2.

      The sub-test test_failover_ost failed with the following error:

      test_failover_ost returned 3
      

      MDS console

      Lustre: lustre-MDT0000: Recovery over after 0:09, of 2 clients 2 recovered and 0 were evicted.
      Lustre: 2817:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1427510311/real 1427510311]  req@ffff88006b6859c0 x1496853024604548/t0(0) o8->lustre-OST0002-osc-MDT0000@10.2.4.226@tcp:28/4 lens 400/544 e 0 to 1 dl 1427510316 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 2817:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1427510321/real 1427510321]  req@ffff8800722326c0 x1496853024604620/t0(0) o8->lustre-OST0006-osc-MDT0000@10.2.4.226@tcp:28/4 lens 400/544 e 0 to 1 dl 1427510331 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 2817:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      LNet: Service thread pid 3181 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 3181, comm: mdt00_003
      
      Call Trace:
       [<ffffffff8152b162>] schedule_timeout+0x192/0x2e0
       [<ffffffff810874f0>] ? process_timeout+0x0/0x10
       [<ffffffffa1325951>] osp_precreate_reserve+0x3d1/0x810 [osp]
       [<ffffffffa131b740>] ? osp_object_free+0x2d0/0x4a0 [osp]
       [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
       [<ffffffffa131fbb6>] osp_declare_object_create+0x1a6/0x650 [osp]
       [<ffffffffa125d81a>] lod_qos_declare_object_on+0x12a/0x4f0 [lod]
       [<ffffffffa12603f3>] lod_alloc_rr.clone.2+0xbb3/0x1020 [lod]
       [<ffffffffa079dfb1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa1261442>] lod_qos_prep_create+0xbe2/0x19e0 [lod]
       [<ffffffffa1254d42>] lod_declare_striped_object+0x162/0x9b0 [lod]
       [<ffffffffa125c547>] lod_declare_object_create+0x2c7/0x460 [lod]
       [<ffffffffa12c4506>] mdd_declare_object_create_internal+0x116/0x340 [mdd]
       [<ffffffffa12bd79e>] mdd_create+0x68e/0x1730 [mdd]
       [<ffffffffa118f7b8>] mdo_create+0x18/0x50 [mdt]
       [<ffffffffa1196bbf>] mdt_reint_open+0x1f8f/0x2c70 [mdt]
       [<ffffffffa0906cbc>] ? upcall_cache_get_entry+0x29c/0x880 [obdclass]
       [<ffffffffa1180cad>] mdt_reint_rec+0x5d/0x200 [mdt]
       [<ffffffffa116511b>] mdt_reint_internal+0x4cb/0x7a0 [mdt]
       [<ffffffffa11655e6>] mdt_intent_reint+0x1f6/0x430 [mdt]
       [<ffffffffa1163bd4>] mdt_intent_policy+0x494/0xce0 [mdt]
       [<ffffffffa0abf4f9>] ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc]
       [<ffffffffa0aeb52b>] ldlm_handle_enqueue0+0x51b/0x13f0 [ptlrpc]
       [<ffffffffa0b6bbb1>] tgt_enqueue+0x61/0x230 [ptlrpc]
       [<ffffffffa0b6c7fe>] tgt_request_handle+0x8be/0x1000 [ptlrpc]
       [<ffffffffa0b1c661>] ptlrpc_main+0xe41/0x1960 [ptlrpc]
       [<ffffffffa0b1b820>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
       [<ffffffff8109e66e>] kthread+0x9e/0xc0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      LustreError: dumping log to /tmp/lustre-log.1427510345.3181
      LNet: Service thread pid 3181 completed after 80.04s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Duration:               86400
      Server failover period: 1200 seconds
      Exited after:           37 seconds
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: