Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4005

Test failure on test suite recovery-double-scale, subtest test_pairwise_fail

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.5.0
    • 3
    • 10717

    Description

      This issue was created by maloo for wangdi <di.wang@intel.com>

      2:35:31:Lustre: 2065:0:(client.c:1896:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1379618921/real 1379618921] req@ffff88006fd91800 x1446635201626172/t0(0) o250->MGC10.10.4.198@tcp@0@lo:26/25 lens 400/544 e 0 to 1 dl 1379618936 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      12:35:31:LustreError: 2021:0:(client.c:1080:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88007985e000 x1446635201626188/t0(0) o253->MGC10.10.4.198@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      12:35:34:LustreError: 2021:0:(client.c:1080:ptlrpc_import_delay_req()) Skipped 1 previous similar message
      12:35:35:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
      12:35:36:LNet: 2285:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      12:35:36:LNet: 2286:0:(debug.c:218:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
      12:35:37:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P2 2>/dev/null
      12:35:38:Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 3 clients reconnect
      12:35:40:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Failing type2=OST item2=ost1 ...
      12:35:42:Lustre: DEBUG MARKER: Failing type2=OST item2=ost1 ...
      12:35:44:Lustre: lustre-MDT0001: Recovery over after 0:05, of 3 clients 3 recovered and 0 were evicted.
      12:35:45:LustreError: 2096:0:(lod_lov.c:707:lod_initialize_objects()) ASSERTION( cfs_bitmap_check(md->lod_ost_descs.ltd_tgt_bitmap, idx) ) failed:
      12:35:46:LustreError: 2096:0:(lod_lov.c:707:lod_initialize_objects()) LBUG
      12:35:47:Pid: 2096, comm: mdt00_001
      12:35:47:
      12:35:48:Call Trace:
      12:35:50: [<ffffffffa04a4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      12:35:51: [<ffffffffa04a4e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      12:35:52: [<ffffffffa0ed8f0b>] lod_initialize_objects+0x95b/0xc00 [lod]
      12:35:55: [<ffffffffa0ed93dc>] lod_parse_striping+0x22c/0x330 [lod]
      12:35:56: [<ffffffffa0eda83e>] ? lod_get_lov_ea+0xae/0x3c0 [lod]
      12:35:57: [<ffffffffa0edadf4>] lod_load_striping+0x2a4/0x4b0 [lod]
      12:35:57: [<ffffffffa0ee5bab>] lod_declare_object_destroy+0x16b/0x390 [lod]
      12:35:58: [<ffffffffa0f502b0>] mdd_declare_finish_unlink+0x90/0x170 [mdd]
      12:35:58: [<ffffffffa0f5628a>] mdd_unlink+0x3ca/0xe30 [mdd]
      12:36:00: [<ffffffffa0e17078>] mdo_unlink+0x18/0x50 [mdt]
      12:36:02: [<ffffffffa0e1a460>] mdt_reint_unlink+0x820/0x1010 [mdt]
      12:36:03: [<ffffffffa0e16d71>] mdt_reint_rec+0x41/0xe0 [mdt]
      12:36:05: [<ffffffffa0dfec63>] mdt_reint_internal+0x4c3/0x780 [mdt]
      12:36:05: [<ffffffffa0dfef64>] mdt_reint+0x44/0xe0 [mdt]
      12:36:06: [<ffffffffa0e01a5a>] mdt_handle_common+0x52a/0x1470 [mdt]
      12:36:07: [<ffffffffa0e3b3c5>] mds_regular_handle+0x15/0x20 [mdt]
      12:36:08: [<ffffffffa07b6ad5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      12:36:10: [<ffffffffa04a554e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      12:36:10: [<ffffffffa04b640f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      12:36:10: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      12:36:10: [<ffffffffa07b7e1d>] ptlrpc_main+0xacd/0x1710 [ptlrpc]
      12:36:11: [<ffffffffa07b7350>] ? ptlrpc_main+0x0/0x1710 [ptlrpc]
      12:36:11: [<ffffffff81096a36>] kthread+0x96/0xa0
      12:36:11: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      12:36:12: [<ffffffff810969a0>] ? kthread+0x0/0xa0
      12:36:14: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      12:36:14:
      12:36:15:Kernel panic - not syncing: LBUG
      12:36:16:Pid: 2096, comm: mdt00_001 Not tainted 2.6.32-358.18.1.el6_lustre.ga32e49e.x86_64 #1
      12:36:16:Call Trace:
      12:36:17: [<ffffffff8150de58>] ? panic+0xa7/0x16f
      12:36:18: [<ffffffffa04a4eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      12:36:22: [<ffffffffa0ed8f0b>] ? lod_initialize_objects+0x95b/0xc00 [lod]
      12:36:22: [<ffffffffa0ed93dc>] ? lod_parse_striping+0x22c/0x330 [lod]
      12:36:23: [<ffffffffa0eda83e>] ? lod_get_lov_ea+0xae/0x3c0 [lod]
      12:36:24: [<ffffffffa0edadf4>] ? lod_load_striping+0x2a4/0x4b0 [lod]
      12:36:24: [<ffffffffa0ee5bab>] ? lod_declare_object_destroy+0x16b/0x390 [lod]
      12:36:25: [<ffffffffa0f502b0>] ? mdd_declare_finish_unlink+0x90/0x170 [mdd]
      12:36:27: [<ffffffffa0f5628a>] ? mdd_unlink+0x3ca/0xe30 [mdd]
      12:36:29: [<ffffffffa0e17078>] ? mdo_unlink+0x18/0x50 [mdt]
      12:36:29: [<ffffffffa0e1a460>] ? mdt_reint_unlink+0x820/0x1010 [mdt]
      12:36:31: [<ffffffffa0e16d71>] ? mdt_reint_rec+0x41/0xe0 [mdt]
      12:36:32: [<ffffffffa0dfec63>] ? mdt_reint_internal+0x4c3/0x780 [mdt]
      12:36:33: [<ffffffffa0dfef64>] ? mdt_reint+0x44/0xe0 [mdt]
      12:36:34: [<ffffffffa0e01a5a>] ? mdt_handle_common+0x52a/0x1470 [mdt]
      12:36:35: [<ffffffffa0e3b3c5>] ? mds_regular_handle+0x15/0x20 [mdt]
      12:36:36: [<ffffffffa07b6ad5>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
      12:36:36: [<ffffffffa04a554e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
      12:36:36: [<ffffffffa04b640f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
      12:36:37: [<ffffffff81055ad3>] ? __wake_up+0x53/0x70
      12:36:38: [<ffffffffa07b7e1d>] ? ptlrpc_main+0xacd/0x1710 [ptlrpc]
      12:36:39: [<ffffffffa07b7350>] ? ptlrpc_main+0x0/0x1710 [ptlrpc]
      12:36:40: [<ffffffff81096a36>] ? kthread+0x96/0xa0
      12:36:41: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      12:36:41: [<ffffffff810969a0>] ? kthread+0x0/0xa0
      12:36:41: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/cdb9d64a-21b5-11e3-afee-52540035b04c.

      The sub-test test_pairwise_fail failed with the following error:

      test failed to respond and timed out

      Info required for matching: recovery-double-scale pairwise_fail

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: