Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5154

Failure on test suite recovery-double-scale test_pairwise_fail

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.6.0
    • None
    • client and server: lustre-master build # 2052
    • 3
    • 14222

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/f5b45d64-e1d6-11e3-8cc0-52540035b04c.

      The sub-test test_pairwise_fail failed with the following error:

      test failed to respond and timed out

      client 3 dmesg

      Lustre: 1763:0:(llite_lib.c:2738:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.1.4.10@tcp:10.1.4.6@tcp:/lustre/fid: [0x200024621:0x2f14:0x0]/ may get corrupted (rc -108)
      Lustre: 1763:0:(llite_lib.c:2738:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.1.4.10@tcp:10.1.4.6@tcp:/lustre/fid: [0x200024621:0x2eec:0x0]/ may get corrupted (rc -108)
      Lustre: 1763:0:(llite_lib.c:2738:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.1.4.10@tcp:10.1.4.6@tcp:/lustre/fid: [0x200024621:0x2d28:0x0]/ may get corrupted (rc -108)
      Lustre: 1764:0:(llite_lib.c:2738:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.1.4.10@tcp:10.1.4.6@tcp:/lustre/fid: [0x200024621:0x2d86:0x0]/ may get corrupted (rc -108)
      Lustre: 1764:0:(llite_lib.c:2738:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.1.4.10@tcp:10.1.4.6@tcp:/lustre/fid: [0x200024621:0x2d94:0x0]/ may get corrupted (rc -108)
      Lustre: lustre-OST0004-osc-ffff88007d400c00: Connection restored to lustre-OST0004 (at 10.1.4.9@tcp)
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark                             Failing type2=OST item2=ost5 ... 
      Lustre: DEBUG MARKER: Failing type2=OST item2=ost5 ...
      INFO: task tar:3715 blocked for more than 120 seconds.
            Not tainted 2.6.32-431.17.1.el6.x86_64 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      tar           D 0000000000000000     0  3715   3713 0x00000080
       ffff8800704d39e8 0000000000000082 0000000000000000 ffffffff810097cc
       ffff88007b8c4ad8 0000000000000000 00000000004d39a8 ffff8800022143c0
       ffff88007133daf8 ffff8800704d3fd8 000000000000fbc8 ffff88007133daf8
      Call Trace:
       [<ffffffff810097cc>] ? __switch_to+0x1ac/0x320
       [<ffffffff81528a95>] schedule_timeout+0x215/0x2e0
       [<ffffffffa0747470>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
       [<ffffffff81528713>] wait_for_common+0x123/0x180
       [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
       [<ffffffff8152882d>] wait_for_completion+0x1d/0x20
       [<ffffffffa0b66c5c>] osc_io_setattr_end+0xbc/0x190 [osc]
       [<ffffffffa0998370>] ? lov_io_end_wrapper+0x0/0x100 [lov]
       [<ffffffffa055e0e0>] cl_io_end+0x60/0x150 [obdclass]
       [<ffffffffa055ec60>] ? cl_io_start+0x0/0x140 [obdclass]
       [<ffffffffa0998461>] lov_io_end_wrapper+0xf1/0x100 [lov]
       [<ffffffffa09981ae>] lov_io_call+0x8e/0x130 [lov]
       [<ffffffffa0999f3c>] lov_io_end+0x4c/0xf0 [lov]
       [<ffffffffa055e0e0>] cl_io_end+0x60/0x150 [obdclass]
       [<ffffffffa0562e52>] cl_io_loop+0xc2/0x1b0 [obdclass]
       [<ffffffffa0a7c4c8>] cl_setattr_ost+0x218/0x2f0 [lustre]
       [<ffffffffa0a4692c>] ll_setattr_raw+0xa2c/0x10d0 [lustre]
       [<ffffffffa0a47035>] ll_setattr+0x65/0xd0 [lustre]
       [<ffffffff811a6fc8>] notify_change+0x168/0x340
       [<ffffffff811bb4ac>] utimes_common+0xdc/0x1b0
       [<ffffffff810ec3fe>] ? call_rcu+0xe/0x10
       [<ffffffff811aa6b0>] ? mntput_no_expire+0x30/0x110
       [<ffffffff811bb650>] do_utimes+0xd0/0x170
       [<ffffffff811bb7f2>] sys_utimensat+0x32/0x90
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      Lustre: 1762:0:(client.c:1914:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1400706578/real 1400706578]  req@ffff8800721e6800 x1468746594693168/t0(0) o8->lustre-OST0006-osc-ffff88007d400c00@10.1.4.5@tcp:28/4 lens 400/544 e 0 to 1 dl 1400706606 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 1762:0:(client.c:1914:ptlrpc_expire_one_request()) Skipped 29 previous similar messages
      Lustre: lustre-OST0000-osc-ffff88007d400c00: Connection to lustre-OST0000 (at 10.1.4.9@tcp) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: Skipped 6 previous similar messages
      Lustre: lustre-OST0001-osc-ffff88007d400c00: Connection to lustre-OST0001 (at 10.1.4.9@tcp) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: lustre-OST0003-osc-ffff88007d400c00: Connection to lustre-OST0003 (at 10.1.4.9@tcp) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: lustre-OST0000-osc-ffff88007d400c00: Connection restored to lustre-OST0000 (at 10.1.4.5@tcp)
      Lustre: lustre-OST0001-osc-ffff88007d400c00: Connection restored to lustre-OST0001 (at 10.1.4.5@tcp)
      INFO: task tar:3715 blocked for more than 120 seconds.
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: