Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6740

racer test_1: test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d259c19c-135d-11e5-b4b0-5254006e85c2.

      The sub-test test_1 failed with the following error:

      test failed to respond and timed out
      

      client dmesg

      [11746.904688] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-42vm5,onyx-42vm6.onyx.hpdd.intel.com DURATION=900 ============ 08:25:59 (1434295559)
      [11747.095999] Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
      [11747.096301] Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 				   LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
      [11749.430943] 19[7765]: segfault at 8 ip 00007f62842db6d8 sp 00007fff0b321d30 error 4 in ld-2.17.so[7f62842d0000+21000]
      [11750.135045] LustreError: 7765:0:(file.c:195:ll_close_inode_openhandle()) lustre-clilmv-ffff88003bc10000: inode [0x200000401:0x7:0x0] mdc close failed: rc = -13
      [11751.350622] LustreError: 13255:0:(file.c:195:ll_close_inode_openhandle()) lustre-clilmv-ffff88003bc10000: inode [0x200000401:0x7f:0x0] mdc close failed: rc = -13
      [11754.718383] LustreError: 20954:0:(file.c:195:ll_close_inode_openhandle()) lustre-clilmv-ffff880055503800: inode [0x200000402:0xd9:0x0] mdc close failed: rc = -13
      [11805.326196] Lustre: 14599:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434295611/real 1434295611]  req@ffff880041820300 x1503967237771580/t0(0) o36->lustre-MDT0000-mdc-ffff88003bc10000@10.2.4.203@tcp:12/10 lens 496/440 e 0 to 1 dl 1434295618 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [11805.326304] Lustre: lustre-MDT0000-mdc-ffff88003bc10000: Connection to lustre-MDT0000 (at 10.2.4.203@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [11805.337639] Lustre: 14599:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      [11806.326218] LustreError: 166-1: MGC10.2.4.203@tcp: Connection to MGS (at 10.2.4.203@tcp) was lost; in progress operations using this service will fail
      [11837.330050] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1434295639/real 0]  req@ffff880042460300 x1503967237772588/t0(0) o250->MGC10.2.4.203@tcp@10.2.4.203@tcp:26/25 lens 520/544 e 0 to 1 dl 1434295650 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      [11837.337236] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
      [11897.330310] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434295694/real 1434295705]  req@ffff880042460600 x1503967237773216/t0(0) o250->MGC10.2.4.203@tcp@10.2.4.203@tcp:26/25 lens 520/544 e 0 to 1 dl 1434295710 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [11897.338187] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      [11901.331794] LustreError: 11-0: lustre-OST0005-osc-ffff88003bc10000: operation obd_ping to node 10.2.4.204@tcp failed: rc = -107
      [11901.336716] Lustre: lustre-OST0005-osc-ffff88003bc10000: Connection to lustre-OST0005 (at 10.2.4.204@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [11901.342269] Lustre: Skipped 1 previous similar message
      [11901.345678] LustreError: 167-0: lustre-OST0005-osc-ffff88003bc10000: This client was evicted by lustre-OST0005; in progress operations using this service will fail.
      [11901.353266] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000403:0x4f5:0x0]/ may get corrupted (rc -108)
      [11901.358781] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000402:0x52f:0x0]/ may get corrupted (rc -108)
      [11901.364366] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000404:0x568:0x0]/ may get corrupted (rc -108)
      [11901.370257] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000404:0x595:0x0]//racer/5 may get corrupted (rc -108)
      [11901.376152] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000402:0x4b5:0x0]/ may get corrupted (rc -108)
      [11901.376321] LustreError: 29857:0:(ldlm_resource.c:778:ldlm_resource_complain()) lustre-OST0005-osc-ffff88003bc10000: namespace resource [0x6e:0x0:0x0].0 (ffff88005ada6d80) refcount nonzero (2) after lock cleanup; forcing cleanup.
      [11901.376323] LustreError: 29857:0:(ldlm_resource.c:1371:ldlm_resource_dump()) --- Resource: [0x6e:0x0:0x0].0 (ffff88005ada6d80) refcount = 3
      [11901.376325] LustreError: 29857:0:(ldlm_resource.c:1374:ldlm_resource_dump()) Granted locks (in reverse order):
      [11901.376328] LustreError: 29857:0:(ldlm_resource.c:1377:ldlm_resource_dump()) ### ### ns: lustre-OST0005-osc-ffff88003bc10000 lock: ffff880040392000/0xd582da8ac6cf044d lrc: 3/0,1 mode: PW/PW res: [0x6e:0x0:0x0].0 rrc: 3 type: EXT [4096->18446744073709551615] (req 4096->18446744073709551615) flags: 0x526400020000 nid: local remote: 0xcfcc00cc32bd357e expref: -99 pid: 14599 timeout: 0 lvb_type: 1
      [11901.376329] LustreError: 29857:0:(ldlm_resource.c:1392:ldlm_resource_dump()) Waiting locks:
      [11901.376332] LustreError: 29857:0:(ldlm_resource.c:1394:ldlm_resource_dump()) ### ### ns: lustre-OST0005-osc-ffff88003bc10000 lock: ffff88003f299a00/0xd582da8ac6cf04d9 lrc: 8/0,5 mode: --/PW res: [0x6e:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x106400020000 nid: local remote: 0xcfcc00cc32bd3626 expref: -99 pid: 13437 timeout: 0 lvb_type: 1
      [11901.376349] Lustre: lustre-OST0005-osc-ffff88003bc10000: Connection restored to lustre-OST0005 (at 10.2.4.204@tcp)
      [11901.425894] LustreError: 29860:0:(ldlm_resource.c:1371:ldlm_resource_dump()) --- Resource: [0x75:0x0:0x0].0 (ffff88004285b500) refcount = 2
      [11901.431499] LustreError: 29860:0:(ldlm_resource.c:1374:ldlm_resource_dump()) Granted locks (in reverse order):
      [11951.330163] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434295739/real 1434295742]  req@ffff880042462a00 x1503967237773848/t0(0) o38->lustre-MDT0000-mdc-ffff880055503800@10.2.4.203@tcp:12/10 lens 520/544 e 0 to 1 dl 1434295764 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [11951.339599] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
      [12000.512211] INFO: task dir_create.sh:5683 blocked for more than 120 seconds.
      [12000.515191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [12000.518122] dir_create.sh   D ffff88007fd13680     0  5683   5642 0x00000080
      [12000.521010]  ffff880052017c40 0000000000000082 ffff880052017fd8 0000000000013680
      [12000.523811]  ffff880052017fd8 0000000000013680 ffff880040eacfa0 ffff88004264ba90
      [12000.526622]  ffff88004264ba94 ffff880040eacfa0 00000000ffffffff ffff88004264ba98
      [12000.529425] Call Trace:
      [12000.531741]  [<ffffffff8160b3d9>] schedule_preempt_disabled+0x29/0x70
      [12000.534342]  [<ffffffff81609105>] __mutex_lock_slowpath+0xc5/0x1c0
      [12000.536900]  [<ffffffff8160856f>] mutex_lock+0x1f/0x2f
      [12000.539356]  [<ffffffff816021b3>] lookup_slow+0x33/0xa7
      [12000.541760]  [<ffffffff811d47ff>] link_path_walk+0x80f/0x8b0
      [12000.544169]  [<ffffffff8118071b>] ? do_wp_page+0x3bb/0x7d0
      [12000.546558]  [<ffffffff811d6ab3>] path_openat+0xa3/0x490
      [12000.548893]  [<ffffffff81082cd1>] ? __set_task_blocked+0x41/0xa0
      [12000.551250]  [<ffffffff811d829b>] do_filp_open+0x4b/0xb0
      [12000.553526]  [<ffffffff811e4d07>] ? __alloc_fd+0xa7/0x130
      [12000.555768]  [<ffffffff811c5f83>] do_sys_open+0xf3/0x1f0
      [12000.557964]  [<ffffffff811c609e>] SyS_open+0x1e/0x20
      [12000.560095]  [<ffffffff81614de9>] system_call_fastpath+0x16/0x1b
      [12000.562301] INFO: task dd:5871 blocked for more than 120 seconds.
      [12000.564450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [12000.566718] dd              D ffff88007fd13680     0  5871   5712 0x00000080
      [12000.568945]  ffff880079d83c60 0000000000000086 ffff880079d83fd8 0000000000013680
      [12000.571201]  ffff880079d83fd8 0000000000013680 ffff88005ae8a220 ffff8800790bf780
      [12000.573407]  ffff8800790bf784 ffff88005ae8a220 00000000ffffffff ffff8800790bf788
      [12000.575590] Call Trace:
      [12000.577310]  [<ffffffff8160b3d9>] schedule_preempt_disabled+0x29/0x70
      [12000.579309]  [<ffffffff81609105>] __mutex_lock_slowpath+0xc5/0x1c0
      [12000.581256]  [<ffffffffa09f4bd7>] ? mdc_pack_capa+0x87/0xf0 [mdc]
      [12000.583175]  [<ffffffff8160856f>] mutex_lock+0x1f/0x2f
      [12000.585017]  [<ffffffffa09e65d1>] mdc_close+0x1a1/0x990 [mdc]
      [12000.586885]  [<ffffffffa086c236>] lmv_close+0x276/0x580 [lmv]
      [12000.588712]  [<ffffffffa0aee2a8>] ll_close_inode_openhandle+0x258/0xe10 [lustre]
      [12000.590626]  [<ffffffffa0af06ad>] ll_md_real_close+0x11d/0x210 [lustre]
      [12000.592464]  [<ffffffffa0af0e6e>] ll_file_release+0x6ce/0xac0 [lustre]
      [12000.594292]  [<ffffffff811c8889>] __fput+0xe9/0x270
      [12000.595962]  [<ffffffff811c8b5e>] ____fput+0xe/0x10
      [12000.597588]  [<ffffffff81093c67>] task_work_run+0xa7/0xe0
      [12000.599216]  [<ffffffff81013b6c>] do_notify_resume+0x9c/0xb0
      [12000.600834]  [<ffffffff8161509d>] int_signal+0x12/0x17
      [12000.602383] INFO: task truncate:17086 blocked for more than 120 seconds.
      [12000.604023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [12000.605744] truncate        D ffff88007fd13680     0 17086   5742 0x00000080
      [12000.607427]  ffff880066b2fc00 0000000000000082 ffff880066b2ffd8 0000000000013680
      [12000.609130]  ffff880066b2ffd8 0000000000013680 ffff880042d316c0 ffff88002ae03110
      [12000.610789]  ffff88002ae03114 ffff880042d316c0 00000000ffffffff ffff88002ae03118
      [12000.612439] Call Trace:
      [12000.613631]  [<ffffffff8160b3d9>] schedule_preempt_disabled+0x29/0x70
      [12000.615120]  [<ffffffff81609105>] __mutex_lock_slowpath+0xc5/0x1c0
      [12000.616581]  [<ffffffff8160856f>] mutex_lock+0x1f/0x2f
      [12000.617961]  [<ffffffffa0b43117>] vvp_io_setattr_start+0x47/0x220 [lustre]
      [12000.619508]  [<ffffffffa06e8015>] cl_io_start+0x65/0x130 [obdclass]
      [12000.620986]  [<ffffffffa06eb4f5>] cl_io_loop+0xa5/0x190 [obdclass]
      [12000.622444]  [<ffffffffa0b3d940>] cl_setattr_ost+0x240/0x3c0 [lustre]
      [12000.623904]  [<ffffffffa0b11038>] ll_setattr_raw+0xa58/0xae0 [lustre]
      [12000.625387]  [<ffffffffa0b11123>] ll_setattr+0x63/0xc0 [lustre]
      [12000.626799]  [<ffffffff811e39b9>] notify_change+0x279/0x3d0
      [12000.628191]  [<ffffffff811c4d93>] do_truncate+0x73/0xc0
      [12000.629556]  [<ffffffff811c4f41>] vfs_truncate+0x161/0x180
      [12000.630935]  [<ffffffff811c4fec>] do_sys_truncate+0x8c/0xb0
      [12000.632327]  [<ffffffff81013b6c>] ? do_notify_resume+0x9c/0xb0
      [12000.633742]  [<ffffffff811c518e>] SyS_truncate+0xe/0x10
      [12000.635117]  [<ffffffff81614de9>] system_call_fastpath+0x16/0x1b
      [12000.636546] INFO: task dd:17689 blocked for more than 120 seconds.
      [12000.637989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: