Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for sarah_lw <wei3.liu@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d259c19c-135d-11e5-b4b0-5254006e85c2.
The sub-test test_1 failed with the following error:
test failed to respond and timed out
client dmesg
[11746.904688] Lustre: DEBUG MARKER: == racer test 1: racer on clients: onyx-42vm5,onyx-42vm6.onyx.hpdd.intel.com DURATION=900 ============ 08:25:59 (1434295559) [11747.095999] Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer [11747.096301] Lustre: DEBUG MARKER: DURATION=900 MDSCOUNT=1 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer [11749.430943] 19[7765]: segfault at 8 ip 00007f62842db6d8 sp 00007fff0b321d30 error 4 in ld-2.17.so[7f62842d0000+21000] [11750.135045] LustreError: 7765:0:(file.c:195:ll_close_inode_openhandle()) lustre-clilmv-ffff88003bc10000: inode [0x200000401:0x7:0x0] mdc close failed: rc = -13 [11751.350622] LustreError: 13255:0:(file.c:195:ll_close_inode_openhandle()) lustre-clilmv-ffff88003bc10000: inode [0x200000401:0x7f:0x0] mdc close failed: rc = -13 [11754.718383] LustreError: 20954:0:(file.c:195:ll_close_inode_openhandle()) lustre-clilmv-ffff880055503800: inode [0x200000402:0xd9:0x0] mdc close failed: rc = -13 [11805.326196] Lustre: 14599:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434295611/real 1434295611] req@ffff880041820300 x1503967237771580/t0(0) o36->lustre-MDT0000-mdc-ffff88003bc10000@10.2.4.203@tcp:12/10 lens 496/440 e 0 to 1 dl 1434295618 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 [11805.326304] Lustre: lustre-MDT0000-mdc-ffff88003bc10000: Connection to lustre-MDT0000 (at 10.2.4.203@tcp) was lost; in progress operations using this service will wait for recovery to complete [11805.337639] Lustre: 14599:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [11806.326218] LustreError: 166-1: MGC10.2.4.203@tcp: Connection to MGS (at 10.2.4.203@tcp) was lost; in progress operations using this service will fail [11837.330050] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1434295639/real 0] req@ffff880042460300 x1503967237772588/t0(0) o250->MGC10.2.4.203@tcp@10.2.4.203@tcp:26/25 lens 520/544 e 0 to 1 dl 1434295650 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [11837.337236] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [11897.330310] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434295694/real 1434295705] req@ffff880042460600 x1503967237773216/t0(0) o250->MGC10.2.4.203@tcp@10.2.4.203@tcp:26/25 lens 520/544 e 0 to 1 dl 1434295710 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [11897.338187] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [11901.331794] LustreError: 11-0: lustre-OST0005-osc-ffff88003bc10000: operation obd_ping to node 10.2.4.204@tcp failed: rc = -107 [11901.336716] Lustre: lustre-OST0005-osc-ffff88003bc10000: Connection to lustre-OST0005 (at 10.2.4.204@tcp) was lost; in progress operations using this service will wait for recovery to complete [11901.342269] Lustre: Skipped 1 previous similar message [11901.345678] LustreError: 167-0: lustre-OST0005-osc-ffff88003bc10000: This client was evicted by lustre-OST0005; in progress operations using this service will fail. [11901.353266] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000403:0x4f5:0x0]/ may get corrupted (rc -108) [11901.358781] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000402:0x52f:0x0]/ may get corrupted (rc -108) [11901.364366] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000404:0x568:0x0]/ may get corrupted (rc -108) [11901.370257] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000404:0x595:0x0]//racer/5 may get corrupted (rc -108) [11901.376152] Lustre: 9197:0:(llite_lib.c:2640:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.2.4.203@tcp:/lustre/fid: [0x200000402:0x4b5:0x0]/ may get corrupted (rc -108) [11901.376321] LustreError: 29857:0:(ldlm_resource.c:778:ldlm_resource_complain()) lustre-OST0005-osc-ffff88003bc10000: namespace resource [0x6e:0x0:0x0].0 (ffff88005ada6d80) refcount nonzero (2) after lock cleanup; forcing cleanup. [11901.376323] LustreError: 29857:0:(ldlm_resource.c:1371:ldlm_resource_dump()) --- Resource: [0x6e:0x0:0x0].0 (ffff88005ada6d80) refcount = 3 [11901.376325] LustreError: 29857:0:(ldlm_resource.c:1374:ldlm_resource_dump()) Granted locks (in reverse order): [11901.376328] LustreError: 29857:0:(ldlm_resource.c:1377:ldlm_resource_dump()) ### ### ns: lustre-OST0005-osc-ffff88003bc10000 lock: ffff880040392000/0xd582da8ac6cf044d lrc: 3/0,1 mode: PW/PW res: [0x6e:0x0:0x0].0 rrc: 3 type: EXT [4096->18446744073709551615] (req 4096->18446744073709551615) flags: 0x526400020000 nid: local remote: 0xcfcc00cc32bd357e expref: -99 pid: 14599 timeout: 0 lvb_type: 1 [11901.376329] LustreError: 29857:0:(ldlm_resource.c:1392:ldlm_resource_dump()) Waiting locks: [11901.376332] LustreError: 29857:0:(ldlm_resource.c:1394:ldlm_resource_dump()) ### ### ns: lustre-OST0005-osc-ffff88003bc10000 lock: ffff88003f299a00/0xd582da8ac6cf04d9 lrc: 8/0,5 mode: --/PW res: [0x6e:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x106400020000 nid: local remote: 0xcfcc00cc32bd3626 expref: -99 pid: 13437 timeout: 0 lvb_type: 1 [11901.376349] Lustre: lustre-OST0005-osc-ffff88003bc10000: Connection restored to lustre-OST0005 (at 10.2.4.204@tcp) [11901.425894] LustreError: 29860:0:(ldlm_resource.c:1371:ldlm_resource_dump()) --- Resource: [0x75:0x0:0x0].0 (ffff88004285b500) refcount = 2 [11901.431499] LustreError: 29860:0:(ldlm_resource.c:1374:ldlm_resource_dump()) Granted locks (in reverse order): [11951.330163] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1434295739/real 1434295742] req@ffff880042462a00 x1503967237773848/t0(0) o38->lustre-MDT0000-mdc-ffff880055503800@10.2.4.203@tcp:12/10 lens 520/544 e 0 to 1 dl 1434295764 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [11951.339599] Lustre: 9196:0:(client.c:2003:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [12000.512211] INFO: task dir_create.sh:5683 blocked for more than 120 seconds. [12000.515191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12000.518122] dir_create.sh D ffff88007fd13680 0 5683 5642 0x00000080 [12000.521010] ffff880052017c40 0000000000000082 ffff880052017fd8 0000000000013680 [12000.523811] ffff880052017fd8 0000000000013680 ffff880040eacfa0 ffff88004264ba90 [12000.526622] ffff88004264ba94 ffff880040eacfa0 00000000ffffffff ffff88004264ba98 [12000.529425] Call Trace: [12000.531741] [<ffffffff8160b3d9>] schedule_preempt_disabled+0x29/0x70 [12000.534342] [<ffffffff81609105>] __mutex_lock_slowpath+0xc5/0x1c0 [12000.536900] [<ffffffff8160856f>] mutex_lock+0x1f/0x2f [12000.539356] [<ffffffff816021b3>] lookup_slow+0x33/0xa7 [12000.541760] [<ffffffff811d47ff>] link_path_walk+0x80f/0x8b0 [12000.544169] [<ffffffff8118071b>] ? do_wp_page+0x3bb/0x7d0 [12000.546558] [<ffffffff811d6ab3>] path_openat+0xa3/0x490 [12000.548893] [<ffffffff81082cd1>] ? __set_task_blocked+0x41/0xa0 [12000.551250] [<ffffffff811d829b>] do_filp_open+0x4b/0xb0 [12000.553526] [<ffffffff811e4d07>] ? __alloc_fd+0xa7/0x130 [12000.555768] [<ffffffff811c5f83>] do_sys_open+0xf3/0x1f0 [12000.557964] [<ffffffff811c609e>] SyS_open+0x1e/0x20 [12000.560095] [<ffffffff81614de9>] system_call_fastpath+0x16/0x1b [12000.562301] INFO: task dd:5871 blocked for more than 120 seconds. [12000.564450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12000.566718] dd D ffff88007fd13680 0 5871 5712 0x00000080 [12000.568945] ffff880079d83c60 0000000000000086 ffff880079d83fd8 0000000000013680 [12000.571201] ffff880079d83fd8 0000000000013680 ffff88005ae8a220 ffff8800790bf780 [12000.573407] ffff8800790bf784 ffff88005ae8a220 00000000ffffffff ffff8800790bf788 [12000.575590] Call Trace: [12000.577310] [<ffffffff8160b3d9>] schedule_preempt_disabled+0x29/0x70 [12000.579309] [<ffffffff81609105>] __mutex_lock_slowpath+0xc5/0x1c0 [12000.581256] [<ffffffffa09f4bd7>] ? mdc_pack_capa+0x87/0xf0 [mdc] [12000.583175] [<ffffffff8160856f>] mutex_lock+0x1f/0x2f [12000.585017] [<ffffffffa09e65d1>] mdc_close+0x1a1/0x990 [mdc] [12000.586885] [<ffffffffa086c236>] lmv_close+0x276/0x580 [lmv] [12000.588712] [<ffffffffa0aee2a8>] ll_close_inode_openhandle+0x258/0xe10 [lustre] [12000.590626] [<ffffffffa0af06ad>] ll_md_real_close+0x11d/0x210 [lustre] [12000.592464] [<ffffffffa0af0e6e>] ll_file_release+0x6ce/0xac0 [lustre] [12000.594292] [<ffffffff811c8889>] __fput+0xe9/0x270 [12000.595962] [<ffffffff811c8b5e>] ____fput+0xe/0x10 [12000.597588] [<ffffffff81093c67>] task_work_run+0xa7/0xe0 [12000.599216] [<ffffffff81013b6c>] do_notify_resume+0x9c/0xb0 [12000.600834] [<ffffffff8161509d>] int_signal+0x12/0x17 [12000.602383] INFO: task truncate:17086 blocked for more than 120 seconds. [12000.604023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12000.605744] truncate D ffff88007fd13680 0 17086 5742 0x00000080 [12000.607427] ffff880066b2fc00 0000000000000082 ffff880066b2ffd8 0000000000013680 [12000.609130] ffff880066b2ffd8 0000000000013680 ffff880042d316c0 ffff88002ae03110 [12000.610789] ffff88002ae03114 ffff880042d316c0 00000000ffffffff ffff88002ae03118 [12000.612439] Call Trace: [12000.613631] [<ffffffff8160b3d9>] schedule_preempt_disabled+0x29/0x70 [12000.615120] [<ffffffff81609105>] __mutex_lock_slowpath+0xc5/0x1c0 [12000.616581] [<ffffffff8160856f>] mutex_lock+0x1f/0x2f [12000.617961] [<ffffffffa0b43117>] vvp_io_setattr_start+0x47/0x220 [lustre] [12000.619508] [<ffffffffa06e8015>] cl_io_start+0x65/0x130 [obdclass] [12000.620986] [<ffffffffa06eb4f5>] cl_io_loop+0xa5/0x190 [obdclass] [12000.622444] [<ffffffffa0b3d940>] cl_setattr_ost+0x240/0x3c0 [lustre] [12000.623904] [<ffffffffa0b11038>] ll_setattr_raw+0xa58/0xae0 [lustre] [12000.625387] [<ffffffffa0b11123>] ll_setattr+0x63/0xc0 [lustre] [12000.626799] [<ffffffff811e39b9>] notify_change+0x279/0x3d0 [12000.628191] [<ffffffff811c4d93>] do_truncate+0x73/0xc0 [12000.629556] [<ffffffff811c4f41>] vfs_truncate+0x161/0x180 [12000.630935] [<ffffffff811c4fec>] do_sys_truncate+0x8c/0xb0 [12000.632327] [<ffffffff81013b6c>] ? do_notify_resume+0x9c/0xb0 [12000.633742] [<ffffffff811c518e>] SyS_truncate+0xe/0x10 [12000.635117] [<ffffffff81614de9>] system_call_fastpath+0x16/0x1b [12000.636546] INFO: task dd:17689 blocked for more than 120 seconds. [12000.637989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.