Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for John Hammond <jhammond@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/8ee96d5e-b56c-11e8-a7de-52540065bddc
test_156 failed with the following error:
Timeout occurred after 166 mins, last suite running was sanity, restarting cluster to continue tests
dd (pid 6412) is looping in ptlrpc_check_set():
[10125.205230] dd S ffff968d3a832f70 0 6412 31766 0x00000080 [10125.206015] Call Trace: [10125.206273] [<ffffffffac914029>] schedule+0x29/0x70 [10125.206797] [<ffffffffac9118d4>] schedule_timeout+0x174/0x2c0 [10125.207550] [<ffffffffac2a3750>] ? internal_add_timer+0x70/0x70 [10125.208213] [<ffffffffc0af5650>] ? ptlrpc_init_rq_pool+0x110/0x110 [ptlrpc] [10125.208966] [<ffffffffc0aff3b0>] ptlrpc_set_wait+0x480/0x790 [ptlrpc] [10125.209703] [<ffffffffac2cf670>] ? wake_up_state+0x20/0x20 [10125.210355] [<ffffffffc0aff73d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc] [10125.211068] [<ffffffffc0ae46a2>] ldlm_cli_enqueue+0x3d2/0x920 [ptlrpc] [10125.211774] [<ffffffffc0adf7c0>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc] [10125.212679] [<ffffffffc0c7e710>] ? osc_lock_lockless_cancel+0xe0/0xe0 [osc] [10125.213491] [<ffffffffc0c7dad0>] ? osc_lock_upcall+0x580/0x580 [osc] [10125.214167] [<ffffffffc0c74965>] osc_enqueue_base+0x2b5/0x6a0 [osc] [10125.214844] [<ffffffffc0c7d550>] ? osc_lock_lvb_update+0x330/0x330 [osc] [10125.215616] [<ffffffffc0c7f17b>] osc_lock_enqueue+0x38b/0x840 [osc] [10125.216297] [<ffffffffc0c7d550>] ? osc_lock_lvb_update+0x330/0x330 [osc] [10125.217058] [<ffffffffc0935de5>] cl_lock_enqueue+0x65/0x120 [obdclass] [10125.217778] [<ffffffffc0cd8285>] lov_lock_enqueue+0x95/0x150 [lov] [10125.218497] [<ffffffffc0935de5>] cl_lock_enqueue+0x65/0x120 [obdclass] [10125.219225] [<ffffffffc0936377>] cl_lock_request+0x67/0x1f0 [obdclass] [10125.219931] [<ffffffffc093a26b>] cl_io_lock+0x2bb/0x3d0 [obdclass] [10125.220688] [<ffffffffc093a5fb>] cl_io_loop+0x11b/0xc70 [obdclass] [10125.221444] [<ffffffffc0d30e02>] ll_file_io_generic+0x4e2/0xd10 [lustre] [10125.222178] [<ffffffffc0d31b82>] ll_file_aio_write+0x372/0x540 [lustre] [10125.222880] [<ffffffffc0d31df4>] ll_file_write+0xa4/0x170 [lustre] [10125.223591] [<ffffffffac41b490>] vfs_write+0xc0/0x1f0 [10125.224151] [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146 [10125.224845] [<ffffffffac41c2bf>] SyS_write+0x7f/0xf0 [10125.225434] [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146 [10125.226136] [<ffffffffac920795>] system_call_fastpath+0x1c/0x21 [10125.226762] [<ffffffffac9206e1>] ? system_call_after_swapgs+0xae/0x146
client logs show dd does 2000+ 1-second loops in ptlrpc_set_wait():
00010000:00010000:1.0:1536619472.378230:0:6412:0:(ldlm_request.c:942:ldlm_cli_enqueue()) ### client-side enqueue START, flags 0x40000000 ns: lustre-OST0002-osc-ffff968d3a345000 lock: ffff968d26d2c480/0x17ff4219e8fed853 lrc: 3/0,1 mode: --/PW res: [0x2ff0:0x0:0x0].0x0 rrc: 2 type: EXT [0->4095] (req 0->4095) flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 6412 timeout: 0 lvb_type: 1 00010000:00010000:1.0:1536619472.378239:0:6412:0:(ldlm_request.c:1014:ldlm_cli_enqueue()) ### sending request ns: lustre-OST0002-osc-ffff968d3a345000 lock: ffff968d26d2c480/0x17ff4219e8fed853 lrc: 3/0,1 mode: --/PW res: [0x2ff0:0x0:0x0].0x0 rrc: 2 type: EXT [0->4095] (req 0->4095) flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 6412 timeout: 0 lvb_type: 1 00000100:00080000:1.0:1536619472.378247:0:6412:0:(client.c:1569:ptlrpc_send_new_req()) @@@ req waiting for recovery: (FULL != CONNECTING) req@ffff968d218dcc00 x1611261872492656/t0(0) o101->lustre-OST0002-osc-ffff968d3a345000@10.9.5.125@tcp:28/4 lens 328/400 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 00000100:00100000:1.0:1536619472.378251:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619473.377745:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00010000:00010000:1.0:1536619473.852020:0:18075:0:(ldlm_lockd.c:1669:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 2/0,0 mode: PR/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x420000010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1 00010000:00010000:1.0:1536619473.852031:0:18075:0:(ldlm_lockd.c:1700:ldlm_handle_bl_callback()) Lock ffff968cf7975440 already unused, calling callback (ffffffffc0c7e710) 00010000:00010000:1.0:1536619473.852034:0:18075:0:(ldlm_request.c:1251:ldlm_cli_cancel_local()) ### client-side cancel ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 3/0,0 mode: PR/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x428400010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1 00010000:00010000:1.0:1536619473.852117:0:18075:0:(ldlm_request.c:1310:ldlm_cancel_pack()) ### packing ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 2/0,0 mode: --/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x4c29400010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1 00010000:00010000:1.0:1536619473.852129:0:18075:0:(ldlm_request.c:1314:ldlm_cancel_pack()) 1 locks packed 00010000:00010000:1.0:1536619473.852141:0:18075:0:(ldlm_lockd.c:1709:ldlm_handle_bl_callback()) ### client blocking callback handler END ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 1/0,0 mode: --/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x4c29400010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1 00010000:00010000:1.0:1536619473.852147:0:18075:0:(ldlm_lock.c:197:ldlm_lock_put()) ### final lock_put on destroyed lock, freeing it. ns: lustre-OST0001-osc-ffff968d3a345000 lock: ffff968cf7975440/0x17ff4219e8fed845 lrc: 0/0,0 mode: --/PR res: [0x2fd0:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x4c29400010000 nid: local remote: 0x61c83a90b60e6c16 expref: -99 pid: 6411 timeout: 0 lvb_type: 1 00000100:00100000:1.0:1536619473.852166:0:1881:0:(client.c:1625:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_01_01:ac35fb57-abc0-fa1f-db48-e4504a124ce3:1881:1611261872492672:10.9.5.125@tcp:103 00000100:00100000:1.0:1536619473.852624:0:1881:0:(client.c:2053:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_01_01:ac35fb57-abc0-fa1f-db48-e4504a124ce3:1881:1611261872492672:10.9.5.125@tcp:103 00000100:00100000:1.0:1536619474.377727:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619475.377720:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619476.377721:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619477.377737:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619478.377757:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619479.377746:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619480.377722:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619481.377735:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619482.377718:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619483.377739:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619484.377729:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619485.377712:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536619486.377724:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds ... 00000100:00100000:1.0:1536621583.377721:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536621584.377725:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds 00000100:00100000:1.0:1536621585.377723:0:6412:0:(client.c:2347:ptlrpc_set_wait()) set ffff968d23136180 going to sleep for 0 seconds
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_156 - Timeout occurred after 166 mins, last suite running was sanity, restarting cluster to continue tests
Attachments
Issue Links
- is related to
-
LU-11128 replay-single test timeout
- Resolved