[LU-7273] sanity-hsm test_107 failed with 'Copytool failed to stop in 20s ...' Created: 08/Oct/15 Updated: 28/Jan/16 Resolved: 28/Jan/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
autotest |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity-hsm test 107 failed with error 'Copytool failed to stop in 20s ...' Logs are at https://testing.hpdd.intel.com/test_sets/070dca4c-6d12-11e5-ab7f-5254006e85c2 There are no errors in the logs. The last things seen in the test_log are: Copytool still running on shadow-22vm6 CMD: shadow-22vm6 pgrep -x lhsmtool_posix shadow-22vm6: 24751 Copytool still running on shadow-22vm6 CMD: shadow-22vm6 pgrep -x lhsmtool_posix shadow-22vm6: 24751 sanity-hsm test_107: @@@@@@ FAIL: Copytool failed to stop in 20s ... Same error has been hit in with this test in the past. One example is https://testing.hpdd.intel.com/test_sets/3bd00e02-6a47-11e5-9a21-5254006e85c2 |
| Comments |
| Comment by Gerrit Updater [ 09/Oct/15 ] |
|
Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/16782 |
| Comment by Bruno Faccini (Inactive) [ 09/Oct/15 ] |
|
The Copytool log shows that both archive actions in tes_107 have successfully completed. The Lustre debug log taken on Agent node seems to indicate that the main copytool thread was still running at the end of the 20s grace period allowed in copytool_cleanup(). More strange is that there are 2 log lines (ll_file_release()) for the single close() of mnt_fd in llapi_hsm_copytool_unregister() : 00000001:02000400:0.0:1444222093.159584:0:25285:0:(debug.c:335:libcfs_debug_mark_buffer()) DEBUG MARKER: pkill -INT -x lhsmtool_posix 00000080:00200000:1.0:1444222093.348446:0:24751:0:(dir.c:1099:ll_dir_ioctl()) VFS Op:inode=[0x200000007:0x1:0x0](ffff88007c2c37c0), cmd=0x401866d5 00000100:00100000:1.0:1444222093.348503:0:24751:0:(client.c:1627:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493076:10.1.5.16@tcp:60 00000100:00100000:1.0:1444222093.348551:0:24751:0:(client.c:2308:ptlrpc_set_wait()) set ffff88003733d2c0 going to sleep for 6 seconds 00010000:00080000:1.0:1444222093.349748:0:24751:0:(ldlm_request.c:1307:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 3378600000, Limit: 0) req@ffff880053624080 x1514365981493076/t0(0) o60->lustre-MDT0000-mdc-ffff88007ad1d000@10.1.5.16@tcp:12/10 lens 440/192 e 0 to 0 dl 1444222100 ref 2 fl Rpc:R/0/0 rc 0/0 00000100:00100000:1.0:1444222093.349766:0:24751:0:(client.c:2022:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493076:10.1.5.16@tcp:60 00000100:00100000:1.0:1444222093.349815:0:24751:0:(client.c:1627:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493080:10.1.5.12@tcp:60 00000100:00100000:1.0:1444222093.349833:0:24751:0:(client.c:2308:ptlrpc_set_wait()) set ffff88003733d2c0 going to sleep for 6 seconds 00010000:00080000:0.0:1444222093.351595:0:24751:0:(ldlm_request.c:1307:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 0, Limit: 31283) req@ffff880053624080 x1514365981493080/t0(0) o60->lustre-MDT0001-mdc-ffff88007ad1d000@10.1.5.12@tcp:12/10 lens 440/192 e 0 to 0 dl 1444222100 ref 2 fl Rpc:R/0/0 rc 0/0 00000100:00100000:0.0:1444222093.351611:0:24751:0:(client.c:2022:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493080:10.1.5.12@tcp:60 00000100:00100000:0.0:1444222093.351627:0:24751:0:(client.c:1627:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493084:10.1.5.12@tcp:60 00000100:00100000:0.0:1444222093.351641:0:24751:0:(client.c:2308:ptlrpc_set_wait()) set ffff88003733d2c0 going to sleep for 6 seconds 00010000:00080000:0.0:1444222093.352420:0:24751:0:(ldlm_request.c:1307:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 0, Limit: 31283) req@ffff880053624080 x1514365981493084/t0(0) o60->lustre-MDT0002-mdc-ffff88007ad1d000@10.1.5.12@tcp:12/10 lens 440/192 e 0 to 0 dl 1444222100 ref 2 fl Rpc:R/0/0 rc 0/0 00000100:00100000:0.0:1444222093.352425:0:24751:0:(client.c:2022:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493084:10.1.5.12@tcp:60 00000100:00100000:0.0:1444222093.352437:0:24751:0:(client.c:1627:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493088:10.1.5.12@tcp:60 00000100:00100000:0.0:1444222093.352463:0:24751:0:(client.c:2308:ptlrpc_set_wait()) set ffff88003733d2c0 going to sleep for 6 seconds 00010000:00080000:0.0:1444222093.353445:0:24751:0:(ldlm_request.c:1307:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 0, Limit: 31283) req@ffff880053624080 x1514365981493088/t0(0) o60->lustre-MDT0003-mdc-ffff88007ad1d000@10.1.5.12@tcp:12/10 lens 440/192 e 0 to 0 dl 1444222100 ref 2 fl Rpc:R/0/0 rc 0/0 00000100:00100000:0.0:1444222093.353451:0:24751:0:(client.c:2022:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493088:10.1.5.12@tcp:60 00000080:00200000:0.0:1444222093.353472:0:24751:0:(file.c:332:ll_file_release()) VFS Op:inode=[0x200000002:0x2:0x0](ffff88007c2c25c0) 00000002:00080000:0.0:1444222093.353504:0:24751:0:(mdc_request.c:776:mdc_close()) @@@ matched open req@ffff880055a93980 x1514365981492320/t4294975516(4294975516) o101->lustre-MDT0000-mdc-ffff88007ad1d000@10.1.5.16@tcp:12/10 lens 872/544 e 0 to 0 dl 1444222096 ref 1 fl Complete:RP/4/0 rc 0/301 00000100:00100000:0.0:1444222093.353514:0:24751:0:(client.c:1627:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493092:10.1.5.16@tcp:35 00000100:00100000:0.0:1444222093.353543:0:24751:0:(client.c:2308:ptlrpc_set_wait()) set ffff88003733d2c0 going to sleep for 6 seconds 00010000:00080000:0.0:1444222093.354529:0:24751:0:(ldlm_request.c:1307:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 3378600000, Limit: 0) req@ffff880053624080 x1514365981493092/t0(0) o35->lustre-MDT0000-mdc-ffff88007ad1d000@10.1.5.16@tcp:23/10 lens 512/424 e 0 to 0 dl 1444222100 ref 2 fl Rpc:R/0/0 rc 0/0 00000100:00100000:0.0:1444222093.354536:0:24751:0:(client.c:2022:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc lhsmtool_posix:706467d4-9a29-db95-1bb0-fb11e4273b13:24751:1514365981493092:10.1.5.16@tcp:35 00000002:00100000:0.0:1444222093.354547:0:24751:0:(mdc_request.c:693:mdc_free_open()) @@@ free open request req@ffff880055a93980 x1514365981492320/t4294975516(4294975516) o101->lustre-MDT0000-mdc-ffff88007ad1d000@10.1.5.16@tcp:12/10 lens 872/544 e 0 to 0 dl 1444222096 ref 1 fl Complete:R/4/0 rc 0/301 00000080:00200000:0.0:1444222093.354560:0:24751:0:(file.c:332:ll_file_release()) VFS Op:inode=[0x200000007:0x1:0x0](ffff88007c2c37c0) ..................... 00000001:02000400:0.0:1444222114.603558:0:25835:0:(debug.c:335:libcfs_debug_mark_buffer()) DEBUG MARKER: pgrep -x lhsmtool_posix 00000001:02000400:0.0:1444222115.032717:0:25890:0:(debug.c:335:libcfs_debug_mark_buffer()) DEBUG MARKER: /usr/sbin/lctl mark sanity-hsm test_107: @@@@@@ FAIL: Copytool failed to stop in 20s ... 00000080:00200000:0.0:1444222115.140512:0:24751:0:(file.c:332:ll_file_release()) VFS Op:inode=[0x200000007:0x1:0x0](ffff88007c2c37c0) 00000001:02000400:0.0:1444222115.184598:0:25918:0:(debug.c:335:libcfs_debug_mark_buffer()) DEBUG MARKER: sanity-hsm test_107: @@@@@@ FAIL: Copytool failed to stop in 20s ... with no other activity traced in-between. I have just pushed patch #16782 to allow full threads stacks dump on Agent nodes when CT fails to stop. |
| Comment by Gerrit Updater [ 28/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16782/ |
| Comment by Joseph Gmitter (Inactive) [ 28/Jan/16 ] |
|
Landed for 2.8 |