[LU-2354] replay-single test_89 failed to remount the OST Created: 19/Jan/12 Updated: 09/Jan/20 Resolved: 09/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Li Wei (Inactive) | Assignee: | Li Wei (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 2880 | ||||
| Description |
|
https://maloo.whamcloud.com/test_sets/5c8be67a-4223-11e1-9650-5254004bbbd3 == replay-single test 89: no disk space leak on late ost connection ================================== 06:57:48 (1326898668) Waiting for orphan cleanup... 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.432742 seconds, 24.2 MB/s Stopping /mnt/ost1 (opts:) Failing mds1 on node client-28vm7 Stopping /mnt/mds1 (opts:) affected facets: mds1 Failover mds1 to client-28vm7 06:58:10 (1326898690) waiting for client-28vm7 network 900 secs ... 06:58:10 (1326898690) network interface is UP Starting mds1: -o user_xattr,acl /dev/lvm-MDS/P0 /mnt/mds1 client-28vm7: debug=0x33f0404 client-28vm7: subsystem_debug=0xffb7e3ff client-28vm7: debug_mb=32 Started lustre-MDT0000 Starting ost1: /dev/lvm-OSS/P0 /mnt/ost1 client-28vm8: mount.lustre: mount /dev/mapper/lvm--OSS-P0 at /mnt/ost1 failed: Transport endpoint is not connected mount -t lustre /dev/lvm-OSS/P0 /mnt/ost1 Start of /dev/lvm-OSS/P0 on ost1 failed 107 Starting client: client-28vm5.lab.whamcloud.com: -o user_xattr,acl,flock client-28vm7@tcp:/lustre /mnt/lustre debug=0x33f0404 subsystem_debug=0xffb7e3ff debug_mb=32 |
| Comments |
| Comment by Li Wei (Inactive) [ 01/Feb/12 ] |
|
From the OSS debug log: 00000020:01000004:0.0:1326898692.027873:0:29847:0:(obd_mount.c:374:lustre_start_mgc()) MGC10.10.4.170@tcp: Set MGC reconnect 1 10000000:01000000:0.0:1326898692.027875:0:29847:0:(mgc_request.c:873:mgc_set_info_async()) InitRecov MGC10.10.4.170@tcp 1/d0:i0:r0:or0:FULL An existing MGC was found during the OST mount process. Its status was FULL. 00000020:01000004:0.0:1326898692.027877:0:29847:0:(obd_mount.c:837:server_start_targets()) starting target lustre-OST0000 00000020:01000004:0.0:1326898692.027881:0:29847:0:(obd_mount.c:814:server_register_target()) Registration , fs=lustre, 10.10.4.171@tcp, index=0000, flags=0x2 10000000:01000000:0.0:1326898692.027883:0:29847:0:(mgc_request.c:887:mgc_set_info_async()) register_target 0x2 10000000:01000000:0.0:1326898692.027900:0:29847:0:(mgc_request.c:838:mgc_target_register()) register 00000100:00100000:0.0:1326898692.027907:0:29847:0:(client.c:1332:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc mount.lustre:f41320d4-71d5-d208-e36e-4a91bdddc25a:29847:1391341666777465:10.10.4.170@tcp:253 The OST was trying to register with the MGS. 00000100:02020000:0.0:1326898692.029258:0:29847:0:(client.c:1062:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.10.4.170@tcp. The mgs_target_reg operation failed with -107 00000100:00080000:0.0:1326898692.029261:0:29847:0:(recover.c:214:ptlrpc_request_handle_notconn()) import MGC10.10.4.170@tcp of MGS@MGC10.10.4.170@tcp_0 abruptly disconnected: reconnecting 00000100:02020000:0.0:1326898692.029264:0:29847:0:(import.c:177:ptlrpc_set_import_discon()) 166-1: MGC10.10.4.170@tcp: Connection to service MGS via nid 10.10.4.170@tcp was lost; in progress operations using this service will fail. 00000100:00080000:0.0:1326898692.029266:0:29847:0:(import.c:180:ptlrpc_set_import_discon()) ffff88006b5ff800 MGS: changing import state from FULL to DISCONN 10000000:01000000:0.0:1326898692.029268:0:29847:0:(mgc_request.c:952:mgc_import_event()) import event 0x808001 00000100:00080000:0.0:1326898692.029269:0:29847:0:(recover.c:223:ptlrpc_request_handle_notconn()) import MGS@MGC10.10.4.170@tcp_0 for MGC10.10.4.170@tcp not replayable, auto-deactivating 00000100:00080000:0.0:1326898692.029270:0:29847:0:(import.c:207:ptlrpc_deactivate_and_unlock_import()) setting import MGS INVALID [...] 00000100:00100000:0.0:1326898692.029286:0:29847:0:(client.c:2591:ptlrpc_abort_inflight()) @@@ inflight req@ffff88003cdacc00 x1391341666777465/t0(0) o-1->MGS@MGC10.10.4.170@tcp_0:26/25 lens 4736/192 e 0 to 0 dl 1326898709 ref 2 fl Rpc:R/ffffffff/ffffffff rc 0/-1 Because the MDT/MGS had restarted, the OST's registration request was replied with ENOTCONN. The MGC became DISCONN, was marked INVALID, and was deactivated. The registration RPC was failed as well, without being re-sent. 00000100:00100000:0.0:1326898692.029336:0:29847:0:(client.c:1666:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc mount.lustre:f41320d4-71d5-d208-e36e-4a91bdddc25a:29847:1391341666777465:10.10.4.170@tcp:253 00000020:02020000:0.0:1326898692.029341:0:29847:0:(obd_mount.c:846:server_start_targets()) 0-0: lustre-OST0000: Required registration failed: -107 00000020:00020000:0.0:1326898692.029863:0:29847:0:(obd_mount.c:1171:lustre_server_mount()) Unable to start targets: -107 The registration failure was passed up the stack to the OST mount process. |
| Comment by Andreas Dilger [ 09/Jan/20 ] |
|
Close old ticket. |