Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
When an OST is deactivated some MDT to OST operation will return -ENOTCONN. In some cases this will result in -ENOTCONN being returned to the client as the Lustre message status which the client ptlrpc code will interpret as indicating that the client has been "abruptly disconnected" from the MDT. Then the client will reconnect and resend the operation causing a loop.
o:~# bash $LUSTRE/tests/llmount.sh ... o:~# lctl set_param osp.lustre-OST0000-osc-MDT0000.active=0 osp.lustre-OST0000-osc-MDT0000.active=0 o:~# touch /mnt/lustre/f0 o:~# chown sanity: /mnt/lustre/f0 o:~# lctl set_param debug=+trace debug=+trace o:~# lctl set_param debug_mb=64 debug_mb=64 o:~# lctl clear o:~# sudo -u sanity chgrp gsanity0 /mnt/lustre/f0 & [1] 12744 o:~# sleep 4 o:~# lctl set_param osp.lustre-OST0000-osc-MDT0000.active=1 osp.lustre-OST0000-osc-MDT0000.active=1 o:~# wait [1]+ Done sudo -u sanity chgrp gsanity0 /mnt/lustre/f0 o:~# lctl dk > /tmp/1.dk
With lots of lines elided:
00000100:00100000:0.0:1534175675.393208:0:12746:0:(client.c:1625:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc chgrp:7dc6380a-74bd-0c12-bf7e-6c11809b0af5:12746:1608699144600192:0@lo:36 00000100:00100000:0.0:1534175675.393458:0:11330:0:(service.c:2129:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc mdt00_001:7dc6380a-74bd-0c12-bf7e-6c11809b0af5+13:12746:x1608699144600192:12345-0@lo:36 00000004:00000001:0.0:1534175675.393963:0:11330:0:(osp_dev.c:815:osp_sync()) Process leaving (rc=18446744073709551509 : -107 : ffffffffffffff95) 00000004:00020000:0.0:1534175675.393967:0:11330:0:(lod_dev.c:1415:lod_sync()) lustre-MDT0000-mdtlov: can't sync ost 0: -107 00000004:00000001:0.0:1534175675.429293:0:11330:0:(lod_dev.c:1422:lod_sync()) Process leaving (rc=18446744073709551509 : -107 : ffffffffffffff95) 00000004:00000001:0.0:1534175675.429298:0:11330:0:(mdd_object.c:1174:mdd_attr_set()) Process leaving via out (rc=18446744073709551509 : -107 : 0xffffffffffffff95) 00000100:00100000:0.0:1534175675.429886:0:11330:0:(service.c:2179:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt00_001:7dc6380a-74bd-0c12-bf7e-6c11809b0af5+8:12746:x1608699144600192:12345-0@lo:36 Request processed in 36428us (36621us total) trans 0 rc -107/-107 Client: 00000100:00000001:0.0:1534175675.430424:0:12746:0:(client.c:1284:ptlrpc_check_status()) Process leaving (rc=18446744073709551509 : -107 : ffffffffffffff95) 00000100:00000001:0.0:1534175675.430429:0:12746:0:(recover.c:232:ptlrpc_request_handle_notconn()) Process entered 00000100:00080000:0.0:1534175675.430430:0:12746:0:(recover.c:236:ptlrpc_request_handle_notconn()) import lustre-MDT0000-mdc-ffff8eb4e8bb6800 of lustre-MDT0000_UUID@192.168.122.131@tcp abruptly disconnected: reconnecting