Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.0
-
3
-
5631
Description
MDS failover happend during OSTs recovery, and OST got two mds connections from different IP. First was processed by OST, second connection cause class_fail_export() at target_handle_connect(), and we got perpetual recovery.
Oct 26 17:09:07 snx11001n008 kernel: [ 838.638847] Lustre: 90700:0:(ldlm_lib.c:2007:target_recovery_init()) RECOVERY: service snx11001-OST0012, 3 recoverable clients, last_transno 54526017 Oct 26 17:09:07 snx11001n008 kernel: [ 838.708354] Lustre: snx11001-OST0012: Now serving snx11001-OST0012/ on /dev/md4 with recovery enabled Oct 26 17:09:07 snx11001n008 kernel: [ 838.717732] Lustre: snx11001-OST0012: Will be in recovery for at least 15:00, or until 3 clients reconnect Oct 26 17:11:05 snx11001n008 kernel: [ 956.648093] LustreError: 88011:0:(ldlm_lib.c:927:target_handle_connect()) snx11001-OST0012: NID 10.10.101.3@o2ib1 (snx11001-MDT0000-mdtlov_UUID) reconnected with 1 conn_cnt; cookies not random? Oct 26 17:15:10 snx11001n008 kernel: [ 1201.718217] Lustre: 88009:0:(ldlm_lib.c:941:target_handle_connect()) snx11001-OST0012: connection from snx11001-MDT0000-mdtlov_UUID@10.10.101.3@o2ib1 recovering/t0 exp ffff88072cb90400 cur 1351289710 last 1351289346 Oct 26 17:18:40 snx11001n008 kernel: [ 1410.931800] Lustre: 88010:0:(ldlm_lib.c:854:target_handle_connect()) snx11001-OST0012: received MDS connection from NID 10.10.101.4@o2ib1, removing former export from NID 10.10.101.3@o2ib1 Oct 26 17:18:40 snx11001n008 kernel: [ 1410.948937] Lustre: 88010:0:(ldlm_lib.c:941:target_handle_connect()) snx11001-OST0012: connection from snx11001-MDT0000-mdtlov_UUID@10.10.101.4@o2ib1 recovering/t0 exp (null) cur 1351289920 last 0 Oct 26 17:18:40 snx11001n008 kernel: [ 1410.976334] LustreError: 88010:0:(ldlm_lib.c:974:target_handle_connect()) snx11001-OST0012: denying connection for new client 10.10.101.4@o2ib1 (snx11001-MDT0000-mdtlov_UUID): 0 clients in recovery for 381s