Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2368

OSTs stuck in perpetual recovery

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.1.6
    • Lustre 2.1.0
    • 3
    • 5631

    Description

      MDS failover happend during OSTs recovery, and OST got two mds connections from different IP. First was processed by OST, second connection cause class_fail_export() at target_handle_connect(), and we got perpetual recovery.

      Oct 26 17:09:07 snx11001n008 kernel: [  838.638847] Lustre: 90700:0:(ldlm_lib.c:2007:target_recovery_init()) RECOVERY: service snx11001-OST0012, 3 recoverable clients, last_transno 54526017
      Oct 26 17:09:07 snx11001n008 kernel: [  838.708354] Lustre: snx11001-OST0012: Now serving snx11001-OST0012/ on /dev/md4 with recovery enabled
      Oct 26 17:09:07 snx11001n008 kernel: [  838.717732] Lustre: snx11001-OST0012: Will be in recovery for at least 15:00, or until 3 clients reconnect
      Oct 26 17:11:05 snx11001n008 kernel: [  956.648093] LustreError: 88011:0:(ldlm_lib.c:927:target_handle_connect()) snx11001-OST0012: NID 10.10.101.3@o2ib1 (snx11001-MDT0000-mdtlov_UUID) reconnected with 1 conn_cnt; cookies not random?
      Oct 26 17:15:10 snx11001n008 kernel: [ 1201.718217] Lustre: 88009:0:(ldlm_lib.c:941:target_handle_connect()) snx11001-OST0012: connection from snx11001-MDT0000-mdtlov_UUID@10.10.101.3@o2ib1 recovering/t0 exp ffff88072cb90400 cur 1351289710 last 1351289346
      Oct 26 17:18:40 snx11001n008 kernel: [ 1410.931800] Lustre: 88010:0:(ldlm_lib.c:854:target_handle_connect()) snx11001-OST0012: received MDS connection from NID 10.10.101.4@o2ib1, removing former export from NID 10.10.101.3@o2ib1
      Oct 26 17:18:40 snx11001n008 kernel: [ 1410.948937] Lustre: 88010:0:(ldlm_lib.c:941:target_handle_connect()) snx11001-OST0012: connection from snx11001-MDT0000-mdtlov_UUID@10.10.101.4@o2ib1 recovering/t0 exp (null) cur 1351289920 last 0
      Oct 26 17:18:40 snx11001n008 kernel: [ 1410.976334] LustreError: 88010:0:(ldlm_lib.c:974:target_handle_connect()) snx11001-OST0012: denying connection for new client 10.10.101.4@o2ib1 (snx11001-MDT0000-mdtlov_UUID): 0 clients in recovery for 381s
      

      Attachments

        Activity

          People

            keith Keith Mannthey (Inactive)
            aboyko Alexander Boyko
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: