Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • 3
    • Orion
    • 3019

    Description

      Observed when manually restarting zfs servers, specifically:

      1) umount /mnt/lustre/server
      2) export zpool
      3) import zpool
      4) mount zpool/server /mnt/lustre/server

      There is some suspicion that we may be rolling back to a previous uberblock due to how things were shutdown but that's just a suspicion at the moment until we have more data.

      2012-05-17 11:24:31 LustreError: 11-0: lstest-OST02e6-osc-ffff88063d2a2000: Communicating with 172.20.4.142@o2ib500, operation obd_ping failed with -107
      2012-05-17 11:24:31 Lustre: lstest-OST02ef-osc-ffff88063d2a2000: Connection to lstest-OST02ef (at 172.20.4.151@o2ib500) was lost; in progress operations using this service will wait for recovery to complete
      2012-05-17 11:24:31 Lustre: Skipped 19 previous similar messages
      2012-05-17 11:24:31 LustreError: Skipped 27 previous similar messages
      2012-05-17 11:24:53 Lustre: lstest-OST02b8-osc-ffff88063d2a2000: Connection to lstest-OST02b8 (at 172.20.4.96@o2ib500) was lost; in progress operations using this service will wait for recovery to complete
      2012-05-17 11:24:53 Lustre: Skipped 30 previous similar messages
      2012-05-17 11:24:53 LustreError: 23000:0:(import.c:930:ptlrpc_connect_interpret()) lstest-OST029a_UUID went back in time (transno 4315649123 was previously committed, server now claims 4295433349)!  See https://bugzilla.lustre.org/show_bug.cgi?id=9646
      2012-05-17 11:24:53 LustreError: 23000:0:(import.c:930:ptlrpc_connect_interpret()) Skipped 11 previous similar messages
      2012-05-17 11:24:57 Lustre: lstest-OST0286-osc-ffff88063d2a2000: Connection restored to lstest-OST0286 (at 172.20.4.46@o2ib500)
      2012-05-17 11:24:57 Lustre: Skipped 12 previous similar messages
      2012-05-17 11:25:02 Lustre: lstest-OST02e9-osc-ffff88063d2a2000: Connection restored to lstest-OST02e9 (at 172.20.4.145@o2ib500)
      2012-05-17 11:25:47 Lustre: lstest-OST02e3-osc-ffff88063d2a2000: Connection restored to lstest-OST02e3 (at 172.20.4.139@o2ib500)
      2012-05-17 11:25:47 Lustre: Skipped 25 previous similar messages
      2012-05-17 11:25:57 Lustre: lstest-OST028d-osc-ffff88063d2a2000: Connection restored to lstest-OST028d (at 172.20.4.53@o2ib500)
      2012-05-17 11:25:57 Lustre: Skipped 3 previous similar messages
      2012-05-17 11:26:11 LustreError: 23000:0:(import.c:930:ptlrpc_connect_interpret()) lstest-OST02e4_UUID went back in time (transno 4320876109 was previously committed, server now claims 4296801416)!  See https://bugzilla.lustre.org/show_bug.cgi?id=9646
      2012-05-17 11:26:11 LustreError: 23000:0:(import.c:930:ptlrpc_connect_interpret()) Skipped 67 previous similar messages
      

      Attachments

        Activity

          [LU-2086] Server went back in time
          jlevi Jodi Levi (Inactive) made changes -
          Fix Version/s Original: Lustre 2.4.0 [ 10154 ]
          ian Ian Colle (Inactive) made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          ian Ian Colle (Inactive) added a comment - - edited

          The fix from patch 3519 was already landed to master as part of another patch.

          ian Ian Colle (Inactive) added a comment - - edited The fix from patch 3519 was already landed to master as part of another patch.
          morrone Christopher Morrone (Inactive) made changes -
          Fix Version/s New: Lustre 2.4.0 [ 10154 ]
          Fix Version/s Original: Sprint 17 (Cherokee) [ 10121 ]
          Key Original: ORI-670 New: LU-2086
          Project Original: Orion [ 10051 ] New: Lustre [ 10000 ]
          Rank (Obsolete) Original: 6570000000
          Release Version History Original: Sprint 16 (Saline Valley) [ 10120 ]
          Workflow Original: jira [ 14555 ] New: Sub-task Blocking [ 16399 ]
          ian Ian Colle (Inactive) made changes -
          Labels New: topsequoia
          ian Ian Colle (Inactive) made changes -
          Fix Version/s New: Sprint 17 (Cherokee) [ 10121 ]
          Fix Version/s Original: Sprint 16 (Saline Valley) [ 10120 ]
          nedbass Ned Bass (Inactive) added a comment - Patch for master: http://review.whamcloud.com/3519
          nedbass Ned Bass (Inactive) added a comment - Patch for orion: http://review.whamcloud.com/3518

          It looks like ofd->ofd_last_transno gets reset to 4294967296 (1 << 32) when a target is restarted on the server. That is, lut_boot_epoch_update() is called with lut->lus_last_transno == 0. Then in ofd_txn_stop_cb() new transaction numbers are assigned starting from there and saved in the last_rcvd file.

          199         } else if (info->fti_transno == 0) {
          200                 info->fti_transno = ++ ofd->ofd_last_transno;
          ...
          224         return ofd_last_rcvd_update(info, txn);
          225 }
          

          However exp->exp_last_committed keeps the value originally read from last_rcvd until the new transaction numbers overtake it. So the server keeps sending a larger last_committed transaction number than what is actually being used by the client, but it writes the newer, lower numbers to last_rcvd. Now if the server restarts it sends the lcd_last_transno from last_rcvd to the client. But if the saved transaction numbers never grew larger than the old last_committed value, the client thinks the server went back in time.

          To clarify, here's some debug data I captured with systemtap. I unmounted the OST at timestamp 1343860077 and the client connection was restored at 1343860152. It prints ofd->ofd_lut->lut_last_transno in ofd_txn_stop_cb() and lut->lus_last_transno in lut_boot_epoch_update(). There was only one active client.

          1343860077 ll_ost_io_114 (24102) ofd_txn_stop_cb lut_last_transno 4295182382
          1343860152 tgt_recov (24731) lut_boot_epoch_update() lut_last_transno 0
          1343860152 ll_ost_io_127 24728 ofd_txn_stop_cb lut_last_transno 4294967296
          1343860152 ll_ost_io_125 24726 ofd_txn_stop_cb lut_last_transno 4294967297
          1343860152 ll_ost_io_120 24721 ofd_txn_stop_cb lut_last_transno 4294967298
          
          nedbass Ned Bass (Inactive) added a comment - It looks like ofd->ofd_last_transno gets reset to 4294967296 (1 << 32) when a target is restarted on the server. That is, lut_boot_epoch_update() is called with lut->lus_last_transno == 0. Then in ofd_txn_stop_cb() new transaction numbers are assigned starting from there and saved in the last_rcvd file. 199 } else if (info->fti_transno == 0) { 200 info->fti_transno = ++ ofd->ofd_last_transno; ... 224 return ofd_last_rcvd_update(info, txn); 225 } However exp->exp_last_committed keeps the value originally read from last_rcvd until the new transaction numbers overtake it. So the server keeps sending a larger last_committed transaction number than what is actually being used by the client, but it writes the newer, lower numbers to last_rcvd. Now if the server restarts it sends the lcd_last_transno from last_rcvd to the client. But if the saved transaction numbers never grew larger than the old last_committed value, the client thinks the server went back in time. To clarify, here's some debug data I captured with systemtap. I unmounted the OST at timestamp 1343860077 and the client connection was restored at 1343860152. It prints ofd->ofd_lut->lut_last_transno in ofd_txn_stop_cb() and lut->lus_last_transno in lut_boot_epoch_update(). There was only one active client. 1343860077 ll_ost_io_114 (24102) ofd_txn_stop_cb lut_last_transno 4295182382 1343860152 tgt_recov (24731) lut_boot_epoch_update() lut_last_transno 0 1343860152 ll_ost_io_127 24728 ofd_txn_stop_cb lut_last_transno 4294967296 1343860152 ll_ost_io_125 24726 ofd_txn_stop_cb lut_last_transno 4294967297 1343860152 ll_ost_io_120 24721 ofd_txn_stop_cb lut_last_transno 4294967298
          tappro Mikhail Pershin made changes -
          Fix Version/s New: Sprint 16 (Saline Valley) [ 10120 ]

          People

            tappro Mikhail Pershin
            behlendorf Brian Behlendorf
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: