Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
Orion
-
3019
Description
Observed when manually restarting zfs servers, specifically:
1) umount /mnt/lustre/server
2) export zpool
3) import zpool
4) mount zpool/server /mnt/lustre/server
There is some suspicion that we may be rolling back to a previous uberblock due to how things were shutdown but that's just a suspicion at the moment until we have more data.
2012-05-17 11:24:31 LustreError: 11-0: lstest-OST02e6-osc-ffff88063d2a2000: Communicating with 172.20.4.142@o2ib500, operation obd_ping failed with -107 2012-05-17 11:24:31 Lustre: lstest-OST02ef-osc-ffff88063d2a2000: Connection to lstest-OST02ef (at 172.20.4.151@o2ib500) was lost; in progress operations using this service will wait for recovery to complete 2012-05-17 11:24:31 Lustre: Skipped 19 previous similar messages 2012-05-17 11:24:31 LustreError: Skipped 27 previous similar messages 2012-05-17 11:24:53 Lustre: lstest-OST02b8-osc-ffff88063d2a2000: Connection to lstest-OST02b8 (at 172.20.4.96@o2ib500) was lost; in progress operations using this service will wait for recovery to complete 2012-05-17 11:24:53 Lustre: Skipped 30 previous similar messages 2012-05-17 11:24:53 LustreError: 23000:0:(import.c:930:ptlrpc_connect_interpret()) lstest-OST029a_UUID went back in time (transno 4315649123 was previously committed, server now claims 4295433349)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 2012-05-17 11:24:53 LustreError: 23000:0:(import.c:930:ptlrpc_connect_interpret()) Skipped 11 previous similar messages 2012-05-17 11:24:57 Lustre: lstest-OST0286-osc-ffff88063d2a2000: Connection restored to lstest-OST0286 (at 172.20.4.46@o2ib500) 2012-05-17 11:24:57 Lustre: Skipped 12 previous similar messages 2012-05-17 11:25:02 Lustre: lstest-OST02e9-osc-ffff88063d2a2000: Connection restored to lstest-OST02e9 (at 172.20.4.145@o2ib500) 2012-05-17 11:25:47 Lustre: lstest-OST02e3-osc-ffff88063d2a2000: Connection restored to lstest-OST02e3 (at 172.20.4.139@o2ib500) 2012-05-17 11:25:47 Lustre: Skipped 25 previous similar messages 2012-05-17 11:25:57 Lustre: lstest-OST028d-osc-ffff88063d2a2000: Connection restored to lstest-OST028d (at 172.20.4.53@o2ib500) 2012-05-17 11:25:57 Lustre: Skipped 3 previous similar messages 2012-05-17 11:26:11 LustreError: 23000:0:(import.c:930:ptlrpc_connect_interpret()) lstest-OST02e4_UUID went back in time (transno 4320876109 was previously committed, server now claims 4296801416)! See https://bugzilla.lustre.org/show_bug.cgi?id=9646 2012-05-17 11:26:11 LustreError: 23000:0:(import.c:930:ptlrpc_connect_interpret()) Skipped 67 previous similar messages
It looks like ofd->ofd_last_transno gets reset to 4294967296 (1 << 32) when a target is restarted on the server. That is, lut_boot_epoch_update() is called with lut->lus_last_transno == 0. Then in ofd_txn_stop_cb() new transaction numbers are assigned starting from there and saved in the last_rcvd file.
However exp->exp_last_committed keeps the value originally read from last_rcvd until the new transaction numbers overtake it. So the server keeps sending a larger last_committed transaction number than what is actually being used by the client, but it writes the newer, lower numbers to last_rcvd. Now if the server restarts it sends the lcd_last_transno from last_rcvd to the client. But if the saved transaction numbers never grew larger than the old last_committed value, the client thinks the server went back in time.
To clarify, here's some debug data I captured with systemtap. I unmounted the OST at timestamp 1343860077 and the client connection was restored at 1343860152. It prints ofd->ofd_lut->lut_last_transno in ofd_txn_stop_cb() and lut->lus_last_transno in lut_boot_epoch_update(). There was only one active client.