[LUDOC-446] Lustre server rolling upgrades - Whamcloud Community JIRA

Details

Type: Question/Request
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- llnl

Rank (Obsolete):
9223372036854775807

Description

The lustre operations manual currently says this:

Lustre software release 2.x.y release (minor) upgrade:
•   All servers must be upgraded at the same time, while some or all clients may be upgraded.
•   Rolling upgrades are supported for minor releases allowing individual servers and clients to be upgraded
without stopping the Lustre file system.

The first sentence sounds like another way of saying "all the servers must be the same". The second sentence sounds like another way of saying "the servers can be different versions, but only temporarily".

As long as:

"2.x" part of the version is staying the same
"y" part is changing to "y+1" or "y+2"
assuming an actual tagged releases

Can we update update the targets within a single file system one at a time? If so, I'll propose a patch with what I think is better wording.

Attachments

Activity

[LUDOC-446] Lustre server rolling upgrades

Olaf Faaland added a comment - 26/Aug/19 12:43 AM

Hi Andreas,
Thanks in arrears. That makes sense, and I haven't gotten around to my proposed wording, but I will.

Peter,
I can't remove the topllnl label in this project, it appears, and can't assign this to myself. Can you make those changes, or make it so I can (either is fine with me)?

Olaf Faaland added a comment - 26/Aug/19 12:43 AM Hi Andreas, Thanks in arrears. That makes sense, and I haven't gotten around to my proposed wording, but I will. Peter, I can't remove the topllnl label in this project, it appears, and can't assign this to myself. Can you make those changes, or make it so I can (either is fine with me)?

Andreas Dilger added a comment - 25/Jun/19 3:36 AM

Olaf, indeed it is possible to upgrade servers one-at-a-time, but I can't imagine why you'd want to? If one server is down temporarily for an upgrade, that means files on that server are temporarily inaccessible and clients will block until the server is restarted and recovery completes. The client recovery time will be pretty much the same whether a single server is upgraded or half/all of the servers are upgraded, but you would have many more recovery periods if you only upgrade one OSS at a time.

The recommended process for rolling upgrades is to failover half of the MDT+OST targets to to their backup nodes, upgrade the now-idle MDS+OSS nodes to the new release, then failover all of the targets to the just-upgraded nodes, upgrade the other half of the MDS+OSS nodes, and fail back half of the targets to their original nodes. This involves 3 recovery periods, or correspondingly more if you have e.g. N-way HA failover clusters and are failing 1/N of the targets at a time.

With FLR mirror/EC we will eventually be able to do this without any outage, but that would need all of the files (or at least all of the in-use files) to be redundant in some way. Conceivably, we could mirror all of the files using OSTs on a particular OSS before doing the upgrade, then un-mirror them afterward, but that would be relatively slow.

In any case, I'm all for improving the readability of the manual, so feel free to suggest better wording.

Andreas Dilger added a comment - 25/Jun/19 3:36 AM Olaf, indeed it is possible to upgrade servers one-at-a-time, but I can't imagine why you'd want to? If one server is down temporarily for an upgrade, that means files on that server are temporarily inaccessible and clients will block until the server is restarted and recovery completes. The client recovery time will be pretty much the same whether a single server is upgraded or half/all of the servers are upgraded, but you would have many more recovery periods if you only upgrade one OSS at a time. The recommended process for rolling upgrades is to failover half of the MDT+OST targets to to their backup nodes, upgrade the now-idle MDS+OSS nodes to the new release, then failover all of the targets to the just-upgraded nodes, upgrade the other half of the MDS+OSS nodes, and fail back half of the targets to their original nodes. This involves 3 recovery periods, or correspondingly more if you have e.g. N-way HA failover clusters and are failing 1/N of the targets at a time. With FLR mirror/EC we will eventually be able to do this without any outage, but that would need all of the files (or at least all of the in-use files) to be redundant in some way. Conceivably, we could mirror all of the files using OSTs on a particular OSS before doing the upgrade, then un-mirror them afterward, but that would be relatively slow. In any case, I'm all for improving the readability of the manual, so feel free to suggest better wording.

Lustre server rolling upgrades

Details

Description

Attachments

Activity

People

Dates