[LU-3929] 2.1.6->2.4.1 rolling upgrade: lustre-MDT0000: recovery is timed out, evict stale exports Created: 11/Sep/13 Updated: 07/Jul/14 Resolved: 06/Jan/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jian Yu | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn4 | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 10379 | ||||||||
| Description |
|
While performing rolling upgrade from Lustre 2.1.6 to 2.4.1 RC2 with the path of OSS->MDS->Client one by one, the test failed after upgrading MDS: Starting the MDS service on fat-amd-3... ---------------- fat-amd-3 ---------------- debug=-1 subsystem_debug=all -lnet -lnd -pinger debug_mb=100 pdsh -l root -t 100 -S -w fat-amd-3 "mkdir -p /mnt/mds1 && mount -t lustre -o user_xattr /dev/sdc1 /mnt/mds1" Waiting 895 secs for fat-amd-3 recovery done. status: RECOVERING <~snip~> Waiting 5 secs for fat-amd-3 recovery done. status: RECOVERING Waiting 0 secs for fat-amd-3 recovery done. status: RECOVERING fat-amd-3 recovery not done in 900 sec. status: RECOVERING On MDS fat-amd-3, "lctl get_param -n ..recovery_status" showed that: ---------------- fat-amd-3 ---------------- status: RECOVERING recovery_start: 1378874775 time_remaining: 0 connected_clients: 2/4 req_replay_clients: 0 lock_repay_clients: 0 completed_clients: 2 evicted_clients: 0 replayed_requests: 0 queued_requests: 0 next_transno: 4294967297 Console log on MDS showed that: Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 4 clients reconnect Lustre: lustre-MDT0000: recovery is timed out, evict stale exports Lustre: lustre-MDT0000: disconnecting 2 stale clients Lustre: lustre-MDT0000: recovery is timed out, evict stale exports Maloo reports: The same failure also occurred while rolling upgrade from Lustre 2.1.6 to 2.4.0: |
| Comments |
| Comment by Peter Jones [ 12/Sep/13 ] |
|
Hongchao Could you please make an assessment of this issue? Thanks Peter |
| Comment by Hongchao Zhang [ 13/Sep/13 ] |
|
this issue is related to the LWP(Light Weight Proxy) connection. |
| Comment by Sebastien Buisson (Inactive) [ 21/Oct/13 ] |
|
Hi, We are suffering from this error, which is very annoying in case a customer wants to upgrade its OSSes first and then its MDSes and clients. This ticket was opened a month ago, but did not make any progress since then. This is surprising, as I would tend to consider it a major issue on the upgrade path from 2.1 to 2.4 (and 2.5 too). Am I missing something? Sebastien. |
| Comment by Hongchao Zhang [ 24/Oct/13 ] |
|
status update: |
| Comment by Hongchao Zhang [ 29/Oct/13 ] |
|
the patch is against b2_1, and is tracked at http://review.whamcloud.com/#/c/8086/ |
| Comment by Hongchao Zhang [ 19/Nov/13 ] |
|
the patch against master is tracked at http://review.whamcloud.com/#/c/8328/ |
| Comment by Sebastien Buisson (Inactive) [ 19/Nov/13 ] |
|
Hi, I have just tested patch http://review.whamcloud.com/#/c/8086/ for b2_1, and it works fine. I mean rolling upgrade from Lustre 2.1.6 plus this patch to 2.4.1 went off smoothly. So now I am wondering what is the purpose of this new patch http://review.whamcloud.com/#/c/8328/ for master. Sebastien. |
| Comment by Hongchao Zhang [ 22/Nov/13 ] |
|
it could allow more previous Lustre version to upgrade to new version with the patch against master. |
| Comment by Sebastien Buisson (Inactive) [ 22/Nov/13 ] |
|
Do you mean the master patch alone would be enough to be able to successfully upgrade from 2.1 with the path OSS->MDS->Client? |
| Comment by Oleg Drokin [ 23/Nov/13 ] |
|
Yes, the master patch alone should be enough to allow upgrades from unpatched 2.1 OSTS (i.e. those that do not have 8086 patch present). Can you give such a combination a try, please? We believe it's a better way since it saves you one extra step of upgrading all your OSTS to 2.1.6+patch before you can update your MDS to 2.4+ and then update your OSTs again to 2.4+ too (which is kind of overkill). |
| Comment by Sebastien Buisson (Inactive) [ 26/Nov/13 ] |
|
Hi, Here is the test I carried out:
It went off smoothly. So I confirm that the master patch is enough. And, as Oleg explained, having the patch in the target version simplifies upgrade. Cheers, |
| Comment by Peter Jones [ 06/Jan/14 ] |
|
Landed for 2.5.1 |