Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0
-
3
-
10379
Description
While performing rolling upgrade from Lustre 2.1.6 to 2.4.1 RC2 with the path of OSS->MDS->Client one by one, the test failed after upgrading MDS:
Starting the MDS service on fat-amd-3... ---------------- fat-amd-3 ---------------- debug=-1 subsystem_debug=all -lnet -lnd -pinger debug_mb=100 pdsh -l root -t 100 -S -w fat-amd-3 "mkdir -p /mnt/mds1 && mount -t lustre -o user_xattr /dev/sdc1 /mnt/mds1" Waiting 895 secs for fat-amd-3 recovery done. status: RECOVERING <~snip~> Waiting 5 secs for fat-amd-3 recovery done. status: RECOVERING Waiting 0 secs for fat-amd-3 recovery done. status: RECOVERING fat-amd-3 recovery not done in 900 sec. status: RECOVERING
On MDS fat-amd-3, "lctl get_param -n ..recovery_status" showed that:
---------------- fat-amd-3 ---------------- status: RECOVERING recovery_start: 1378874775 time_remaining: 0 connected_clients: 2/4 req_replay_clients: 0 lock_repay_clients: 0 completed_clients: 2 evicted_clients: 0 replayed_requests: 0 queued_requests: 0 next_transno: 4294967297
Console log on MDS showed that:
Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 4 clients reconnect Lustre: lustre-MDT0000: recovery is timed out, evict stale exports Lustre: lustre-MDT0000: disconnecting 2 stale clients Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
Maloo reports:
https://maloo.whamcloud.com/test_sets/d91a2b68-1aa1-11e3-88ff-52540035b04c
https://maloo.whamcloud.com/test_sets/dae9450c-1a86-11e3-8ceb-52540035b04c
The same failure also occurred while rolling upgrade from Lustre 2.1.6 to 2.4.0:
https://maloo.whamcloud.com/test_sets/c70af506-1ab5-11e3-8898-52540035b04c
Attachments
Issue Links
- is related to
-
LU-5298 The lwp device cannot be started when we migrate from Lustre 2.1 to Lustre 2.4
- Resolved