[LU-2157] rolling downgrade from 2.3.0 to 1.8.8-wc1 failed Created: 12/Oct/12 Updated: 28/Feb/18 Resolved: 28/Feb/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.0, Lustre 1.8.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 5182 | ||||||||
| Description |
|
After successfully rolling upgrade from Lustre 1.8.8-wc1 to 2.3.0 RC2 with the path of OSS->MDS->Client, rolling downgrade with path of Client->MDS->OSS failed at the mounting 1.8.8-wc1 client stage: mount.lustre: mount fat-amd-1:/lustre at /mnt/lustre failed: Invalid argument This may have multiple causes. Is 'lustre' the correct filesystem name? Are the mount options correct? Check the syslog for more info. Dmesg showed that: Lustre: Server MGS version (2.3.0.0) is much newer than client version (1.8.8) Lustre: 6967:0:(obd_config.c:875:class_process_config()) Ignoring unknown param jobid_var=procname_uid LustreError: 6967:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command: Lustre: cmd=cf00f 0:(null) 1:sys.jobid_var=procname_uid 2:procname_uid LustreError: 15b-f: MGC10.10.4.132@tcp: The configuration from log 'lustre-client' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.10.4.132@tcp: The configuration from log 'lustre-client' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 6955:0:(llite_lib.c:1095:ll_fill_super()) Unable to process log: -22 LustreError: 6955:0:(lov_obd.c:1009:lov_cleanup()) lov tgt 0 not cleaned! deathrow=0, lovrc=1 LustreError: 6955:0:(mdc_request.c:1498:mdc_precleanup()) client import never connected LustreError: 6955:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway LustreError: 6955:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Lustre: client lustre-client(ffff880331cbc000) umount complete LustreError: 6955:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-22) Here is the test report of rolling upgrade: https://maloo.whamcloud.com/test_sets/c3ef59ee-142a-11e2-af8d-52540035b04c |
| Comments |
| Comment by Jian Yu [ 12/Oct/12 ] |
|
The jobstats feature is disabled by default on 2.3.0, but is enabled by test-framework.sh while running auster test suite. In the above rolling upgrade test, parallel-scale was run, so jobstats was enabled after the upgrading. Without enabling the jobstats feature on 2.3.0 after upgrading, downgrading passed. |
| Comment by Andreas Dilger [ 13/Oct/12 ] |
|
I think the major problem here is that the client code does not skip the unknown config command as it should. This is bad for two reasons:
We need to verify whether this is a problem in only 1.8.8 or if it is also in newer releases. In my local Lustre filesystem (1.8.6) I have such a parameter for the OST that is properly ignored, so I wonder whether this is a regression in 1.8.8 or if the problem exists on the client only? Since it is unlikely that we will make another 2.3 release I would strongly prefer to fix this before the release. |