[LU-5692] Lustre 2.5.3 client mounting Lustre 2.5.3 failed Created: 30/Sep/14  Updated: 25/Mar/16  Resolved: 25/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Haisong Cai (Inactive) Assignee: Jian Yu
Resolution: Done Votes: 0
Labels: sdsc
Environment:

Linux lustre-mds-8-0.local 2.6.32-431.23.3.el6_lustre.x86_64 #1 SMP Thu Aug 28 20:20:13 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux


Attachments: File lustre-mds-8-1_debug_kernel.gz     File lustre-mds-8-1_dmesg.gz     File lustre-mds-8-1_messages.gz     File oasis-rhino_messages.gz     JPEG File screendump.jpg    
Severity: 3
Rank (Obsolete): 15934

 Description   

We have a Lustre 2.4.2 file-system. It was upgraded from 1.8.7 without reformatting MDT/OST.

Recently, we decided to upgrade it to 2.5.3.

Prior to this upgrade, the file-system had clients running 2.4.2 and 2.5.3 at different time and running without problem. 4 clients only.

Yesterday, we upgraded the server to 2.5.3. MDS first. Client running 2.4.2 hung for 15+ ,minutes. We then rebooted client and mount still hung.

We then try to mount with 2 2.5.3 clients and both of them crashed.

At this point, only MDS was upgraded to 2.5.3 and rest of OSSs were still running 2.4.2.

I am attaching MDS logs here.

Please advice



 Comments   
Comment by Peter Jones [ 01/Oct/14 ]

Yu, Jian

Could you please assist with this issue?

Thanks

Peter

Comment by Andreas Dilger [ 03/Oct/14 ]

We do not test interoperability running MDS and OSS with different versions. Is there a particular reason you didn't upgrade the OSS at the same time? While that may not relate directly to your client problem, it introduces potential problems that could be easily avoided.

Comment by Haisong Cai (Inactive) [ 03/Oct/14 ]

Hi Andreas,

The reason was for rolling upgrade. So eventually, we will upgrade every server to a same version of Lustre.
But with some of the file-systems, there are as many as 32 OSS and 128 OSTs, upgrading them all will take sometime.

thanks,
Haisong

Comment by Jian Yu [ 07/Oct/14 ]

Hi Haisong,

Could you please gather the vmcore crash dump file for the Lustre 2.5.3 client and upload it to "uploads/LU-5692" directory on ftp.hpdd.intel.com? Thanks!

Comment by Jian Yu [ 07/Oct/14 ]

I did an experiment on a small test cluster (2 Clients, 1 MGS/MDS, 1 OSS) with the following steps:

1. setup and start Lustre 1.8.8-wc1 filesystem
2. shutdown the entire Lustre 1.8.8-wc1 filesystem
3. clean upgrade all Lustre servers and clients at once to Lustre 2.4.2
4. start the entire Lustre 2.4.2 filesystem
5. run IOR and tar applications on the two live Lustre 2.4.2 Clients
6. rolling upgrade MGS/MDS to Lustre 2.5.3
7. rolling upgrade one Client to Lustre 2.5.3
8. rolling upgrade the other Client to Lustre 2.5.3
9. run IOR and tar applications on the two live Lustre 2.5.3 Clients
10. rolling upgrade OSS to Lustre 2.5.3
11. run IOR and Simul tests on the upgraded Lustre 2.5.3 filesystem

I tried to provision Lustre 1.8.7-wc1 servers but got kernel panic failure caused by isci module, so I switched to use Lustre 1.8.8-wc1. All of the above steps passed testing.

Comment by Haisong Cai (Inactive) [ 07/Oct/14 ]

Thanks you for the information.

Our file-system was upgraded from 1.8.7. I guess we are out of luck for rolling upgrade.

By the way, we have reconfigured failed file-system for other purpose therefore unable to get vmcore dump produced for the time being.

Haisong

Comment by John Fuchs-Chesney (Inactive) [ 25/Mar/16 ]

Hello Haisong,

We are marking this one as resolved/done.

If you need any further work done on this ticket, please let us know and we can re-open it.

Thanks,
~ jfc.

Generated at Sat Feb 10 01:53:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.