Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5692

Lustre 2.5.3 client mounting Lustre 2.5.3 failed

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • Lustre 2.5.3
    • Linux lustre-mds-8-0.local 2.6.32-431.23.3.el6_lustre.x86_64 #1 SMP Thu Aug 28 20:20:13 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
    • 3
    • 15934

    Description

      We have a Lustre 2.4.2 file-system. It was upgraded from 1.8.7 without reformatting MDT/OST.

      Recently, we decided to upgrade it to 2.5.3.

      Prior to this upgrade, the file-system had clients running 2.4.2 and 2.5.3 at different time and running without problem. 4 clients only.

      Yesterday, we upgraded the server to 2.5.3. MDS first. Client running 2.4.2 hung for 15+ ,minutes. We then rebooted client and mount still hung.

      We then try to mount with 2 2.5.3 clients and both of them crashed.

      At this point, only MDS was upgraded to 2.5.3 and rest of OSSs were still running 2.4.2.

      I am attaching MDS logs here.

      Please advice

      Attachments

        1. lustre-mds-8-1_debug_kernel.gz
          706 kB
        2. lustre-mds-8-1_dmesg.gz
          20 kB
        3. lustre-mds-8-1_messages.gz
          1.0 kB
        4. screendump.jpg
          screendump.jpg
          2.98 MB
        5. oasis-rhino_messages.gz
          476 kB

        Activity

          [LU-5692] Lustre 2.5.3 client mounting Lustre 2.5.3 failed

          Hello Haisong,

          We are marking this one as resolved/done.

          If you need any further work done on this ticket, please let us know and we can re-open it.

          Thanks,
          ~ jfc.

          jfc John Fuchs-Chesney (Inactive) added a comment - Hello Haisong, We are marking this one as resolved/done. If you need any further work done on this ticket, please let us know and we can re-open it. Thanks, ~ jfc.

          Thanks you for the information.

          Our file-system was upgraded from 1.8.7. I guess we are out of luck for rolling upgrade.

          By the way, we have reconfigured failed file-system for other purpose therefore unable to get vmcore dump produced for the time being.

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Thanks you for the information. Our file-system was upgraded from 1.8.7. I guess we are out of luck for rolling upgrade. By the way, we have reconfigured failed file-system for other purpose therefore unable to get vmcore dump produced for the time being. Haisong
          yujian Jian Yu added a comment -

          I did an experiment on a small test cluster (2 Clients, 1 MGS/MDS, 1 OSS) with the following steps:

          1. setup and start Lustre 1.8.8-wc1 filesystem
          2. shutdown the entire Lustre 1.8.8-wc1 filesystem
          3. clean upgrade all Lustre servers and clients at once to Lustre 2.4.2
          4. start the entire Lustre 2.4.2 filesystem
          5. run IOR and tar applications on the two live Lustre 2.4.2 Clients
          6. rolling upgrade MGS/MDS to Lustre 2.5.3
          7. rolling upgrade one Client to Lustre 2.5.3
          8. rolling upgrade the other Client to Lustre 2.5.3
          9. run IOR and tar applications on the two live Lustre 2.5.3 Clients
          10. rolling upgrade OSS to Lustre 2.5.3
          11. run IOR and Simul tests on the upgraded Lustre 2.5.3 filesystem
          

          I tried to provision Lustre 1.8.7-wc1 servers but got kernel panic failure caused by isci module, so I switched to use Lustre 1.8.8-wc1. All of the above steps passed testing.

          yujian Jian Yu added a comment - I did an experiment on a small test cluster (2 Clients, 1 MGS/MDS, 1 OSS) with the following steps: 1. setup and start Lustre 1.8.8-wc1 filesystem 2. shutdown the entire Lustre 1.8.8-wc1 filesystem 3. clean upgrade all Lustre servers and clients at once to Lustre 2.4.2 4. start the entire Lustre 2.4.2 filesystem 5. run IOR and tar applications on the two live Lustre 2.4.2 Clients 6. rolling upgrade MGS/MDS to Lustre 2.5.3 7. rolling upgrade one Client to Lustre 2.5.3 8. rolling upgrade the other Client to Lustre 2.5.3 9. run IOR and tar applications on the two live Lustre 2.5.3 Clients 10. rolling upgrade OSS to Lustre 2.5.3 11. run IOR and Simul tests on the upgraded Lustre 2.5.3 filesystem I tried to provision Lustre 1.8.7-wc1 servers but got kernel panic failure caused by isci module, so I switched to use Lustre 1.8.8-wc1. All of the above steps passed testing.
          yujian Jian Yu added a comment -

          Hi Haisong,

          Could you please gather the vmcore crash dump file for the Lustre 2.5.3 client and upload it to "uploads/LU-5692" directory on ftp.hpdd.intel.com? Thanks!

          yujian Jian Yu added a comment - Hi Haisong, Could you please gather the vmcore crash dump file for the Lustre 2.5.3 client and upload it to "uploads/ LU-5692 " directory on ftp.hpdd.intel.com? Thanks!

          Hi Andreas,

          The reason was for rolling upgrade. So eventually, we will upgrade every server to a same version of Lustre.
          But with some of the file-systems, there are as many as 32 OSS and 128 OSTs, upgrading them all will take sometime.

          thanks,
          Haisong

          haisong Haisong Cai (Inactive) added a comment - Hi Andreas, The reason was for rolling upgrade. So eventually, we will upgrade every server to a same version of Lustre. But with some of the file-systems, there are as many as 32 OSS and 128 OSTs, upgrading them all will take sometime. thanks, Haisong

          We do not test interoperability running MDS and OSS with different versions. Is there a particular reason you didn't upgrade the OSS at the same time? While that may not relate directly to your client problem, it introduces potential problems that could be easily avoided.

          adilger Andreas Dilger added a comment - We do not test interoperability running MDS and OSS with different versions. Is there a particular reason you didn't upgrade the OSS at the same time? While that may not relate directly to your client problem, it introduces potential problems that could be easily avoided.
          pjones Peter Jones added a comment -

          Yu, Jian

          Could you please assist with this issue?

          Thanks

          Peter

          pjones Peter Jones added a comment - Yu, Jian Could you please assist with this issue? Thanks Peter

          People

            yujian Jian Yu
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: