Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6321

Clean downgrade from 2.7.0 to 2.6.0 failed: fail to init namespace LFSCK component: rc = -5

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.7.0, Lustre 2.8.0
    • Lustre 2.7.0
    • None
    • 3
    • 17682

    Description

      1. formatted and setup lustre for 2.6.0, then clean upgrade the system to 2.7.0, successful

      2. downgrade the system to 2.6.0, mount system failed

      MDS shows:

      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
      LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
      LustreError: 33604:0:(lfsck_namespace.c:1786:lfsck_namespace_setup()) lustre-MDT0000-osd: fail to init namespace LFSCK component: rc = -5
      LustreError: 33604:0:(mdd_device.c:1051:mdd_prepare()) lustre-MDD0000: failed to initialize lfsck: rc = -5
      LustreError: 33604:0:(obd_mount_server.c:1769:server_fill_super()) Unable to start targets: -5
      LustreError: 33712:0:(qsd_reint.c:54:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x1010000:0x0], rc:-5
      Lustre: Failing over lustre-MDT0000
      LustreError: 33712:0:(qsd_reint.c:54:qsd_reint_completion()) Skipped 1 previous similar message
      Lustre: 33604:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324784/real 1425324784]  req@ffff88081d695400 x1494561337639040/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1425324790 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: server umount lustre-MDT0000 complete
      LustreError: 33604:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-5)
      Lustre: DEBUG MARKER: Using TIMEOUT=20
      Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
      LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11.
      LustreError: 34030:0:(lfsck_namespace.c:1786:lfsck_namespace_setup()) lustre-MDT0000-osd: fail to init namespace LFSCK component: rc = -5
      LustreError: 34030:0:(mdd_device.c:1051:mdd_prepare()) lustre-MDD0000: failed to initialize lfsck: rc = -5
      LustreError: 34030:0:(obd_mount_server.c:1769:server_fill_super()) Unable to start targets: -5
      LustreError: 34125:0:(qsd_reint.c:54:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5
      LustreError: 34126:0:(qsd_reint.c:54:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x1010000:0x0], rc:-5
      Lustre: Failing over lustre-MDT0000
      Lustre: 34030:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324879/real 1425324879]  req@ffff8804187e4800 x1494561337639168/t0(0) o251->MGC10.2.4.47@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1425324885 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: server umount lustre-MDT0000 complete
      LustreError: 34030:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-5)
      Lustre: DEBUG MARKER: Using TIMEOUT=20
      Lustre: DEBUG MARKER: upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted
      

      client shows:

      Setup mgs, mdt, osts
      Starting mds1: -o user_xattr,acl  /dev/sdb1 /mnt/mds1
      Start of /dev/sdb1 on mds1 failed 5
      Starting ost1:   /dev/sdb1 /mnt/ost1
      Start of /dev/sdb1 on ost1 failed 19
      Starting client: onyx-28: -o user_xattr,flock onyx-25@tcp:/lustre /mnt/lustre
      Lustre: 74562:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324792/real 1425324792]  req@ffff8804364bec00 x1494557626728452/t0(0) o250->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 400/544 e 0 to 1 dl 1425324797 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      LustreError: 76865:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8804364be800 x1494557626728456/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      LustreError: 76878:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8804364be000 x1494557626728464/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      Lustre: 74562:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324817/real 1425324817]  req@ffff8804364be400 x1494557626728468/t0(0) o250->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 400/544 e 0 to 1 dl 1425324827 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      LustreError: 76865:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8804364be800 x1494557626728460/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Lustre: Unmounted lustre-client
      LustreError: 76865:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-5)
      Starting client onyx-23.onyx.hpdd.intel.com,onyx-27,onyx-28: -o user_xattr,flock onyx-25@tcp:/lustre /mnt/lustre
      Lustre: 74562:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324829/real 1425324829]  req@ffff8804364be000 x1494557626728472/t0(0) o250->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 400/544 e 0 to 1 dl 1425324834 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      LustreError: 76949:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8804364bec00 x1494557626728476/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      LustreError: 76962:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8804364be400 x1494557626728484/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      Lustre: 74562:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1425324854/real 1425324854]  req@ffff8804364be000 x1494557626728488/t0(0) o250->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 400/544 e 0 to 1 dl 1425324864 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      LustreError: 76949:0:(client.c:1083:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8804364bec00 x1494557626728480/t0(0) o101->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      LustreError: 15c-8: MGC10.2.4.47@tcp: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Lustre: Unmounted lustre-client
      LustreError: 76949:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-5)
      Using TIMEOUT=20
      Lustre: DEBUG MARKER: Using TIMEOUT=20
      jobstats not supported by server
      disable quota as required
       upgrade-downgrade : @@@@@@ FAIL: NAME=ncli not mounted 
      

      Attachments

        Issue Links

          Activity

            [LU-6321] Clean downgrade from 2.7.0 to 2.6.0 failed: fail to init namespace LFSCK component: rc = -5

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13945/
            Subject: LU-6321 lfsck: make lfsck_namespace trace file as index
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ca8067522d6a6928e33dc8d34d5ad208c7eb535f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13945/ Subject: LU-6321 lfsck: make lfsck_namespace trace file as index Project: fs/lustre-release Branch: master Current Patch Set: Commit: ca8067522d6a6928e33dc8d34d5ad208c7eb535f

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13946/
            Subject: LU-6321 lfsck: make lfsck_namespace trace file as index
            Project: fs/lustre-release
            Branch: b2_7
            Current Patch Set:
            Commit: 1ece3b3ffdf3dc112be19dd0ee2563b3e22d4b57

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13946/ Subject: LU-6321 lfsck: make lfsck_namespace trace file as index Project: fs/lustre-release Branch: b2_7 Current Patch Set: Commit: 1ece3b3ffdf3dc112be19dd0ee2563b3e22d4b57

            Thanks Sarah!

            yong.fan nasf (Inactive) added a comment - Thanks Sarah!
            sarah Sarah Liu added a comment -

            Fan Yong, the patch works!

            sarah Sarah Liu added a comment - Fan Yong, the patch works!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13946
            Subject: LU-6321 lfsck: make lfsck_namespace trace file as index
            Project: fs/lustre-release
            Branch: b2_7
            Current Patch Set: 1
            Commit: b6ef06c39f4c0dfff1e22f2ab6d805b816a08857

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13946 Subject: LU-6321 lfsck: make lfsck_namespace trace file as index Project: fs/lustre-release Branch: b2_7 Current Patch Set: 1 Commit: b6ef06c39f4c0dfff1e22f2ab6d805b816a08857

            Sarah, would you please to help verify above patch when you have time? Thanks!

            yong.fan nasf (Inactive) added a comment - Sarah, would you please to help verify above patch when you have time? Thanks!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13945
            Subject: LU-6321 lfsck: make lfsck_namespace trace file as index
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2398502aaa7c4e9153f9c89ecb3086169daae81d

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13945 Subject: LU-6321 lfsck: make lfsck_namespace trace file as index Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2398502aaa7c4e9153f9c89ecb3086169daae81d

            Originally, the "lfsck_namespace" file stored both the namespace LFSCK statistics information and the FIDs to be double scanned. But to improve the namespace LFSCK performance (since Lustre-2.7), we split single trace file as multiple ones, and name them as "lfsck_namespace_xx". At that time, the original "lfsck_namespace" only needs to record the namespace LFSCK statistics information. So we make it as regular file, NOT index file. When downgrade to Lustre-2.6, the old LFSCK wants an index trace file instead of regular file, so failed.

            Two solutions for that:
            1) on Lustre-2.6, remove the old "lfsck_namespace" under ldiskfs mode manually.
            2) patch Lustre-2.7 and master code, to make the "lfsck_namespace" as index file.

            I prefer the later solution.

            yong.fan nasf (Inactive) added a comment - Originally, the "lfsck_namespace" file stored both the namespace LFSCK statistics information and the FIDs to be double scanned. But to improve the namespace LFSCK performance (since Lustre-2.7), we split single trace file as multiple ones, and name them as "lfsck_namespace_xx". At that time, the original "lfsck_namespace" only needs to record the namespace LFSCK statistics information. So we make it as regular file, NOT index file. When downgrade to Lustre-2.6, the old LFSCK wants an index trace file instead of regular file, so failed. Two solutions for that: 1) on Lustre-2.6, remove the old "lfsck_namespace" under ldiskfs mode manually. 2) patch Lustre-2.7 and master code, to make the "lfsck_namespace" as index file. I prefer the later solution.

            This may relate to "LU-5820 lfsck: use multiple namespace LFSCK trace files" patch http://review.whamcloud.com/12809 or "LU-5707 lfsck: store namespace LFSCK statistics info in new EA" patch http://review.whamcloud.com/12321 which were supposed to allow the LFSCK code to ignore the new namespace log file and create a new one?

            adilger Andreas Dilger added a comment - This may relate to " LU-5820 lfsck: use multiple namespace LFSCK trace files" patch http://review.whamcloud.com/12809 or " LU-5707 lfsck: store namespace LFSCK statistics info in new EA" patch http://review.whamcloud.com/12321 which were supposed to allow the LFSCK code to ignore the new namespace log file and create a new one?
            green Oleg Drokin added a comment -

            getting mds side debug log with increased debug would be great to better understand what failed and why

            green Oleg Drokin added a comment - getting mds side debug log with increased debug would be great to better understand what failed and why

            People

              yong.fan nasf (Inactive)
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: