Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2606

lustre 2.4 don't able to start on 2.1 disks

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 6079

    Description

      
      

      [ 363.060205] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=off. Opts:
      [ 363.108765] Lustre: MGC192.168.69.5@tcp: Reactivating import
      Jan 11 11:15:42 rhel6-64 kernel: [ 363.108765] Lustre: MGC192.168.69.5@tcp: Reactivating import
      [ 364.599695] Lustre: lustre-MDT0000: used disk, loading
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.599695] Lustre: lustre-MDT0000: used disk, loading
      [ 364.611720] Lustre: 12041:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /Users/shadow/work/lustre/work/BUGS/MRP-509/lustre.13/lustre/utils/l_getident
      ity
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.611720] Lustre: 12041:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /Users/shadow/work/lustre/work/BUGS/MRP-509/
      lustre.13/lustre/utils/l_getidentity
      [ 364.643496] Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
      [ 364.647787] LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
      Jan 11 11:15:44 [ 364.650814] Lustre: lustre-MDT0000: No usr space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota accounting.
      rhel6-64 kernel: [ 364.643496] [ 364.656135] Lustre: lustre-MDT0000: No grp space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota a
      ccounting.
      Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.647787] LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.650814] Lustre: lustre-MDT0000: No usr space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota
      accounting.
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.656135] Lustre: lustre-MDT0000: No grp space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota
      accounting.
      [ 364.875511] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts:
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.875511] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts:
      [ 365.137956] LustreError: 12167:0:(ofd_fs.c:254:ofd_groups_init()) groups file is corrupted? size = 4
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.137956] LustreError: 12167:0:(ofd_fs.c:254:ofd_groups_init()) groups file is corrupted? size = 4
      [ 365.144003] LustreError: 12167:0:(obd_config.c:572:class_setup()) setup lustre-OST0000 failed (-5)
      [ 365.146446] LustreError: 12167:0:(obd_config.c:1546:class_config_llog_handler()) MGC192.168.69.5@tcp: cfg command failed: rc = -5
      Jan 11 11:15:44 [ 365.150286] Lustre: cmd=cf003 0:lustre-OST0000 1:dev 2:0 3:f
      rhel6-64 kernel: [ 365.144003] [ 365.152806] LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-OST0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 12167:0:(obd_config.c:572:class_setup()) setup lust[ 365.162154] LustreError: 12130:0:(obd_mount.c:1848:server_start_targets()) failed to start server lustre-OST0000: -5
      re-OST0000 failed (-5)
      Jan 11 11[ 365.165806] LustreError: 12130:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -5
      :15:44 rhel6-64 kernel: [ 365.1[ 365.168976] LustreError: 12130:0:(obd_mount.c:1352:lustre_disconnect_osp()) Can't end config log lustre
      46446] LustreError: 12167:0:(obd[ 365.172533] LustreError: 12130:0:(obd_mount.c:2114:server_put_super()) lustre-OST0000: failed to disconnect osp-on-ost (rc=-2)!
      _config.c:1546:class_config_llog[ 365.177472] LustreError: 12130:0:(obd_config.c:619:class_cleanup()) Device 13 not setup
      _handler()) MGC192.168.69.5@tcp:[ 365.180974] LustreError: 12130:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost lustre-MDT0000-osp-OST0000
      cfg command failed: rc = -5
      Jan[ 365.184942] LustreError: 12130:0:(obd_mount.c:2159:server_put_super()) lustre-OST0000: Fail to stop osp-on-ost!
      11 11:15:44 rhel6-64 kernel: [ 365.150286] Lustre: cmd=cf00[ 365.191390] Lustre: server umount lustre-OST0000 complete
      3 0:lustre-OST00[ 365.193673] LustreError: 12130:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount /dev/loop1 (-5)
      00 1:dev 2:0 3:f
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.152806] LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-OST0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.162154] LustreError: 12130:0:(obd_mount.c:1848:server_start_targets()) failed to start server lustre-OST0000: -5
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.165806] LustreError: 12130:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -5
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.168976] LustreError: 12130:0:(obd_mount.c:1352:lustre_disconnect_osp()) Can't end config log lustre
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.172533] LustreError: 12130:0:(obd_mount.c:2114:server_put_super()) lustre-OST0000: failed to disconnect osp-on-ost (rc=-2)!
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.177472] LustreError: 12130:0:(obd_config.c:619:class_cleanup()) Device 13 not setup
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.180974] LustreError: 12130:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost lustre-MDT0000-osp-OST0000
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.184942] LustreError: 12130:0:(obd_mount.c:2159:server_put_super()) lustre-OST0000: Fail to stop osp-on-ost!
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.191390] Lustre: server umount lustre-OST0000 complete
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.193673] LustreError: 12130:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount /dev/loop1 (-5)

      Attachments

        Issue Links

          Activity

            [LU-2606] lustre 2.4 don't able to start on 2.1 disks

            Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/15731
            Subject: LU-2606 osp: add procfs values for OST reserved size
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a69eed85d902e2b15a960430e4652fbcc3c0bc33

            gerrit Gerrit Updater added a comment - Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/15731 Subject: LU-2606 osp: add procfs values for OST reserved size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a69eed85d902e2b15a960430e4652fbcc3c0bc33

            Closing bug per Di's comment that the fix has been merged.

            adilger Andreas Dilger added a comment - Closing bug per Di's comment that the fix has been merged.

            Are there any updates on the test of this patch?

            jlevi Jodi Levi (Inactive) added a comment - Are there any updates on the test of this patch?

            Shadow, any update from your testing of the patch?

            adilger Andreas Dilger added a comment - Shadow, any update from your testing of the patch?

            I will retest today.

            shadow Alexey Lyashkov added a comment - I will retest today.
            di.wang Di Wang added a comment -

            this patch http://review.whamcloud.com/#change,4325 (already merged) should fix this problem. I will update the test image later.

            di.wang Di Wang added a comment - this patch http://review.whamcloud.com/#change,4325 (already merged) should fix this problem. I will update the test image later.

            Di, could you please make a patch for this, and also fix the test image at the same time.

            adilger Andreas Dilger added a comment - Di, could you please make a patch for this, and also fix the test image at the same time.
            di.wang Di Wang added a comment -

            Yes, we would not need group file anymore after that patch is landed. I just checked our disk2_1-ldiskfs.tar.bz2 (under tests), it seems
            LAST_GROUPS size is zero, that is why conf-sanity.sh did not find out this error.

            [root@testnode tests]# mount -t ldiskfs -o loop ./ost /mnt/mds1
            [root@testnode tests]# ls /mnt/mds1/
            CONFIGS health_check LAST_GROUP last_rcvd lost+found O
            [root@testnode tests]# ls /mnt/mds1/LAST_GROUP -l
            rwx----- 1 root root 0 Mar 14 2012 /mnt/mds1/LAST_GROUP
            [root@testnode tests]# od -x /mnt/mds1/LAST_GROUP
            0000000
            [root@testnode tests]# stat /mnt/mds1/LAST_GROUP
            File: `/mnt/mds1/LAST_GROUP'
            Size: 0 Blocks: 0 IO Block: 4096 regular empty file
            Device: 700h/1792d Inode: 17 Links: 1
            Access: (0700/rwx-----) Uid: ( 0/ root) Gid: ( 0/ root)
            Access: 2013-10-28 05:59:59.362696108 -0700
            Modify: 2012-03-14 23:16:54.069859989 -0700
            Change: 2012-03-14 23:16:54.069859989 -0700

            Apparently, we need to make a better disk image here.

            di.wang Di Wang added a comment - Yes, we would not need group file anymore after that patch is landed. I just checked our disk2_1-ldiskfs.tar.bz2 (under tests), it seems LAST_GROUPS size is zero, that is why conf-sanity.sh did not find out this error. [root@testnode tests] # mount -t ldiskfs -o loop ./ost /mnt/mds1 [root@testnode tests] # ls /mnt/mds1/ CONFIGS health_check LAST_GROUP last_rcvd lost+found O [root@testnode tests] # ls /mnt/mds1/LAST_GROUP -l rwx ----- 1 root root 0 Mar 14 2012 /mnt/mds1/LAST_GROUP [root@testnode tests] # od -x /mnt/mds1/LAST_GROUP 0000000 [root@testnode tests] # stat /mnt/mds1/LAST_GROUP File: `/mnt/mds1/LAST_GROUP' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 700h/1792d Inode: 17 Links: 1 Access: (0700/ rwx -----) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-28 05:59:59.362696108 -0700 Modify: 2012-03-14 23:16:54.069859989 -0700 Change: 2012-03-14 23:16:54.069859989 -0700 Apparently, we need to make a better disk image here.

            a bit unexpected because we do have a test for the case in conf-sanity.sh

            bzzz Alex Zhuravlev added a comment - a bit unexpected because we do have a test for the case in conf-sanity.sh

            People

              di.wang Di Wang
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: