Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2606

lustre 2.4 don't able to start on 2.1 disks

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 6079

    Description

      
      

      [ 363.060205] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=off. Opts:
      [ 363.108765] Lustre: MGC192.168.69.5@tcp: Reactivating import
      Jan 11 11:15:42 rhel6-64 kernel: [ 363.108765] Lustre: MGC192.168.69.5@tcp: Reactivating import
      [ 364.599695] Lustre: lustre-MDT0000: used disk, loading
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.599695] Lustre: lustre-MDT0000: used disk, loading
      [ 364.611720] Lustre: 12041:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /Users/shadow/work/lustre/work/BUGS/MRP-509/lustre.13/lustre/utils/l_getident
      ity
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.611720] Lustre: 12041:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /Users/shadow/work/lustre/work/BUGS/MRP-509/
      lustre.13/lustre/utils/l_getidentity
      [ 364.643496] Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
      [ 364.647787] LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
      Jan 11 11:15:44 [ 364.650814] Lustre: lustre-MDT0000: No usr space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota accounting.
      rhel6-64 kernel: [ 364.643496] [ 364.656135] Lustre: lustre-MDT0000: No grp space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota a
      ccounting.
      Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.647787] LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.650814] Lustre: lustre-MDT0000: No usr space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota
      accounting.
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.656135] Lustre: lustre-MDT0000: No grp space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota
      accounting.
      [ 364.875511] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts:
      Jan 11 11:15:44 rhel6-64 kernel: [ 364.875511] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts:
      [ 365.137956] LustreError: 12167:0:(ofd_fs.c:254:ofd_groups_init()) groups file is corrupted? size = 4
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.137956] LustreError: 12167:0:(ofd_fs.c:254:ofd_groups_init()) groups file is corrupted? size = 4
      [ 365.144003] LustreError: 12167:0:(obd_config.c:572:class_setup()) setup lustre-OST0000 failed (-5)
      [ 365.146446] LustreError: 12167:0:(obd_config.c:1546:class_config_llog_handler()) MGC192.168.69.5@tcp: cfg command failed: rc = -5
      Jan 11 11:15:44 [ 365.150286] Lustre: cmd=cf003 0:lustre-OST0000 1:dev 2:0 3:f
      rhel6-64 kernel: [ 365.144003] [ 365.152806] LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-OST0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 12167:0:(obd_config.c:572:class_setup()) setup lust[ 365.162154] LustreError: 12130:0:(obd_mount.c:1848:server_start_targets()) failed to start server lustre-OST0000: -5
      re-OST0000 failed (-5)
      Jan 11 11[ 365.165806] LustreError: 12130:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -5
      :15:44 rhel6-64 kernel: [ 365.1[ 365.168976] LustreError: 12130:0:(obd_mount.c:1352:lustre_disconnect_osp()) Can't end config log lustre
      46446] LustreError: 12167:0:(obd[ 365.172533] LustreError: 12130:0:(obd_mount.c:2114:server_put_super()) lustre-OST0000: failed to disconnect osp-on-ost (rc=-2)!
      _config.c:1546:class_config_llog[ 365.177472] LustreError: 12130:0:(obd_config.c:619:class_cleanup()) Device 13 not setup
      _handler()) MGC192.168.69.5@tcp:[ 365.180974] LustreError: 12130:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost lustre-MDT0000-osp-OST0000
      cfg command failed: rc = -5
      Jan[ 365.184942] LustreError: 12130:0:(obd_mount.c:2159:server_put_super()) lustre-OST0000: Fail to stop osp-on-ost!
      11 11:15:44 rhel6-64 kernel: [ 365.150286] Lustre: cmd=cf00[ 365.191390] Lustre: server umount lustre-OST0000 complete
      3 0:lustre-OST00[ 365.193673] LustreError: 12130:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount /dev/loop1 (-5)
      00 1:dev 2:0 3:f
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.152806] LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-OST0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.162154] LustreError: 12130:0:(obd_mount.c:1848:server_start_targets()) failed to start server lustre-OST0000: -5
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.165806] LustreError: 12130:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -5
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.168976] LustreError: 12130:0:(obd_mount.c:1352:lustre_disconnect_osp()) Can't end config log lustre
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.172533] LustreError: 12130:0:(obd_mount.c:2114:server_put_super()) lustre-OST0000: failed to disconnect osp-on-ost (rc=-2)!
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.177472] LustreError: 12130:0:(obd_config.c:619:class_cleanup()) Device 13 not setup
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.180974] LustreError: 12130:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost lustre-MDT0000-osp-OST0000
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.184942] LustreError: 12130:0:(obd_mount.c:2159:server_put_super()) lustre-OST0000: Fail to stop osp-on-ost!
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.191390] Lustre: server umount lustre-OST0000 complete
      Jan 11 11:15:44 rhel6-64 kernel: [ 365.193673] LustreError: 12130:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount /dev/loop1 (-5)

      Attachments

        Issue Links

          Activity

            [LU-2606] lustre 2.4 don't able to start on 2.1 disks

            Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/15731
            Subject: LU-2606 osp: add procfs values for OST reserved size
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: a69eed85d902e2b15a960430e4652fbcc3c0bc33

            gerrit Gerrit Updater added a comment - Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/15731 Subject: LU-2606 osp: add procfs values for OST reserved size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a69eed85d902e2b15a960430e4652fbcc3c0bc33

            Closing bug per Di's comment that the fix has been merged.

            adilger Andreas Dilger added a comment - Closing bug per Di's comment that the fix has been merged.

            Are there any updates on the test of this patch?

            jlevi Jodi Levi (Inactive) added a comment - Are there any updates on the test of this patch?

            Shadow, any update from your testing of the patch?

            adilger Andreas Dilger added a comment - Shadow, any update from your testing of the patch?

            I will retest today.

            shadow Alexey Lyashkov added a comment - I will retest today.
            di.wang Di Wang added a comment -

            this patch http://review.whamcloud.com/#change,4325 (already merged) should fix this problem. I will update the test image later.

            di.wang Di Wang added a comment - this patch http://review.whamcloud.com/#change,4325 (already merged) should fix this problem. I will update the test image later.

            Di, could you please make a patch for this, and also fix the test image at the same time.

            adilger Andreas Dilger added a comment - Di, could you please make a patch for this, and also fix the test image at the same time.
            di.wang Di Wang added a comment -

            Yes, we would not need group file anymore after that patch is landed. I just checked our disk2_1-ldiskfs.tar.bz2 (under tests), it seems
            LAST_GROUPS size is zero, that is why conf-sanity.sh did not find out this error.

            [root@testnode tests]# mount -t ldiskfs -o loop ./ost /mnt/mds1
            [root@testnode tests]# ls /mnt/mds1/
            CONFIGS health_check LAST_GROUP last_rcvd lost+found O
            [root@testnode tests]# ls /mnt/mds1/LAST_GROUP -l
            rwx----- 1 root root 0 Mar 14 2012 /mnt/mds1/LAST_GROUP
            [root@testnode tests]# od -x /mnt/mds1/LAST_GROUP
            0000000
            [root@testnode tests]# stat /mnt/mds1/LAST_GROUP
            File: `/mnt/mds1/LAST_GROUP'
            Size: 0 Blocks: 0 IO Block: 4096 regular empty file
            Device: 700h/1792d Inode: 17 Links: 1
            Access: (0700/rwx-----) Uid: ( 0/ root) Gid: ( 0/ root)
            Access: 2013-10-28 05:59:59.362696108 -0700
            Modify: 2012-03-14 23:16:54.069859989 -0700
            Change: 2012-03-14 23:16:54.069859989 -0700

            Apparently, we need to make a better disk image here.

            di.wang Di Wang added a comment - Yes, we would not need group file anymore after that patch is landed. I just checked our disk2_1-ldiskfs.tar.bz2 (under tests), it seems LAST_GROUPS size is zero, that is why conf-sanity.sh did not find out this error. [root@testnode tests] # mount -t ldiskfs -o loop ./ost /mnt/mds1 [root@testnode tests] # ls /mnt/mds1/ CONFIGS health_check LAST_GROUP last_rcvd lost+found O [root@testnode tests] # ls /mnt/mds1/LAST_GROUP -l rwx ----- 1 root root 0 Mar 14 2012 /mnt/mds1/LAST_GROUP [root@testnode tests] # od -x /mnt/mds1/LAST_GROUP 0000000 [root@testnode tests] # stat /mnt/mds1/LAST_GROUP File: `/mnt/mds1/LAST_GROUP' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 700h/1792d Inode: 17 Links: 1 Access: (0700/ rwx -----) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-28 05:59:59.362696108 -0700 Modify: 2012-03-14 23:16:54.069859989 -0700 Change: 2012-03-14 23:16:54.069859989 -0700 Apparently, we need to make a better disk image here.

            a bit unexpected because we do have a test for the case in conf-sanity.sh

            bzzz Alex Zhuravlev added a comment - a bit unexpected because we do have a test for the case in conf-sanity.sh

            I think that this problem will be fixed as soon as the next patches in the DNE series land, since the group file is no longer used?

            If that isn't the case, Di can you make the code able to accept a 4-byte file and just treat it as a __u32 instead of a __u64?

            adilger Andreas Dilger added a comment - I think that this problem will be fixed as soon as the next patches in the DNE series land, since the group file is no longer used? If that isn't the case, Di can you make the code able to accept a 4-byte file and just treat it as a __u32 instead of a __u64?

            People

              di.wang Di Wang
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: