[LU-2606] lustre 2.4 don't able to start on 2.1 disks - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.4.0
Affects Version/s: Lustre 2.4.0
Labels:
- HB

Severity:
3
Rank (Obsolete):
6079

Description

[ 363.060205] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=off. Opts:
[ 363.108765] Lustre: MGC192.168.69.5@tcp: Reactivating import
Jan 11 11:15:42 rhel6-64 kernel: [ 363.108765] Lustre: MGC192.168.69.5@tcp: Reactivating import
[ 364.599695] Lustre: lustre-MDT0000: used disk, loading
Jan 11 11:15:44 rhel6-64 kernel: [ 364.599695] Lustre: lustre-MDT0000: used disk, loading
[ 364.611720] Lustre: 12041:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /Users/shadow/work/lustre/work/BUGS/MRP-509/lustre.13/lustre/utils/l_getident
ity
Jan 11 11:15:44 rhel6-64 kernel: [ 364.611720] Lustre: 12041:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /Users/shadow/work/lustre/work/BUGS/MRP-509/
lustre.13/lustre/utils/l_getidentity
[ 364.643496] Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
[ 364.647787] LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
Jan 11 11:15:44 [ 364.650814] Lustre: lustre-MDT0000: No usr space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota accounting.
rhel6-64 kernel: [ 364.643496] [ 364.656135] Lustre: lustre-MDT0000: No grp space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota a
ccounting.
Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
Jan 11 11:15:44 rhel6-64 kernel: [ 364.647787] LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
Jan 11 11:15:44 rhel6-64 kernel: [ 364.650814] Lustre: lustre-MDT0000: No usr space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota
accounting.
Jan 11 11:15:44 rhel6-64 kernel: [ 364.656135] Lustre: lustre-MDT0000: No grp space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota
accounting.
[ 364.875511] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts:
Jan 11 11:15:44 rhel6-64 kernel: [ 364.875511] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts:
[ 365.137956] LustreError: 12167:0:(ofd_fs.c:254:ofd_groups_init()) groups file is corrupted? size = 4
Jan 11 11:15:44 rhel6-64 kernel: [ 365.137956] LustreError: 12167:0:(ofd_fs.c:254:ofd_groups_init()) groups file is corrupted? size = 4
[ 365.144003] LustreError: 12167:0:(obd_config.c:572:class_setup()) setup lustre-OST0000 failed (-5)
[ 365.146446] LustreError: 12167:0:(obd_config.c:1546:class_config_llog_handler()) MGC192.168.69.5@tcp: cfg command failed: rc = -5
Jan 11 11:15:44 [ 365.150286] Lustre: cmd=cf003 0:lustre-OST0000 1:dev 2:0 3:f
rhel6-64 kernel: [ 365.144003] [ 365.152806] LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-OST0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 12167:0:(obd_config.c:572:class_setup()) setup lust[ 365.162154] LustreError: 12130:0:(obd_mount.c:1848:server_start_targets()) failed to start server lustre-OST0000: -5
re-OST0000 failed (-5)
Jan 11 11[ 365.165806] LustreError: 12130:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -5
:15:44 rhel6-64 kernel: [ 365.1[ 365.168976] LustreError: 12130:0:(obd_mount.c:1352:lustre_disconnect_osp()) Can't end config log lustre
46446] LustreError: 12167:0:(obd[ 365.172533] LustreError: 12130:0:(obd_mount.c:2114:server_put_super()) lustre-OST0000: failed to disconnect osp-on-ost (rc=-2)!
_config.c:1546:class_config_llog[ 365.177472] LustreError: 12130:0:(obd_config.c:619:class_cleanup()) Device 13 not setup
_handler()) MGC192.168.69.5@tcp:[ 365.180974] LustreError: 12130:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost lustre-MDT0000-osp-OST0000
cfg command failed: rc = -5
Jan[ 365.184942] LustreError: 12130:0:(obd_mount.c:2159:server_put_super()) lustre-OST0000: Fail to stop osp-on-ost!
11 11:15:44 rhel6-64 kernel: [ 365.150286] Lustre: cmd=cf00[ 365.191390] Lustre: server umount lustre-OST0000 complete
3 0:lustre-OST00[ 365.193673] LustreError: 12130:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount /dev/loop1 (-5)
00 1:dev 2:0 3:f
Jan 11 11:15:44 rhel6-64 kernel: [ 365.152806] LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-OST0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Jan 11 11:15:44 rhel6-64 kernel: [ 365.162154] LustreError: 12130:0:(obd_mount.c:1848:server_start_targets()) failed to start server lustre-OST0000: -5
Jan 11 11:15:44 rhel6-64 kernel: [ 365.165806] LustreError: 12130:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -5
Jan 11 11:15:44 rhel6-64 kernel: [ 365.168976] LustreError: 12130:0:(obd_mount.c:1352:lustre_disconnect_osp()) Can't end config log lustre
Jan 11 11:15:44 rhel6-64 kernel: [ 365.172533] LustreError: 12130:0:(obd_mount.c:2114:server_put_super()) lustre-OST0000: failed to disconnect osp-on-ost (rc=-2)!
Jan 11 11:15:44 rhel6-64 kernel: [ 365.177472] LustreError: 12130:0:(obd_config.c:619:class_cleanup()) Device 13 not setup
Jan 11 11:15:44 rhel6-64 kernel: [ 365.180974] LustreError: 12130:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost lustre-MDT0000-osp-OST0000
Jan 11 11:15:44 rhel6-64 kernel: [ 365.184942] LustreError: 12130:0:(obd_mount.c:2159:server_put_super()) lustre-OST0000: Fail to stop osp-on-ost!
Jan 11 11:15:44 rhel6-64 kernel: [ 365.191390] Lustre: server umount lustre-OST0000 complete
Jan 11 11:15:44 rhel6-64 kernel: [ 365.193673] LustreError: 12130:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount /dev/loop1 (-5)

Attachments

Issue Links

is related to

LU-1445 fid on OST landing

Resolved

Activity

[LU-2606] lustre 2.4 don't able to start on 2.1 disks

Gerrit Updater added a comment - 27/Jul/15 8:19 AM

Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/15731
Subject: ~~LU-2606~~ osp: add procfs values for OST reserved size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a69eed85d902e2b15a960430e4652fbcc3c0bc33

Gerrit Updater added a comment - 27/Jul/15 8:19 AM Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/15731 Subject: LU-2606 osp: add procfs values for OST reserved size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: a69eed85d902e2b15a960430e4652fbcc3c0bc33

Andreas Dilger added a comment - 07/Feb/13 10:40 AM

Closing bug per Di's comment that the fix has been merged.

Andreas Dilger added a comment - 07/Feb/13 10:40 AM Closing bug per Di's comment that the fix has been merged.

Jodi Levi (Inactive) added a comment - 06/Feb/13 12:55 PM

Are there any updates on the test of this patch?

Jodi Levi (Inactive) added a comment - 06/Feb/13 12:55 PM Are there any updates on the test of this patch?

Andreas Dilger added a comment - 30/Jan/13 7:10 PM

Shadow, any update from your testing of the patch?

Andreas Dilger added a comment - 30/Jan/13 7:10 PM Shadow, any update from your testing of the patch?

Alexey Lyashkov added a comment - 24/Jan/13 12:20 AM

I will retest today.

Alexey Lyashkov added a comment - 24/Jan/13 12:20 AM I will retest today.

Di Wang added a comment - 23/Jan/13 6:46 PM

this patch http://review.whamcloud.com/#change,4325 (already merged) should fix this problem. I will update the test image later.

Di Wang added a comment - 23/Jan/13 6:46 PM this patch http://review.whamcloud.com/#change,4325 (already merged) should fix this problem. I will update the test image later.

Andreas Dilger added a comment - 14/Jan/13 3:19 PM

Di, could you please make a patch for this, and also fix the test image at the same time.

Andreas Dilger added a comment - 14/Jan/13 3:19 PM Di, could you please make a patch for this, and also fix the test image at the same time.

Di Wang added a comment - 11/Jan/13 2:36 PM

Yes, we would not need group file anymore after that patch is landed. I just checked our disk2_1-ldiskfs.tar.bz2 (under tests), it seems
LAST_GROUPS size is zero, that is why conf-sanity.sh did not find out this error.

[root@testnode tests]# mount -t ldiskfs -o loop ./ost /mnt/mds1
[root@testnode tests]# ls /mnt/mds1/
CONFIGS health_check LAST_GROUP last_rcvd lost+found O
[root@testnode tests]# ls /mnt/mds1/LAST_GROUP -l
~~rwx~~----- 1 root root 0 Mar 14 2012 /mnt/mds1/LAST_GROUP
[root@testnode tests]# od -x /mnt/mds1/LAST_GROUP
0000000
[root@testnode tests]# stat /mnt/mds1/LAST_GROUP
File: `/mnt/mds1/LAST_GROUP'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 700h/1792d Inode: 17 Links: 1
Access: (0700/~~rwx~~-----) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-10-28 05:59:59.362696108 -0700
Modify: 2012-03-14 23:16:54.069859989 -0700
Change: 2012-03-14 23:16:54.069859989 -0700

Apparently, we need to make a better disk image here.

Di Wang added a comment - 11/Jan/13 2:36 PM Yes, we would not need group file anymore after that patch is landed. I just checked our disk2_1-ldiskfs.tar.bz2 (under tests), it seems LAST_GROUPS size is zero, that is why conf-sanity.sh did not find out this error. [root@testnode tests] # mount -t ldiskfs -o loop ./ost /mnt/mds1 [root@testnode tests] # ls /mnt/mds1/ CONFIGS health_check LAST_GROUP last_rcvd lost+found O [root@testnode tests] # ls /mnt/mds1/LAST_GROUP -l rwx ----- 1 root root 0 Mar 14 2012 /mnt/mds1/LAST_GROUP [root@testnode tests] # od -x /mnt/mds1/LAST_GROUP 0000000 [root@testnode tests] # stat /mnt/mds1/LAST_GROUP File: `/mnt/mds1/LAST_GROUP' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 700h/1792d Inode: 17 Links: 1 Access: (0700/ rwx -----) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-10-28 05:59:59.362696108 -0700 Modify: 2012-03-14 23:16:54.069859989 -0700 Change: 2012-03-14 23:16:54.069859989 -0700 Apparently, we need to make a better disk image here.

Alex Zhuravlev added a comment - 11/Jan/13 1:02 PM

a bit unexpected because we do have a test for the case in conf-sanity.sh

Alex Zhuravlev added a comment - 11/Jan/13 1:02 PM a bit unexpected because we do have a test for the case in conf-sanity.sh

People

Assignee:: Di Wang

Reporter:: Alexey Lyashkov

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/Jan/13 10:57 AM

Updated:: 27/Jul/15 8:19 AM

Resolved:: 07/Feb/13 10:40 AM