[LU-2606] lustre 2.4 don't able to start on 2.1 disks Created: 11/Jan/13  Updated: 27/Jul/15  Resolved: 07/Feb/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Alexey Lyashkov Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: HB

Issue Links:
Related
is related to LU-1445 fid on OST landing Resolved
Severity: 3
Rank (Obsolete): 6079

 Description   

[ 363.060205] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=off. Opts:
[ 363.108765] Lustre: MGC192.168.69.5@tcp: Reactivating import
Jan 11 11:15:42 rhel6-64 kernel: [ 363.108765] Lustre: MGC192.168.69.5@tcp: Reactivating import
[ 364.599695] Lustre: lustre-MDT0000: used disk, loading
Jan 11 11:15:44 rhel6-64 kernel: [ 364.599695] Lustre: lustre-MDT0000: used disk, loading
[ 364.611720] Lustre: 12041:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /Users/shadow/work/lustre/work/BUGS/MRP-509/lustre.13/lustre/utils/l_getident
ity
Jan 11 11:15:44 rhel6-64 kernel: [ 364.611720] Lustre: 12041:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /Users/shadow/work/lustre/work/BUGS/MRP-509/
lustre.13/lustre/utils/l_getidentity
[ 364.643496] Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
[ 364.647787] LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
Jan 11 11:15:44 [ 364.650814] Lustre: lustre-MDT0000: No usr space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota accounting.
rhel6-64 kernel: [ 364.643496] [ 364.656135] Lustre: lustre-MDT0000: No grp space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota a
ccounting.
Lustre: lustre-MDT0000: Temporarily refusing client connection from 0@lo
Jan 11 11:15:44 rhel6-64 kernel: [ 364.647787] LustreError: 11-0: an error occurred while communicating with 0@lo. The mds_connect operation failed with -11
Jan 11 11:15:44 rhel6-64 kernel: [ 364.650814] Lustre: lustre-MDT0000: No usr space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota
accounting.
Jan 11 11:15:44 rhel6-64 kernel: [ 364.656135] Lustre: lustre-MDT0000: No grp space accounting support. Please consider running tunefs.lustre --quota on an unmounted filesystem to enable quota
accounting.
[ 364.875511] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts:
Jan 11 11:15:44 rhel6-64 kernel: [ 364.875511] LDISKFS-fs (loop1): mounted filesystem with ordered data mode. quota=off. Opts:
[ 365.137956] LustreError: 12167:0:(ofd_fs.c:254:ofd_groups_init()) groups file is corrupted? size = 4
Jan 11 11:15:44 rhel6-64 kernel: [ 365.137956] LustreError: 12167:0:(ofd_fs.c:254:ofd_groups_init()) groups file is corrupted? size = 4
[ 365.144003] LustreError: 12167:0:(obd_config.c:572:class_setup()) setup lustre-OST0000 failed (-5)
[ 365.146446] LustreError: 12167:0:(obd_config.c:1546:class_config_llog_handler()) MGC192.168.69.5@tcp: cfg command failed: rc = -5
Jan 11 11:15:44 [ 365.150286] Lustre: cmd=cf003 0:lustre-OST0000 1:dev 2:0 3:f
rhel6-64 kernel: [ 365.144003] [ 365.152806] LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-OST0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 12167:0:(obd_config.c:572:class_setup()) setup lust[ 365.162154] LustreError: 12130:0:(obd_mount.c:1848:server_start_targets()) failed to start server lustre-OST0000: -5
re-OST0000 failed (-5)
Jan 11 11[ 365.165806] LustreError: 12130:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -5
:15:44 rhel6-64 kernel: [ 365.1[ 365.168976] LustreError: 12130:0:(obd_mount.c:1352:lustre_disconnect_osp()) Can't end config log lustre
46446] LustreError: 12167:0:(obd[ 365.172533] LustreError: 12130:0:(obd_mount.c:2114:server_put_super()) lustre-OST0000: failed to disconnect osp-on-ost (rc=-2)!
_config.c:1546:class_config_llog[ 365.177472] LustreError: 12130:0:(obd_config.c:619:class_cleanup()) Device 13 not setup
_handler()) MGC192.168.69.5@tcp:[ 365.180974] LustreError: 12130:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost lustre-MDT0000-osp-OST0000
cfg command failed: rc = -5
Jan[ 365.184942] LustreError: 12130:0:(obd_mount.c:2159:server_put_super()) lustre-OST0000: Fail to stop osp-on-ost!
11 11:15:44 rhel6-64 kernel: [ 365.150286] Lustre: cmd=cf00[ 365.191390] Lustre: server umount lustre-OST0000 complete
3 0:lustre-OST00[ 365.193673] LustreError: 12130:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount /dev/loop1 (-5)
00 1:dev 2:0 3:f
Jan 11 11:15:44 rhel6-64 kernel: [ 365.152806] LustreError: 15c-8: MGC192.168.69.5@tcp: The configuration from log 'lustre-OST0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Jan 11 11:15:44 rhel6-64 kernel: [ 365.162154] LustreError: 12130:0:(obd_mount.c:1848:server_start_targets()) failed to start server lustre-OST0000: -5
Jan 11 11:15:44 rhel6-64 kernel: [ 365.165806] LustreError: 12130:0:(obd_mount.c:2400:server_fill_super()) Unable to start targets: -5
Jan 11 11:15:44 rhel6-64 kernel: [ 365.168976] LustreError: 12130:0:(obd_mount.c:1352:lustre_disconnect_osp()) Can't end config log lustre
Jan 11 11:15:44 rhel6-64 kernel: [ 365.172533] LustreError: 12130:0:(obd_mount.c:2114:server_put_super()) lustre-OST0000: failed to disconnect osp-on-ost (rc=-2)!
Jan 11 11:15:44 rhel6-64 kernel: [ 365.177472] LustreError: 12130:0:(obd_config.c:619:class_cleanup()) Device 13 not setup
Jan 11 11:15:44 rhel6-64 kernel: [ 365.180974] LustreError: 12130:0:(obd_mount.c:1420:lustre_stop_osp()) Can not find osp-on-ost lustre-MDT0000-osp-OST0000
Jan 11 11:15:44 rhel6-64 kernel: [ 365.184942] LustreError: 12130:0:(obd_mount.c:2159:server_put_super()) lustre-OST0000: Fail to stop osp-on-ost!
Jan 11 11:15:44 rhel6-64 kernel: [ 365.191390] Lustre: server umount lustre-OST0000 complete
Jan 11 11:15:44 rhel6-64 kernel: [ 365.193673] LustreError: 12130:0:(obd_mount.c:2988:lustre_fill_super()) Unable to mount /dev/loop1 (-5)



 Comments   
Comment by Andreas Dilger [ 11/Jan/13 ]

I think that this problem will be fixed as soon as the next patches in the DNE series land, since the group file is no longer used?

If that isn't the case, Di can you make the code able to accept a 4-byte file and just treat it as a __u32 instead of a __u64?

Comment by Alex Zhuravlev [ 11/Jan/13 ]

a bit unexpected because we do have a test for the case in conf-sanity.sh

Comment by Di Wang [ 11/Jan/13 ]

Yes, we would not need group file anymore after that patch is landed. I just checked our disk2_1-ldiskfs.tar.bz2 (under tests), it seems
LAST_GROUPS size is zero, that is why conf-sanity.sh did not find out this error.

[root@testnode tests]# mount -t ldiskfs -o loop ./ost /mnt/mds1
[root@testnode tests]# ls /mnt/mds1/
CONFIGS health_check LAST_GROUP last_rcvd lost+found O
[root@testnode tests]# ls /mnt/mds1/LAST_GROUP -l
rwx----- 1 root root 0 Mar 14 2012 /mnt/mds1/LAST_GROUP
[root@testnode tests]# od -x /mnt/mds1/LAST_GROUP
0000000
[root@testnode tests]# stat /mnt/mds1/LAST_GROUP
File: `/mnt/mds1/LAST_GROUP'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 700h/1792d Inode: 17 Links: 1
Access: (0700/rwx-----) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-10-28 05:59:59.362696108 -0700
Modify: 2012-03-14 23:16:54.069859989 -0700
Change: 2012-03-14 23:16:54.069859989 -0700

Apparently, we need to make a better disk image here.

Comment by Andreas Dilger [ 14/Jan/13 ]

Di, could you please make a patch for this, and also fix the test image at the same time.

Comment by Di Wang [ 23/Jan/13 ]

this patch http://review.whamcloud.com/#change,4325 (already merged) should fix this problem. I will update the test image later.

Comment by Alexey Lyashkov [ 24/Jan/13 ]

I will retest today.

Comment by Andreas Dilger [ 30/Jan/13 ]

Shadow, any update from your testing of the patch?

Comment by Jodi Levi (Inactive) [ 06/Feb/13 ]

Are there any updates on the test of this patch?

Comment by Andreas Dilger [ 07/Feb/13 ]

Closing bug per Di's comment that the fix has been merged.

Comment by Gerrit Updater [ 27/Jul/15 ]

Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/15731
Subject: LU-2606 osp: add procfs values for OST reserved size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a69eed85d902e2b15a960430e4652fbcc3c0bc33

Generated at Sat Feb 10 01:26:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.