[LU-6279] Failed adding new MDT after upgrade from a single MDT system Created: 25/Feb/15  Updated: 27/Feb/15  Resolved: 27/Feb/15

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: Text File debug.out    
Severity: 3
Rank (Obsolete): 17602

 Description   

test steps:
1. create a filesystem with single MDT under 2.4
2. mount filesystem and create a dir-1/file-1
3. shut down the system and upgrade to 2.5.3
4. mount the filesystem again and create dir-2/file-2
5. shut down the system again and then upgrade to 2.7.0
6. format another three disks as MDTs under 2.7.0
7. succeed mount the first MDT (the single MDT formatted under 2.4)
8. try to mount the second MDT and got following errors:

Lustre: Lustre: Build Version: 2.6.94--PRISTINE-2.6.32-431.29.2.el6_lustre.x86_64
LNet: Added LNI 10.2.4.47@tcp [8/256/0/180]
LNet: Accept secure, port 988
LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424843466/real 1424843466]  req@ffff8807976469c0 x1494056660107304/t0(0) o38->lustre-MDT0001-osp-MDT0000@10.2.4.56@tcp:24/4 lens 400/544 e 0 to 1 dl 1424843471 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843466/real 1424843466]  req@ffff880797646cc0 x1494056660107300/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843471 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LDISKFS-fs (sdb2): mounted filesystem with ordered data mode. quota=on. Opts: 
LDISKFS-fs (sdb2): mounted filesystem with ordered data mode. quota=on. Opts: 
LustreError: 140-5: Server lustre-MDT0001 requested index 1, but that index is already in use. Use --writeconf to force
LustreError: 10216:0:(mgs_handler.c:439:mgs_target_reg()) Failed to write lustre-MDT0001 log (-98)
LustreError: 15f-b: lustre-MDT0001: cannot register this server with the MGS: rc = -98. Is the MGS running?
LustreError: 10283:0:(obd_mount_server.c:1783:server_fill_super()) Unable to start targets: -98
LustreError: 10283:0:(obd_mount_server.c:1498:server_put_super()) no obd lustre-MDT0001
LustreError: 10283:0:(obd_mount_server.c:137:server_deregister_mount()) lustre-MDT0001 not registered
Lustre: server umount lustre-MDT0001 complete
LustreError: 10283:0:(obd_mount.c:1339:lustre_fill_super()) Unable to mount  (-98)
Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843491/real 1424843491]  req@ffff88079787c0c0 x1494056660107440/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843501 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843516/real 1424843516]  req@ffff88079787c6c0 x1494056660107452/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843531 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843541/real 1424843541]  req@ffff88079787c6c0 x1494056660107472/t0(0) o38->lustre-MDT0001-osp-MDT0000@10.2.4.56@tcp:24/4 lens 400/544 e 0 to 1 dl 1424843561 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 1 previous similar message
LustreError: 11-0: lustre-OST0000-osc-MDT0000: operation ost_connect to node 10.2.4.56@tcp failed: rc = -16
Lustre: lustre-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects
Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
[root@onyx-25 ~]# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/sdb1 on /mnt/mds1 type lustre (rw,acl,user_xattr)
[root@onyx-25 ~]# dl
-bash: dl: command not found
[root@onyx-25 ~]# lctl dl
  0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 10
  1 UP mgs MGS MGS 9
  2 UP mgc MGC10.2.4.47@tcp 1dab9afc-c4b3-8c46-e219-e7a04558654f 5
  3 UP mds MDS MDS_uuid 3
  4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
  5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 13
  6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4
  7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4
  8 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
  9 UP osp lustre-MDT0001-osp-MDT0000 lustre-MDT0000-mdtlov_UUID 5
 10 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5
[root@onyx-25 ~]# df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sda1       20642428 1916464  17677388  10% /
tmpfs           16393952       0  16393952   0% /dev/shm
/dev/sdb1        7498624  444148   6542260   7% /mnt/mds1
[root@onyx-25 ~]# 


 Comments   
Comment by Sarah Liu [ 25/Feb/15 ]

debug log.

If you need any more logs, please just let me know.

Comment by Andreas Dilger [ 25/Feb/15 ]

Sarah, I think there was a mistake in your process to add the additional MDTs. It looks like they were all formatted with mkfs.lustre --mdt --index=1 but the index should be increasing for each new MDT being added. Otherwise the MGS will refuse the new MDT to connect:

LustreError: 140-5: Server lustre-MDT0001 requested index 1, but that index is already in use.
Comment by Sarah Liu [ 25/Feb/15 ]

Hi Andreas,

I did increase the index number while failed mounting the MDT which was formatted with index=1; anyway I will do it again to confirm.

Comment by Sarah Liu [ 27/Feb/15 ]

I rerun the test again and it didn't hit the problem, so close the bug

Generated at Sat Feb 10 01:58:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.