[LU-6279] Failed adding new MDT after upgrade from a single MDT system Created: 25/Feb/15 Updated: 27/Feb/15 Resolved: 27/Feb/15 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 17602 |
| Description |
|
test steps: Lustre: Lustre: Build Version: 2.6.94--PRISTINE-2.6.32-431.29.2.el6_lustre.x86_64 LNet: Added LNI 10.2.4.47@tcp [8/256/0/180] LNet: Accept secure, port 988 LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424843466/real 1424843466] req@ffff8807976469c0 x1494056660107304/t0(0) o38->lustre-MDT0001-osp-MDT0000@10.2.4.56@tcp:24/4 lens 400/544 e 0 to 1 dl 1424843471 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843466/real 1424843466] req@ffff880797646cc0 x1494056660107300/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843471 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LDISKFS-fs (sdb2): mounted filesystem with ordered data mode. quota=on. Opts: LDISKFS-fs (sdb2): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 140-5: Server lustre-MDT0001 requested index 1, but that index is already in use. Use --writeconf to force LustreError: 10216:0:(mgs_handler.c:439:mgs_target_reg()) Failed to write lustre-MDT0001 log (-98) LustreError: 15f-b: lustre-MDT0001: cannot register this server with the MGS: rc = -98. Is the MGS running? LustreError: 10283:0:(obd_mount_server.c:1783:server_fill_super()) Unable to start targets: -98 LustreError: 10283:0:(obd_mount_server.c:1498:server_put_super()) no obd lustre-MDT0001 LustreError: 10283:0:(obd_mount_server.c:137:server_deregister_mount()) lustre-MDT0001 not registered Lustre: server umount lustre-MDT0001 complete LustreError: 10283:0:(obd_mount.c:1339:lustre_fill_super()) Unable to mount (-98) Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843491/real 1424843491] req@ffff88079787c0c0 x1494056660107440/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843501 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843516/real 1424843516] req@ffff88079787c6c0 x1494056660107452/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843531 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 1 previous similar message Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843541/real 1424843541] req@ffff88079787c6c0 x1494056660107472/t0(0) o38->lustre-MDT0001-osp-MDT0000@10.2.4.56@tcp:24/4 lens 400/544 e 0 to 1 dl 1424843561 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 1 previous similar message LustreError: 11-0: lustre-OST0000-osc-MDT0000: operation ost_connect to node 10.2.4.56@tcp failed: rc = -16 Lustre: lustre-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted. [root@onyx-25 ~]# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/sdb1 on /mnt/mds1 type lustre (rw,acl,user_xattr) [root@onyx-25 ~]# dl -bash: dl: command not found [root@onyx-25 ~]# lctl dl 0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 10 1 UP mgs MGS MGS 9 2 UP mgc MGC10.2.4.47@tcp 1dab9afc-c4b3-8c46-e219-e7a04558654f 5 3 UP mds MDS MDS_uuid 3 4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4 5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 13 6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4 7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4 8 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 9 UP osp lustre-MDT0001-osp-MDT0000 lustre-MDT0000-mdtlov_UUID 5 10 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5 [root@onyx-25 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 20642428 1916464 17677388 10% / tmpfs 16393952 0 16393952 0% /dev/shm /dev/sdb1 7498624 444148 6542260 7% /mnt/mds1 [root@onyx-25 ~]# |
| Comments |
| Comment by Sarah Liu [ 25/Feb/15 ] |
|
debug log. If you need any more logs, please just let me know. |
| Comment by Andreas Dilger [ 25/Feb/15 ] |
|
Sarah, I think there was a mistake in your process to add the additional MDTs. It looks like they were all formatted with mkfs.lustre --mdt --index=1 but the index should be increasing for each new MDT being added. Otherwise the MGS will refuse the new MDT to connect: LustreError: 140-5: Server lustre-MDT0001 requested index 1, but that index is already in use. |
| Comment by Sarah Liu [ 25/Feb/15 ] |
|
Hi Andreas, I did increase the index number while failed mounting the MDT which was formatted with index=1; anyway I will do it again to confirm. |
| Comment by Sarah Liu [ 27/Feb/15 ] |
|
I rerun the test again and it didn't hit the problem, so close the bug |