[LU-4138] Problem with migrating from 1 MDT to 2 MDT Created: 23/Oct/13  Updated: 08/May/15  Resolved: 08/May/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Haisong Cai (Inactive) Assignee: Minh Diep
Resolution: Fixed Votes: 0
Labels: Sdsc
Environment:

CentOS 6.3, Lustre 2.4.0


Severity: 3
Rank (Obsolete): 11229

 Description   

Background:

Objective is to upgrade our Lustre software from 1.8.7 to 2.4.*.
We also want to split our current active/standby MDS with shared MDT to
2 MDTs and active/active.

The requirement is, all data will be in place during the upgrade/MDS
split

= We went though Lustre software upgrading from 1.8.7 (CentOS/el5) to
2.4.0 (CentOS/el6) successfully. During this process, we kept the
1 MDS/MDT.

= We then configured 2 other machines for the new MDS servers.
We transfered network interfaces to one of the new MDS servers.

= We formated MDT in new MDS

mkfs.lustre --reformat --fsname=rhino --param mdt.quota_type=ug --mgs --mdt --index=0 /dev/md0

= We copy the existing MDT contents and ea.bak files over the new server.
(with Gnu tar version 1.27)

/usr/local/bin/tar czvf /share/apps/tmp/rhino_mdt.tgz --sparse .

getfattr R -d -m '.*' -e hex -P . > /tmp/ea$(date +%Y%m%d).bak

= We then run

/usr/local/bin/tar xzvpf /share/apps/tmp/rhino_mdt.tgz --sparse

setfattr --restore=/share/apps/tmp/ea-20131023.bak

= We attempted to mount new MDT:

mount -t lustre /dev/md1 /rhino

= We got errors:

mount.lustre: mount /dev/md1 at /rhino failed: File exists

[from dmesg]

LDISKFS-fs (md1): mounted filesystem with ordered data mode. quota=on. Opts:
Lustre: 13422:0:(mgs_llog.c:238:mgs_fsdb_handler()) MDT using 1.8 OSC name scheme
LustreError: 140-5: Server rhino-MDT0000 requested index 0, but that index is already in use. Use --writeconf to force
LustreError: 13376:0:(mgs_llog.c:3625:mgs_write_log_target()) Can't get index (-98)
LustreError: 13376:0:(mgs_handler.c:408:mgs_handle_target_reg()) Failed to write rhino-MDT0000 log (-98)
LustreError: 13321:0:(obd_mount_server.c:1124:server_register_target()) rhino-MDT0000: error registering with the MGS: rc = -98 (not fatal)
Lustre: 13423:0:(obd_config.c:1428:class_config_llog_handler()) For 1.8 interoperability, rename obd type from mds to mdt
Lustre: rhino-MDT0000: used disk, loading
Lustre: 13423:0:(mdt_handler.c:4946:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
Lustre: 13423:0:(mdt_handler.c:4946:mdt_process_config()) Skipped 1 previous similar message
LustreError: 13423:0:(genops.c:320:class_newdev()) Device rhino-OST0000-osc already exists at 8, won't add
LustreError: 13423:0:(obd_config.c:374:class_attach()) Cannot create device rhino-OST0000-osc of type osp : -17
LustreError: 13423:0:(obd_config.c:1553:class_config_llog_handler()) MGC192.168.95.245@tcp: cfg command failed: rc = -17
Lustre: cmd=cf001 0:rhino-OST0000-osc 1:osp 2:rhino-mdtlov_UUID
LustreError: 15c-8: MGC192.168.95.245@tcp: The configuration from log 'rhino-MDT0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 13321:0:(obd_mount_server.c:1258:server_start_targets()) failed to start server rhino-MDT0000: -17
LustreError: 13321:0:(obd_mount_server.c:1700:server_fill_super()) Unable to start targets: -17
LustreError: 13321:0:(obd_mount_server.c:849:lustre_disconnect_lwp()) rhino-MDT0000-lwp-MDT0000: Can't end config log rhino-client.
LustreError: 13321:0:(obd_mount_server.c:1427:server_put_super()) rhino-MDT0000: failed to disconnect lwp. (rc=-2)
Lustre: Failing over rhino-MDT0000
LustreError: 137-5: rhino-MDT0000_UUID: not available for connect from 192.168.95.248@tcp (no target)
Lustre: 13321:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1382564544/real 1382564544] req@ffff880343c20c00 x1449708353487088/t0(0) o251->MGC192.168.95.245@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1382564550 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: server umount rhino-MDT0000 complete
LustreError: 13321:0:(obd_mount.c:1275:lustre_fill_super()) Unable to mount (-17)



 Comments   
Comment by Peter Jones [ 24/Oct/13 ]

Yu, Jian

Could you please advise on this one

Peter

Comment by Minh Diep [ 24/Oct/13 ]

Haisong,

I just noticed this
LustreError: 15c-8: MGC192.168.95.245@tcp: The configuration from log 'rhino-MDT0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.

the ip address on the MGC seems to be the old/previous ip, can you confirm? If you moved the MDS with ip change, we should --writeconf to wipe the ip as well, no?

Comment by Andreas Dilger [ 24/Oct/13 ]

Better would be to use "lctl --replace-nids" instead of a whole writeconf c

Comment by Andreas Dilger [ 24/Oct/13 ]

I think the goal is to replace the old MDS hardware with a new node and disks.

Lustre is confused because you are formatting the new MDS and then restoring from the tar backup, but for some reason the MDT thinks it is new. Maybe the label on the MDS needs to be fixed? What does "e2label /dev/md1" report?

I submitted a patch under LU-14 to add the "--replace" option to mkfs.lustre for this case, but it is not in 2.4.

Note that I would also recommend to use Lustre 2.4.1 instead of 2.4.0 so you get the other fixes included there.

Comment by Minh Diep [ 24/Oct/13 ]
  1. e2label /dev/md1
    rhino-MDT0000

the new server has been configure with the same ip address as the old one. My question above was incorrect because I login a different server. We tried the lctl replace_nids /dev/md1 nids but it require to have mgt/mdt mounted which failed

[root@lustre-mds-0-0 modprobe.d]# lctl replace_nids /dev/md1 192.168.95.245@tcp
No device found for name MGS: Invalid argument
This command must be run on the MGS.
error: replace_nids: Invalid argument

Comment by Minh Diep [ 24/Oct/13 ]

here are the command that I ran to make it worked. I will try a couple more different scenarios

658 reboot
659 mkfs.lustre --reformat --fsname=rhino --param mdt.quota_type=ug --mgs --mdt --index=0 /dev/md0
660 mount -t ldiskfs /dev/md0 /mnt
661 cd /mnt
662 ls
663 /usr/local/bin/tar xzvpf /share/apps/tmp/rhino_mdt.tgz --sparse
664 setfattr --restore=/share/apps/tmp/ea-20131023.bak
665 cd
666 umount /mnt
667 tunefs.lustre --writeconf --reformat --fsname=rhino /dev/md0
668 tunefs.lustre --writeconf --reformat --fsname=rhino --mgs --mdt /dev/md0
669 mount -t lustre /dev/md0 /rhino/
670 lctl dl

Comment by Jian Yu [ 25/Oct/13 ]

Hi Minh,

tunefs.lustre --writeconf --reformat --fsname=rhino --mgs --mdt /dev/md0

IMHO, "--reformat" option is not needed here. So, do I understand correctly that running "tunefs.lustre --writeconf" can resolve the original "index is already in use" failure because "LDD_F_WRITECONF" flag is set?

Comment by Minh Diep [ 25/Oct/13 ]

Hi YuJian,

no, tunefs.lustre --writeconf --mgs --mdt /dev/md0 did not solve the issue. In this case, this is a bug.
I have to use either (not both) --reformat --fsname=rhino or --reformat --fsname=rhino --mgs --mdt.

Comment by Haisong Cai (Inactive) [ 08/Nov/13 ]

Minh,

The IP address is the one from previous server running 1.8.7.
Our test has been "data in place" from existing servers of 1.8.7 to
2.4.* we going to keep IPs.
Am I missing something here?

It's true that in our test case, MGC192.168.95.245@tcp was used by
another hardware.
But we have move the cable from old to new. The old server has no
Lustre running.

Haisong

Comment by Minh Diep [ 14/Nov/13 ]

I found that we needed to --writeconf and unmount all the OST while working on the MDS/MDT. So far it seems like user error. I will take this bug and verify further.

Comment by Minh Diep [ 15/Nov/13 ]

Procedure to restore and upgrade MDS and add another MDT

On MDS0

1. mkfs.lustre --reformat --fsname=rhino --param mdt.quota_type=ug --mgs --mdt --index=0 /dev/md0
2. mount -t ldiskfs /dev/md0 /mnt
3. cd /mnt
4. /usr/local/bin/tar xzvpf /share/apps/tmp/rhino_mdt.tgz --sparse
5. setfattr --restore=/share/apps/tmp/ea-20131023.bak
6. cd; umount /mnt
7. tunefs.lustre —erase-params /dev/md0
8. tunefs.lustre --writeconf —param=“failover.node=<MDS1 nid> --mgs --mdt /dev/md0
9. mount -t lustre –o writeconf /dev/md0 /rhino/
10. Umount /rhino
11. Mount –t ldiskfs /dev/md0 /mnt
12. Ls –l /mnt/CONFIG* (check the timestamp of the file to see if it’s current)
13. llog_reader /mnt/CONFIG*/rhino-MDT0000 ( the output should show about 7 to 9 lines with timestamp current)
14. Umount /mnt
15. Mount –t lustre /dev/md0 /rhino

On MDS1

1. mkfs.lustre --reformat --fsname rhino --param mdt.quota_type=ug --mgsnode <MDS0 nid> --failnode <MDS0 nid> --mdt --index 1 /dev/md1
2.mount –t lustre /dev/md1 /rhino
3. Lctl dl on both MDS0 and MDS1 to see if they have MDT0001

On OSS (repeat on all OSS)

1. Tunefs.lustre —-erase-params /dev/sdb (repeat for all devices)
2. Tunefs.lustre —writeconf —param="ost.quota_type=ug -—param="failover.mode=failout” —mgsnode=<MDS0 nid> —mgsnode=<MDS1 nid> —ost /dev/sdb (repeat on all devices)
3. Mount –t lustre –o writeconf /dev/sdb /rhino/sdb (repeat on all devices)

On Clients

Mount –t lustre <MDS0 nid>:<MDS1 nid>:/rhino /rhino

NOTE: do not use --servicenode option due to LU-4243

Comment by Minh Diep [ 21/Nov/13 ]

latest status: We were able to test the failover while traffics going to both MDTs.

SDSC continue testing and will tear the test cluster down and start over again from 1.8.9 to make sure before the production upgrade

Comment by John Fuchs-Chesney (Inactive) [ 12/Mar/14 ]

Hello Minh and Haisong,
Any further progress on this issue?
Should we mark this as resolved?
Thanks,
~ jfc.

Comment by Minh Diep [ 12/Mar/14 ]

sure, you can close it for now. we'll open when we move to 2 MDT later this year

Comment by Andreas Dilger [ 08/May/15 ]

Closing per last comments.

Note that with newer Lustre it is possible to use mkfs.lustre --replace --index=0 to replace an existing target with one that has been restored from a file-level backup. This has been tested with OST replacement, but it should also work with MDTs (it marks the target so that it doesn't try to register with the MGS as a new device).

Generated at Sat Feb 10 01:40:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.