Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.4.0
-
CentOS 6.3, Lustre 2.4.0
-
3
-
11229
Description
Background:
Objective is to upgrade our Lustre software from 1.8.7 to 2.4.*.
We also want to split our current active/standby MDS with shared MDT to
2 MDTs and active/active.
The requirement is, all data will be in place during the upgrade/MDS
split
= We went though Lustre software upgrading from 1.8.7 (CentOS/el5) to
2.4.0 (CentOS/el6) successfully. During this process, we kept the
1 MDS/MDT.
= We then configured 2 other machines for the new MDS servers.
We transfered network interfaces to one of the new MDS servers.
= We formated MDT in new MDS
mkfs.lustre --reformat --fsname=rhino --param mdt.quota_type=ug --mgs --mdt --index=0 /dev/md0
= We copy the existing MDT contents and ea.bak files over the new server.
(with Gnu tar version 1.27)
/usr/local/bin/tar czvf /share/apps/tmp/rhino_mdt.tgz --sparse .
getfattr R -d -m '.*' -e hex -P . > /tmp/ea$(date +%Y%m%d).bak
= We then run
/usr/local/bin/tar xzvpf /share/apps/tmp/rhino_mdt.tgz --sparse
setfattr --restore=/share/apps/tmp/ea-20131023.bak
= We attempted to mount new MDT:
mount -t lustre /dev/md1 /rhino
= We got errors:
mount.lustre: mount /dev/md1 at /rhino failed: File exists
[from dmesg]
LDISKFS-fs (md1): mounted filesystem with ordered data mode. quota=on. Opts:
Lustre: 13422:0:(mgs_llog.c:238:mgs_fsdb_handler()) MDT using 1.8 OSC name scheme
LustreError: 140-5: Server rhino-MDT0000 requested index 0, but that index is already in use. Use --writeconf to force
LustreError: 13376:0:(mgs_llog.c:3625:mgs_write_log_target()) Can't get index (-98)
LustreError: 13376:0:(mgs_handler.c:408:mgs_handle_target_reg()) Failed to write rhino-MDT0000 log (-98)
LustreError: 13321:0:(obd_mount_server.c:1124:server_register_target()) rhino-MDT0000: error registering with the MGS: rc = -98 (not fatal)
Lustre: 13423:0:(obd_config.c:1428:class_config_llog_handler()) For 1.8 interoperability, rename obd type from mds to mdt
Lustre: rhino-MDT0000: used disk, loading
Lustre: 13423:0:(mdt_handler.c:4946:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
Lustre: 13423:0:(mdt_handler.c:4946:mdt_process_config()) Skipped 1 previous similar message
LustreError: 13423:0:(genops.c:320:class_newdev()) Device rhino-OST0000-osc already exists at 8, won't add
LustreError: 13423:0:(obd_config.c:374:class_attach()) Cannot create device rhino-OST0000-osc of type osp : -17
LustreError: 13423:0:(obd_config.c:1553:class_config_llog_handler()) MGC192.168.95.245@tcp: cfg command failed: rc = -17
Lustre: cmd=cf001 0:rhino-OST0000-osc 1:osp 2:rhino-mdtlov_UUID
LustreError: 15c-8: MGC192.168.95.245@tcp: The configuration from log 'rhino-MDT0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 13321:0:(obd_mount_server.c:1258:server_start_targets()) failed to start server rhino-MDT0000: -17
LustreError: 13321:0:(obd_mount_server.c:1700:server_fill_super()) Unable to start targets: -17
LustreError: 13321:0:(obd_mount_server.c:849:lustre_disconnect_lwp()) rhino-MDT0000-lwp-MDT0000: Can't end config log rhino-client.
LustreError: 13321:0:(obd_mount_server.c:1427:server_put_super()) rhino-MDT0000: failed to disconnect lwp. (rc=-2)
Lustre: Failing over rhino-MDT0000
LustreError: 137-5: rhino-MDT0000_UUID: not available for connect from 192.168.95.248@tcp (no target)
Lustre: 13321:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1382564544/real 1382564544] req@ffff880343c20c00 x1449708353487088/t0(0) o251->MGC192.168.95.245@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1382564550 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: server umount rhino-MDT0000 complete
LustreError: 13321:0:(obd_mount.c:1275:lustre_fill_super()) Unable to mount (-17)