Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4138

Problem with migrating from 1 MDT to 2 MDT

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.4.0
    • CentOS 6.3, Lustre 2.4.0
    • 3
    • 11229

    Description

      Background:

      Objective is to upgrade our Lustre software from 1.8.7 to 2.4.*.
      We also want to split our current active/standby MDS with shared MDT to
      2 MDTs and active/active.

      The requirement is, all data will be in place during the upgrade/MDS
      split

      = We went though Lustre software upgrading from 1.8.7 (CentOS/el5) to
      2.4.0 (CentOS/el6) successfully. During this process, we kept the
      1 MDS/MDT.

      = We then configured 2 other machines for the new MDS servers.
      We transfered network interfaces to one of the new MDS servers.

      = We formated MDT in new MDS

      mkfs.lustre --reformat --fsname=rhino --param mdt.quota_type=ug --mgs --mdt --index=0 /dev/md0

      = We copy the existing MDT contents and ea.bak files over the new server.
      (with Gnu tar version 1.27)

      /usr/local/bin/tar czvf /share/apps/tmp/rhino_mdt.tgz --sparse .

      getfattr R -d -m '.*' -e hex -P . > /tmp/ea$(date +%Y%m%d).bak

      = We then run

      /usr/local/bin/tar xzvpf /share/apps/tmp/rhino_mdt.tgz --sparse

      setfattr --restore=/share/apps/tmp/ea-20131023.bak

      = We attempted to mount new MDT:

      mount -t lustre /dev/md1 /rhino

      = We got errors:

      mount.lustre: mount /dev/md1 at /rhino failed: File exists

      [from dmesg]

      LDISKFS-fs (md1): mounted filesystem with ordered data mode. quota=on. Opts:
      Lustre: 13422:0:(mgs_llog.c:238:mgs_fsdb_handler()) MDT using 1.8 OSC name scheme
      LustreError: 140-5: Server rhino-MDT0000 requested index 0, but that index is already in use. Use --writeconf to force
      LustreError: 13376:0:(mgs_llog.c:3625:mgs_write_log_target()) Can't get index (-98)
      LustreError: 13376:0:(mgs_handler.c:408:mgs_handle_target_reg()) Failed to write rhino-MDT0000 log (-98)
      LustreError: 13321:0:(obd_mount_server.c:1124:server_register_target()) rhino-MDT0000: error registering with the MGS: rc = -98 (not fatal)
      Lustre: 13423:0:(obd_config.c:1428:class_config_llog_handler()) For 1.8 interoperability, rename obd type from mds to mdt
      Lustre: rhino-MDT0000: used disk, loading
      Lustre: 13423:0:(mdt_handler.c:4946:mdt_process_config()) For interoperability, skip this mdt.quota_type. It is obsolete.
      Lustre: 13423:0:(mdt_handler.c:4946:mdt_process_config()) Skipped 1 previous similar message
      LustreError: 13423:0:(genops.c:320:class_newdev()) Device rhino-OST0000-osc already exists at 8, won't add
      LustreError: 13423:0:(obd_config.c:374:class_attach()) Cannot create device rhino-OST0000-osc of type osp : -17
      LustreError: 13423:0:(obd_config.c:1553:class_config_llog_handler()) MGC192.168.95.245@tcp: cfg command failed: rc = -17
      Lustre: cmd=cf001 0:rhino-OST0000-osc 1:osp 2:rhino-mdtlov_UUID
      LustreError: 15c-8: MGC192.168.95.245@tcp: The configuration from log 'rhino-MDT0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      LustreError: 13321:0:(obd_mount_server.c:1258:server_start_targets()) failed to start server rhino-MDT0000: -17
      LustreError: 13321:0:(obd_mount_server.c:1700:server_fill_super()) Unable to start targets: -17
      LustreError: 13321:0:(obd_mount_server.c:849:lustre_disconnect_lwp()) rhino-MDT0000-lwp-MDT0000: Can't end config log rhino-client.
      LustreError: 13321:0:(obd_mount_server.c:1427:server_put_super()) rhino-MDT0000: failed to disconnect lwp. (rc=-2)
      Lustre: Failing over rhino-MDT0000
      LustreError: 137-5: rhino-MDT0000_UUID: not available for connect from 192.168.95.248@tcp (no target)
      Lustre: 13321:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1382564544/real 1382564544] req@ffff880343c20c00 x1449708353487088/t0(0) o251->MGC192.168.95.245@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1382564550 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: server umount rhino-MDT0000 complete
      LustreError: 13321:0:(obd_mount.c:1275:lustre_fill_super()) Unable to mount (-17)

      Attachments

        Activity

          People

            mdiep Minh Diep
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: