Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6279

Failed adding new MDT after upgrade from a single MDT system

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 17602

    Description

      test steps:
      1. create a filesystem with single MDT under 2.4
      2. mount filesystem and create a dir-1/file-1
      3. shut down the system and upgrade to 2.5.3
      4. mount the filesystem again and create dir-2/file-2
      5. shut down the system again and then upgrade to 2.7.0
      6. format another three disks as MDTs under 2.7.0
      7. succeed mount the first MDT (the single MDT formatted under 2.4)
      8. try to mount the second MDT and got following errors:

      Lustre: Lustre: Build Version: 2.6.94--PRISTINE-2.6.32-431.29.2.el6_lustre.x86_64
      LNet: Added LNI 10.2.4.47@tcp [8/256/0/180]
      LNet: Accept secure, port 988
      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. quota=on. Opts: 
      Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1424843466/real 1424843466]  req@ffff8807976469c0 x1494056660107304/t0(0) o38->lustre-MDT0001-osp-MDT0000@10.2.4.56@tcp:24/4 lens 400/544 e 0 to 1 dl 1424843471 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843466/real 1424843466]  req@ffff880797646cc0 x1494056660107300/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843471 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      LDISKFS-fs (sdb2): mounted filesystem with ordered data mode. quota=on. Opts: 
      LDISKFS-fs (sdb2): mounted filesystem with ordered data mode. quota=on. Opts: 
      LustreError: 140-5: Server lustre-MDT0001 requested index 1, but that index is already in use. Use --writeconf to force
      LustreError: 10216:0:(mgs_handler.c:439:mgs_target_reg()) Failed to write lustre-MDT0001 log (-98)
      LustreError: 15f-b: lustre-MDT0001: cannot register this server with the MGS: rc = -98. Is the MGS running?
      LustreError: 10283:0:(obd_mount_server.c:1783:server_fill_super()) Unable to start targets: -98
      LustreError: 10283:0:(obd_mount_server.c:1498:server_put_super()) no obd lustre-MDT0001
      LustreError: 10283:0:(obd_mount_server.c:137:server_deregister_mount()) lustre-MDT0001 not registered
      Lustre: server umount lustre-MDT0001 complete
      LustreError: 10283:0:(obd_mount.c:1339:lustre_fill_super()) Unable to mount  (-98)
      Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843491/real 1424843491]  req@ffff88079787c0c0 x1494056660107440/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843501 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843516/real 1424843516]  req@ffff88079787c6c0 x1494056660107452/t0(0) o8->lustre-OST0000-osc-MDT0000@10.2.4.56@tcp:28/4 lens 400/544 e 0 to 1 dl 1424843531 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 1 previous similar message
      Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1424843541/real 1424843541]  req@ffff88079787c6c0 x1494056660107472/t0(0) o38->lustre-MDT0001-osp-MDT0000@10.2.4.56@tcp:24/4 lens 400/544 e 0 to 1 dl 1424843561 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Lustre: 10145:0:(client.c:1939:ptlrpc_expire_one_request()) Skipped 1 previous similar message
      LustreError: 11-0: lustre-OST0000-osc-MDT0000: operation ost_connect to node 10.2.4.56@tcp failed: rc = -16
      Lustre: lustre-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects
      Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.
      [root@onyx-25 ~]# mount
      /dev/sda1 on / type ext3 (rw)
      proc on /proc type proc (rw)
      sysfs on /sys type sysfs (rw)
      devpts on /dev/pts type devpts (rw,gid=5,mode=620)
      tmpfs on /dev/shm type tmpfs (rw)
      none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
      sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
      nfsd on /proc/fs/nfsd type nfsd (rw)
      /dev/sdb1 on /mnt/mds1 type lustre (rw,acl,user_xattr)
      [root@onyx-25 ~]# dl
      -bash: dl: command not found
      [root@onyx-25 ~]# lctl dl
        0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 10
        1 UP mgs MGS MGS 9
        2 UP mgc MGC10.2.4.47@tcp 1dab9afc-c4b3-8c46-e219-e7a04558654f 5
        3 UP mds MDS MDS_uuid 3
        4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
        5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 13
        6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4
        7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4
        8 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
        9 UP osp lustre-MDT0001-osp-MDT0000 lustre-MDT0000-mdtlov_UUID 5
       10 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5
      [root@onyx-25 ~]# df
      Filesystem     1K-blocks    Used Available Use% Mounted on
      /dev/sda1       20642428 1916464  17677388  10% /
      tmpfs           16393952       0  16393952   0% /dev/shm
      /dev/sdb1        7498624  444148   6542260   7% /mnt/mds1
      [root@onyx-25 ~]# 
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: