Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4263

zfs-based OST responds to 'lctl create/destoroy' from an obdecho client with an error

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0, Lustre 2.5.1
    • None
    • ZFS-based OST
    • 3
    • 11705

    Description

      Synopsis: zfs-based OST responds to 'lctl create/destoroy' from an
      obdecho client with an error, and obdfilter-survey always hits that
      error.

      The following details are in two parts. The first part lays out in
      concrete detail how I built the ZFS-based lustre instance on which
      this experement rests. The second part goes through the specific steps
      to produce the error. The steps in the second part are, so far as I
      can tell, exactly those steps that obdfilter-echo would take given the
      command line (and restuls):

      [root@oss01 andrew]# nobjhi=2 thrhi=2 size=1024 case=disk /usr/bin/obdfilter-survey
      Fri Nov 15 14:33:43 PST 2013 Obdfilter-survey for case=disk from oss01
      ost 1 sz 1048576K rsz 1024K obj 1 thr 1 write 853.95 SHORT rewrite 485.57 [ 382.97, 382.97] read 825.94 SHORT
      ost 1 sz 1048576K rsz 1024K obj 1 thr 2 ERROR: 1 != 0
      create: 1 objects
      error: create: #1 - File exists
      created object #s on localhost:zlustre-OST0000_ecc not contiguous

      ----Part I----------------------------------------------------------

      The OpenSFS cluster hosted at Indiana University has 24 "client" nodes
      and 8 "server" nodes. My work has been on
      c[21-24],mds[03-04],oss[01-02], but these details are really focused
      on just oss01. I will give the details of the construction of the MGS
      and the (separate) MDS, but feel free to skip them. I do not think
      they are directly relevant.

      --MGS---------
      [root@mds04 andrew]# modprobe lustre
      [root@mds04 andrew]# zpool create -f zpool-mgt0 mirror /dev/sd[ab] mirror /dev/sd[st]
      [root@mds04 andrew]# umount zpool-mgt0
      [root@mds04 andrew]# mkfs.lustre --mgs --fsname zlustre --backfstype=zfs --reformat zpool-mgt0/mgt0

      Permanent disk data:
      Target: MGS
      Index: unassigned
      Lustre FS: zlustre
      Mount type: zfs
      Flags: 0x64
      (MGS first_time update )
      Persistent mount opts:
      Parameters:

      mkfs_cmd = zfs create -o canmount=off -o xattr=sa zpool-mgt0/mgt0
      Writing zpool-mgt0/mgt0 properties
      lustre:version=1
      lustre:flags=100
      lustre:index=65535
      lustre:fsname=zlustre
      lustre:svname=MGS
      [root@mds04 andrew]# mkdir /mgt0
      [root@mds04 andrew]# mount -t lustre zpool-mgt0/mgt0 /mgt0
      --MGS---------
      --MDS---------
      [root@mds03 andrew]# modprobe lustre
      [root@mds03 andrew]# zpool create -f zpool-mdt0 mirror /dev/sd[gh] mirror /dev/sd[mn]
      [root@mds03 andrew]# mkfs.lustre --mdt --index=0 --fsname zlustre --mgsnid=192.168.2.128@o2ib --backfstype=zfs --reformat zpool-mdt0/mdt0

      Permanent disk data:
      Target: zlustre:MDT0000
      Index: 0
      Lustre FS: zlustre
      Mount type: zfs
      Flags: 0x61
      (MDT first_time update )
      Persistent mount opts:
      Parameters: mgsnode=192.168.2.128@o2ib

      mkfs_cmd = zfs create -o canmount=off -o xattr=sa zpool-mdt0/mdt0
      Writing zpool-mdt0/mdt0 properties
      lustre:version=1
      lustre:flags=97
      lustre:index=0
      lustre:fsname=zlustre
      lustre:svname=zlustre:MDT0000
      lustre:mgsnode=192.168.2.128@o2ib
      [root@mds03 andrew]# mkdir /mdt0
      [root@mds03 andrew]# mount -t lustre zpool-mdt0/mdt0 /mdt0
      --MDS---------

      And here is the OSS where the actual experement takes place:
      --OSS---------
      [root@oss01 andrew]# modprobe lustre
      [root@oss01 andrew]# zpool create -f zpool-ost0 raidz2 /dev/sd[rstuvwx]
      [root@oss01 andrew]# mkfs.lustre --ost --index=0 --fsname zlustre --mgsnid=192.168.2.128@o2ib --backfstype=zfs --reformat zpool-ost0/ost0

      Permanent disk data:
      Target: zlustre:OST0000
      Index: 0
      Lustre FS: zlustre
      Mount type: zfs
      Flags: 0x62
      (OST first_time update )
      Persistent mount opts:
      Parameters: mgsnode=192.168.2.128@o2ib

      mkfs_cmd = zfs create -o canmount=off -o xattr=sa zpool-ost0/ost0
      Writing zpool-ost0/ost0 properties
      lustre:version=1
      lustre:flags=98
      lustre:index=0
      lustre:fsname=zlustre
      lustre:svname=zlustre:OST0000
      lustre:mgsnode=192.168.2.128@o2ib
      [root@oss01 andrew]# umount /zpool-ost0/
      [root@oss01 andrew]# mkdir /ost0
      [root@oss01 andrew]# mount -t lustre zpool-ost0/ost0 /ost0
      [root@oss01 andrew]# nobjhi=2 thrhi=2 size=1024 case=disk /usr/bin/obdfilter-survey
      Thu Nov 14 15:11:28 PST 2013 Obdfilter-survey for case=disk from oss01
      ost 1 sz 1048576K rsz 1024K obj 1 thr 1 write 805.38 SHORT rewrite 450.65 [ 453.97, 453.97] read 757.87 SHORT
      ost 1 sz 1048576K rsz 1024K obj 1 thr 2 ERROR: 1 != 0
      create: 1 objects
      error: create: #1 - File exists
      created object #s on localhost:zlustre-OST0000_ecc not contiguous
      --OSS---------

      N.B I also construct two OSTs on oss02, but they also play no role in
      this experiment.

      Having built the above file system, I mounted it on to of the clients
      and ran a few rudimentary 'dd' file creates and 'cp' file copies,
      verifying that the file system itself appears to be working correctly.

      Once a clean new instance of the file system has been recreated I run
      the obdfilter-survey at the top of this note with the results reported
      there. Cliff white has confimed that this error also occurs on his
      platform, Hyperion. After investigating how the obdfilter-survey
      actually interacts with 'lctl' and the 'obdecho' client, I abstracted
      the core details and recreated the error (again with a clean new file
      system) with a minimum of distraction. Those deatils are in Part II
      along with a brief note about the error thatappears on the console.

      ----Part I----------------------------------------------------------

      ----Part II---------------------------------------------------------

      [root@oss01 andrew]# modprobe obdecho
      [root@oss01 andrew]# lctl dl
      0 UP osd-zfs zlustre-OST0000-osd zlustre-OST0000-osd_UUID 5
      1 UP mgc MGC192.168.2.128@o2ib 46fb8d7b-7224-7952-9c9f-43af71bdf872 5
      2 UP ost OSS OSS_uuid 3
      3 UP obdfilter zlustre-OST0000 zlustre-OST0000_UUID 5
      4 UP lwp zlustre-MDT0000-lwp-OST0000 zlustre-MDT0000-lwp-OST0000_UUID 5
      [root@oss01 andrew]# lctl
      lctl > attach echo_client zlustre-OST0000_ecc zlustre-OST0000_ecc_UUID
      lctl > setup zlustre-OST0000
      lctl > dl
      0 UP osd-zfs zlustre-OST0000-osd zlustre-OST0000-osd_UUID 5
      1 UP mgc MGC192.168.2.128@o2ib 46fb8d7b-7224-7952-9c9f-43af71bdf872 5
      2 UP ost OSS OSS_uuid 3
      3 UP obdfilter zlustre-OST0000 zlustre-OST0000_UUID 6
      4 UP lwp zlustre-MDT0000-lwp-OST0000 zlustre-MDT0000-lwp-OST0000_UUID 5
      5 UP echo_client zlustre-OST0000_ecc zlustre-OST0000_ecc_UUID 3
      lctl > quit
      [root@oss01 andrew]# lctl --device 5 create 1
      create: 1 objects
      create: #1 is object id 0x2
      [root@oss01 andrew]# lctl
      lctl > --threads 1 -1 5 test_brw 1024 wx q 256 1t2 p256
      Print status every 1 seconds
      --threads: starting 1 threads on device 5 running test_brw 1024 wx q 256 1t2 p256
      Total: total 1024 threads 1 sec 1.196911 855.535625/second
      lctl > --threads 1 -1 5 test_brw 1024 wx q 256 1t2 p256
      Print status every 1 seconds
      --threads: starting 1 threads on device 5 running test_brw 1024 wx q 256 1t2 p256
      Total: total 1024 threads 1 sec 1.042479 982.273983/second
      lctl > --threads 1 -1 5 test_brw 1024 rx q 256 1t2 p256
      Print status every 1 seconds
      --threads: starting 1 threads on device 5 running test_brw 1024 rx q 256 1t2 p256
      Total: total 1024 threads 1 sec 1.110232 922.329747/second
      lctl > quit
      [root@oss01 andrew]# lctl --device 5 destroy 0x2 1
      destroy: 1 objects
      destroy: #1 is object id 0x2
      [root@oss01 andrew]# lctl --device 5 create 1
      create: 1 objects
      error: create: #1 - No such file or directory
      [root@oss01 andrew]# lctl
      lctl > cfg zlustre-OST0000_ecc
      lctl > cleanup
      lctl > detach
      lctl > quit
      ----------------------------------------------------------------------

      At the point you load obdecho there is one comment on the
      console. After that the console is silent until you hit the object
      creation error:

      <ConMan> Connection to console [oss01] opened.
      Lustre: Echo OBD driver; http://www.lustre.org/
      LustreError: 7139:0:(osd_handler.c:213:osd_trans_start()) zlustre-OST0000: can't assign tx: rc = -2
      LustreError: 7139:0:(ofd_obd.c:1356:ofd_create()) zlustre-OST0000: unable to precreate: rc = -2
      LustreError: 7139:0:(echo_client.c:2310:echo_create_object()) Cannot create objects: rc = -2
      LustreError: 7139:0:(echo_client.c:2334:echo_create_object()) create object failed with: rc = -2

      ----Part II---------------------------------------------------------

      If I mount the corresponding ZFS zpool as justa plain old zfs I do see
      a directory for .../O/2, but I do not know enough about the object
      handling in Lustre to verify if it "looks right".

      Andrew Uselton
      2013-11-15

      Attachments

        Activity

          [LU-4263] zfs-based OST responds to 'lctl create/destoroy' from an obdecho client with an error
          yujian Jian Yu added a comment -

          Patch http://review.whamcloud.com/8301 was cherry-picked to Lustre b2_5 branch for 2.5.1.

          yujian Jian Yu added a comment - Patch http://review.whamcloud.com/8301 was cherry-picked to Lustre b2_5 branch for 2.5.1.
          pjones Peter Jones added a comment -

          Landed for 2.6

          pjones Peter Jones added a comment - Landed for 2.6
          bogl Bob Glossman (Inactive) added a comment - in b2_5: http://review.whamcloud.com/8893

          I can confirm that the patch does fix this, at least for a very small test instance. Thanks so much Li Wei.

          Detials:
          BUILD=lustre-reviews
          B_NUM=19549
          PACKAGES="expect,lsof,curl,gcc,make,cvs,bc,byacc,posix,pdsh"
          loadjenkinsbuild -n mds03,mds04,oss01,oss02 -j $BUILD -b $B_NUM -t server -d el6 -a x86_64 \
          -i inkernel --profile test --packages="${PACKAGES}" --reboot \
          –powerup --nocheckreserved
          URL=http://archive.zfsonlinux.org/epel/zfs-release-1-3.el6.noarch.rpm
          yum localinstall -y --nogpgcheck $URL
          yum install -y zfs
          yum install -y lustre-osd-zfs

          [create file system as detailed in original post, then run the obdfilter-survey with a very small set of tests]

          [root@oss01 andrew]# nobjhi=2 thrhi=2 size=1024 case=disk /usr/bin/obdfilter-survey
          Mon Nov 18 15:09:42 PST 2013 Obdfilter-survey for case=disk from oss01
          ost 1 sz 1048576K rsz 1024K obj 1 thr 1 write 746.59 SHORT rewrite 571.87 SHORT read 798.46 SHORT
          ost 1 sz 1048576K rsz 1024K obj 1 thr 2 write 1113.45 SHORT rewrite 864.40 SHORT read 935.57 SHORT
          ost 1 sz 1048576K rsz 1024K obj 2 thr 2 write 1089.11 SHORT rewrite 401.32 [ 397.97, 397.97] read 1229.06 SHORT
          done!

          And we have a success
          -Andrew

          uselton Andrew Uselton (Inactive) added a comment - I can confirm that the patch does fix this, at least for a very small test instance. Thanks so much Li Wei. Detials: BUILD=lustre-reviews B_NUM=19549 PACKAGES="expect,lsof,curl,gcc,make,cvs,bc,byacc,posix,pdsh" loadjenkinsbuild -n mds03,mds04,oss01,oss02 -j $BUILD -b $B_NUM -t server -d el6 -a x86_64 \ -i inkernel --profile test --packages="${PACKAGES}" --reboot \ –powerup --nocheckreserved URL= http://archive.zfsonlinux.org/epel/zfs-release-1-3.el6.noarch.rpm yum localinstall -y --nogpgcheck $URL yum install -y zfs yum install -y lustre-osd-zfs [create file system as detailed in original post, then run the obdfilter-survey with a very small set of tests] [root@oss01 andrew] # nobjhi=2 thrhi=2 size=1024 case=disk /usr/bin/obdfilter-survey Mon Nov 18 15:09:42 PST 2013 Obdfilter-survey for case=disk from oss01 ost 1 sz 1048576K rsz 1024K obj 1 thr 1 write 746.59 SHORT rewrite 571.87 SHORT read 798.46 SHORT ost 1 sz 1048576K rsz 1024K obj 1 thr 2 write 1113.45 SHORT rewrite 864.40 SHORT read 935.57 SHORT ost 1 sz 1048576K rsz 1024K obj 2 thr 2 write 1089.11 SHORT rewrite 401.32 [ 397.97, 397.97] read 1229.06 SHORT done! And we have a success -Andrew
          liwei Li Wei (Inactive) added a comment - - edited

          No problem. I'm afraid the patch has to be ported to the branch you are using, pushed to Gerrit so that it gets built, and provisioned to your cluster using loadjenkindsbuild. I have pushed one for latest lustre-release.git master branch: http://review.whamcloud.com/8301. If that's suitable for your needs, just wait for "Jenkins" to complete the build and run loadjenkinsbuild.

          liwei Li Wei (Inactive) added a comment - - edited No problem. I'm afraid the patch has to be ported to the branch you are using, pushed to Gerrit so that it gets built, and provisioned to your cluster using loadjenkindsbuild. I have pushed one for latest lustre-release.git master branch: http://review.whamcloud.com/8301 . If that's suitable for your needs, just wait for "Jenkins" to complete the build and run loadjenkinsbuild.

          Thanks Li Wei. It's good to know that this is a known issue. I am very new to the process, so can I trouble you for a newbie question? If I want to test this patch on the OpenSFS cluster that I am using as a sandbox, is there some simple rune to give "loadjenkinsbuild" so that I get it, or will I need to engage in patch-and-build manually?
          -Andrew

          uselton Andrew Uselton (Inactive) added a comment - Thanks Li Wei. It's good to know that this is a known issue. I am very new to the process, so can I trouble you for a newbie question? If I want to test this patch on the OpenSFS cluster that I am using as a sandbox, is there some simple rune to give "loadjenkinsbuild" so that I get it, or will I need to engage in patch-and-build manually? -Andrew

          Just in case you missed my comment on Skype yesterday: http://review.whamcloud.com/#/c/7508/ may be helpful.

          liwei Li Wei (Inactive) added a comment - Just in case you missed my comment on Skype yesterday: http://review.whamcloud.com/#/c/7508/ may be helpful.

          People

            liwei Li Wei (Inactive)
            uselton Andrew Uselton (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: