Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-657

recovery-mds-scale (FLAVOR=MDS): client load failed

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.1.0, Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.1, Lustre 2.1.4
    • 3
    • 4239

    Description

      After running for 8 hours (MDS failed over 51 times), recovery-mds-scale test failed as follows:

      Starting mds1: -o user_xattr,acl  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds1
      fat-amd-1-ib: debug=0x33f0404
      fat-amd-1-ib: subsystem_debug=0xffb7e3ff
      fat-amd-1-ib: debug_mb=48
      Started lustre-MDT0000
      ==== Checking the clients loads AFTER  failover -- failure NOT OK
      Client load failed on node client-12-ib, rc=1
      Client load failed during failover. Exiting
      Found the END_RUN_FILE file: /home/yujian/test_logs/end_run_file
      client-12-ib
      client-2-ib
      Client load failed on node client-12-ib
      
      client client-12-ib load stdout and debug files :
                    /tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib
                    /tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib.debug
      2011-09-01 08:15:46 Terminating clients loads ...
      Duration:                43200
      Server failover period: 600 seconds
      Exited after:           30428 seconds
      Number of failovers before exit:
      mds1: 51 times
      ost1: 0 times
      ost2: 0 times
      ost3: 0 times
      ost4: 0 times
      ost5: 0 times
      ost6: 0 times
      Status: FAIL: rc=1
      

      /tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib.debug:

      2011-09-01 08:07:51: dd succeeded
      + cd /tmp
      + rm -rf /mnt/lustre/d0.dd-client-12-ib
      ++ date '+%F %H:%M:%S'
      + echoerr '2011-09-01 08:07:53: dd run finished'
      + echo '2011-09-01 08:07:53: dd run finished'
      2011-09-01 08:07:53: dd run finished
      + '[' '!' -e /home/yujian/test_logs/end_run_file ']'
      + true
      ++ date '+%F %H:%M:%S'
      + echoerr '2011-09-01 08:07:53: dd run starting'
      + echo '2011-09-01 08:07:53: dd run starting'
      2011-09-01 08:07:53: dd run starting
      + mkdir -p /mnt/lustre/d0.dd-client-12-ib
      + cd /mnt/lustre/d0.dd-client-12-ib
      + load_pid=26714
      + wait 26714
      + dd bs=4k count=1000000 status=noxfer if=/dev/zero of=/mnt/lustre/d0.dd-client-12-ib/dd-file
      dd: writing `/mnt/lustre/d0.dd-client-12-ib/dd-file': No space left on device
      805559+0 records in
      805558+0 records out
      + '[' 1 -eq 0 ']'
      ++ date '+%F %H:%M:%S'
      + echoerr '2011-09-01 08:09:10: dd failed'
      + echo '2011-09-01 08:09:10: dd failed'
      2011-09-01 08:09:10: dd failed
      

      Syslog on client-12-ib showed:

      Sep  1 08:09:10 client-12 kernel: Lustre: DEBUG MARKER: Checking clients are in FULL state before doing next failover
      Sep  1 08:09:10 client-12 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.4.135@o2ib. The ost_write operation failed with -28
      Sep  1 08:09:10 client-12 kernel: LustreError: 26714:0:(vvp_io.c:1001:vvp_io_commit_write()) Write page 805558 of inode ffff880312852a38 failed -28
      

      The dd error and syslog on client-12-ib showed that "No space left on device" on the OSS node fat-amd-4 (192.168.4.135), however, df on that node showed that:

      [root@fat-amd-4-ib ~]# lctl dl
        0 UP mgc MGC192.168.4.132@o2ib f176795b-1295-66f4-d018-e6c09ba5b112 5
        1 UP ost OSS OSS_uuid 3
        2 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 21
        3 UP obdfilter lustre-OST0003 lustre-OST0003_UUID 21
        4 UP obdfilter lustre-OST0005 lustre-OST0005_UUID 21
      
      [root@fat-amd-4-ib ~]# df
      Filesystem           1K-blocks      Used Available Use% Mounted on
      /dev/sda1             20642428   2223180  17370672  12% /
      tmpfs                  8165072         0   8165072   0% /dev/shm
      /dev/sdk              15939896   6847620   8292276  46% /mnt/ost2
      /dev/sdg              15939896    460468  14679428   4% /mnt/ost4
      /dev/sdh              15939896    463104  14676792   4% /mnt/ost6
      

      "lfs df" and "lfs df -i" on the client nodes also showed that there were free enough spaces and inodes on the OSTs. What's more, for each load (dd, tar, dbench, iozone) running on different client node, the load dir would always be removed after a successful run and before a new run (please refer to the recovery-mds-scale.log_run_

      {dd,tar,dbench,iozone}

      .sh-*.debug files), so there should not be ENOSPC error.

      Syslog on fat-amd-4-ib showed that:

      Sep  1 08:00:05 fat-amd-4-ib kernel: Pid: 4286, comm: ll_ost_io_127
      Sep  1 08:00:05 fat-amd-4-ib kernel:
      Sep  1 08:00:05 fat-amd-4-ib kernel: Call Trace:
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff81247e8f>] ? __generic_unplug_device+0x1f/0x40
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff814dc995>] schedule_timeout+0x215/0x2e0
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff814dc613>] wait_for_common+0x123/0x180
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff8105dc60>] ? default_wake_function+0x0/0x20
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff81249a5d>] ? submit_bio+0x8d/0x120
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff814dc72d>] wait_for_completion+0x1d/0x20
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff8124cedd>] __blkdev_issue_flush+0xad/0xe0
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff8124cf26>] blkdev_issue_flush+0x16/0x20
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa041f8ab>] ldiskfs_sync_file+0x17b/0x250 [ldiskfs]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa0a15865>] filter_sync+0x285/0x3e0 [obdfilter]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff8115a340>] ? cache_alloc_refill+0x1c0/0x240
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa04fe98c>] ? lprocfs_counter_add+0x12c/0x196 [lvfs]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa09cfe0c>] ost_blocking_ast+0x58c/0xe40 [ost]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa064db73>] ldlm_cancel_callback+0x63/0xf0 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa064dc59>] ldlm_lock_cancel+0x59/0x190 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa066fe36>] ldlm_request_cancel+0x256/0x420 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa09d9720>] ost_handle+0x3d60/0x4b90 [ost]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa049b6b1>] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa0694104>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa069684c>] ? lustre_msg_get_status+0x3c/0xa0 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa06a4c7e>] ptlrpc_main+0xb8e/0x1900 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa06a40f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa06a40f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffffa06a40f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
      Sep  1 08:00:05 fat-amd-4-ib kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
      Sep  1 08:00:05 fat-amd-4-ib kernel:
      Sep  1 08:00:05 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889205.4286
      Sep  1 08:00:05 fat-amd-4-ib kernel: Lustre: Service thread pid 4210 was inactive for 156.04s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      Sep  1 08:00:05 fat-amd-4-ib kernel: Lustre: Skipped 2 previous similar messages
      Sep  1 08:00:05 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889205.4270
      Sep  1 08:00:05 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889205.4269
      Sep  1 08:00:06 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889206.4237
      Sep  1 08:00:06 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889206.4267
      Sep  1 08:00:06 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889206.4283
      Sep  1 08:00:12 fat-amd-4-ib kernel: Lustre: Service thread pid 4218 was inactive for 156.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      Sep  1 08:00:12 fat-amd-4-ib kernel: Lustre: Skipped 12 previous similar messages
      Sep  1 08:00:12 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889212.4218
      Sep  1 08:00:15 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889215.4285
      Sep  1 08:00:18 fat-amd-4-ib kernel: Lustre: Service thread pid 4201 was inactive for 156.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      Sep  1 08:00:18 fat-amd-4-ib kernel: Lustre: Skipped 2 previous similar messages
      Sep  1 08:00:18 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889218.4201
      Sep  1 08:00:25 fat-amd-4-ib kernel: Lustre: Service thread pid 4204 completed after 177.20s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      Sep  1 08:00:25 fat-amd-4-ib kernel: Lustre: Skipped 27 previous similar messages
      Sep  1 08:00:35 fat-amd-4-ib kernel: Lustre: Service thread pid 4249 was inactive for 156.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
      Sep  1 08:00:35 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889235.4249
      Sep  1 08:00:36 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889236.4275
      Sep  1 08:00:43 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889243.4222
      Sep  1 08:00:48 fat-amd-4-ib kernel: Lustre: Service thread pid 4226 completed after 200.23s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      

      Maloo report: https://maloo.whamcloud.com/test_sets/b4bac0f4-d523-11e0-8d02-52540025f9af

      Since the size of all of the log files is large, I could not upload them to this ticket. Please find /scratch/logs/2.1.0/recovery-mds-scale.1314890151.log.tar.bz2 on node brent.whamcloud.com.

      Attachments

        Issue Links

          Activity

            [LU-657] recovery-mds-scale (FLAVOR=MDS): client load failed

            oh, thanks!

            OSTSIZE is 2G, and it's the same situation of insufficient disk space at OST

            hongchao.zhang Hongchao Zhang added a comment - oh, thanks! OSTSIZE is 2G, and it's the same situation of insufficient disk space at OST
            yujian Jian Yu added a comment -

            what is the OSTSIZE for this test? the dd stopped at about 1.7G.

            From the above Maloo report, we can enter "go to session" and find the autotest configuration info in lustre-initialization-1:

            export OSTSIZE=2097152
            
            yujian Jian Yu added a comment - what is the OSTSIZE for this test? the dd stopped at about 1.7G. From the above Maloo report, we can enter "go to session" and find the autotest configuration info in lustre-initialization-1: export OSTSIZE=2097152

            what is the OSTSIZE for this test? the dd stopped at about 1.7G.

            hongchao.zhang Hongchao Zhang added a comment - what is the OSTSIZE for this test? the dd stopped at about 1.7G.
            yujian Jian Yu added a comment -

            Lustre Version: v2_3_0_RC1
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/24
            Distro/Arch: RHEL5.8/x86_64(client), RHEL6.3/x86_64(server)

            recovery-mds-scale test failed with out of space issue again:
            https://maloo.whamcloud.com/test_sets/50d94020-068b-11e2-9e80-52540035b04c

            yujian Jian Yu added a comment - Lustre Version: v2_3_0_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/24 Distro/Arch: RHEL5.8/x86_64(client), RHEL6.3/x86_64(server) recovery-mds-scale test failed with out of space issue again: https://maloo.whamcloud.com/test_sets/50d94020-068b-11e2-9e80-52540035b04c
            yujian Jian Yu added a comment -

            I just updated http://review.whamcloud.com/#change,4002 with "ostsizegb=6" to see whether the recovery-mds-scale test will hit the out of space error or not.

            Hi Chris,
            Could you please check whether "ostsizegb" test parameter works or not? In the latest run for the above patch, although I set "ostsizegb=6", I still found "OSTSIZE=4089446" was used.

            yujian Jian Yu added a comment - I just updated http://review.whamcloud.com/#change,4002 with "ostsizegb=6" to see whether the recovery-mds-scale test will hit the out of space error or not. Hi Chris, Could you please check whether "ostsizegb" test parameter works or not? In the latest run for the above patch, although I set "ostsizegb=6", I still found "OSTSIZE=4089446" was used.
            pjones Peter Jones added a comment -

            Dropping priority as this is a test only issue. Once the test correction is finalized we should land it to master and we may also land it to b2_3 if we have another RC

            pjones Peter Jones added a comment - Dropping priority as this is a test only issue. Once the test correction is finalized we should land it to master and we may also land it to b2_3 if we have another RC
            yujian Jian Yu added a comment -

            I went and found the original logs and the values are;

            MDSSIZE=2097152
            OSTSIZE=4089446

            I just checked the run_dd.sh script and found it always created/deleted a file with 4000000k bytes:

            dd bs=4k count=1000000 status=noxfer if=/dev/zero of=$TESTDIR/dd-file 1>$LOG &
            

            The OSTSIZE with 4089446k bytes is not enough for a 4000000k bytes file with stripe count 0.

            I just updated http://review.whamcloud.com/#change,4002 with "ostsizegb=6" to see whether the recovery-mds-scale test will hit the out of space error or not.

            yujian Jian Yu added a comment - I went and found the original logs and the values are; MDSSIZE=2097152 OSTSIZE=4089446 I just checked the run_dd.sh script and found it always created/deleted a file with 4000000k bytes: dd bs=4k count=1000000 status=noxfer if=/dev/zero of=$TESTDIR/dd-file 1>$LOG & The OSTSIZE with 4089446k bytes is not enough for a 4000000k bytes file with stripe count 0. I just updated http://review.whamcloud.com/#change,4002 with "ostsizegb=6" to see whether the recovery-mds-scale test will hit the out of space error or not.
            hongchao.zhang Hongchao Zhang added a comment - the patch is tracked at http://review.whamcloud.com/#change,4064

            this explains the issue, the default stripe count is 1 and the OSTSIZE is 4G, the -ENOSPC occurs just at writing 3.5G,
            and the remaining disk space is 0.3G (free: 246771712 avail: 37392384 grant 37253120 left: 4096 pending: 0).

            then this is the normal case, and it can be fixed by modifying the replay-mds-scale to increase default stripe count or enlarge OST size.

            hongchao.zhang Hongchao Zhang added a comment - this explains the issue, the default stripe count is 1 and the OSTSIZE is 4G, the -ENOSPC occurs just at writing 3.5G, and the remaining disk space is 0.3G (free: 246771712 avail: 37392384 grant 37253120 left: 4096 pending: 0). then this is the normal case, and it can be fixed by modifying the replay-mds-scale to increase default stripe count or enlarge OST size.

            The really need to make this info easier to access, we have a ticket about this.

            I went and found the original logs and the values are;

            MDSSIZE=2097152
            OSTSIZE=4089446

            I don't specify the stripe count.

            config was this:

            09:00:57:cat /root/autotest_config.sh
            09:00:57:#!/bin/bash
            09:00:57:#Auto Generated By Whamcloud Autotest
            09:00:57:#Key Exports
            09:00:57:export mgs_HOST=fat-intel-3vm3
            09:00:58:export mds_HOST=fat-intel-3vm3
            09:00:58:export MGSDEV=/dev/lvm-MDS/P1
            09:00:58:export MDSDEV=/dev/lvm-MDS/P1
            09:00:58:export mds1_HOST=fat-intel-3vm3
            09:00:58:export MDSDEV1=/dev/lvm-MDS/P1
            09:00:58:export MDSCOUNT=1
            09:00:58:export MDSSIZE=2097152
            09:00:58:export MGSSIZE=2097152
            09:00:58:export MDSFSTYPE=ldiskfs
            09:00:58:export MGSFSTYPE=ldiskfs
            09:00:58:export mdsfailover_HOST=fat-intel-3vm7
            09:00:58:export mds1failover_HOST=fat-intel-3vm7
            09:00:58:export MGSNID=fat-intel-3vm3:fat-intel-3vm7
            09:00:58:export FAILURE_MODE=HARD
            09:00:58:export POWER_DOWN="pm -h powerman --off"
            09:00:58:export POWER_UP="pm -h powerman --on"
            09:00:58:export ost_HOST=fat-intel-3vm4
            09:00:58:export ostfailover_HOST=fat-intel-3vm8
            09:00:58:export ost1_HOST=fat-intel-3vm4
            09:00:58:export OSTDEV1=/dev/lvm-OSS/P1
            09:00:58:export ost1failover_HOST=fat-intel-3vm8
            09:00:58:export ost2_HOST=fat-intel-3vm4
            09:00:58:export OSTDEV2=/dev/lvm-OSS/P2
            09:00:58:export ost2failover_HOST=fat-intel-3vm8
            09:00:58:export ost3_HOST=fat-intel-3vm4
            09:00:58:export OSTDEV3=/dev/lvm-OSS/P3
            09:00:58:export ost3failover_HOST=fat-intel-3vm8
            09:00:58:export ost4_HOST=fat-intel-3vm4
            09:00:58:export OSTDEV4=/dev/lvm-OSS/P4
            09:00:58:export ost4failover_HOST=fat-intel-3vm8
            09:00:58:export ost5_HOST=fat-intel-3vm4
            09:00:58:export OSTDEV5=/dev/lvm-OSS/P5
            09:00:58:export ost5failover_HOST=fat-intel-3vm8
            09:00:58:export ost6_HOST=fat-intel-3vm4
            09:00:58:export OSTDEV6=/dev/lvm-OSS/P6
            09:00:58:export ost6failover_HOST=fat-intel-3vm8
            09:00:58:export ost7_HOST=fat-intel-3vm4
            09:00:58:export OSTDEV7=/dev/lvm-OSS/P7
            09:00:58:export ost7failover_HOST=fat-intel-3vm8
            09:00:58:# some setup for conf-sanity test 24a, 24b, 33a
            09:00:58:export fs2mds_DEV=/dev/lvm-MDS/S1
            09:00:58:export fs2ost_DEV=/dev/lvm-OSS/S1
            09:00:58:export fs3ost_DEV=/dev/lvm-OSS/S2
            09:00:58:export RCLIENTS="fat-intel-3vm6 fat-intel-3vm5"
            09:00:58:export OSTCOUNT=7
            09:00:58:export NETTYPE=tcp
            09:00:59:export OSTSIZE=4089446
            09:00:59:export OSTFSTYPE=ldiskfs
            09:00:59:export FSTYPE=ldiskfs
            09:00:59:export SHARED_DIRECTORY=/home/autotest/.autotest/shared_dir/2012-09-17/080224-7f44ee54dbb8
            09:00:59:export SLOW=yes
            09:00:59:VERBOSE=true
            09:00:59:
            09:00:59:# Entries above here come are created by configurecluster.rb
            09:00:59:# Entries below here come from mecturk.h
            09:00:59:FSNAME=lustre
            09:00:59:
            09:00:59:TMP=${TMP:-/tmp}
            09:00:59:
            09:00:59:DAEMONSIZE=${DAEMONSIZE:-500}
            09:00:59:
            09:00:59:MDSOPT=${MDSOPT:-""}
            09:00:59:MGSOPT=${MGSOPT:-""}
            09:00:59:
            09:00:59:# sgpdd-survey requires these to be set. They apprarently have no side affect.
            09:00:59:SGPDD_YES=true
            09:00:59:REFORMAT=true
            09:00:59:
            09:00:59:# some bits for liblustre tcp connecttions
            09:00:59:export LNET_ACCEPT_PORT=7988
            09:00:59:export ACCEPTOR_PORT=7988
            09:00:59:
            09:00:59:OSTOPT=${OSTOPT:-""}
            09:00:59:
            09:00:59:STRIPE_BYTES=${STRIPE_BYTES:-1048576}
            09:00:59:STRIPES_PER_OBJ=${STRIPES_PER_OBJ:-0}
            09:00:59:SINGLEMDS=${SINGLEMDS:-"mds1"}
            09:00:59:TIMEOUT=${TIMEOUT:-20}
            09:00:59:PTLDEBUG=${PTLDEBUG:-0x33f0404}
            09:00:59:DEBUG_SIZE=${DEBUG_SIZE:-32}
            09:00:59:SUBSYSTEM=${SUBSYSTEM:- 0xffb7e3ff}
            09:00:59:
            09:00:59:MKFSOPT=""
            09:00:59:MOUNTOPT=""
            09:00:59:[ "x$MDSJOURNALSIZE" != "x" ] &&
            09:00:59: MKFSOPT=$MKFSOPT" -J size=$MDSJOURNALSIZE"
            09:00:59:[ "x$MDSISIZE" != "x" ] &&
            09:00:59: MKFSOPT=$MKFSOPT" -i $MDSISIZE"
            09:00:59:[ "x$MKFSOPT" != "x" ] &&
            09:00:59: MKFSOPT="--mkfsoptions=\\\"$MKFSOPT\\\""
            09:00:59:[ "x$MDSCAPA" != "x" ] &&
            09:00:59: MKFSOPT="--param mdt.capa=$MDSCAPA"
            09:00:59:[ "$MDSFSTYPE" = "ldiskfs" ] &&
            09:00:59: MDSOPT=$MDSOPT" --mountfsoptions=errors=remount-ro,iopen_nopriv,user_xattr,acl"
            09:00:59:[ "x$mdsfailover_HOST" != "x" ] &&
            09:00:59: MDSOPT=$MDSOPT" --failnode=`h2$NETTYPE $mdsfailover_HOST`"
            09:00:59:[ "x$STRIPE_BYTES" != "x" ] &&
            09:00:59: MOUNTOPT=$MOUNTOPT" --param lov.stripesize=$STRIPE_BYTES"
            09:00:59:[ "x$STRIPES_PER_OBJ" != "x" ] &&
            09:00:59: MOUNTOPT=$MOUNTOPT" --param lov.stripecount=$STRIPES_PER_OBJ"
            09:00:59:[ "x$L_GETIDENTITY" != "x" ] &&
            09:00:59: MOUNTOPT=$MOUNTOPT" --param mdt.identity_upcall=$L_GETIDENTITY"
            09:00:59:
            09:00:59:MDS_MKFS_OPTS="--mdt --fsname=$FSNAME $MKFSOPT $MDSOPT"
            09:00:59:[ "$MDSFSTYPE" = "ldiskfs" ] &&
            09:00:59: MDS_MKFS_OPTS=$MDS_MKFS_OPTS" --param sys.timeout=$TIMEOUT --device-size=$MDSSIZE"
            09:00:59:[ "$MDSFSTYPE" = "zfs" ] &&
            09:00:59: MDS_MKFS_OPTS=$MDS_MKFS_OPTS" --vdev-size=$MDSSIZE"
            09:00:59:
            09:00:59:if combined_mgs_mds ; then
            09:00:59: [ "$MDSCOUNT" = "1" ] &&
            09:00:59: MDS_MKFS_OPTS="--mgs $MDS_MKFS_OPTS"
            09:00:59:else
            09:00:59: MDS_MKFS_OPTS="--mgsnode=$MGSNID $MDS_MKFS_OPTS"
            09:00:59: [ "$MGSFSTYPE" = "ldiskfs" ] &&
            09:00:59: MGS_MKFS_OPTS="--mgs --device-size=$MGSSIZE"
            09:00:59: [ "$MGSFSTYPE" = "zfs" ] &&
            09:00:59: MGS_MKFS_OPTS="--mgs --vdev-size=$MGSSIZE"
            09:00:59:fi
            09:00:59:
            09:00:59:if [ "$MDSDEV1" != "$MGSDEV" ]; then
            09:00:59: if [ "$MGSFSTYPE" == "ldiskfs" ]; then
            09:00:59: MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-"-o loop"}
            09:00:59: else
            09:00:59: MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-""}
            09:00:59: fi
            09:00:59:else
            09:00:59: MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-$MDS_MOUNT_OPTS}
            09:00:59:fi
            09:00:59:
            09:00:59:MKFSOPT=""
            09:00:59:MOUNTOPT=""
            09:00:59:[ "x$OSTJOURNALSIZE" != "x" ] &&
            09:00:59: MKFSOPT=$MKFSOPT" -J size=$OSTJOURNALSIZE"
            09:00:59:[ "x$MKFSOPT" != "x" ] &&
            09:00:59: MKFSOPT="--mkfsoptions=\\\"$MKFSOPT\\\""
            09:00:59:[ "x$OSSCAPA" != "x" ] &&
            09:00:59: MKFSOPT="--param ost.capa=$OSSCAPA"
            09:00:59:[ "x$ostfailover_HOST" != "x" ] &&
            09:00:59: OSTOPT=$OSTOPT" --failnode=`h2$NETTYPE $ostfailover_HOST`"
            09:00:59:
            09:00:59:OST_MKFS_OPTS="--ost --fsname=$FSNAME --mgsnode=$MGSNID $MKFSOPT $OSTOPT"
            09:00:59:[ "$OSTFSTYPE" = "ldiskfs" ] &&
            09:00:59: OST_MKFS_OPTS=$OST_MKFS_OPTS" --param sys.timeout=$TIMEOUT --device-size=$OSTSIZE"
            09:00:59:[ "$OSTFSTYPE" = "zfs" ] &&
            09:00:59: OST_MKFS_OPTS=$OST_MKFS_OPTS" --vdev-size=$OSTSIZE"
            09:01:00:
            09:01:00:MDS_MOUNT_OPTS=${MDS_MOUNT_OPTS:-"-o user_xattr,acl"}
            09:01:00:OST_MOUNT_OPTS=${OST_MOUNT_OPTS:-""}
            09:01:00:
            09:01:00:# TT-430
            09:01:00:SERVER_FAILOVER_PERIOD=$((60 * 15))
            09:01:00:
            09:01:00:#RUNAS_ID=840000017
            09:01:00:#client
            09:01:00:MOUNT=${MOUNT:-/mnt/${FSNAME}}
            09:01:00:MOUNT1=${MOUNT1:-$MOUNT}
            09:01:00:MOUNT2=${MOUNT2:-${MOUNT}2}
            09:01:00:MOUNTOPT=${MOUNTOPT:-"-o user_xattr,acl,flock"}
            09:01:00:[ "x$RMTCLIENT" != "x" ] &&
            09:01:00: MOUNTOPT=$MOUNTOPT",remote_client"
            09:01:00:DIR=${DIR:-$MOUNT}
            09:01:00:DIR1=${DIR:-$MOUNT1}
            09:01:00:DIR2=${DIR2:-$MOUNT2}
            09:01:00:
            09:01:00:if [ $UID -ne 0 ]; then
            09:01:00: log "running as non-root uid $UID"
            09:01:00: RUNAS_ID="$UID"
            09:01:00: RUNAS_GID=`id -g $USER`
            09:01:00: RUNAS=""
            09:01:00:else
            09:01:00: RUNAS_ID=${RUNAS_ID:-500}
            09:01:00: RUNAS_GID=${RUNAS_GID:-$RUNAS_ID}
            09:01:00: RUNAS=${RUNAS:-"runas -u $RUNAS_ID"}
            09:01:00:fi
            09:01:00:
            09:01:00:PDSH="pdsh -t 120 -S -Rrsh -w"
            09:01:00:export RSYNC_RSH=rsh
            09:01:00:FAILURE_MODE=${FAILURE_MODE:-SOFT} # or HARD
            09:01:00:POWER_DOWN=${POWER_DOWN:-"powerman --off"}
            09:01:00:POWER_UP=${POWER_UP:-"powerman --on"}
            09:01:00:SLOW=${SLOW:-no}
            09:01:00:FAIL_ON_ERROR=${FAIL_ON_ERROR:-true}
            09:01:00:
            09:01:01:# error: conf_param: No such device" issue in every test suite logs
            09:01:01:# sanity-quota test_32 hash_lqs_cur_bits isn't set properly
            09:01:01:QUOTA_TYPE=${QUOTA_TYPE:-"ug3"}
            09:01:01:QUOTA_USERS=${QUOTA_USERS:-"quota_usr quota_2usr sanityusr sanityusr1"}
            09:01:01:LQUOTAOPTS=${LQUOTAOPTS:-"hash_lqs_cur_bits=3"}
            09:01:01:
            09:01:01:# SKIP: parallel-scale test_compilebench compilebench not found
            09:01:01:# SKIP: parallel-scale test_connectathon connectathon dir not found
            09:01:01:# ------
            09:01:01:cbench_DIR=/usr/bin
            09:01:01:cnt_DIR=/opt/connectathon
            09:01:01:
            09:01:01:MPIRUN=$(which mpirun 2>/dev/null) || true
            09:01:01:MPIRUN_OPTIONS="-mca boot ssh"
            09:01:01:MPI_USER=${MPI_USER:-mpiuser}
            09:01:01:SINGLECLIENT=$(hostname)
            09:01:01:#cbench_DIR=/data/src/benchmarks/compilebench.hg
            09:01:01:#cnt_DIR=/data/src/benchmarks/cthon04
            09:01:01:
            09:01:01:# For multiple clients testing, we need use the cfg/ncli.sh config file, and
            09:01:01:# only need specify the "RCLIENTS" variable. The "CLIENTS" and "CLIENTCOUNT"
            09:01:01:# variables are defined in init_clients_lists(), which is called from cfg/ncli.sh.
            09:01:01:# So, if we add the contents of cfg/ncli.sh into autotest_config.sh, we would not
            09:01:01:# need specify "CLIENTS" and "CLIENTCOUNT", and the above two issues (#3 and #4) would also be fixed.
            09:01:01:# Start of contents of cfg/ncli.sh
            09:01:01:CLIENT1=${CLIENT1:-`hostname`}
            09:01:01:SINGLECLIENT=$CLIENT1
            09:01:01:RCLIENTS=${RCLIENTS:-""}
            09:01:01:
            09:01:01:init_clients_lists
            09:01:01:
            09:01:01:[ -n "$RCLIENTS" -a "$PDSH" = "no_dsh" ] && \
            09:01:01: error "tests for remote clients $RCLIENTS needs pdsh != do_dsh " || true
            09:01:01:
            09:01:01:[ -n "$FUNCTIONS" ] && . $FUNCTIONS || true
            09:01:01:
            09:01:01:# for recovery scale tests
            09:01:01:# default boulder cluster iozone location
            09:01:01:export PATH=/opt/iozone/bin:$PATH
            09:01:01:
            09:01:01:LOADS=${LOADS:-"dd tar dbench iozone"}
            09:01:01:for i in $LOADS; do
            09:01:01: [ -f $LUSTRE/tests/run_${i}.sh ] || \
            09:01:01: error "incorrect load: $i"
            09:01:01:done
            09:01:01:CLIENT_LOADS=($LOADS)
            09:01:01:# End of contents of cfg/ncli.sh
            09:01:01:

            chris Chris Gearing (Inactive) added a comment - The really need to make this info easier to access, we have a ticket about this. I went and found the original logs and the values are; MDSSIZE=2097152 OSTSIZE=4089446 I don't specify the stripe count. config was this: 09:00:57:cat /root/autotest_config.sh 09:00:57:#!/bin/bash 09:00:57:#Auto Generated By Whamcloud Autotest 09:00:57:#Key Exports 09:00:57:export mgs_HOST=fat-intel-3vm3 09:00:58:export mds_HOST=fat-intel-3vm3 09:00:58:export MGSDEV=/dev/lvm-MDS/P1 09:00:58:export MDSDEV=/dev/lvm-MDS/P1 09:00:58:export mds1_HOST=fat-intel-3vm3 09:00:58:export MDSDEV1=/dev/lvm-MDS/P1 09:00:58:export MDSCOUNT=1 09:00:58:export MDSSIZE=2097152 09:00:58:export MGSSIZE=2097152 09:00:58:export MDSFSTYPE=ldiskfs 09:00:58:export MGSFSTYPE=ldiskfs 09:00:58:export mdsfailover_HOST=fat-intel-3vm7 09:00:58:export mds1failover_HOST=fat-intel-3vm7 09:00:58:export MGSNID=fat-intel-3vm3:fat-intel-3vm7 09:00:58:export FAILURE_MODE=HARD 09:00:58:export POWER_DOWN="pm -h powerman --off" 09:00:58:export POWER_UP="pm -h powerman --on" 09:00:58:export ost_HOST=fat-intel-3vm4 09:00:58:export ostfailover_HOST=fat-intel-3vm8 09:00:58:export ost1_HOST=fat-intel-3vm4 09:00:58:export OSTDEV1=/dev/lvm-OSS/P1 09:00:58:export ost1failover_HOST=fat-intel-3vm8 09:00:58:export ost2_HOST=fat-intel-3vm4 09:00:58:export OSTDEV2=/dev/lvm-OSS/P2 09:00:58:export ost2failover_HOST=fat-intel-3vm8 09:00:58:export ost3_HOST=fat-intel-3vm4 09:00:58:export OSTDEV3=/dev/lvm-OSS/P3 09:00:58:export ost3failover_HOST=fat-intel-3vm8 09:00:58:export ost4_HOST=fat-intel-3vm4 09:00:58:export OSTDEV4=/dev/lvm-OSS/P4 09:00:58:export ost4failover_HOST=fat-intel-3vm8 09:00:58:export ost5_HOST=fat-intel-3vm4 09:00:58:export OSTDEV5=/dev/lvm-OSS/P5 09:00:58:export ost5failover_HOST=fat-intel-3vm8 09:00:58:export ost6_HOST=fat-intel-3vm4 09:00:58:export OSTDEV6=/dev/lvm-OSS/P6 09:00:58:export ost6failover_HOST=fat-intel-3vm8 09:00:58:export ost7_HOST=fat-intel-3vm4 09:00:58:export OSTDEV7=/dev/lvm-OSS/P7 09:00:58:export ost7failover_HOST=fat-intel-3vm8 09:00:58:# some setup for conf-sanity test 24a, 24b, 33a 09:00:58:export fs2mds_DEV=/dev/lvm-MDS/S1 09:00:58:export fs2ost_DEV=/dev/lvm-OSS/S1 09:00:58:export fs3ost_DEV=/dev/lvm-OSS/S2 09:00:58:export RCLIENTS="fat-intel-3vm6 fat-intel-3vm5" 09:00:58:export OSTCOUNT=7 09:00:58:export NETTYPE=tcp 09:00:59:export OSTSIZE=4089446 09:00:59:export OSTFSTYPE=ldiskfs 09:00:59:export FSTYPE=ldiskfs 09:00:59:export SHARED_DIRECTORY=/home/autotest/.autotest/shared_dir/2012-09-17/080224-7f44ee54dbb8 09:00:59:export SLOW=yes 09:00:59:VERBOSE=true 09:00:59: 09:00:59:# Entries above here come are created by configurecluster.rb 09:00:59:# Entries below here come from mecturk.h 09:00:59:FSNAME=lustre 09:00:59: 09:00:59:TMP=${TMP:-/tmp} 09:00:59: 09:00:59:DAEMONSIZE=${DAEMONSIZE:-500} 09:00:59: 09:00:59:MDSOPT=${MDSOPT:-""} 09:00:59:MGSOPT=${MGSOPT:-""} 09:00:59: 09:00:59:# sgpdd-survey requires these to be set. They apprarently have no side affect. 09:00:59:SGPDD_YES=true 09:00:59:REFORMAT=true 09:00:59: 09:00:59:# some bits for liblustre tcp connecttions 09:00:59:export LNET_ACCEPT_PORT=7988 09:00:59:export ACCEPTOR_PORT=7988 09:00:59: 09:00:59:OSTOPT=${OSTOPT:-""} 09:00:59: 09:00:59:STRIPE_BYTES=${STRIPE_BYTES:-1048576} 09:00:59:STRIPES_PER_OBJ=${STRIPES_PER_OBJ:-0} 09:00:59:SINGLEMDS=${SINGLEMDS:-"mds1"} 09:00:59:TIMEOUT=${TIMEOUT:-20} 09:00:59:PTLDEBUG=${PTLDEBUG:-0x33f0404} 09:00:59:DEBUG_SIZE=${DEBUG_SIZE:-32} 09:00:59:SUBSYSTEM=${SUBSYSTEM:- 0xffb7e3ff} 09:00:59: 09:00:59:MKFSOPT="" 09:00:59:MOUNTOPT="" 09:00:59:[ "x$MDSJOURNALSIZE" != "x" ] && 09:00:59: MKFSOPT=$MKFSOPT" -J size=$MDSJOURNALSIZE" 09:00:59:[ "x$MDSISIZE" != "x" ] && 09:00:59: MKFSOPT=$MKFSOPT" -i $MDSISIZE" 09:00:59:[ "x$MKFSOPT" != "x" ] && 09:00:59: MKFSOPT="--mkfsoptions=\\\"$MKFSOPT\\\"" 09:00:59:[ "x$MDSCAPA" != "x" ] && 09:00:59: MKFSOPT="--param mdt.capa=$MDSCAPA" 09:00:59:[ "$MDSFSTYPE" = "ldiskfs" ] && 09:00:59: MDSOPT=$MDSOPT" --mountfsoptions=errors=remount-ro,iopen_nopriv,user_xattr,acl" 09:00:59:[ "x$mdsfailover_HOST" != "x" ] && 09:00:59: MDSOPT=$MDSOPT" --failnode=`h2$NETTYPE $mdsfailover_HOST`" 09:00:59:[ "x$STRIPE_BYTES" != "x" ] && 09:00:59: MOUNTOPT=$MOUNTOPT" --param lov.stripesize=$STRIPE_BYTES" 09:00:59:[ "x$STRIPES_PER_OBJ" != "x" ] && 09:00:59: MOUNTOPT=$MOUNTOPT" --param lov.stripecount=$STRIPES_PER_OBJ" 09:00:59:[ "x$L_GETIDENTITY" != "x" ] && 09:00:59: MOUNTOPT=$MOUNTOPT" --param mdt.identity_upcall=$L_GETIDENTITY" 09:00:59: 09:00:59:MDS_MKFS_OPTS="--mdt --fsname=$FSNAME $MKFSOPT $MDSOPT" 09:00:59:[ "$MDSFSTYPE" = "ldiskfs" ] && 09:00:59: MDS_MKFS_OPTS=$MDS_MKFS_OPTS" --param sys.timeout=$TIMEOUT --device-size=$MDSSIZE" 09:00:59:[ "$MDSFSTYPE" = "zfs" ] && 09:00:59: MDS_MKFS_OPTS=$MDS_MKFS_OPTS" --vdev-size=$MDSSIZE" 09:00:59: 09:00:59:if combined_mgs_mds ; then 09:00:59: [ "$MDSCOUNT" = "1" ] && 09:00:59: MDS_MKFS_OPTS="--mgs $MDS_MKFS_OPTS" 09:00:59:else 09:00:59: MDS_MKFS_OPTS="--mgsnode=$MGSNID $MDS_MKFS_OPTS" 09:00:59: [ "$MGSFSTYPE" = "ldiskfs" ] && 09:00:59: MGS_MKFS_OPTS="--mgs --device-size=$MGSSIZE" 09:00:59: [ "$MGSFSTYPE" = "zfs" ] && 09:00:59: MGS_MKFS_OPTS="--mgs --vdev-size=$MGSSIZE" 09:00:59:fi 09:00:59: 09:00:59:if [ "$MDSDEV1" != "$MGSDEV" ]; then 09:00:59: if [ "$MGSFSTYPE" == "ldiskfs" ]; then 09:00:59: MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-"-o loop"} 09:00:59: else 09:00:59: MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-""} 09:00:59: fi 09:00:59:else 09:00:59: MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-$MDS_MOUNT_OPTS} 09:00:59:fi 09:00:59: 09:00:59:MKFSOPT="" 09:00:59:MOUNTOPT="" 09:00:59:[ "x$OSTJOURNALSIZE" != "x" ] && 09:00:59: MKFSOPT=$MKFSOPT" -J size=$OSTJOURNALSIZE" 09:00:59:[ "x$MKFSOPT" != "x" ] && 09:00:59: MKFSOPT="--mkfsoptions=\\\"$MKFSOPT\\\"" 09:00:59:[ "x$OSSCAPA" != "x" ] && 09:00:59: MKFSOPT="--param ost.capa=$OSSCAPA" 09:00:59:[ "x$ostfailover_HOST" != "x" ] && 09:00:59: OSTOPT=$OSTOPT" --failnode=`h2$NETTYPE $ostfailover_HOST`" 09:00:59: 09:00:59:OST_MKFS_OPTS="--ost --fsname=$FSNAME --mgsnode=$MGSNID $MKFSOPT $OSTOPT" 09:00:59:[ "$OSTFSTYPE" = "ldiskfs" ] && 09:00:59: OST_MKFS_OPTS=$OST_MKFS_OPTS" --param sys.timeout=$TIMEOUT --device-size=$OSTSIZE" 09:00:59:[ "$OSTFSTYPE" = "zfs" ] && 09:00:59: OST_MKFS_OPTS=$OST_MKFS_OPTS" --vdev-size=$OSTSIZE" 09:01:00: 09:01:00:MDS_MOUNT_OPTS=${MDS_MOUNT_OPTS:-"-o user_xattr,acl"} 09:01:00:OST_MOUNT_OPTS=${OST_MOUNT_OPTS:-""} 09:01:00: 09:01:00:# TT-430 09:01:00:SERVER_FAILOVER_PERIOD=$((60 * 15)) 09:01:00: 09:01:00:#RUNAS_ID=840000017 09:01:00:#client 09:01:00:MOUNT=${MOUNT:-/mnt/${FSNAME}} 09:01:00:MOUNT1=${MOUNT1:-$MOUNT} 09:01:00:MOUNT2=${MOUNT2:-${MOUNT}2} 09:01:00:MOUNTOPT=${MOUNTOPT:-"-o user_xattr,acl,flock"} 09:01:00:[ "x$RMTCLIENT" != "x" ] && 09:01:00: MOUNTOPT=$MOUNTOPT",remote_client" 09:01:00:DIR=${DIR:-$MOUNT} 09:01:00:DIR1=${DIR:-$MOUNT1} 09:01:00:DIR2=${DIR2:-$MOUNT2} 09:01:00: 09:01:00:if [ $UID -ne 0 ]; then 09:01:00: log "running as non-root uid $UID" 09:01:00: RUNAS_ID="$UID" 09:01:00: RUNAS_GID=`id -g $USER` 09:01:00: RUNAS="" 09:01:00:else 09:01:00: RUNAS_ID=${RUNAS_ID:-500} 09:01:00: RUNAS_GID=${RUNAS_GID:-$RUNAS_ID} 09:01:00: RUNAS=${RUNAS:-"runas -u $RUNAS_ID"} 09:01:00:fi 09:01:00: 09:01:00:PDSH="pdsh -t 120 -S -Rrsh -w" 09:01:00:export RSYNC_RSH=rsh 09:01:00:FAILURE_MODE=${FAILURE_MODE:-SOFT} # or HARD 09:01:00:POWER_DOWN=${POWER_DOWN:-"powerman --off"} 09:01:00:POWER_UP=${POWER_UP:-"powerman --on"} 09:01:00:SLOW=${SLOW:-no} 09:01:00:FAIL_ON_ERROR=${FAIL_ON_ERROR:-true} 09:01:00: 09:01:01:# error: conf_param: No such device" issue in every test suite logs 09:01:01:# sanity-quota test_32 hash_lqs_cur_bits isn't set properly 09:01:01:QUOTA_TYPE=${QUOTA_TYPE:-"ug3"} 09:01:01:QUOTA_USERS=${QUOTA_USERS:-"quota_usr quota_2usr sanityusr sanityusr1"} 09:01:01:LQUOTAOPTS=${LQUOTAOPTS:-"hash_lqs_cur_bits=3"} 09:01:01: 09:01:01:# SKIP: parallel-scale test_compilebench compilebench not found 09:01:01:# SKIP: parallel-scale test_connectathon connectathon dir not found 09:01:01:# ------ 09:01:01:cbench_DIR=/usr/bin 09:01:01:cnt_DIR=/opt/connectathon 09:01:01: 09:01:01:MPIRUN=$(which mpirun 2>/dev/null) || true 09:01:01:MPIRUN_OPTIONS="-mca boot ssh" 09:01:01:MPI_USER=${MPI_USER:-mpiuser} 09:01:01:SINGLECLIENT=$(hostname) 09:01:01:#cbench_DIR=/data/src/benchmarks/compilebench.hg 09:01:01:#cnt_DIR=/data/src/benchmarks/cthon04 09:01:01: 09:01:01:# For multiple clients testing, we need use the cfg/ncli.sh config file, and 09:01:01:# only need specify the "RCLIENTS" variable. The "CLIENTS" and "CLIENTCOUNT" 09:01:01:# variables are defined in init_clients_lists(), which is called from cfg/ncli.sh. 09:01:01:# So, if we add the contents of cfg/ncli.sh into autotest_config.sh, we would not 09:01:01:# need specify "CLIENTS" and "CLIENTCOUNT", and the above two issues (#3 and #4) would also be fixed. 09:01:01:# Start of contents of cfg/ncli.sh 09:01:01:CLIENT1=${CLIENT1:-`hostname`} 09:01:01:SINGLECLIENT=$CLIENT1 09:01:01:RCLIENTS=${RCLIENTS:-""} 09:01:01: 09:01:01:init_clients_lists 09:01:01: 09:01:01:[ -n "$RCLIENTS" -a "$PDSH" = "no_dsh" ] && \ 09:01:01: error "tests for remote clients $RCLIENTS needs pdsh != do_dsh " || true 09:01:01: 09:01:01:[ -n "$FUNCTIONS" ] && . $FUNCTIONS || true 09:01:01: 09:01:01:# for recovery scale tests 09:01:01:# default boulder cluster iozone location 09:01:01:export PATH=/opt/iozone/bin:$PATH 09:01:01: 09:01:01:LOADS=${LOADS:-"dd tar dbench iozone"} 09:01:01:for i in $LOADS; do 09:01:01: [ -f $LUSTRE/tests/run_${i}.sh ] || \ 09:01:01: error "incorrect load: $i" 09:01:01:done 09:01:01:CLIENT_LOADS=($LOADS) 09:01:01:# End of contents of cfg/ncli.sh 09:01:01:

            the recovery-mds-scale test run, Thanks Chris, Yujian!

            from the output, the OST does have no space,
            https://maloo.whamcloud.com/test_logs/9a8991da-00e5-11e2-860a-52540035b04c
            ...
            LustreError: 9656:0:(filter_io.c:787:filter_preprw_write()) retry after commit pending journals
            LustreError: 9656:0:(filter_io.c:820:filter_preprw_write()) lustre-OST0001: cli 70f1591f-2616-ae38-9797-12038e2d355c/ffff880037547c00 free: 246771712 avail: 37392384 grant 37253120 left: 4096 pending: 0 osfs_age 4295175522, current 4295175522
            ...

            Hi Chris, what is the disk size of "/dev/dm-1" in fat-intel-3vm4(OSS node), is it 2G? ( I just check it but it started to
            run another test). and 'dd' encountered the -ENOSPC(-28) at 3.5G (897749*4096), the stripe_count is 2?

            hongchao.zhang Hongchao Zhang added a comment - the recovery-mds-scale test run, Thanks Chris, Yujian! from the output, the OST does have no space, https://maloo.whamcloud.com/test_logs/9a8991da-00e5-11e2-860a-52540035b04c ... LustreError: 9656:0:(filter_io.c:787:filter_preprw_write()) retry after commit pending journals LustreError: 9656:0:(filter_io.c:820:filter_preprw_write()) lustre-OST0001: cli 70f1591f-2616-ae38-9797-12038e2d355c/ffff880037547c00 free: 246771712 avail: 37392384 grant 37253120 left: 4096 pending: 0 osfs_age 4295175522, current 4295175522 ... Hi Chris, what is the disk size of "/dev/dm-1" in fat-intel-3vm4(OSS node), is it 2G? ( I just check it but it started to run another test). and 'dd' encountered the -ENOSPC(-28) at 3.5G (897749*4096), the stripe_count is 2?

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: