<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:09:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-657] recovery-mds-scale (FLAVOR=MDS): client load failed</title>
                <link>https://jira.whamcloud.com/browse/LU-657</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After running for 8 hours (MDS failed over 51 times), recovery-mds-scale test failed as follows:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Starting mds1: -o user_xattr,acl  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds1
fat-amd-1-ib: debug=0x33f0404
fat-amd-1-ib: subsystem_debug=0xffb7e3ff
fat-amd-1-ib: debug_mb=48
Started lustre-MDT0000
==== Checking the clients loads AFTER  failover -- failure NOT OK
Client load failed on node client-12-ib, rc=1
Client load failed during failover. Exiting
Found the END_RUN_FILE file: /home/yujian/test_logs/end_run_file
client-12-ib
client-2-ib
Client load failed on node client-12-ib

client client-12-ib load stdout and debug files :
              /tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib
              /tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib.debug
2011-09-01 08:15:46 Terminating clients loads ...
Duration:                43200
Server failover period: 600 seconds
Exited after:           30428 seconds
Number of failovers before exit:
mds1: 51 times
ost1: 0 times
ost2: 0 times
ost3: 0 times
ost4: 0 times
ost5: 0 times
ost6: 0 times
Status: FAIL: rc=1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;/tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib.debug:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2011-09-01 08:07:51: dd succeeded
+ cd /tmp
+ rm -rf /mnt/lustre/d0.dd-client-12-ib
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-01 08:07:53: dd run finished&apos;
+ echo &apos;2011-09-01 08:07:53: dd run finished&apos;
2011-09-01 08:07:53: dd run finished
+ &apos;[&apos; &apos;!&apos; -e /home/yujian/test_logs/end_run_file &apos;]&apos;
+ true
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-01 08:07:53: dd run starting&apos;
+ echo &apos;2011-09-01 08:07:53: dd run starting&apos;
2011-09-01 08:07:53: dd run starting
+ mkdir -p /mnt/lustre/d0.dd-client-12-ib
+ cd /mnt/lustre/d0.dd-client-12-ib
+ load_pid=26714
+ wait 26714
+ dd bs=4k count=1000000 status=noxfer if=/dev/zero of=/mnt/lustre/d0.dd-client-12-ib/dd-file
dd: writing `/mnt/lustre/d0.dd-client-12-ib/dd-file&apos;: No space left on device
805559+0 records in
805558+0 records out
+ &apos;[&apos; 1 -eq 0 &apos;]&apos;
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-01 08:09:10: dd failed&apos;
+ echo &apos;2011-09-01 08:09:10: dd failed&apos;
2011-09-01 08:09:10: dd failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Syslog on client-12-ib showed:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep  1 08:09:10 client-12 kernel: Lustre: DEBUG MARKER: Checking clients are in FULL state before doing next failover
Sep  1 08:09:10 client-12 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.4.135@o2ib. The ost_write operation failed with -28
Sep  1 08:09:10 client-12 kernel: LustreError: 26714:0:(vvp_io.c:1001:vvp_io_commit_write()) Write page 805558 of inode ffff880312852a38 failed -28
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The dd error and syslog on client-12-ib showed that &quot;No space left on device&quot; on the OSS node fat-amd-4 (192.168.4.135), however, df on that node showed that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fat-amd-4-ib ~]# lctl dl
  0 UP mgc MGC192.168.4.132@o2ib f176795b-1295-66f4-d018-e6c09ba5b112 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 21
  3 UP obdfilter lustre-OST0003 lustre-OST0003_UUID 21
  4 UP obdfilter lustre-OST0005 lustre-OST0005_UUID 21

[root@fat-amd-4-ib ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             20642428   2223180  17370672  12% /
tmpfs                  8165072         0   8165072   0% /dev/shm
/dev/sdk              15939896   6847620   8292276  46% /mnt/ost2
/dev/sdg              15939896    460468  14679428   4% /mnt/ost4
/dev/sdh              15939896    463104  14676792   4% /mnt/ost6
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&quot;lfs df&quot; and &quot;lfs df -i&quot; on the client nodes also showed that there were free enough spaces and inodes on the OSTs. What&apos;s more, for each load (dd, tar, dbench, iozone) running on different client node, the load dir would always be removed after a successful run and before a new run (please refer to the recovery-mds-scale.log_run_&lt;/p&gt;
{dd,tar,dbench,iozone}
&lt;p&gt;.sh-*.debug files), so there should not be ENOSPC error.&lt;/p&gt;

&lt;p&gt;Syslog on fat-amd-4-ib showed that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep  1 08:00:05 fat-amd-4-ib kernel: Pid: 4286, comm: ll_ost_io_127
Sep  1 08:00:05 fat-amd-4-ib kernel:
Sep  1 08:00:05 fat-amd-4-ib kernel: Call Trace:
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff81247e8f&amp;gt;] ? __generic_unplug_device+0x1f/0x40
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff814dc995&amp;gt;] schedule_timeout+0x215/0x2e0
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff814dc613&amp;gt;] wait_for_common+0x123/0x180
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff8105dc60&amp;gt;] ? default_wake_function+0x0/0x20
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff81249a5d&amp;gt;] ? submit_bio+0x8d/0x120
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff814dc72d&amp;gt;] wait_for_completion+0x1d/0x20
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff8124cedd&amp;gt;] __blkdev_issue_flush+0xad/0xe0
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff8124cf26&amp;gt;] blkdev_issue_flush+0x16/0x20
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa041f8ab&amp;gt;] ldiskfs_sync_file+0x17b/0x250 [ldiskfs]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa0a15865&amp;gt;] filter_sync+0x285/0x3e0 [obdfilter]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff8115a340&amp;gt;] ? cache_alloc_refill+0x1c0/0x240
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa04fe98c&amp;gt;] ? lprocfs_counter_add+0x12c/0x196 [lvfs]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa09cfe0c&amp;gt;] ost_blocking_ast+0x58c/0xe40 [ost]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa064db73&amp;gt;] ldlm_cancel_callback+0x63/0xf0 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa064dc59&amp;gt;] ldlm_lock_cancel+0x59/0x190 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa066fe36&amp;gt;] ldlm_request_cancel+0x256/0x420 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa09d9720&amp;gt;] ost_handle+0x3d60/0x4b90 [ost]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa049b6b1&amp;gt;] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa0694104&amp;gt;] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa069684c&amp;gt;] ? lustre_msg_get_status+0x3c/0xa0 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa06a4c7e&amp;gt;] ptlrpc_main+0xb8e/0x1900 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa06a40f0&amp;gt;] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff8100c1ca&amp;gt;] child_rip+0xa/0x20
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa06a40f0&amp;gt;] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffffa06a40f0&amp;gt;] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
Sep  1 08:00:05 fat-amd-4-ib kernel: [&amp;lt;ffffffff8100c1c0&amp;gt;] ? child_rip+0x0/0x20
Sep  1 08:00:05 fat-amd-4-ib kernel:
Sep  1 08:00:05 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889205.4286
Sep  1 08:00:05 fat-amd-4-ib kernel: Lustre: Service thread pid 4210 was inactive for 156.04s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Sep  1 08:00:05 fat-amd-4-ib kernel: Lustre: Skipped 2 previous similar messages
Sep  1 08:00:05 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889205.4270
Sep  1 08:00:05 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889205.4269
Sep  1 08:00:06 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889206.4237
Sep  1 08:00:06 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889206.4267
Sep  1 08:00:06 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889206.4283
Sep  1 08:00:12 fat-amd-4-ib kernel: Lustre: Service thread pid 4218 was inactive for 156.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Sep  1 08:00:12 fat-amd-4-ib kernel: Lustre: Skipped 12 previous similar messages
Sep  1 08:00:12 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889212.4218
Sep  1 08:00:15 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889215.4285
Sep  1 08:00:18 fat-amd-4-ib kernel: Lustre: Service thread pid 4201 was inactive for 156.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Sep  1 08:00:18 fat-amd-4-ib kernel: Lustre: Skipped 2 previous similar messages
Sep  1 08:00:18 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889218.4201
Sep  1 08:00:25 fat-amd-4-ib kernel: Lustre: Service thread pid 4204 completed after 177.20s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Sep  1 08:00:25 fat-amd-4-ib kernel: Lustre: Skipped 27 previous similar messages
Sep  1 08:00:35 fat-amd-4-ib kernel: Lustre: Service thread pid 4249 was inactive for 156.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one.
Sep  1 08:00:35 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889235.4249
Sep  1 08:00:36 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889236.4275
Sep  1 08:00:43 fat-amd-4-ib kernel: LustreError: dumping log to /tmp/lustre-log.1314889243.4222
Sep  1 08:00:48 fat-amd-4-ib kernel: Lustre: Service thread pid 4226 completed after 200.23s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Maloo report: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/b4bac0f4-d523-11e0-8d02-52540025f9af&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/b4bac0f4-d523-11e0-8d02-52540025f9af&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since the size of all of the log files is large, I could not upload them to this ticket. Please find /scratch/logs/2.1.0/recovery-mds-scale.1314890151.log.tar.bz2 on node brent.whamcloud.com.&lt;/p&gt;</description>
                <environment>&lt;br/&gt;
Lustre Tag: v2_1_0_0_RC1&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/274/&quot;&gt;http://newbuild.whamcloud.com/job/lustre-master/274/&lt;/a&gt;&lt;br/&gt;
e2fsprogs Build: &lt;a href=&quot;http://newbuild.whamcloud.com/job/e2fsprogs-master/42/&quot;&gt;http://newbuild.whamcloud.com/job/e2fsprogs-master/42/&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL6/x86_64(in-kernel OFED, kernel version: 2.6.32-131.6.1.el6.x86_64)&lt;br/&gt;
ENABLE_QUOTA=yes&lt;br/&gt;
FAILURE_MODE=HARD&lt;br/&gt;
FLAVOR=MDS&lt;br/&gt;
&lt;br/&gt;
MGS/MDS Nodes: fat-amd-1-ib(active), fat-amd-2-ib(passive)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;\  /&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;1 combined MGS/MDT&lt;br/&gt;
&lt;br/&gt;
OSS Nodes: fat-amd-3-ib(active), fat-amd-4-ib(active)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;\  /&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;OST1 (lustre-OST0000, active in fat-amd-3-ib)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;OST2 (lustre-OST0001, active in fat-amd-4-ib)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;OST3 (lustre-OST0002, active in fat-amd-3-ib)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;OST4 (lustre-OST0003, active in fat-amd-4-ib)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;OST5 (lustre-OST0004, active in fat-amd-3-ib)&lt;br/&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;OST6 (lustre-OST0005, active in fat-amd-4-ib)&lt;br/&gt;
&lt;br/&gt;
Client Nodes:  client-[1,2,4,5,12,13,15],fat-intel-4&lt;br/&gt;
&lt;br/&gt;
Network Addresses:&lt;br/&gt;
fat-amd-1-ib: 192.168.4.132&lt;br/&gt;
fat-amd-2-ib: 192.168.4.133&lt;br/&gt;
fat-amd-3-ib: 192.168.4.134&lt;br/&gt;
fat-amd-4-ib: 192.168.4.135&lt;br/&gt;
client-1-ib: 192.168.4.1&lt;br/&gt;
client-2-ib: 192.168.4.2&lt;br/&gt;
client-4-ib: 192.168.4.4&lt;br/&gt;
client-5-ib: 192.168.4.5&lt;br/&gt;
client-12-ib: 192.168.4.12&lt;br/&gt;
client-13-ib: 192.168.4.13&lt;br/&gt;
client-15-ib: 192.168.4.15&lt;br/&gt;
fat-intel-4-ib: 192.168.4.131&lt;br/&gt;
</environment>
        <key id="11650">LU-657</key>
            <summary>recovery-mds-scale (FLAVOR=MDS): client load failed</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="yujian">Jian Yu</reporter>
                        <labels>
                            <label>MB</label>
                    </labels>
                <created>Fri, 2 Sep 2011 04:01:39 +0000</created>
                <updated>Tue, 9 Apr 2013 04:08:34 +0000</updated>
                            <resolved>Tue, 15 Jan 2013 13:48:22 +0000</resolved>
                                    <version>Lustre 2.1.0</version>
                    <version>Lustre 2.3.0</version>
                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.1.1</version>
                    <version>Lustre 2.1.4</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="20167" author="yujian" created="Tue, 13 Sep 2011 08:40:35 +0000"  >&lt;p&gt;Lustre Branch: master&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-reviews/2161/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-reviews/2161/&lt;/a&gt;&lt;br/&gt;
e2fsprogs Build: &lt;a href=&quot;http://newbuild.whamcloud.com/job/e2fsprogs-master/54/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://newbuild.whamcloud.com/job/e2fsprogs-master/54/&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL6/x86_64(in-kernel OFED, kernel version: 2.6.32-131.6.1.el6.x86_64)&lt;br/&gt;
ENABLE_QUOTA=yes&lt;br/&gt;
FAILURE_MODE=HARD&lt;br/&gt;
FLAVOR=OSS&lt;/p&gt;

&lt;p&gt;Lustre configuration:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;MGS/MDS Node: fat-amd-1-ib

OSS Nodes: fat-amd-3-ib(active), fat-amd-4-ib(active)
                               \ /
                               OST1 (active in fat-amd-3-ib)
                               OST2 (active in fat-amd-4-ib)
                               OST3 (active in fat-amd-3-ib)
                               OST4 (active in fat-amd-4-ib)
                               OST5 (active in fat-amd-3-ib)
                               OST6 (active in fat-amd-4-ib)
           fat-amd-2-ib(OST7)

Client Nodes: client-[1,2,4,5,12,13,15],fat-intel-4 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After the OSS failed over 9 times, recovery-mds-scale(FLAVOR=OSS) failed as follows:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;==== Checking the clients loads AFTER  failover -- failure NOT OK
Client load failed on node client-12-ib, rc=1
Client load failed during failover. Exiting
Found the END_RUN_FILE file: /home/yujian/test_logs/end_run_file
client-12-ib
Client load failed on node client-12-ib

client client-12-ib load stdout and debug files :
              /tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib
              /tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib.debug
2011-09-13 03:17:20 Terminating clients loads ...
Duration:                7200
Server failover period: 600 seconds
Exited after:           6452 seconds
Number of failovers before exit:
mds1: 0 times
ost1: 2 times
ost2: 2 times
ost3: 0 times
ost4: 1 times
ost5: 0 times
ost6: 2 times
ost7: 2 times
Status: FAIL: rc=1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;/tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib.debug:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2011-09-13 03:12:09: dd succeeded
+ cd /tmp
+ rm -rf /mnt/lustre/d0.dd-client-12-ib
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-13 03:12:13: dd run finished&apos;
+ echo &apos;2011-09-13 03:12:13: dd run finished&apos;
2011-09-13 03:12:13: dd run finished
+ &apos;[&apos; &apos;!&apos; -e /home/yujian/test_logs/end_run_file &apos;]&apos;
+ true
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-13 03:12:13: dd run starting&apos;
+ echo &apos;2011-09-13 03:12:13: dd run starting&apos;
2011-09-13 03:12:13: dd run starting
+ mkdir -p /mnt/lustre/d0.dd-client-12-ib
+ cd /mnt/lustre/d0.dd-client-12-ib
+ load_pid=8828
+ wait 8828
+ dd bs=4k count=1000000 status=noxfer if=/dev/zero of=/mnt/lustre/d0.dd-client-12-ib/dd-file
dd: writing `/mnt/lustre/d0.dd-client-12-ib/dd-file&apos;: No space left on device
654964+0 records in
654963+0 records out
+ &apos;[&apos; 1 -eq 0 &apos;]&apos;
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-13 03:13:21: dd failed&apos;
+ echo &apos;2011-09-13 03:13:21: dd failed&apos;
2011-09-13 03:13:21: dd failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Syslog on client-12-ib showed that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep 13 03:13:12 client-12 kernel: Lustre: 2186:0:(client.c:1773:ptlrpc_expire_one_request()) @@@ Request  sent has failed due to network error: [sent 1315908792/real 1315908792
]  req@ffff8800ae432800 x1379823408037596/t0(0) o8-&amp;gt;lustre-OST0002-osc-ffff880315e7c400@192.168.4.134@o2ib:28/4 lens 368/512 e 0 to 1 dl 1315908818 ref 1 fl Rpc:XN/0/ffffffff r
c 0/-1
Sep 13 03:13:12 client-12 kernel: Lustre: 2186:0:(client.c:1773:ptlrpc_expire_one_request()) Skipped 135 previous similar messages
Sep 13 03:13:21 client-12 kernel: LustreError: 8828:0:(vvp_io.c:1001:vvp_io_commit_write()) Write page 654963 of inode ffff88028e371bf8 failed -28
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client-12-ib ~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
lustre-MDT0000_UUID         7.2G      435.4M        6.2G   6% /mnt/lustre[MDT:0]
lustre-OST0000_UUID        15.2G      440.1M       14.0G   3% /mnt/lustre[OST:0]
lustre-OST0001_UUID        15.2G      440.1M       14.0G   3% /mnt/lustre[OST:1]
lustre-OST0002_UUID        15.2G      440.1M       14.0G   3% /mnt/lustre[OST:2]
lustre-OST0003_UUID        15.2G      440.1M       14.0G   3% /mnt/lustre[OST:3]
lustre-OST0004_UUID        15.2G      440.1M       14.0G   3% /mnt/lustre[OST:4]
lustre-OST0005_UUID        15.2G      440.1M       14.0G   3% /mnt/lustre[OST:5]
lustre-OST0006_UUID        15.2G        6.6G        7.9G  45% /mnt/lustre[OST:6]

filesystem summary:       106.4G        9.1G       91.9G   9% /mnt/lustre

[root@client-12-ib ~]# lfs df -i
UUID                      Inodes       IUsed       IFree IUse% Mounted on
lustre-MDT0000_UUID      5000040          63     4999977   0% /mnt/lustre[MDT:0]
lustre-OST0000_UUID       236160          89      236071   0% /mnt/lustre[OST:0]
lustre-OST0001_UUID       236160          94      236066   0% /mnt/lustre[OST:1]
lustre-OST0002_UUID       236160          94      236066   0% /mnt/lustre[OST:2]
lustre-OST0003_UUID       236160          88      236072   0% /mnt/lustre[OST:3]
lustre-OST0004_UUID       236160          87      236073   0% /mnt/lustre[OST:4]
lustre-OST0005_UUID       236160          89      236071   0% /mnt/lustre[OST:5]
lustre-OST0006_UUID       236160         100      236060   0% /mnt/lustre[OST:6]

filesystem summary:      5000040          63     4999977   0% /mnt/lustre

[root@fat-amd-2-ib ~]# lctl dl
  0 UP mgc MGC192.168.4.132@o2ib 87b4991e-f76f-5234-3bfb-544ca6cfbb6f 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter lustre-OST0006 lustre-OST0006_UUID 19

[root@fat-amd-2-ib ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              20G  1.8G   17G  10% /
tmpfs                 7.8G     0  7.8G   0% /dev/shm
/dev/sde               16G  6.6G  7.9G  46% /mnt/ost7

[root@fat-amd-2-ib ~]# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1            1310720   42667 1268053    4% /
tmpfs                2041268       1 2041267    1% /dev/shm
/dev/sde              236160     100  236060    1% /mnt/ost7
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Syslog on fat-amd-2-ib showed that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep 13 03:12:59 fat-amd-2-ib kernel: INFO: task jbd2/sde-8:2487 blocked for more than 120 seconds.
Sep 13 03:12:59 fat-amd-2-ib kernel: &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.Sep 13 03:12:59 fat-amd-2-ib kernel: jbd2/sde-8    D 0000000000000004     0  2487      2 0x00000080
Sep 13 03:12:59 fat-amd-2-ib kernel: ffff88011a5a3c10 0000000000000046 0000000000000000 ffff88011ac4cc70
Sep 13 03:12:59 fat-amd-2-ib kernel: ffff88011a5a3b80 ffffffff81012979 ffff88011a5a3bc0 ffffffff81098d09
Sep 13 03:12:59 fat-amd-2-ib kernel: ffff880117e00678 ffff88011a5a3fd8 000000000000f598 ffff880117e00678
Sep 13 03:12:59 fat-amd-2-ib kernel: Call Trace:Sep 13 03:12:59 fat-amd-2-ib kernel: [&amp;lt;ffffffff81012979&amp;gt;] ? read_tsc+0x9/0x20
Sep 13 03:12:59 fat-amd-2-ib kernel: [&amp;lt;ffffffff81098d09&amp;gt;] ? ktime_get_ts+0xa9/0xe0
Sep 13 03:12:59 fat-amd-2-ib kernel: [&amp;lt;ffffffff81098d09&amp;gt;] ? ktime_get_ts+0xa9/0xe0
Sep 13 03:12:59 fat-amd-2-ib kernel: [&amp;lt;ffffffff811a3de0&amp;gt;] ? sync_buffer+0x0/0x50
Sep 13 03:12:59 fat-amd-2-ib kernel: [&amp;lt;ffffffff814dc403&amp;gt;] io_schedule+0x73/0xc0Sep 13 03:12:59 fat-amd-2-ib kernel: [&amp;lt;ffffffff811a3e20&amp;gt;] sync_buffer+0x40/0x50
Sep 13 03:12:59 fat-amd-2-ib kernel: [&amp;lt;ffffffff814dcc6f&amp;gt;] __wait_on_bit+0x5f/0x90
Sep 13 03:12:59 fat-amd-2-ib kernel: [&amp;lt;ffffffff811a3de0&amp;gt;] ? sync_buffer+0x0/0x50
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff814dcd18&amp;gt;] out_of_line_wait_on_bit+0x78/0x90
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8108e1a0&amp;gt;] ? wake_bit_function+0x0/0x50Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff811a3dd6&amp;gt;] __wait_on_buffer+0x26/0x30Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa0922971&amp;gt;] jbd2_journal_commit_transaction+0x11c1/0x1530 [jbd2]Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff810096e0&amp;gt;] ? __switch_to+0xd0/0x320
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8107a17b&amp;gt;] ? try_to_del_timer_sync+0x7b/0xe0
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa0927b48&amp;gt;] kjournald2+0xb8/0x220 [jbd2]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8108e160&amp;gt;] ? autoremove_wake_function+0x0/0x40
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa0927a90&amp;gt;] ? kjournald2+0x0/0x220 [jbd2]Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8108ddf6&amp;gt;] kthread+0x96/0xa0Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8100c1ca&amp;gt;] child_rip+0xa/0x20Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8108dd60&amp;gt;] ? kthread+0x0/0xa0
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8100c1c0&amp;gt;] ? child_rip+0x0/0x20
Sep 13 03:13:00 fat-amd-2-ib kernel: INFO: task ll_ost_io_89:2736 blocked for more than 120 seconds.
Sep 13 03:13:00 fat-amd-2-ib kernel: &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
Sep 13 03:13:00 fat-amd-2-ib kernel: ll_ost_io_89  D 0000000000000004     0  2736      2 0x00000080
Sep 13 03:13:00 fat-amd-2-ib kernel: ffff88041158b790 0000000000000046 0000000000000000 ffff880217fa8aa0
Sep 13 03:13:00 fat-amd-2-ib kernel: ffff88041158b710 ffffffff81247e8f 0000000000000001 0000000100237e3a
Sep 13 03:13:00 fat-amd-2-ib kernel: ffff880411589a78 ffff88041158bfd8 000000000000f598 ffff880411589a78
Sep 13 03:13:00 fat-amd-2-ib kernel: Call Trace:
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff81247e8f&amp;gt;] ? __generic_unplug_device+0x1f/0x40
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff814dc995&amp;gt;] schedule_timeout+0x215/0x2e0
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff814dc613&amp;gt;] wait_for_common+0x123/0x180
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8105dc60&amp;gt;] ? default_wake_function+0x0/0x20
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff81249a5d&amp;gt;] ? submit_bio+0x8d/0x120
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff814dc72d&amp;gt;] wait_for_completion+0x1d/0x20
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8124cedd&amp;gt;] __blkdev_issue_flush+0xad/0xe0
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8124cf26&amp;gt;] blkdev_issue_flush+0x16/0x20
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa094a8ab&amp;gt;] ldiskfs_sync_file+0x17b/0x250 [ldiskfs]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa0a1c8f5&amp;gt;] filter_sync+0x285/0x3e0 [obdfilter]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa05b56d6&amp;gt;] ? _ldlm_lock_debug+0x446/0x680 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa046998c&amp;gt;] ? lprocfs_counter_add+0x12c/0x196 [lvfs]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa09d6e0c&amp;gt;] ost_blocking_ast+0x58c/0xe40 [ost]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa05b8b73&amp;gt;] ldlm_cancel_callback+0x63/0xf0 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa05b8c59&amp;gt;] ldlm_lock_cancel+0x59/0x190 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa05dae36&amp;gt;] ldlm_request_cancel+0x256/0x420 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa09e0720&amp;gt;] ost_handle+0x3d60/0x4b90 [ost]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa04066b1&amp;gt;] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa05fee94&amp;gt;] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa06015fc&amp;gt;] ? lustre_msg_get_status+0x3c/0xa0 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa060fb4e&amp;gt;] ptlrpc_main+0xb8e/0x1900 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa060efc0&amp;gt;] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8100c1ca&amp;gt;] child_rip+0xa/0x20
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa060efc0&amp;gt;] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffffa060efc0&amp;gt;] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
Sep 13 03:13:00 fat-amd-2-ib kernel: [&amp;lt;ffffffff8100c1c0&amp;gt;] ? child_rip+0x0/0x20
&amp;lt;~snip~&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Maloo report: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/64ba1fae-ddfd-11e0-9909-52540025f9af&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/64ba1fae-ddfd-11e0-9909-52540025f9af&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Please find /scratch/logs/2.1.0/recovery-oss-scale.1315909046.log.tar.bz2 on node brent.whamcloud.com for more logs.&lt;/p&gt;</comment>
                            <comment id="20347" author="yujian" created="Mon, 19 Sep 2011 22:39:09 +0000"  >&lt;p&gt;Lustre Tag: v2_1_0_0_RC2&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/283/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://newbuild.whamcloud.com/job/lustre-master/283/&lt;/a&gt;&lt;br/&gt;
e2fsprogs Build: &lt;a href=&quot;http://newbuild.whamcloud.com/job/e2fsprogs-master/54/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://newbuild.whamcloud.com/job/e2fsprogs-master/54/&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL6/x86_64(in-kernel OFED, kernel version: 2.6.32-131.6.1.el6.x86_64)&lt;br/&gt;
ENABLE_QUOTA=yes&lt;br/&gt;
FAILURE_MODE=HARD&lt;br/&gt;
FLAVOR=MDS&lt;/p&gt;

&lt;p&gt;After running about 7 hours (MDS failed over 29 times), recovery-mds-scale test failed with the same issue:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/b3f0c344-e32e-11e0-9909-52540025f9af&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/b3f0c344-e32e-11e0-9909-52540025f9af&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;/tmp/recovery-mds-scale.log_run_dd.sh-client-12-ib.debug:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2011-09-19 10:16:49: dd succeeded
+ cd /tmp
+ rm -rf /mnt/lustre/d0.dd-client-12-ib
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-19 10:16:51: dd run finished&apos;
+ echo &apos;2011-09-19 10:16:51: dd run finished&apos;
2011-09-19 10:16:51: dd run finished
+ &apos;[&apos; &apos;!&apos; -e /home/yujian/test_logs/end_run_file &apos;]&apos;
+ true
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-19 10:16:51: dd run starting&apos;
+ echo &apos;2011-09-19 10:16:51: dd run starting&apos;
2011-09-19 10:16:51: dd run starting
+ mkdir -p /mnt/lustre/d0.dd-client-12-ib
+ cd /mnt/lustre/d0.dd-client-12-ib
+ load_pid=17460
+ wait 17460
+ dd bs=4k count=1000000 status=noxfer if=/dev/zero of=/mnt/lustre/d0.dd-client-12-ib/dd-file
dd: writing `/mnt/lustre/d0.dd-client-12-ib/dd-file&apos;: No space left on device
589960+0 records in
589959+0 records out
+ &apos;[&apos; 1 -eq 0 &apos;]&apos;
++ date &apos;+%F %H:%M:%S&apos;
+ echoerr &apos;2011-09-19 10:24:03: dd failed&apos;
+ echo &apos;2011-09-19 10:24:03: dd failed&apos;
2011-09-19 10:24:03: dd failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Syslog on the client node client-12-ib:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Sep 19 10:17:48 client-12 kernel: LustreError: 11-0: an error occurred while communicating with 192.168.4.134@o2ib. The ost_write operation failed with -28
Sep 19 10:17:48 client-12 kernel: LustreError: 17460:0:(vvp_io.c:1001:vvp_io_commit_write()) Write page 589959 of inode ffff88029f2b7cb8 failed -28
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Please refer to the attached recovery-mds-scale.1316453521.log.tar.bz2 for more logs.&lt;/p&gt;</comment>
                            <comment id="28487" author="yujian" created="Mon, 13 Feb 2012 07:29:23 +0000"  >&lt;p&gt;Lustre Tag: v2_1_1_0_RC2&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_1/41/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_1/41/&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-220.el6)&lt;br/&gt;
Network: TCP (1GigE)&lt;br/&gt;
FAILURE_MODE=HARD&lt;br/&gt;
FLAVOR=MDS&lt;/p&gt;

&lt;p&gt;The same issue occurred: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/8384ecca-55b2-11e1-9aa8-5254004bbbd3&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/8384ecca-55b2-11e1-9aa8-5254004bbbd3&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="40480" author="yujian" created="Wed, 13 Jun 2012 05:55:57 +0000"  >&lt;p&gt;Another instance: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/92648b78-b3d7-11e1-8808-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/92648b78-b3d7-11e1-8808-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="40487" author="yujian" created="Wed, 13 Jun 2012 07:12:20 +0000"  >&lt;p&gt;Another instance: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/f09a5f58-b3bc-11e1-a2dd-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/f09a5f58-b3bc-11e1-a2dd-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="41963" author="pjones" created="Wed, 18 Jul 2012 07:54:54 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="42115" author="hongchao.zhang" created="Mon, 23 Jul 2012 05:16:01 +0000"  >&lt;p&gt;there was no obvious error in the logs, and the obd_osfs was updated by fsfilt_statfs before -ENOSPC&lt;br/&gt;
I checked the recently occurrences of this bug in Maloo, and the log of corresponding OST was overwritten&lt;br/&gt;
for the extra logs produced after the error -ENOSPC was triggered at OST&lt;/p&gt;

&lt;p&gt;the OST could be actually full at the time if the tests running at the clients used the same OST, &lt;br/&gt;
will create a debug patch to produce more logs to check whether it is the case&lt;/p&gt;</comment>
                            <comment id="42123" author="hongchao.zhang" created="Mon, 23 Jul 2012 08:12:50 +0000"  >&lt;p&gt;the possible patch is tracked at &lt;a href=&quot;http://review.whamcloud.com/#change,3446&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3446&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="42305" author="hongchao.zhang" created="Thu, 26 Jul 2012 06:53:52 +0000"  >&lt;p&gt;this issue occurs after the client received the OBD_FL_NO_USRQUOTA or OBD_FL_NO_GRPQUOTA, is the recovery-mds-scale tested&lt;br/&gt;
by user id with quota limit?&lt;/p&gt;</comment>
                            <comment id="42543" author="yujian" created="Wed, 1 Aug 2012 08:45:43 +0000"  >&lt;blockquote&gt;&lt;p&gt;this issue occurs after the client received the OBD_FL_NO_USRQUOTA or OBD_FL_NO_GRPQUOTA, is the recovery-mds-scale tested by user id with quota limit?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The test was run by root user.&lt;/p&gt;</comment>
                            <comment id="42942" author="pjones" created="Thu, 9 Aug 2012 10:42:27 +0000"  >&lt;p&gt;The proposed patch has been landed for 2.3. Closing this ticket for now but we should reopen it if it transpires that the issue fixed was not the one reported and this problem still occurs on the next tag.&lt;/p&gt;</comment>
                            <comment id="43314" author="bzzz" created="Thu, 16 Aug 2012 03:42:13 +0000"  >&lt;p&gt;just got this on the master:&lt;/p&gt;


&lt;p&gt;Lustre: DEBUG MARKER: == sanity test 49: Change max_pages_per_rpc won&apos;t break osc extent =================================== 07:44:43 (1345088683)&lt;br/&gt;
format at filter_io.c:786:filter_preprw_write doesn&apos;t end in newline&lt;br/&gt;
LustreError: 15266:0:(ost_handler.c:2240:ost_handle()) ASSERTION( get_current()-&amp;gt;journal_info == ((void *)0) ) failed: &lt;br/&gt;
LustreError: 15266:0:(ost_handler.c:2240:ost_handle()) LBUG&lt;/p&gt;


&lt;p&gt;filter_io.c:786: 		CDEBUG(D_INODE, &quot;retry after commit pending journals&quot;);&lt;/p&gt;
</comment>
                            <comment id="43316" author="hongchao.zhang" created="Thu, 16 Aug 2012 04:19:44 +0000"  >&lt;p&gt;there is a bug in the merged patch&lt;/p&gt;</comment>
                            <comment id="43317" author="hongchao.zhang" created="Thu, 16 Aug 2012 04:27:17 +0000"  >&lt;p&gt;Thanks bzzz!&lt;/p&gt;

&lt;p&gt;sorry, there is indeed a bug in the merged patch, the fix patch will be pushed soon.&lt;/p&gt;</comment>
                            <comment id="43321" author="hongchao.zhang" created="Thu, 16 Aug 2012 05:38:31 +0000"  >&lt;p&gt;the fix patch is tracked at &lt;a href=&quot;http://review.whamcloud.com/#change,3692&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3692&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="43483" author="yujian" created="Mon, 20 Aug 2012 04:38:31 +0000"  >&lt;p&gt;Lustre Branch: b2_1&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_1/117&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_1/117&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another failure instance: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/6533657c-ea21-11e1-b794-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/6533657c-ea21-11e1-b794-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="43713" author="pjones" created="Thu, 23 Aug 2012 16:51:27 +0000"  >&lt;p&gt;Landed for 2.3 and 2.4&lt;/p&gt;</comment>
                            <comment id="44198" author="yujian" created="Wed, 5 Sep 2012 11:03:10 +0000"  >&lt;p&gt;Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_3/12&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_3/12&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The issue still exists:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/5daabc00-f760-11e1-8b95-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/5daabc00-f760-11e1-8b95-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="44270" author="hongchao.zhang" created="Thu, 6 Sep 2012 06:45:27 +0000"  >&lt;p&gt;Hi YuJian, &lt;br/&gt;
could you please help to reproduce this issue with D_QUOTA added into PTLDEBUG, for it is suspected it could be related to quota, thanks!&lt;/p&gt;</comment>
                            <comment id="44280" author="yujian" created="Thu, 6 Sep 2012 09:26:45 +0000"  >&lt;blockquote&gt;&lt;p&gt;could you please help to reproduce this issue with D_QUOTA added into PTLDEBUG, for it is suspected it could be related to quota, thanks!&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Hi Chris,&lt;/p&gt;

&lt;p&gt;Could you please set the following PTLDEBUG value in autotest_config.sh to run the hard failover tests?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;PTLDEBUG=&quot;vfstrace rpctrace dlmtrace neterror ha config ioctl super quota&quot;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I could not reproduce this issue in manual test runs.&lt;/p&gt;</comment>
                            <comment id="44293" author="chris" created="Thu, 6 Sep 2012 11:24:03 +0000"  >&lt;p&gt;Added to b2_3 failover tests&lt;/p&gt;</comment>
                            <comment id="44453" author="yujian" created="Sun, 9 Sep 2012 21:19:01 +0000"  >&lt;blockquote&gt;&lt;p&gt;Added to b2_3 failover tests&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Thanks Chris.&lt;/p&gt;

&lt;p&gt;Hi Hongchao,&lt;br/&gt;
The issue was reproduced in autotest run:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/7eca53cc-f92d-11e1-a1b8-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/7eca53cc-f92d-11e1-a1b8-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="44574" author="hongchao.zhang" created="Tue, 11 Sep 2012 06:20:28 +0000"  >&lt;p&gt;the autotest has been queued to verify whether this issue is fixed or not&lt;br/&gt;
after the patch(&lt;a href=&quot;http://review.whamcloud.com/#change,3913&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3913&lt;/a&gt;) was merged, which fixed a bug in the previous patch in this ticket&lt;/p&gt;</comment>
                            <comment id="44762" author="yujian" created="Thu, 13 Sep 2012 03:30:30 +0000"  >&lt;p&gt;Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_3/17&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_3/17&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/22a91122-fd70-11e1-afe5-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/22a91122-fd70-11e1-afe5-52540035b04c&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/28a35b3a-fd72-11e1-a1b4-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/28a35b3a-fd72-11e1-a1b4-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The issue still occurred.&lt;/p&gt;</comment>
                            <comment id="44777" author="hongchao.zhang" created="Thu, 13 Sep 2012 07:44:19 +0000"  >&lt;p&gt;the debug patch is at &lt;a href=&quot;http://review.whamcloud.com/#change,3979&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3979&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hi Chris,&lt;br/&gt;
Could you please queue an autotest (only b2_3 failover tests) against the build with this debug patch? Thanks!&lt;/p&gt;</comment>
                            <comment id="44787" author="chris" created="Thu, 13 Sep 2012 08:57:37 +0000"  >&lt;p&gt;You are able to do this using the Test-Requirements in the described on the wiki.&lt;/p&gt;

&lt;p&gt;You will need something like this;&lt;/p&gt;

&lt;p&gt;Test-Requirements: fortestonly envdefinitions=&quot;SLOW=yes&quot; clientcount=4 osscount=2 mdscount=2 austeroptions=-R failover=true useiscsi=true testgroup=failover&lt;/p&gt;

&lt;p&gt;I should probably introduce a class that allows somethings like that to be done more easily, this example is on the wiki page.&lt;/p&gt;</comment>
                            <comment id="44796" author="yujian" created="Thu, 13 Sep 2012 10:31:32 +0000"  >&lt;p&gt;Hi Chris and Hongchao,&lt;br/&gt;
Please also add testlist=recovery-mds-scale to only run this test in the failover group, otherwise it will take very long time to finish the whole group.&lt;/p&gt;</comment>
                            <comment id="44942" author="yujian" created="Sat, 15 Sep 2012 06:04:59 +0000"  >&lt;p&gt;Hi Hongchao,&lt;br/&gt;
I applied the debug patch on b2_3 branch: &lt;a href=&quot;http://review.whamcloud.com/#change,4002&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4002&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="44972" author="yujian" created="Sun, 16 Sep 2012 21:18:13 +0000"  >&lt;blockquote&gt;&lt;p&gt;I applied the debug patch on b2_3 branch: &lt;a href=&quot;http://review.whamcloud.com/#change,4002&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4002&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Chris, could you please a look at the above change? The recovery-mds-scale test was not run somehow.&lt;/p&gt;</comment>
                            <comment id="45036" author="chris" created="Mon, 17 Sep 2012 10:27:04 +0000"  >&lt;p&gt;There is a bug in autotest, I have tried to fix on the fly. You might want to try it again.&lt;/p&gt;

&lt;p&gt;The bug was that austeroptions was defined as nil and so have no type meaning the parser did not know what to do with the option.&lt;/p&gt;

&lt;p&gt;I fixed this by defaults austeroptions to &quot;-v&quot; meaning it now has a type.&lt;/p&gt;</comment>
                            <comment id="45038" author="chris" created="Mon, 17 Sep 2012 10:35:18 +0000"  >&lt;p&gt;I&apos;ve restarted this but don&apos;t expect it to work, I&apos;ve not actually been successful with the long set of Test-Requirements, in fact it&apos;s possible I&apos;ve broken stuff by trying to get it to work.&lt;/p&gt;</comment>
                            <comment id="45101" author="hongchao.zhang" created="Mon, 17 Sep 2012 21:35:18 +0000"  >&lt;p&gt;the recovery-mds-scale test run, Thanks Chris, Yujian!&lt;/p&gt;


&lt;p&gt;from the output, the OST does have no space,&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_logs/9a8991da-00e5-11e2-860a-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_logs/9a8991da-00e5-11e2-860a-52540035b04c&lt;/a&gt;&lt;br/&gt;
...&lt;br/&gt;
LustreError: 9656:0:(filter_io.c:787:filter_preprw_write()) retry after commit pending journals&lt;br/&gt;
LustreError: 9656:0:(filter_io.c:820:filter_preprw_write()) lustre-OST0001: cli 70f1591f-2616-ae38-9797-12038e2d355c/ffff880037547c00 free: 246771712 avail: 37392384 grant 37253120 left: 4096 pending: 0 osfs_age 4295175522, current 4295175522&lt;br/&gt;
...&lt;/p&gt;

&lt;p&gt;Hi Chris, what is the disk size of &quot;/dev/dm-1&quot; in fat-intel-3vm4(OSS node), is it 2G? ( I just check it but it started to&lt;br/&gt;
run another test). and &apos;dd&apos; encountered the -ENOSPC(-28) at 3.5G (897749*4096), the stripe_count is 2?&lt;/p&gt;</comment>
                            <comment id="45260" author="chris" created="Thu, 20 Sep 2012 09:54:02 +0000"  >&lt;p&gt;The really need to make this info easier to access, we have a ticket about this.&lt;/p&gt;

&lt;p&gt;I went and found the original logs and the values are;&lt;/p&gt;

&lt;p&gt;MDSSIZE=2097152&lt;br/&gt;
OSTSIZE=4089446&lt;/p&gt;

&lt;p&gt;I don&apos;t specify the stripe count.&lt;/p&gt;

&lt;p&gt;config was this:&lt;/p&gt;

&lt;p&gt;09:00:57:cat /root/autotest_config.sh&lt;br/&gt;
09:00:57:#!/bin/bash&lt;br/&gt;
09:00:57:#Auto Generated By Whamcloud Autotest&lt;br/&gt;
09:00:57:#Key Exports&lt;br/&gt;
09:00:57:export mgs_HOST=fat-intel-3vm3&lt;br/&gt;
09:00:58:export mds_HOST=fat-intel-3vm3&lt;br/&gt;
09:00:58:export MGSDEV=/dev/lvm-MDS/P1&lt;br/&gt;
09:00:58:export MDSDEV=/dev/lvm-MDS/P1&lt;br/&gt;
09:00:58:export mds1_HOST=fat-intel-3vm3&lt;br/&gt;
09:00:58:export MDSDEV1=/dev/lvm-MDS/P1&lt;br/&gt;
09:00:58:export MDSCOUNT=1&lt;br/&gt;
09:00:58:export MDSSIZE=2097152&lt;br/&gt;
09:00:58:export MGSSIZE=2097152&lt;br/&gt;
09:00:58:export MDSFSTYPE=ldiskfs&lt;br/&gt;
09:00:58:export MGSFSTYPE=ldiskfs&lt;br/&gt;
09:00:58:export mdsfailover_HOST=fat-intel-3vm7&lt;br/&gt;
09:00:58:export mds1failover_HOST=fat-intel-3vm7&lt;br/&gt;
09:00:58:export MGSNID=fat-intel-3vm3:fat-intel-3vm7&lt;br/&gt;
09:00:58:export FAILURE_MODE=HARD&lt;br/&gt;
09:00:58:export POWER_DOWN=&quot;pm -h powerman --off&quot;&lt;br/&gt;
09:00:58:export POWER_UP=&quot;pm -h powerman --on&quot;&lt;br/&gt;
09:00:58:export ost_HOST=fat-intel-3vm4&lt;br/&gt;
09:00:58:export ostfailover_HOST=fat-intel-3vm8&lt;br/&gt;
09:00:58:export ost1_HOST=fat-intel-3vm4&lt;br/&gt;
09:00:58:export OSTDEV1=/dev/lvm-OSS/P1&lt;br/&gt;
09:00:58:export ost1failover_HOST=fat-intel-3vm8&lt;br/&gt;
09:00:58:export ost2_HOST=fat-intel-3vm4&lt;br/&gt;
09:00:58:export OSTDEV2=/dev/lvm-OSS/P2&lt;br/&gt;
09:00:58:export ost2failover_HOST=fat-intel-3vm8&lt;br/&gt;
09:00:58:export ost3_HOST=fat-intel-3vm4&lt;br/&gt;
09:00:58:export OSTDEV3=/dev/lvm-OSS/P3&lt;br/&gt;
09:00:58:export ost3failover_HOST=fat-intel-3vm8&lt;br/&gt;
09:00:58:export ost4_HOST=fat-intel-3vm4&lt;br/&gt;
09:00:58:export OSTDEV4=/dev/lvm-OSS/P4&lt;br/&gt;
09:00:58:export ost4failover_HOST=fat-intel-3vm8&lt;br/&gt;
09:00:58:export ost5_HOST=fat-intel-3vm4&lt;br/&gt;
09:00:58:export OSTDEV5=/dev/lvm-OSS/P5&lt;br/&gt;
09:00:58:export ost5failover_HOST=fat-intel-3vm8&lt;br/&gt;
09:00:58:export ost6_HOST=fat-intel-3vm4&lt;br/&gt;
09:00:58:export OSTDEV6=/dev/lvm-OSS/P6&lt;br/&gt;
09:00:58:export ost6failover_HOST=fat-intel-3vm8&lt;br/&gt;
09:00:58:export ost7_HOST=fat-intel-3vm4&lt;br/&gt;
09:00:58:export OSTDEV7=/dev/lvm-OSS/P7&lt;br/&gt;
09:00:58:export ost7failover_HOST=fat-intel-3vm8&lt;br/&gt;
09:00:58:# some setup for conf-sanity test 24a, 24b, 33a&lt;br/&gt;
09:00:58:export fs2mds_DEV=/dev/lvm-MDS/S1&lt;br/&gt;
09:00:58:export fs2ost_DEV=/dev/lvm-OSS/S1&lt;br/&gt;
09:00:58:export fs3ost_DEV=/dev/lvm-OSS/S2&lt;br/&gt;
09:00:58:export RCLIENTS=&quot;fat-intel-3vm6 fat-intel-3vm5&quot;&lt;br/&gt;
09:00:58:export OSTCOUNT=7&lt;br/&gt;
09:00:58:export NETTYPE=tcp&lt;br/&gt;
09:00:59:export OSTSIZE=4089446&lt;br/&gt;
09:00:59:export OSTFSTYPE=ldiskfs&lt;br/&gt;
09:00:59:export FSTYPE=ldiskfs&lt;br/&gt;
09:00:59:export SHARED_DIRECTORY=/home/autotest/.autotest/shared_dir/2012-09-17/080224-7f44ee54dbb8&lt;br/&gt;
09:00:59:export SLOW=yes&lt;br/&gt;
09:00:59:VERBOSE=true&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:# Entries above here come are created by configurecluster.rb&lt;br/&gt;
09:00:59:# Entries below here come from mecturk.h&lt;br/&gt;
09:00:59:FSNAME=lustre&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:TMP=${TMP:-/tmp}&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:DAEMONSIZE=${DAEMONSIZE:-500}&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:MDSOPT=${MDSOPT:-&quot;&quot;}&lt;br/&gt;
09:00:59:MGSOPT=${MGSOPT:-&quot;&quot;}&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:# sgpdd-survey requires these to be set. They apprarently have no side affect.&lt;br/&gt;
09:00:59:SGPDD_YES=true&lt;br/&gt;
09:00:59:REFORMAT=true&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:# some bits for liblustre tcp connecttions&lt;br/&gt;
09:00:59:export LNET_ACCEPT_PORT=7988&lt;br/&gt;
09:00:59:export ACCEPTOR_PORT=7988&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:OSTOPT=${OSTOPT:-&quot;&quot;}&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:STRIPE_BYTES=${STRIPE_BYTES:-1048576}&lt;br/&gt;
09:00:59:STRIPES_PER_OBJ=${STRIPES_PER_OBJ:-0}&lt;br/&gt;
09:00:59:SINGLEMDS=${SINGLEMDS:-&quot;mds1&quot;}&lt;br/&gt;
09:00:59:TIMEOUT=${TIMEOUT:-20}&lt;br/&gt;
09:00:59:PTLDEBUG=${PTLDEBUG:-0x33f0404}&lt;br/&gt;
09:00:59:DEBUG_SIZE=${DEBUG_SIZE:-32}&lt;br/&gt;
09:00:59:SUBSYSTEM=${SUBSYSTEM:- 0xffb7e3ff}&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:MKFSOPT=&quot;&quot;&lt;br/&gt;
09:00:59:MOUNTOPT=&quot;&quot;&lt;br/&gt;
09:00:59:[ &quot;x$MDSJOURNALSIZE&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MKFSOPT=$MKFSOPT&quot; -J size=$MDSJOURNALSIZE&quot;&lt;br/&gt;
09:00:59:[ &quot;x$MDSISIZE&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MKFSOPT=$MKFSOPT&quot; -i $MDSISIZE&quot;&lt;br/&gt;
09:00:59:[ &quot;x$MKFSOPT&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MKFSOPT=&quot;--mkfsoptions=\\\&quot;$MKFSOPT\\\&quot;&quot;&lt;br/&gt;
09:00:59:[ &quot;x$MDSCAPA&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MKFSOPT=&quot;--param mdt.capa=$MDSCAPA&quot;&lt;br/&gt;
09:00:59:[ &quot;$MDSFSTYPE&quot; = &quot;ldiskfs&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MDSOPT=$MDSOPT&quot; --mountfsoptions=errors=remount-ro,iopen_nopriv,user_xattr,acl&quot;&lt;br/&gt;
09:00:59:[ &quot;x$mdsfailover_HOST&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MDSOPT=$MDSOPT&quot; --failnode=`h2$NETTYPE $mdsfailover_HOST`&quot;&lt;br/&gt;
09:00:59:[ &quot;x$STRIPE_BYTES&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MOUNTOPT=$MOUNTOPT&quot; --param lov.stripesize=$STRIPE_BYTES&quot;&lt;br/&gt;
09:00:59:[ &quot;x$STRIPES_PER_OBJ&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MOUNTOPT=$MOUNTOPT&quot; --param lov.stripecount=$STRIPES_PER_OBJ&quot;&lt;br/&gt;
09:00:59:[ &quot;x$L_GETIDENTITY&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MOUNTOPT=$MOUNTOPT&quot; --param mdt.identity_upcall=$L_GETIDENTITY&quot;&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:MDS_MKFS_OPTS=&quot;--mdt --fsname=$FSNAME $MKFSOPT $MDSOPT&quot;&lt;br/&gt;
09:00:59:[ &quot;$MDSFSTYPE&quot; = &quot;ldiskfs&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MDS_MKFS_OPTS=$MDS_MKFS_OPTS&quot; --param sys.timeout=$TIMEOUT --device-size=$MDSSIZE&quot;&lt;br/&gt;
09:00:59:[ &quot;$MDSFSTYPE&quot; = &quot;zfs&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MDS_MKFS_OPTS=$MDS_MKFS_OPTS&quot; --vdev-size=$MDSSIZE&quot;&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:if combined_mgs_mds ; then&lt;br/&gt;
09:00:59:    [ &quot;$MDSCOUNT&quot; = &quot;1&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:        MDS_MKFS_OPTS=&quot;--mgs $MDS_MKFS_OPTS&quot;&lt;br/&gt;
09:00:59:else&lt;br/&gt;
09:00:59:    MDS_MKFS_OPTS=&quot;--mgsnode=$MGSNID $MDS_MKFS_OPTS&quot;&lt;br/&gt;
09:00:59:    [ &quot;$MGSFSTYPE&quot; = &quot;ldiskfs&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:        MGS_MKFS_OPTS=&quot;--mgs --device-size=$MGSSIZE&quot;&lt;br/&gt;
09:00:59:    [ &quot;$MGSFSTYPE&quot; = &quot;zfs&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:        MGS_MKFS_OPTS=&quot;--mgs --vdev-size=$MGSSIZE&quot;&lt;br/&gt;
09:00:59:fi&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:if [ &quot;$MDSDEV1&quot; != &quot;$MGSDEV&quot; ]; then&lt;br/&gt;
09:00:59:    if [ &quot;$MGSFSTYPE&quot; == &quot;ldiskfs&quot; ]; then&lt;br/&gt;
09:00:59:        MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-&quot;-o loop&quot;}&lt;br/&gt;
09:00:59:    else&lt;br/&gt;
09:00:59:        MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-&quot;&quot;}&lt;br/&gt;
09:00:59:    fi&lt;br/&gt;
09:00:59:else&lt;br/&gt;
09:00:59:    MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-$MDS_MOUNT_OPTS}&lt;br/&gt;
09:00:59:fi&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:MKFSOPT=&quot;&quot;&lt;br/&gt;
09:00:59:MOUNTOPT=&quot;&quot;&lt;br/&gt;
09:00:59:[ &quot;x$OSTJOURNALSIZE&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MKFSOPT=$MKFSOPT&quot; -J size=$OSTJOURNALSIZE&quot;&lt;br/&gt;
09:00:59:[ &quot;x$MKFSOPT&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MKFSOPT=&quot;--mkfsoptions=\\\&quot;$MKFSOPT\\\&quot;&quot;&lt;br/&gt;
09:00:59:[ &quot;x$OSSCAPA&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    MKFSOPT=&quot;--param ost.capa=$OSSCAPA&quot;&lt;br/&gt;
09:00:59:[ &quot;x$ostfailover_HOST&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    OSTOPT=$OSTOPT&quot; --failnode=`h2$NETTYPE $ostfailover_HOST`&quot;&lt;br/&gt;
09:00:59:&lt;br/&gt;
09:00:59:OST_MKFS_OPTS=&quot;--ost --fsname=$FSNAME --mgsnode=$MGSNID $MKFSOPT $OSTOPT&quot;&lt;br/&gt;
09:00:59:[ &quot;$OSTFSTYPE&quot; = &quot;ldiskfs&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    OST_MKFS_OPTS=$OST_MKFS_OPTS&quot; --param sys.timeout=$TIMEOUT --device-size=$OSTSIZE&quot;&lt;br/&gt;
09:00:59:[ &quot;$OSTFSTYPE&quot; = &quot;zfs&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:00:59:    OST_MKFS_OPTS=$OST_MKFS_OPTS&quot; --vdev-size=$OSTSIZE&quot;&lt;br/&gt;
09:01:00:&lt;br/&gt;
09:01:00:MDS_MOUNT_OPTS=${MDS_MOUNT_OPTS:-&quot;-o user_xattr,acl&quot;}&lt;br/&gt;
09:01:00:OST_MOUNT_OPTS=${OST_MOUNT_OPTS:-&quot;&quot;}&lt;br/&gt;
09:01:00:&lt;br/&gt;
09:01:00:# TT-430&lt;br/&gt;
09:01:00:SERVER_FAILOVER_PERIOD=$((60 * 15))&lt;br/&gt;
09:01:00:&lt;br/&gt;
09:01:00:#RUNAS_ID=840000017&lt;br/&gt;
09:01:00:#client&lt;br/&gt;
09:01:00:MOUNT=${MOUNT:-/mnt/${FSNAME}}&lt;br/&gt;
09:01:00:MOUNT1=${MOUNT1:-$MOUNT}&lt;br/&gt;
09:01:00:MOUNT2=${MOUNT2:-${MOUNT}2}&lt;br/&gt;
09:01:00:MOUNTOPT=${MOUNTOPT:-&quot;-o user_xattr,acl,flock&quot;}&lt;br/&gt;
09:01:00:[ &quot;x$RMTCLIENT&quot; != &quot;x&quot; ] &amp;amp;&amp;amp;&lt;br/&gt;
09:01:00:        MOUNTOPT=$MOUNTOPT&quot;,remote_client&quot;&lt;br/&gt;
09:01:00:DIR=${DIR:-$MOUNT}&lt;br/&gt;
09:01:00:DIR1=${DIR:-$MOUNT1}&lt;br/&gt;
09:01:00:DIR2=${DIR2:-$MOUNT2}&lt;br/&gt;
09:01:00:&lt;br/&gt;
09:01:00:if [ $UID -ne 0 ]; then&lt;br/&gt;
09:01:00:        log &quot;running as non-root uid $UID&quot;&lt;br/&gt;
09:01:00:        RUNAS_ID=&quot;$UID&quot;&lt;br/&gt;
09:01:00:        RUNAS_GID=`id -g $USER`&lt;br/&gt;
09:01:00:        RUNAS=&quot;&quot;&lt;br/&gt;
09:01:00:else&lt;br/&gt;
09:01:00:        RUNAS_ID=${RUNAS_ID:-500}&lt;br/&gt;
09:01:00:        RUNAS_GID=${RUNAS_GID:-$RUNAS_ID}&lt;br/&gt;
09:01:00:        RUNAS=${RUNAS:-&quot;runas -u $RUNAS_ID&quot;}&lt;br/&gt;
09:01:00:fi&lt;br/&gt;
09:01:00:&lt;br/&gt;
09:01:00:PDSH=&quot;pdsh -t 120 -S -Rrsh -w&quot;&lt;br/&gt;
09:01:00:export RSYNC_RSH=rsh&lt;br/&gt;
09:01:00:FAILURE_MODE=${FAILURE_MODE:-SOFT} # or HARD&lt;br/&gt;
09:01:00:POWER_DOWN=${POWER_DOWN:-&quot;powerman --off&quot;}&lt;br/&gt;
09:01:00:POWER_UP=${POWER_UP:-&quot;powerman --on&quot;}&lt;br/&gt;
09:01:00:SLOW=${SLOW:-no}&lt;br/&gt;
09:01:00:FAIL_ON_ERROR=${FAIL_ON_ERROR:-true}&lt;br/&gt;
09:01:00:&lt;br/&gt;
09:01:01:# error: conf_param: No such device&quot; issue in every test suite logs&lt;br/&gt;
09:01:01:# sanity-quota test_32 hash_lqs_cur_bits isn&apos;t set properly&lt;br/&gt;
09:01:01:QUOTA_TYPE=${QUOTA_TYPE:-&quot;ug3&quot;}&lt;br/&gt;
09:01:01:QUOTA_USERS=${QUOTA_USERS:-&quot;quota_usr quota_2usr sanityusr sanityusr1&quot;}&lt;br/&gt;
09:01:01:LQUOTAOPTS=${LQUOTAOPTS:-&quot;hash_lqs_cur_bits=3&quot;}&lt;br/&gt;
09:01:01:&lt;br/&gt;
09:01:01:# SKIP: parallel-scale test_compilebench compilebench not found&lt;br/&gt;
09:01:01:# SKIP: parallel-scale test_connectathon connectathon dir not found&lt;br/&gt;
09:01:01:# ------&lt;br/&gt;
09:01:01:cbench_DIR=/usr/bin&lt;br/&gt;
09:01:01:cnt_DIR=/opt/connectathon&lt;br/&gt;
09:01:01:&lt;br/&gt;
09:01:01:MPIRUN=$(which mpirun 2&amp;gt;/dev/null) || true&lt;br/&gt;
09:01:01:MPIRUN_OPTIONS=&quot;-mca boot ssh&quot;&lt;br/&gt;
09:01:01:MPI_USER=${MPI_USER:-mpiuser}&lt;br/&gt;
09:01:01:SINGLECLIENT=$(hostname)&lt;br/&gt;
09:01:01:#cbench_DIR=/data/src/benchmarks/compilebench.hg&lt;br/&gt;
09:01:01:#cnt_DIR=/data/src/benchmarks/cthon04&lt;br/&gt;
09:01:01:&lt;br/&gt;
09:01:01:# For multiple clients testing, we need use the cfg/ncli.sh config file, and&lt;br/&gt;
09:01:01:# only need specify the &quot;RCLIENTS&quot; variable. The &quot;CLIENTS&quot; and &quot;CLIENTCOUNT&quot;&lt;br/&gt;
09:01:01:# variables are defined in init_clients_lists(), which is called from cfg/ncli.sh.&lt;br/&gt;
09:01:01:# So, if we add the contents of cfg/ncli.sh into autotest_config.sh, we would not&lt;br/&gt;
09:01:01:# need specify &quot;CLIENTS&quot; and &quot;CLIENTCOUNT&quot;, and the above two issues (#3 and #4) would also be fixed.&lt;br/&gt;
09:01:01:# Start of contents of cfg/ncli.sh&lt;br/&gt;
09:01:01:CLIENT1=${CLIENT1:-`hostname`}&lt;br/&gt;
09:01:01:SINGLECLIENT=$CLIENT1&lt;br/&gt;
09:01:01:RCLIENTS=${RCLIENTS:-&quot;&quot;}&lt;br/&gt;
09:01:01:&lt;br/&gt;
09:01:01:init_clients_lists&lt;br/&gt;
09:01:01:&lt;br/&gt;
09:01:01:[ -n &quot;$RCLIENTS&quot; -a &quot;$PDSH&quot; = &quot;no_dsh&quot; ] &amp;amp;&amp;amp; \&lt;br/&gt;
09:01:01:                error &quot;tests for remote clients $RCLIENTS needs pdsh != do_dsh &quot; || true&lt;br/&gt;
09:01:01:&lt;br/&gt;
09:01:01:[ -n &quot;$FUNCTIONS&quot; ] &amp;amp;&amp;amp; . $FUNCTIONS || true&lt;br/&gt;
09:01:01:&lt;br/&gt;
09:01:01:# for recovery scale tests&lt;br/&gt;
09:01:01:# default boulder cluster iozone location&lt;br/&gt;
09:01:01:export PATH=/opt/iozone/bin:$PATH&lt;br/&gt;
09:01:01:&lt;br/&gt;
09:01:01:LOADS=${LOADS:-&quot;dd tar dbench iozone&quot;}&lt;br/&gt;
09:01:01:for i in $LOADS; do&lt;br/&gt;
09:01:01:    [ -f $LUSTRE/tests/run_${i}.sh ] || \&lt;br/&gt;
09:01:01:        error &quot;incorrect load: $i&quot;&lt;br/&gt;
09:01:01:done&lt;br/&gt;
09:01:01:CLIENT_LOADS=($LOADS)&lt;br/&gt;
09:01:01:# End of contents of cfg/ncli.sh&lt;br/&gt;
09:01:01:&lt;/p&gt;</comment>
                            <comment id="45316" author="hongchao.zhang" created="Fri, 21 Sep 2012 00:21:11 +0000"  >&lt;p&gt;this explains the issue, the default stripe count is 1 and the OSTSIZE is 4G, the -ENOSPC occurs just at writing 3.5G, &lt;br/&gt;
and the remaining disk space is 0.3G (free: 246771712 avail: 37392384 grant 37253120 left: 4096 pending: 0).&lt;/p&gt;

&lt;p&gt;then this is the normal case, and it can be fixed by modifying the replay-mds-scale to increase default stripe count or enlarge OST size.&lt;/p&gt;</comment>
                            <comment id="45317" author="hongchao.zhang" created="Fri, 21 Sep 2012 00:49:41 +0000"  >&lt;p&gt;the patch is tracked at &lt;a href=&quot;http://review.whamcloud.com/#change,4064&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4064&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="45318" author="yujian" created="Fri, 21 Sep 2012 00:51:51 +0000"  >&lt;blockquote&gt;
&lt;p&gt;I went and found the original logs and the values are;&lt;/p&gt;

&lt;p&gt;MDSSIZE=2097152&lt;br/&gt;
OSTSIZE=4089446&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I just checked the run_dd.sh script and found it always created/deleted a file with 4000000k bytes:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;dd bs=4k count=1000000 status=noxfer if=/dev/zero of=$TESTDIR/dd-file 1&amp;gt;$LOG &amp;amp;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The OSTSIZE with 4089446k bytes is not enough for a 4000000k bytes file with stripe count 0.&lt;/p&gt;

&lt;p&gt;I just updated &lt;a href=&quot;http://review.whamcloud.com/#change,4002&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4002&lt;/a&gt; with &quot;ostsizegb=6&quot; to see whether the recovery-mds-scale test will hit the out of space error or not.&lt;/p&gt;</comment>
                            <comment id="45380" author="pjones" created="Sat, 22 Sep 2012 01:11:36 +0000"  >&lt;p&gt;Dropping priority as this is a test only issue. Once the test correction is finalized we should land it to master and we may also land it to b2_3 if we have another RC&lt;/p&gt;</comment>
                            <comment id="45393" author="yujian" created="Sat, 22 Sep 2012 08:34:31 +0000"  >&lt;blockquote&gt;&lt;p&gt;I just updated &lt;a href=&quot;http://review.whamcloud.com/#change,4002&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4002&lt;/a&gt; with &quot;ostsizegb=6&quot; to see whether the recovery-mds-scale test will hit the out of space error or not.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Hi Chris,&lt;br/&gt;
Could you please check whether &quot;ostsizegb&quot; test parameter works or not? In the latest run for the above patch, although I set &quot;ostsizegb=6&quot;, I still found &quot;OSTSIZE=4089446&quot; was used.&lt;/p&gt;</comment>
                            <comment id="45499" author="yujian" created="Tue, 25 Sep 2012 04:54:04 +0000"  >&lt;p&gt;Lustre Version: v2_3_0_RC1&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_3/24&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_3/24&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL5.8/x86_64(client), RHEL6.3/x86_64(server)&lt;/p&gt;

&lt;p&gt;recovery-mds-scale test failed with out of space issue again:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/50d94020-068b-11e2-9e80-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/50d94020-068b-11e2-9e80-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="45501" author="hongchao.zhang" created="Tue, 25 Sep 2012 05:52:05 +0000"  >&lt;p&gt;what is the OSTSIZE for this test? the dd stopped at about 1.7G.&lt;/p&gt;</comment>
                            <comment id="45502" author="yujian" created="Tue, 25 Sep 2012 06:15:47 +0000"  >&lt;blockquote&gt;&lt;p&gt;what is the OSTSIZE for this test? the dd stopped at about 1.7G.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;From the above Maloo report, we can enter &quot;go to session&quot; and find the autotest configuration info in lustre-initialization-1:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;export OSTSIZE=2097152
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="45503" author="hongchao.zhang" created="Tue, 25 Sep 2012 06:23:24 +0000"  >&lt;p&gt;oh, thanks!&lt;/p&gt;

&lt;p&gt;OSTSIZE is 2G, and it&apos;s the same situation of insufficient disk space at OST&lt;/p&gt;</comment>
                            <comment id="47432" author="yujian" created="Tue, 6 Nov 2012 03:11:39 +0000"  >&lt;p&gt;Mark this ticket as a blocker since it&apos;s blocking the recovery-*-scale tests.&lt;/p&gt;</comment>
                            <comment id="47433" author="hongchao.zhang" created="Tue, 6 Nov 2012 05:57:28 +0000"  >&lt;p&gt;how about adapting the data size in &apos;dd&apos; to the actual free space in Luste then, will try to create a patch for it&lt;/p&gt;</comment>
                            <comment id="47713" author="hongchao.zhang" created="Mon, 12 Nov 2012 22:46:40 +0000"  >&lt;p&gt;status update&lt;/p&gt;

&lt;p&gt;this should be a TT ticket, and the patch is under creation and test, will attach it soon&lt;/p&gt;</comment>
                            <comment id="47930" author="hongchao.zhang" created="Fri, 16 Nov 2012 09:18:53 +0000"  >&lt;p&gt;the patch is tracked at &lt;a href=&quot;http://review.whamcloud.com/#change,4599&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4599&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="48974" author="hongchao.zhang" created="Mon, 10 Dec 2012 09:17:40 +0000"  >&lt;p&gt;during testing the patch in a loop (with small OSTs), I found it will fail for -ENOSPC after several loops, and it seems to be related to grant...&lt;br/&gt;
will spend some more time to trace where the problem is.&lt;/p&gt;

&lt;p&gt;btw, this is not the same issue with this ticket.&lt;/p&gt;</comment>
                            <comment id="49282" author="hongchao.zhang" created="Sun, 16 Dec 2012 22:19:28 +0000"  >&lt;p&gt;there is a bug in osd_trans_stop, which use thandle-&amp;gt;th_sync after &quot;dt_txn_hook_stop&quot;, which could change the value,&lt;/p&gt;

&lt;p&gt;int ofd_txn_stop_cb(const struct lu_env *env, struct thandle *txn,&lt;br/&gt;
                    void *cookie)&lt;br/&gt;
{   &lt;br/&gt;
        ...&lt;/p&gt;

&lt;p&gt;        /* if can&apos;t add callback, do sync write */&lt;br/&gt;
        txn-&amp;gt;th_sync = !!tgt_last_commit_cb_add(txn, &amp;amp;ofd-&amp;gt;ofd_lut,&lt;br/&gt;
                                                info-&amp;gt;fti_exp,&lt;br/&gt;
                                                info-&amp;gt;fti_transno);&lt;/p&gt;

&lt;p&gt;        return ofd_last_rcvd_update(info, txn);&lt;br/&gt;
}&lt;/p&gt;

&lt;p&gt;then the transactions won&apos;t be committed and caused the issue(the client can&apos;t use just deleted disk space, &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-456&quot; title=&quot;statfs reports truncated blocks as freed while they are not&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-456&quot;&gt;&lt;del&gt;LU-456&lt;/del&gt;&lt;/a&gt;)&lt;/p&gt;</comment>
                            <comment id="49316" author="tappro" created="Mon, 17 Dec 2012 12:10:10 +0000"  >&lt;p&gt;the transaction will be committed in any case or did you mean &apos;won&apos;t be forced to commit&apos;? &lt;br/&gt;
This is not bug in osd_trans_stop, the th_sync must be used exactly after hook_stop call just because hook may set &apos;sync&apos; flag. The bug is in ofd_txn_stop_cb() which shouldn&apos;t rewrite th_sync but set it using &apos;OR&apos; logic operation: txn-&amp;gt;th_sync |= !!tgt_last_commit_cb_add(...).&lt;/p&gt;</comment>
                            <comment id="49477" author="hongchao.zhang" created="Thu, 20 Dec 2012 03:52:13 +0000"  >&lt;p&gt;Yes, thanks, the patch has been updated accordingly.&lt;/p&gt;</comment>
                            <comment id="50492" author="jlevi" created="Tue, 15 Jan 2013 13:48:22 +0000"  >&lt;p&gt;Patch landed. Closing ticket. Please reopen if more work is needed.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="15676">LU-1824</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="10492" name="recovery-mds-scale.1316453521.log.tar.bz2" size="3196214" author="yujian" created="Mon, 19 Sep 2011 22:40:22 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv4cv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4239</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>