<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:42:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11263] sanity-hsm test 300 fails with  &apos;hsm_control state is not &apos;enabled&apos; on mds1&apos;</title>
                <link>https://jira.whamcloud.com/browse/LU-11263</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;sanity-hsm test_300 fails. From the suite_log for &lt;a href=&quot;https://testing.whamcloud.com/test_sets/cb44d736-a0de-11e8-8ee3-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/cb44d736-a0de-11e8-8ee3-52540065bddc&lt;/a&gt;, it looks like vm4 is nt in the &#8216;enabled&#8217; state&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;CMD: trevis-11vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
Update not seen after 20s: wanted &apos;enabled&apos; got &apos;stopped&apos;
 sanity-hsm test_300: @@@@@@ FAIL: hsm_control state is not &apos;enabled&apos; on mds1 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the console logs for MDS1 and MD3, vm4, we see some errors after the first unmount&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[23667.706326] Lustre: DEBUG MARKER: umount /mnt/lustre-mds1
[23667.728001] Lustre: Failing over lustre-MDT0000
[23668.148236] Lustre: server umount lustre-MDT0000 complete
[23668.254268] Lustre: DEBUG MARKER: lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp;
[23668.254268] lctl dl | grep &apos; ST &apos; || true
[23668.528308] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.4.122@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[23668.528313] LustreError: Skipped 46 previous similar messages
[23669.539537] LustreError: 11-0: lustre-MDT0000-osp-MDT0002: operation obd_ping to node 0@lo failed: rc = -107
[23669.539542] LustreError: Skipped 1 previous similar message
[23669.539547] Lustre: lustre-MDT0000-osp-MDT0002: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
[23669.539548] Lustre: Skipped 1 previous similar message
[23678.359170] Lustre: DEBUG MARKER: hostname
[23678.465628] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
[23678.505905] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1
[23678.543360] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
[23678.613251] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[23678.631296] LustreError: 166-1: MGC10.9.4.123@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
[23678.631415] LustreError: 26046:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
[23678.631485] Lustre: MGS: Connection restored to 10.9.4.123@tcp (at 0@lo)
[23678.631486] Lustre: Skipped 27 previous similar messages
[23678.631517] Lustre: Evicted from MGS (at MGC10.9.4.123@tcp_0) after server handle changed from 0x842e27191f8a85d1 to 0x842e27191f8ace0f
[23678.730587] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
[23678.779913] Lustre: lustre-MDT0000: in recovery but waiting for the first client to connect
[23678.812256] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
[23678.850860] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/utils:/opt/iozone/bin:/usr/lib64/lustre/test
[23679.059833] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 7 clients reconnect
[23679.110502] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23679.152771] Lustre: DEBUG MARKER: trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23679.184653] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
[23679.223601] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 				2&amp;gt;/dev/null | grep -E &apos;:[a-zA-Z]{3}[0-9]{4}&apos;
[23679.261448] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 				2&amp;gt;/dev/null | grep -E &apos;:[a-zA-Z]{3}[0-9]{4}&apos;
[23679.304631] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2&amp;gt;/dev/null
[23679.637253] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.641791] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.696035] Lustre: DEBUG MARKER: trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.698094] Lustre: DEBUG MARKER: trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23683.686751] LustreError: 167-0: lustre-MDT0000-lwp-MDT0002: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[23683.789089] Lustre: lustre-MDT0000: Recovery over after 0:05, of 7 clients 7 recovered and 0 were evicted.
[23684.820279] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
[23684.832756] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
[23684.876149] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
[23684.884301] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
[23685.942378] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23685.977910] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23686.095857] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0002.hsm_control
[23686.132021] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0002.hsm_control
[23686.252173] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -P mdt.lustre-MDT0000.hsm_control=enabled
[23686.270502] Lustre: Setting parameter lustre-MDT0000.mdt.lustre-MDT0000.hsm_control in log params
[23686.270512] Lustre: Skipped 3 previous similar messages
[23686.299219] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -P mdt.lustre-MDT0001.hsm_control=enabled
[23686.339891] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -P mdt.lustre-MDT0002.hsm_control=enabled
[23686.380162] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -P mdt.lustre-MDT0003.hsm_control=enabled
[23686.850519] LustreError: 17275:0:(mdt_coordinator.c:1005:mdt_hsm_cdt_start()) lustre-MDT0000: Coordinator already started or stopping
[23706.425954] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23706.466717] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23706.590506] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0002.hsm_control
[23706.631145] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0002.hsm_control
[23706.761098] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1&apos; &apos; /proc/mounts || true
[23706.798299] Lustre: DEBUG MARKER: umount /mnt/lustre-mds1
[23707.172546] Lustre: DEBUG MARKER: lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp;
[23707.172546] lctl dl | grep &apos; ST &apos; || true
[23717.088225] Lustre: 2937:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1534314513/real 1534314513]  req@ffff88004da74c40 x1608820623035712/t0(0) o400-&amp;gt;MGC10.9.4.123@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1534314520 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[23717.088234] Lustre: 2937:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[23717.280798] Lustre: DEBUG MARKER: hostname
[23717.389356] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
[23717.431911] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1
[23717.468621] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
[23717.523820] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[23717.595775] Lustre: MGS: Connection restored to 10.9.4.124@tcp (at 10.9.4.124@tcp)
[23717.595781] Lustre: Skipped 27 previous similar messages
[23723.327601] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
[23723.365896] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/utils:/opt/iozone/bin:/usr/lib64/lustre/test
[23723.621406] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23723.665936] Lustre: DEBUG MARKER: trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23723.695487] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
[23723.732499] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 				2&amp;gt;/dev/null | grep -E &apos;:[a-zA-Z]{3}[0-9]{4}&apos;
[23723.766970] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 				2&amp;gt;/dev/null | grep -E &apos;:[a-zA-Z]{3}[0-9]{4}&apos;
[23723.807809] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2&amp;gt;/dev/null
[23724.110534] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23724.115679] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23724.163653] Lustre: DEBUG MARKER: trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23724.168911] Lustre: DEBUG MARKER: trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23728.297558] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
[23728.298167] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
[23728.354498] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
[23728.355846] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
[23729.415037] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23729.451571] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23730.491135] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23731.531240] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23732.570962] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23733.609891] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23734.648627] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23735.686065] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23736.723799] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23737.762782] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23738.801398] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23739.839530] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23740.878463] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23741.917538] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23742.956405] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23743.994575] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23745.034297] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23746.074069] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23747.111516] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23748.149274] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23749.187205] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23750.223793] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm_control
[23750.331633] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-hsm test_300: @@@@@@ FAIL: hsm_control state is not \&apos;enabled\&apos; on mds1 
[23750.375715] Lustre: DEBUG MARKER: sanity-hsm test_300: @@@@@@ FAIL: hsm_control state is not &apos;enabled&apos; on mds1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Similarly, we have some errors on MDS2 and MDS4, vm5, &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[23670.655388] LustreError: 11-0: lustre-MDT0000-osp-MDT0001: operation obd_ping to node 10.9.4.123@tcp failed: rc = -107
[23670.655392] LustreError: Skipped 3 previous similar messages
[23670.655401] Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 10.9.4.123@tcp) was lost; in progress operations using this service will wait for recovery to complete
[23670.655402] Lustre: Skipped 3 previous similar messages
[23678.733466] Lustre: lustre-MDT0001: Connection restored to 10.9.4.123@tcp (at 10.9.4.123@tcp)
[23678.733473] Lustre: Skipped 6 previous similar messages
[23679.171633] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23679.211468] Lustre: DEBUG MARKER: trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23679.702724] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.703291] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.751953] Lustre: DEBUG MARKER: trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.753448] Lustre: DEBUG MARKER: trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23680.656535] LustreError: 167-0: lustre-MDT0000-lwp-MDT0001: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[23680.656543] LustreError: Skipped 1 previous similar message
[23680.656598] LustreError: 166-1: MGC10.9.4.123@tcp: Connection to MGS (at 10.9.4.123@tcp) was lost; in progress operations using this service will fail
[23680.660167] LustreError: 16750:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
[23680.660984] Lustre: Evicted from MGS (at MGC10.9.4.123@tcp_0) after server handle changed from 0x842e27191f8a85c3 to 0x842e27191f8ad142
[23684.885402] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
[23684.891508] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
[23684.937278] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
[23684.946568] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 5 sec
[23686.078066] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0001.hsm_control
[23686.114999] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0001.hsm_control
[23686.231366] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0003.hsm_control
[23686.270114] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0003.hsm_control
[23689.384702] LustreError: 8711:0:(mdt_coordinator.c:1005:mdt_hsm_cdt_start()) lustre-MDT0001: Coordinator already started or stopping
[23689.384707] LustreError: 8711:0:(mdt_coordinator.c:1005:mdt_hsm_cdt_start()) Skipped 1 previous similar message
[23706.570364] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0001.hsm_control
[23706.607395] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0001.hsm_control
[23706.731616] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0003.hsm_control
[23706.766678] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0003.hsm_control
[23717.651504] Lustre: 2894:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1534314515/real 1534314515]  req@ffff88005b13f040 x1608820615336448/t0(0) o400-&amp;gt;MGC10.9.4.123@tcp@10.9.4.123@tcp:26/25 lens 224/224 e 0 to 1 dl 1534314522 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[23717.656643] Lustre: MGC10.9.4.123@tcp: Connection restored to MGC10.9.4.123@tcp_0 (at 10.9.4.123@tcp)
[23717.656646] Lustre: Skipped 6 previous similar messages
[23723.678167] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23723.720086] Lustre: DEBUG MARKER: trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23724.169208] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23724.181342] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23724.226365] Lustre: DEBUG MARKER: trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23724.233859] Lustre: DEBUG MARKER: trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23728.355778] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
[23728.361315] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
[23728.409092] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
[23728.415257] Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec
[23750.389384] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-hsm test_300: @@@@@@ FAIL: hsm_control state is not \&apos;enabled\&apos; on mds1 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The OSS also has some connection issues&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[23666.783662] Lustre: DEBUG MARKER: == sanity-hsm test 300: On disk coordinator state kept between MDT umount/mount ====================== 23:27:51 (1534314471)
[23668.572258] LustreError: 11-0: lustre-MDT0000-lwp-OST0000: operation obd_ping to node 10.9.4.123@tcp failed: rc = -107
[23668.572275] LustreError: Skipped 7 previous similar messages
[23668.572283] Lustre: lustre-MDT0000-lwp-OST0000: Connection to lustre-MDT0000 (at 10.9.4.123@tcp) was lost; in progress operations using this service will wait for recovery to complete
[23668.572284] Lustre: Skipped 7 previous similar messages
[23678.727567] Lustre: lustre-OST0000: Connection restored to 10.9.4.123@tcp (at 10.9.4.123@tcp)
[23678.727571] Lustre: Skipped 8 previous similar messages
[23679.158294] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23679.203043] Lustre: DEBUG MARKER: trevis-11vm4: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[23679.688823] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.695586] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.741636] Lustre: DEBUG MARKER: trevis-11vm2: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23679.741868] Lustre: DEBUG MARKER: trevis-11vm1: executing wait_import_state_mount FULL mdc.lustre-MDT0000-mdc-*.mds_server_uuid
[23680.570151] Lustre: 3979:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1534314478/real 1534314478]  req@ffff8800739c5c80 x1608820631664480/t0(0) o400-&amp;gt;MGC10.9.4.123@tcp@10.9.4.123@tcp:26/25 lens 224/224 e 0 to 1 dl 1534314485 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[23680.570154] Lustre: 3979:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[23680.570167] LustreError: 166-1: MGC10.9.4.123@tcp: Connection to MGS (at 10.9.4.123@tcp) was lost; in progress operations using this service will fail
[23680.570268] LustreError: 16611:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
[23680.570269] LustreError: 16611:0:(mgc_request.c:603:do_requeue()) Skipped 1 previous similar message
[23680.573305] Lustre: Evicted from MGS (at 10.9.4.123@tcp) after server handle changed from 0x842e27191f8a8935 to 0x842e27191f8ad085
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We&#8217;ve seen two instances of this failure&lt;br/&gt;
2018-08-04 2.10.5 RC1 CentOS7 DNE/ZFS&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/797d69de-97a8-11e8-a9f7-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/797d69de-97a8-11e8-a9f7-52540065bddc&lt;/a&gt;&lt;br/&gt;
2018-08-15 2.10.5 RC2 SLES12 SP3 DNE/ldiskfs &lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/cb44d736-a0de-11e8-8ee3-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/cb44d736-a0de-11e8-8ee3-52540065bddc&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="52999">LU-11263</key>
            <summary>sanity-hsm test 300 fails with  &apos;hsm_control state is not &apos;enabled&apos; on mds1&apos;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                            <label>DNE</label>
                    </labels>
                <created>Thu, 16 Aug 2018 22:31:33 +0000</created>
                <updated>Fri, 12 Aug 2022 22:03:54 +0000</updated>
                            <resolved>Fri, 12 Aug 2022 22:03:54 +0000</resolved>
                                    <version>Lustre 2.10.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="232131" author="jhammond" created="Fri, 17 Aug 2018 15:36:26 +0000"  >&lt;p&gt;This appears to have a similar root cause as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11058&quot; title=&quot;sanity test_77k: checksum(1) != 0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11058&quot;&gt;LU-11058&lt;/a&gt;. The params log is not being processed completely duing the mount of &lt;tt&gt;lustre-MDT0000&lt;/tt&gt;. This is due to the params config log instance being shared globally. During mount of &lt;tt&gt;lustre-MDT0000&lt;/tt&gt;. the second &lt;tt&gt;fail $SINGLEMDS&lt;/tt&gt; we see:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;10000000:01000000:0.0:1534314533.471712:0:26046:0:(mgc_request.c:2076:mgc_process_log()) Process log params:ffff880053253800 from 25

        CDEBUG(D_MGC, &quot;Process log %s:%p from %d\n&quot;, cld-&amp;gt;cld_logname,
               cld-&amp;gt;cld_cfg.cfg_instance, cld-&amp;gt;cld_cfg.cfg_last_idx + 1);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;tt&gt;cfg_last_idx&lt;/tt&gt; is set at the end of &lt;tt&gt;class_config_parse_llog()&lt;/tt&gt;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;int class_config_parse_llog(const struct lu_env *env, struct llog_ctxt *ctxt,
                            char *name, struct config_llog_instance *cfg)

	CDEBUG(D_CONFIG, &quot;Processed log %s gen %d-%d (rc=%d)\n&quot;, name,
	       cd.lpcd_first_idx + 1, cd.lpcd_last_idx, rc);
	if (cfg)
                cfg-&amp;gt;cfg_last_idx = cd.lpcd_last_idx;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="234381" author="simmonsja" created="Thu, 4 Oct 2018 18:57:06 +0000"  >&lt;p&gt;I tried patch &lt;a href=&quot;https://review.whamcloud.com/#/c/33190&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/33190&lt;/a&gt;&#160;with sanity 77k and it fixed the issues. It might resolve this failure as well.&lt;/p&gt;</comment>
                            <comment id="234769" author="simmonsja" created="Thu, 11 Oct 2018 14:46:51 +0000"  >&lt;p&gt;Is this still true?&lt;/p&gt;</comment>
                            <comment id="244664" author="jamesanunez" created="Tue, 26 Mar 2019 14:49:01 +0000"  >&lt;p&gt;We are still seeing this test fail occasionally with this error. For example, here is a recent master failure  &lt;a href=&quot;https://testing.whamcloud.com/test_sets/a9ad9054-42af-11e9-8e92-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/a9ad9054-42af-11e9-8e92-52540065bddc&lt;/a&gt; .&lt;/p&gt;</comment>
                            <comment id="247739" author="hornc" created="Sun, 26 May 2019 17:53:18 +0000"  >&lt;p&gt;+1 on master &lt;a href=&quot;https://testing.whamcloud.com/test_sets/d3b0ab58-7dd2-11e9-b8e0-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/d3b0ab58-7dd2-11e9-b8e0-52540065bddc&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="50492">LU-10595</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="52401">LU-11058</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i000w7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>