Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.4
-
3
-
9223372036854775807
Description
conf-sanity test_32c actually fails with ‘test_32c failed with 1’, but if you look for the actual failure in the suite_log, you’ll see
/tmp/t32/mnt/lustre /usr/lib64/lustre/tests tar: ./striped_dir: file changed as we read it tar: The following options were used after any non-optional arguments in archive create or update mode. These options are positional and affect only arguments that follow them. Please, rearrange them properly. tar: --exclude './striped_dir' has no effect tar: --exclude './striped_dir_old' has no effect tar: --exclude './remote_dir' has no effect tar: Exiting with failure status due to previous errors /usr/lib64/lustre/tests mv: cannot move '/tmp/t32/mnt/lustre/remote_dir' to '/tmp/t32/mnt/lustre/striped_dir/remote_dir': Directory not empty conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6103:error_noexit() = /usr/lib64/lustre/tests/conf-sanity.sh:2162:t32_test() = /usr/lib64/lustre/tests/conf-sanity.sh:2422:test_32c()
So far, conf-sanity test 32c fails with this error for RHEL 8 testing only.
When this test fails for ARM, we see an additional error coming from the test script
/usr/lib64/lustre/tests /usr/lib64/lustre/tests/conf-sanity.sh: line 2084: [: !=: unary operator expected /tmp/t32/mnt/lustre /usr/lib64/lustre/tests tar: ./striped_dir: file changed as we read it tar: The following options were used after any non-optional arguments in archive create or update mode. These options are positional and affect only arguments that follow them. Please, rearrange them properly. tar: --exclude './striped_dir' has no effect tar: --exclude './striped_dir_old' has no effect tar: --exclude './remote_dir' has no effect tar: Exiting with failure status due to previous errors /usr/lib64/lustre/tests mv: cannot move '/tmp/t32/mnt/lustre/remote_dir' to '/tmp/t32/mnt/lustre/striped_dir/remote_dir': Directory not empty conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6103:error_noexit() = /usr/lib64/lustre/tests/conf-sanity.sh:2162:t32_test() = /usr/lib64/lustre/tests/conf-sanity.sh:2422:test_32c()
Looking at the failure at https://testing.whamcloud.com/test_sets/7dfd32bc-d764-11e9-a25b-52540065bddc, we see LNet errors on the MDS1/3 (vm2)
[57884.609121] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.9.5.2@tcp [57885.165309] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.9.5.2@tcp [57901.160460] LNetError: 5452:0:(lib-move.c:3044:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-1.2.3.4@tcp: -125 [57901.163143] Lustre: t32fs-OST0000: Connection restored to 524f3afa-6d00-4 (at 10.9.5.2@tcp) [57901.164531] Lustre: Skipped 22 previous similar messages [57901.169993] Lustre: t32fs-OST0000: deleting orphan objects from 0x200000400:634 to 0x200000400:705 [57901.450090] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15 [57921.147858] Lustre: t32fs-OST0000: deleting orphan objects from 0x0:642 to 0x0:705 [57921.435042] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9 [57921.998506] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M [57922.561527] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70 [57923.141130] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_new t32fs.interop [57929.712016] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.failover.node=10.9.5.2@tcp [57930.302577] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.mdc.max_rpcs_in_flight=9 [57930.872100] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.lov.stripesize=4M [57941.162535] LNetError: 5452:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.10.4.134@tcp added to recovery queue. Health = 0 [57949.354803] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug=-1 [57949.932241] Lustre: DEBUG MARKER: test -f /tmp/t32/list [57950.542349] Lustre: DEBUG MARKER: test -f /tmp/t32/list2 [57951.112651] Lustre: DEBUG MARKER: cat /tmp/t32/list2 [57951.164524] LNetError: 5452:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.10.4.134@tcp added to recovery queue. Health = 0 [57966.150469] LNetError: 5447:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.5.2@tcp added to recovery queue. Health = 900 [57966.152375] LNetError: 5447:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 1 previous similar message [58005.676808] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed [58006.101254] Lustre: DEBUG MARKER: conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed
In the client 2 (vm5) console log we see LNet erros
[57951.613668] Lustre: DEBUG MARKER: == conf-sanity test 32c: dne upgrade test ============================================================ 16:09:03 (1568390943) [58045.604092] Lustre: Mounted t32fs-client [58047.520140] LNetError: 11379:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 1.2.3.4@tcp added to recovery queue. Health = 0 [58063.904037] LNetError: 11374:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.4.111@tcp added to recovery queue. Health = 900 [58063.905520] LNetError: 11374:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 1 previous similar message [58102.192325] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed
Here are a few links to test logs for recent failures
https://testing.whamcloud.com/test_sets/adbaf1b0-d7aa-11e9-9fc9-52540065bddc
https://testing.whamcloud.com/test_sets/526d634c-d59d-11e9-90ad-52540065bddc
https://testing.whamcloud.com/test_sets/f5da4c64-d2c0-11e9-97d5-52540065bddc
https://testing.whamcloud.com/test_sets/afe82c76-d315-11e9-9fc9-52540065bddc