[LU-12775] conf-sanity test 32c fails with ‘mv remote dir failed’ Created: 17/Sep/19 Updated: 06/Apr/20 Resolved: 01/Mar/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.4 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.5 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | arm, rhel8 | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
conf-sanity test_32c actually fails with ‘test_32c failed with 1’, but if you look for the actual failure in the suite_log, you’ll see /tmp/t32/mnt/lustre /usr/lib64/lustre/tests tar: ./striped_dir: file changed as we read it tar: The following options were used after any non-optional arguments in archive create or update mode. These options are positional and affect only arguments that follow them. Please, rearrange them properly. tar: --exclude './striped_dir' has no effect tar: --exclude './striped_dir_old' has no effect tar: --exclude './remote_dir' has no effect tar: Exiting with failure status due to previous errors /usr/lib64/lustre/tests mv: cannot move '/tmp/t32/mnt/lustre/remote_dir' to '/tmp/t32/mnt/lustre/striped_dir/remote_dir': Directory not empty conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6103:error_noexit() = /usr/lib64/lustre/tests/conf-sanity.sh:2162:t32_test() = /usr/lib64/lustre/tests/conf-sanity.sh:2422:test_32c() So far, conf-sanity test 32c fails with this error for RHEL 8 testing only. When this test fails for ARM, we see an additional error coming from the test script /usr/lib64/lustre/tests /usr/lib64/lustre/tests/conf-sanity.sh: line 2084: [: !=: unary operator expected /tmp/t32/mnt/lustre /usr/lib64/lustre/tests tar: ./striped_dir: file changed as we read it tar: The following options were used after any non-optional arguments in archive create or update mode. These options are positional and affect only arguments that follow them. Please, rearrange them properly. tar: --exclude './striped_dir' has no effect tar: --exclude './striped_dir_old' has no effect tar: --exclude './remote_dir' has no effect tar: Exiting with failure status due to previous errors /usr/lib64/lustre/tests mv: cannot move '/tmp/t32/mnt/lustre/remote_dir' to '/tmp/t32/mnt/lustre/striped_dir/remote_dir': Directory not empty conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6103:error_noexit() = /usr/lib64/lustre/tests/conf-sanity.sh:2162:t32_test() = /usr/lib64/lustre/tests/conf-sanity.sh:2422:test_32c() Looking at the failure at https://testing.whamcloud.com/test_sets/7dfd32bc-d764-11e9-a25b-52540065bddc, we see LNet errors on the MDS1/3 (vm2) [57884.609121] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.9.5.2@tcp [57885.165309] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.9.5.2@tcp [57901.160460] LNetError: 5452:0:(lib-move.c:3044:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-1.2.3.4@tcp: -125 [57901.163143] Lustre: t32fs-OST0000: Connection restored to 524f3afa-6d00-4 (at 10.9.5.2@tcp) [57901.164531] Lustre: Skipped 22 previous similar messages [57901.169993] Lustre: t32fs-OST0000: deleting orphan objects from 0x200000400:634 to 0x200000400:705 [57901.450090] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15 [57921.147858] Lustre: t32fs-OST0000: deleting orphan objects from 0x0:642 to 0x0:705 [57921.435042] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9 [57921.998506] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M [57922.561527] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70 [57923.141130] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_new t32fs.interop [57929.712016] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.failover.node=10.9.5.2@tcp [57930.302577] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.mdc.max_rpcs_in_flight=9 [57930.872100] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.lov.stripesize=4M [57941.162535] LNetError: 5452:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.10.4.134@tcp added to recovery queue. Health = 0 [57949.354803] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug=-1 [57949.932241] Lustre: DEBUG MARKER: test -f /tmp/t32/list [57950.542349] Lustre: DEBUG MARKER: test -f /tmp/t32/list2 [57951.112651] Lustre: DEBUG MARKER: cat /tmp/t32/list2 [57951.164524] LNetError: 5452:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.10.4.134@tcp added to recovery queue. Health = 0 [57966.150469] LNetError: 5447:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.5.2@tcp added to recovery queue. Health = 900 [57966.152375] LNetError: 5447:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 1 previous similar message [58005.676808] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed [58006.101254] Lustre: DEBUG MARKER: conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed In the client 2 (vm5) console log we see LNet erros [57951.613668] Lustre: DEBUG MARKER: == conf-sanity test 32c: dne upgrade test ============================================================ 16:09:03 (1568390943) [58045.604092] Lustre: Mounted t32fs-client [58047.520140] LNetError: 11379:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 1.2.3.4@tcp added to recovery queue. Health = 0 [58063.904037] LNetError: 11374:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.4.111@tcp added to recovery queue. Health = 900 [58063.905520] LNetError: 11374:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 1 previous similar message [58102.192325] Lustre: DEBUG MARKER: /usr/sbin/lctl mark conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed Here are a few links to test logs for recent failures |
| Comments |
| Comment by Peter Jones [ 18/Sep/19 ] |
|
Lai Could you please investigate? Thanks Peter |
| Comment by Lai Siyao [ 09/Oct/19 ] |
tar: The following options were used after any non-optional arguments in archive create or update mode. These options are positional and affect only arguments that follow them. Please, rearrange them properly. tar: --exclude './striped_dir' has no effect tar: --exclude './striped_dir_old' has no effect tar: --exclude './remote_dir' has no effect tar: Exiting with failure status due to previous errors This is because 'tar' in RHEL8 is stricter in option order, which can be fixed, but after this it still fails, I'm looking into it. |
| Comment by Gerrit Updater [ 03/Dec/19 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36907 |
| Comment by Jian Yu [ 29/Jan/20 ] |
|
The failure also occurred on Lustre b2_12 branch with RHEL 8.1 client: |
| Comment by Gerrit Updater [ 01/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36907/ |
| Comment by Peter Jones [ 01/Mar/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 02/Mar/20 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37772 |
| Comment by Gerrit Updater [ 06/Apr/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37772/ |