[LU-12775] conf-sanity test 32c fails with ‘mv remote dir failed’ Created: 17/Sep/19  Updated: 06/Apr/20  Resolved: 01/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.4
Fix Version/s: Lustre 2.14.0, Lustre 2.12.5

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: arm, rhel8

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

conf-sanity test_32c actually fails with ‘test_32c failed with 1’, but if you look for the actual failure in the suite_log, you’ll see

/tmp/t32/mnt/lustre /usr/lib64/lustre/tests
tar: ./striped_dir: file changed as we read it
tar: The following options were used after any non-optional arguments in archive create or update mode.  These options are positional and affect only arguments that follow them.  Please, rearrange them properly.
tar: --exclude './striped_dir' has no effect
tar: --exclude './striped_dir_old' has no effect
tar: --exclude './remote_dir' has no effect
tar: Exiting with failure status due to previous errors
/usr/lib64/lustre/tests
mv: cannot move '/tmp/t32/mnt/lustre/remote_dir' to '/tmp/t32/mnt/lustre/striped_dir/remote_dir': Directory not empty
 conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6103:error_noexit()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2162:t32_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2422:test_32c()

So far, conf-sanity test 32c fails with this error for RHEL 8 testing only.

When this test fails for ARM, we see an additional error coming from the test script

/usr/lib64/lustre/tests
/usr/lib64/lustre/tests/conf-sanity.sh: line 2084: [: !=: unary operator expected
/tmp/t32/mnt/lustre /usr/lib64/lustre/tests
tar: ./striped_dir: file changed as we read it
tar: The following options were used after any non-optional arguments in archive create or update mode.  These options are positional and affect only arguments that follow them.  Please, rearrange them properly.
tar: --exclude './striped_dir' has no effect
tar: --exclude './striped_dir_old' has no effect
tar: --exclude './remote_dir' has no effect
tar: Exiting with failure status due to previous errors
/usr/lib64/lustre/tests
mv: cannot move '/tmp/t32/mnt/lustre/remote_dir' to '/tmp/t32/mnt/lustre/striped_dir/remote_dir': Directory not empty
 conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6103:error_noexit()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2162:t32_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:2422:test_32c()

Looking at the failure at https://testing.whamcloud.com/test_sets/7dfd32bc-d764-11e9-a25b-52540065bddc, we see LNet errors on the MDS1/3 (vm2)

[57884.609121] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.9.5.2@tcp
[57885.165309] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.9.5.2@tcp
[57901.160460] LNetError: 5452:0:(lib-move.c:3044:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-1.2.3.4@tcp: -125
[57901.163143] Lustre: t32fs-OST0000: Connection restored to 524f3afa-6d00-4 (at 10.9.5.2@tcp)
[57901.164531] Lustre: Skipped 22 previous similar messages
[57901.169993] Lustre: t32fs-OST0000: deleting orphan objects from 0x200000400:634 to 0x200000400:705
[57901.450090] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15
[57921.147858] Lustre: t32fs-OST0000: deleting orphan objects from 0x0:642 to 0x0:705
[57921.435042] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9
[57921.998506] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M
[57922.561527] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70
[57923.141130] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_new t32fs.interop
[57929.712016] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.failover.node=10.9.5.2@tcp
[57930.302577] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.mdc.max_rpcs_in_flight=9
[57930.872100] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.lov.stripesize=4M
[57941.162535] LNetError: 5452:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.10.4.134@tcp added to recovery queue. Health = 0
[57949.354803] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug=-1
[57949.932241] Lustre: DEBUG MARKER: test -f /tmp/t32/list
[57950.542349] Lustre: DEBUG MARKER: test -f /tmp/t32/list2
[57951.112651] Lustre: DEBUG MARKER: cat /tmp/t32/list2
[57951.164524] LNetError: 5452:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.10.4.134@tcp added to recovery queue. Health = 0
[57966.150469] LNetError: 5447:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.5.2@tcp added to recovery queue. Health = 900
[57966.152375] LNetError: 5447:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 1 previous similar message
[58005.676808] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed 
[58006.101254] Lustre: DEBUG MARKER: conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed

In the client 2 (vm5) console log we see LNet erros

[57951.613668] Lustre: DEBUG MARKER: == conf-sanity test 32c: dne upgrade test ============================================================ 16:09:03 (1568390943)
[58045.604092] Lustre: Mounted t32fs-client
[58047.520140] LNetError: 11379:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 1.2.3.4@tcp added to recovery queue. Health = 0
[58063.904037] LNetError: 11374:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.4.111@tcp added to recovery queue. Health = 900
[58063.905520] LNetError: 11374:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 1 previous similar message
[58102.192325] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed 

Here are a few links to test logs for recent failures
https://testing.whamcloud.com/test_sets/adbaf1b0-d7aa-11e9-9fc9-52540065bddc
https://testing.whamcloud.com/test_sets/526d634c-d59d-11e9-90ad-52540065bddc
https://testing.whamcloud.com/test_sets/f5da4c64-d2c0-11e9-97d5-52540065bddc
https://testing.whamcloud.com/test_sets/afe82c76-d315-11e9-9fc9-52540065bddc



 Comments   
Comment by Peter Jones [ 18/Sep/19 ]

Lai

Could you please investigate?

Thanks

Peter

Comment by Lai Siyao [ 09/Oct/19 ]
tar: The following options were used after any non-optional arguments in archive create or update mode.  These options are positional and affect only arguments that follow them.  Please, rearrange them properly.
tar: --exclude './striped_dir' has no effect
tar: --exclude './striped_dir_old' has no effect
tar: --exclude './remote_dir' has no effect
tar: Exiting with failure status due to previous errors 

This is because 'tar' in RHEL8 is stricter in option order, which can be fixed, but after this it still fails, I'm looking into it.

Comment by Gerrit Updater [ 03/Dec/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36907
Subject: LU-12775 test: reorder 'tar' command options
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 188bd643f1f3ea76a743745280e28a87c424cefc

Comment by Jian Yu [ 29/Jan/20 ]

The failure also occurred on Lustre b2_12 branch with RHEL 8.1 client:
https://testing.whamcloud.com/test_sets/f7b90e8a-3ecc-11ea-b3fe-52540065bddc

Comment by Gerrit Updater [ 01/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36907/
Subject: LU-12775 test: reorder 'tar' command options
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f3e101a36310c0c2b9d516c09ec0166eb24524d2

Comment by Peter Jones [ 01/Mar/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 02/Mar/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37772
Subject: LU-12775 test: reorder 'tar' command options
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 99910e1d21a2a00306bf60773d01ce06030072e8

Comment by Gerrit Updater [ 06/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37772/
Subject: LU-12775 test: reorder 'tar' command options
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 8b82f77084652f581f48c98dd934e3ba2a8dae89

Generated at Sat Feb 10 02:55:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.