Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12775

conf-sanity test 32c fails with ‘mv remote dir failed’

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.4
    • Fix Version/s: Lustre 2.14.0
    • Labels:
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      conf-sanity test_32c actually fails with ‘test_32c failed with 1’, but if you look for the actual failure in the suite_log, you’ll see

      /tmp/t32/mnt/lustre /usr/lib64/lustre/tests
      tar: ./striped_dir: file changed as we read it
      tar: The following options were used after any non-optional arguments in archive create or update mode.  These options are positional and affect only arguments that follow them.  Please, rearrange them properly.
      tar: --exclude './striped_dir' has no effect
      tar: --exclude './striped_dir_old' has no effect
      tar: --exclude './remote_dir' has no effect
      tar: Exiting with failure status due to previous errors
      /usr/lib64/lustre/tests
      mv: cannot move '/tmp/t32/mnt/lustre/remote_dir' to '/tmp/t32/mnt/lustre/striped_dir/remote_dir': Directory not empty
       conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6103:error_noexit()
        = /usr/lib64/lustre/tests/conf-sanity.sh:2162:t32_test()
        = /usr/lib64/lustre/tests/conf-sanity.sh:2422:test_32c()
      

      So far, conf-sanity test 32c fails with this error for RHEL 8 testing only.

      When this test fails for ARM, we see an additional error coming from the test script

      /usr/lib64/lustre/tests
      /usr/lib64/lustre/tests/conf-sanity.sh: line 2084: [: !=: unary operator expected
      /tmp/t32/mnt/lustre /usr/lib64/lustre/tests
      tar: ./striped_dir: file changed as we read it
      tar: The following options were used after any non-optional arguments in archive create or update mode.  These options are positional and affect only arguments that follow them.  Please, rearrange them properly.
      tar: --exclude './striped_dir' has no effect
      tar: --exclude './striped_dir_old' has no effect
      tar: --exclude './remote_dir' has no effect
      tar: Exiting with failure status due to previous errors
      /usr/lib64/lustre/tests
      mv: cannot move '/tmp/t32/mnt/lustre/remote_dir' to '/tmp/t32/mnt/lustre/striped_dir/remote_dir': Directory not empty
       conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6103:error_noexit()
        = /usr/lib64/lustre/tests/conf-sanity.sh:2162:t32_test()
        = /usr/lib64/lustre/tests/conf-sanity.sh:2422:test_32c()
      

      Looking at the failure at https://testing.whamcloud.com/test_sets/7dfd32bc-d764-11e9-a25b-52540065bddc, we see LNet errors on the MDS1/3 (vm2)

      [57884.609121] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.9.5.2@tcp
      [57885.165309] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.9.5.2@tcp
      [57901.160460] LNetError: 5452:0:(lib-move.c:3044:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-1.2.3.4@tcp: -125
      [57901.163143] Lustre: t32fs-OST0000: Connection restored to 524f3afa-6d00-4 (at 10.9.5.2@tcp)
      [57901.164531] Lustre: Skipped 22 previous similar messages
      [57901.169993] Lustre: t32fs-OST0000: deleting orphan objects from 0x200000400:634 to 0x200000400:705
      [57901.450090] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15
      [57921.147858] Lustre: t32fs-OST0000: deleting orphan objects from 0x0:642 to 0x0:705
      [57921.435042] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9
      [57921.998506] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M
      [57922.561527] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70
      [57923.141130] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_new t32fs.interop
      [57929.712016] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.failover.node=10.9.5.2@tcp
      [57930.302577] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.mdc.max_rpcs_in_flight=9
      [57930.872100] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0001.lov.stripesize=4M
      [57941.162535] LNetError: 5452:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.10.4.134@tcp added to recovery queue. Health = 0
      [57949.354803] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param debug=-1
      [57949.932241] Lustre: DEBUG MARKER: test -f /tmp/t32/list
      [57950.542349] Lustre: DEBUG MARKER: test -f /tmp/t32/list2
      [57951.112651] Lustre: DEBUG MARKER: cat /tmp/t32/list2
      [57951.164524] LNetError: 5452:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.10.4.134@tcp added to recovery queue. Health = 0
      [57966.150469] LNetError: 5447:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.5.2@tcp added to recovery queue. Health = 900
      [57966.152375] LNetError: 5447:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 1 previous similar message
      [58005.676808] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed 
      [58006.101254] Lustre: DEBUG MARKER: conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed
      

      In the client 2 (vm5) console log we see LNet erros

      [57951.613668] Lustre: DEBUG MARKER: == conf-sanity test 32c: dne upgrade test ============================================================ 16:09:03 (1568390943)
      [58045.604092] Lustre: Mounted t32fs-client
      [58047.520140] LNetError: 11379:0:(peer.c:3713:lnet_peer_ni_add_to_recoveryq_locked()) lpni 1.2.3.4@tcp added to recovery queue. Health = 0
      [58063.904037] LNetError: 11374:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.4.111@tcp added to recovery queue. Health = 900
      [58063.905520] LNetError: 11374:0:(lib-msg.c:481:lnet_handle_local_failure()) Skipped 1 previous similar message
      [58102.192325] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_32c: @@@@@@ FAIL: mv remote dir failed 
      

      Here are a few links to test logs for recent failures
      https://testing.whamcloud.com/test_sets/adbaf1b0-d7aa-11e9-9fc9-52540065bddc
      https://testing.whamcloud.com/test_sets/526d634c-d59d-11e9-90ad-52540065bddc
      https://testing.whamcloud.com/test_sets/f5da4c64-d2c0-11e9-97d5-52540065bddc
      https://testing.whamcloud.com/test_sets/afe82c76-d315-11e9-9fc9-52540065bddc

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                laisiyao Lai Siyao
                Reporter:
                jamesanunez James Nunez
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: