Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8544

recovery-double-scale test_pairwise_fail: start client on trevis-54vm5 failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/72a0fd32-6033-11e6-aa74-5254006e85c2.

      The sub-test test_pairwise_fail failed with the following error:

      start client on trevis-54vm5 failed
      

      test logs:

      CMD: trevis-54vm5 test -f /tmp/client-load.pid &&
              { kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; }
      + pm -h powerman --off trevis-54vm5
      Command completed successfully
      + pm -h powerman --on trevis-54vm5
      Command completed successfully
      14:43:16 (1470926596) waiting for trevis-54vm5 network 900 secs ...
      waiting ping -c 1 -w 3 trevis-54vm5, 895 secs left ...
      waiting ping -c 1 -w 3 trevis-54vm5, 890 secs left ...
      waiting ping -c 1 -w 3 trevis-54vm5, 885 secs left ...
      waiting ping -c 1 -w 3 trevis-54vm5, 880 secs left ...
      14:43:48 (1470926628) network interface is UP
      CMD: trevis-54vm5 hostname
      pdsh@trevis-54vm1: trevis-54vm5: mcmd: connect failed: Connection refused
      CMD: trevis-54vm5 hostname
      Reintegrating trevis-54vm5
      Starting client: trevis-54vm5:  -o user_xattr,flock trevis-54vm7:trevis-54vm3:/lustre /mnt/lustre
      CMD: trevis-54vm5 mkdir -p /mnt/lustre
      CMD: trevis-54vm5 mount -t lustre -o user_xattr,flock trevis-54vm7:trevis-54vm3:/lustre /mnt/lustre
      CMD: trevis-54vm5 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all -lnet -lnd -pinger\" 4 
      trevis-54vm5: stat: cannot read file system information for ‘/mnt/lustre’: Input/output error
       recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: start client on trevis-54vm5 failed
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4804:error()
        = /usr/lib64/lustre/tests/recovery-double-scale.sh:72:reboot_recover_node()
        = /usr/lib64/lustre/tests/recovery-double-scale.sh:160:failover_pair()
        = /usr/lib64/lustre/tests/recovery-double-scale.sh:251:test_pairwise_fail()
        = /usr/lib64/lustre/tests/test-framework.sh:5068:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5107:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4954:run_test()
        = /usr/lib64/lustre/tests/recovery-double-scale.sh:303:main()
      

      Attachments

        Issue Links

          Activity

            [LU-8544] recovery-double-scale test_pairwise_fail: start client on trevis-54vm5 failed

            Hi Bruno,
            I will check it and update its status once I found something.

            hongchao.zhang Hongchao Zhang added a comment - Hi Bruno, I will check it and update its status once I found something.
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22459/
            Subject: LU-8544 test: using lfs df in client_up
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 72ec6eb3c74c85f54277aadfd9b83167ea8e81ec

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22459/ Subject: LU-8544 test: using lfs df in client_up Project: fs/lustre-release Branch: master Current Patch Set: Commit: 72ec6eb3c74c85f54277aadfd9b83167ea8e81ec

            Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/22459
            Subject: LU-8544 test: using lfs df in client_up
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7438b526bb79b5138fb51cd1ed58eadc1bbeab26

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/22459 Subject: LU-8544 test: using lfs df in client_up Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7438b526bb79b5138fb51cd1ed58eadc1bbeab26

            the problem could be related to LU-7759, and the default LL_SBI_LAZYSTATFS cause the "df" fail to wait the recovery to finish.

            00000080:00000004:0.0:1472256895.043480:0:3882:0:(obd_class.h:1166:obd_statfs_async()) lustre-clilov-ffff880037e6c000: osfs ffff88007aa25210 age 4294647497, max_age 4294916319
            00020000:00080000:0.0:1472256895.043483:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 0 inactive
            00020000:00080000:0.0:1472256895.043484:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 1 inactive
            00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 2 inactive
            00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 3 inactive
            00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 4 inactive
            00020000:00080000:0.0:1472256895.043486:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 5 inactive
            00020000:00080000:0.0:1472256895.043486:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 6 inactive
            00000080:00020000:0.0:1472256895.043488:0:3882:0:(llite_lib.c:1890:ll_statfs_internal()) obd_statfs fails: rc = -5
            
            hongchao.zhang Hongchao Zhang added a comment - the problem could be related to LU-7759 , and the default LL_SBI_LAZYSTATFS cause the "df" fail to wait the recovery to finish. 00000080:00000004:0.0:1472256895.043480:0:3882:0:(obd_class.h:1166:obd_statfs_async()) lustre-clilov-ffff880037e6c000: osfs ffff88007aa25210 age 4294647497, max_age 4294916319 00020000:00080000:0.0:1472256895.043483:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 0 inactive 00020000:00080000:0.0:1472256895.043484:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 1 inactive 00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 2 inactive 00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 3 inactive 00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 4 inactive 00020000:00080000:0.0:1472256895.043486:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 5 inactive 00020000:00080000:0.0:1472256895.043486:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 6 inactive 00000080:00020000:0.0:1472256895.043488:0:3882:0:(llite_lib.c:1890:ll_statfs_internal()) obd_statfs fails: rc = -5

            Looking at test failures in Maloo, I see that this test started failing with this error message on 2016-07-01. I see failures on both onyx and trevis.

            jamesanunez James Nunez (Inactive) added a comment - Looking at test failures in Maloo, I see that this test started failing with this error message on 2016-07-01. I see failures on both onyx and trevis.
            pjones Peter Jones added a comment -

            Hongchao

            Could you please advise on this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Hongchao Could you please advise on this one? Thanks Peter

            This issue was first seen for master on 2016-07-08 for build# 3405, Tag 2.8.55, Lustre version: 2.8.55.27.geb2657a
            https://testing.hpdd.intel.com/test_sets/e5a03c8e-4568-11e6-80b9-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - This issue was first seen for master on 2016-07-08 for build# 3405, Tag 2.8.55, Lustre version: 2.8.55.27.geb2657a https://testing.hpdd.intel.com/test_sets/e5a03c8e-4568-11e6-80b9-5254006e85c2

            This issue has been seen around 40 times in past 30 days overall.

            standan Saurabh Tandan (Inactive) added a comment - This issue has been seen around 40 times in past 30 days overall.

            People

              hongchao.zhang Hongchao Zhang
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: