Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10522

recovery-random-scale test_fail_client_mds: test_fail_client_mds returned 4

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Lustre 2.10.3
    • None
    • Failover
      Client/Server: 2.10.3 RC1
      b2_10, build 68
    • 3
    • 9223372036854775807

    Description

      recovery-random-scale test_fail_client_mds - test_fail_client_mds returned 4
      ^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/b02af984-f65d-11e7-94c7-52540065bddc

      test_fail_client_mds failed with the following error:

      test_fail_client_mds returned 4
      

      Test logs:

      ==== Checking the clients loads BEFORE failover -- failure NOT OK              ELAPSED=3962 DURATION=86400 PERIOD=1200
      10:34:00 (1515580440) waiting for onyx-41vm3 network 5 secs ...
      10:34:00 (1515580440) network interface is UP
      CMD: onyx-41vm3 rc=0;
      			val=\$(/usr/sbin/lctl get_param -n catastrophe 2>&1);
      			if [[ \$? -eq 0 && \$val -ne 0 ]]; then
      				echo \$(hostname -s): \$val;
      				rc=\$val;
      			fi;
      			exit \$rc
      CMD: onyx-41vm3 ps auxwww | grep -v grep | grep -q run_dd.sh
      Client load failed on node onyx-41vm3, rc=1
      2018-01-10 10:34:31 Terminating clients loads ...
      Duration:               86400
      Server failover period: 1200 seconds
      Exited after:           3962 seconds
      Number of failovers before exit:
      mds1 failed over 4 times
      Status: FAIL: rc=4
      CMD: onyx-41vm3,onyx-41vm4 test -f /tmp/client-load.pid &&
              { kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; }
      onyx-41vm3: sh: line 1: kill: (8054) - No such process
      

      run_tar_debug.onyx-41vm4.log

      tar: etc/ssl: Cannot stat: No such file or directory
      tar: etc/systemd/system/getty.target.wants: Cannot stat: No such file or directory
      tar: etc/systemd/system/sockets.target.wants: Cannot stat: No such file or directory
      tar: etc/systemd/system/multi-user.target.wants: Cannot stat: No such file or directory
      tar: etc/systemd/system/sysinit.target.wants: Cannot stat: No such file or directory
      tar: etc/systemd/system/dev-virtio\\x2dports-org.qemu.guest_agent.0.device.wants: Cannot stat: No such file or directory
      tar: etc/systemd/system/remote-fs.target.wants: Cannot stat: No such file or directory
      tar: etc/systemd/system/basic.target.wants: Cannot stat: No such file or directory
      tar: etc/systemd/system/default.target.wants: Cannot stat: No such file or directory
      tar: etc/systemd/system: Cannot stat: No such file or directory
      tar: etc/systemd: Cannot stat: No such file or directory
      tar: etc/rc.d/rc1.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc3.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc2.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc4.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc0.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc5.d: Cannot stat: No such file or directory
      tar: etc/rc.d/rc6.d: Cannot stat: No such file or directory
      tar: etc/rc.d: Cannot stat: No such file or directory
      tar: etc/alternatives: Cannot stat: No such file or directory
      tar: Exiting with failure status due to previous errors
      

      Attachments

        Issue Links

          Activity

            [LU-10522] recovery-random-scale test_fail_client_mds: test_fail_client_mds returned 4
            pjones Peter Jones added a comment -

            Typo in Jira reference so closed in error but maybe that is ok because it is an old failure not reported for a long time...

            pjones Peter Jones added a comment - Typo in Jira reference so closed in error but maybe that is ok because it is an old failure not reported for a long time...
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55226/
            Subject: LU-10522 utils: new --mindepth for lfs find
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 919f2c921866b8ab1f9cad3fe8d86eee7924e85d

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55226/ Subject: LU-10522 utils: new --mindepth for lfs find Project: fs/lustre-release Branch: master Current Patch Set: Commit: 919f2c921866b8ab1f9cad3fe8d86eee7924e85d

            "Maximilian Dilger <mdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55226
            Subject: LU-10522 utils: new --mindepth for lfs find
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: cd30a9d9399154c88f7115ec9ffd9e26bfeaa66c

            gerrit Gerrit Updater added a comment - "Maximilian Dilger <mdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55226 Subject: LU-10522 utils: new --mindepth for lfs find Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cd30a9d9399154c88f7115ec9ffd9e26bfeaa66c

            Found a similar issue with recovery-random-scale test set on 2.12.8: https://testing.whamcloud.com/test_sets/e735d4c7-0211-48dc-82e6-a6ba45ceb281 

            But return code is different due to it being a different test:

            ...
            Starting client: onyx-112vm10:  -o user_xattr,flock onyx-70vm3:onyx-70vm4:/lustre /mnt/lustre
            CMD: onyx-112vm10 mkdir -p /mnt/lustre
            CMD: onyx-112vm10 mount -t lustre -o user_xattr,flock onyx-70vm3:onyx-70vm4:/lustre /mnt/lustre
            onyx-112vm10: mount.lustre: according to /etc/mtab onyx-70vm3:onyx-70vm4:/lustre is already mounted on /mnt/lustre
            2021-11-20 21:05:50 Terminating clients loads ...
            Duration:               86400
            Server failover period: 1200 seconds
            Exited after:           65095 seconds
            Number of failovers before exit:
            mds1 failed over 55 times
            Status: FAIL: rc=1
            CMD: onyx-112vm10,onyx-112vm9 test -f /tmp/client-load.pid &&
                    { kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; } 
            ...
            tar: etc/pki/tls/certs: Cannot stat: No such file or directory
            tar: etc/pki/tls: Cannot stat: No such file or directory
            tar: etc/pki/java: Cannot stat: No such file or directory
            tar: etc/pki/ca-trust/source: Cannot stat: No such file or directory
            tar: etc/pki/ca-trust: Cannot stat: No such file or directory
            tar: etc/pki: Cannot stat: No such file or directory
            tar: etc/ssl: Cannot stat: No such file or directory
            tar: etc/pam.d: Cannot stat: No such file or directory
            tar: etc/rc.d/rc0.d: Cannot stat: No such file or directory
            tar: etc/rc.d/rc6.d: Cannot stat: No such file or directory
            tar: etc/rc.d/rc1.d: Cannot stat: No such file or directory
            tar: etc/rc.d/rc4.d: Cannot stat: No such file or directory
            tar: etc/rc.d/rc5.d: Cannot stat: No such file or directory
            tar: etc/rc.d/rc3.d: Cannot stat: No such file or directory
            tar: etc/rc.d/rc2.d: Cannot stat: No such file or directory
            tar: etc/rc.d: Cannot stat: No such file or directory
            tar: etc/sysconfig/network-scripts: Cannot stat: No such file or directory
            tar: etc/sysconfig: Cannot stat: No such file or directory
            tar: etc/profile.d: Cannot stat: No such file or directory
            tar: etc/sysctl.d: Cannot stat: No such file or directory
            tar: Exiting with failure status due to previous errors 
            anikitenko Alena Nikitenko (Inactive) added a comment - Found a similar issue with recovery-random-scale test set on 2.12.8: https://testing.whamcloud.com/test_sets/e735d4c7-0211-48dc-82e6-a6ba45ceb281   But return code is different due to it being a different test: ... Starting client: onyx-112vm10: -o user_xattr,flock onyx-70vm3:onyx-70vm4:/lustre /mnt/lustre CMD: onyx-112vm10 mkdir -p /mnt/lustre CMD: onyx-112vm10 mount -t lustre -o user_xattr,flock onyx-70vm3:onyx-70vm4:/lustre /mnt/lustre onyx-112vm10: mount.lustre: according to /etc/mtab onyx-70vm3:onyx-70vm4:/lustre is already mounted on /mnt/lustre 2021-11-20 21:05:50 Terminating clients loads ... Duration: 86400 Server failover period: 1200 seconds Exited after: 65095 seconds Number of failovers before exit: mds1 failed over 55 times Status: FAIL: rc=1 CMD: onyx-112vm10,onyx-112vm9 test -f /tmp/client-load.pid && { kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; } ... tar: etc/pki/tls/certs: Cannot stat: No such file or directory tar: etc/pki/tls: Cannot stat: No such file or directory tar: etc/pki/java: Cannot stat: No such file or directory tar: etc/pki/ca-trust/source: Cannot stat: No such file or directory tar: etc/pki/ca-trust: Cannot stat: No such file or directory tar: etc/pki: Cannot stat: No such file or directory tar: etc/ssl: Cannot stat: No such file or directory tar: etc/pam.d: Cannot stat: No such file or directory tar: etc/rc.d/rc0.d: Cannot stat: No such file or directory tar: etc/rc.d/rc6.d: Cannot stat: No such file or directory tar: etc/rc.d/rc1.d: Cannot stat: No such file or directory tar: etc/rc.d/rc4.d: Cannot stat: No such file or directory tar: etc/rc.d/rc5.d: Cannot stat: No such file or directory tar: etc/rc.d/rc3.d: Cannot stat: No such file or directory tar: etc/rc.d/rc2.d: Cannot stat: No such file or directory tar: etc/rc.d: Cannot stat: No such file or directory tar: etc/sysconfig/network-scripts: Cannot stat: No such file or directory tar: etc/sysconfig: Cannot stat: No such file or directory tar: etc/profile.d: Cannot stat: No such file or directory tar: etc/sysctl.d: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors

            People

              mdilger Max Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: