[LU-11171] parallel-scale-nfs* running racer against wrong directory Created: 24/Jul/18  Updated: 09/Aug/18  Resolved: 09/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: tests

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for James Nunez <james.a.nunez@intel.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c386aaae-8b55-11e8-9028-52540065bddc

Since the patch for LU-11045 test: use provided directory in racer/racer.sh, commit d6a3908fee28, landed, we see parallel-scale-nfsv3 and parallel-scale-nfv4 test suites failing. racer uses the first input variable as the directory to run racer in, parallel-scale-nfs’ routine racer_on_nfs() is not calling racer correctly.

 127 test_racer_on_nfs() {
 128         $racer $CLIENTS
 129 }
 130 run_test racer_on_nfs "racer on NFS client"

We need to call racer with $TESTDIR.

parallel-scale-nfsv3 and parallel-scale-nfv4 test suites started failing on 2018-07-19 with no individual test failing, but the whole test suite is marked as failed. I think racer_on_nfs() calling racer with no directory is what is causing these failures.

In the suite_log for https://testing.whamcloud.com/test_sets/75546c6e-8dd4-11e8-8ee3-52540065bddc, the only signs of a failure are

layout: raid0 raid0 pfl pfl pfl flr flr flr
layout: raid0 raid0 pfl pfl pfl flr flr flr
layout: raid0 raid0 pfl pfl pfl flr flr flr
 racer cleanup
./file_create.sh: line 1: kill: (3030) - No such process
./file_create.sh: line 1: kill: (3044) - No such process
./file_create.sh: line 1: kill: (3045) - No such process
  Trace dump:
  = ./file_create.sh:1:main()
parallel-scale-nfsv4: FAIL: test-framework exiting on error
  Trace dump:
  = ./file_create.sh:1:main()
parallel-scale-nfsv4: FAIL: test-framework exiting on error
  Trace dump:
  = ./file_create.sh:1:main()
parallel-scale-nfsv4: FAIL: test-framework exiting on error

In the output for racer_on_nfs(), we also see that we are using the wrong file system for this test

there should be NO racer processes:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/vda1       20511312 2448832  16997520  13% /
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.

Logs for more of these failures are at
https://testing.whamcloud.com/test_sets/5b0dee6e-8b6f-11e8-b0aa-52540065bddc
https://testing.whamcloud.com/test_sets/8d0547e8-8d61-11e8-87f3-52540065bddc



 Comments   
Comment by Gerrit Updater [ 25/Jul/18 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32880
Subject: LU-11171 tests: set parameters for racer_on_nfs
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f6d191751b79369202e268f537a25b453ab8775d

Comment by Gerrit Updater [ 09/Aug/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32880/
Subject: LU-11171 tests: set parameters for racer_on_nfs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: be189ac74726415c0859c3001e3f1cfde7542864

Comment by James Nunez (Inactive) [ 09/Aug/18 ]

Landed for Lustre 2.12

Generated at Sat Feb 10 02:41:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.