[LU-8053] iokit-plot-sgpdd incorectly parses sgpdd-survey log lines with failed tests Created: 21/Apr/16  Updated: 17/Mar/20  Resolved: 17/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andrew Uselton (Inactive) Assignee: Emoly Liu
Resolution: Low Priority Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When one runs sgpdd-survey, the default behavior is to run with an increasing 'thr' thread count up to 4096 threads. When thread count gets to 256 the underlying sgp_dd consistently fails some or all of its multiple instances (and multiple pthreads per instance). The failure is an ENOMEM returned to sgp_dd from its 'write()' system call to the scsi device /dev/sg*'. sgpdd-survey notes the failure and reports the number of failed thread in its output.

The problem arises when iokit-plot-sgpdd parses lines like the following with the word 'failed' in them:
...
dev 1 sz 8388608K rsz 1024K crg 16 thr 128 write 1078.83 [ 67.54, 68.02] read 898.01 [ 56.21, 75.40]
dev 1 sz 8388608K rsz 1024K crg 16 thr 256 write 16 failed read 13 failed
...
For a complete such log see https://wiki.hpdd.intel.com/display/~uselton/sgpdd-survey. iokit-plot-sgpdd does look for such lines, but it does not process them correctly. Instead of ignoring that data it reformats the line to make the failed thread count numbers ("write 16 failed", read 13 failed") look like regular data rate entries for the given experiments. This results in bogus data files for the gnuplot command. For an example of the incorrect graph and what it should look like see the above wiki entry.

I have a rudimentary modification that fixes the issue and am using it in production on the spirit cluster. A better solution is probably needed, rather than my minimal hack.
spirit cluster: /scratch/perf-base-tests/sgpdd/bin/iokit-plot-sgpdd-mod


Generated at Sat Feb 10 02:14:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.