[LUDOC-175] Need Guidance on Selecting sgpdd-survey Parameters Created: 22/Aug/13  Updated: 25/Oct/17  Resolved: 25/Oct/17

Status: Resolved
Project: Lustre Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Dan Ferber (Inactive) Assignee: Dan Cobb (Inactive)
Resolution: Low Priority Votes: 0
Labels: None

Rank (Obsolete): 9877

 Description   

It would help customers and partners if we could help add information in the following areas:

1. More info on the parameters
2. Suggestions around selecting the range of the parameters

Thanks



 Comments   
Comment by Dan Ferber (Inactive) [ 22/Aug/13 ]

(from Zhiqi Tao)

These failures are normal. There should be a sgpdd_survey_[date]@[time].detail file in the working directory that shows the failures due to out of memory.

"sg starting in command at "sgp_dd.c":827: Cannot allocate memory"

BTW, I normally put sgpdd-survey into a script and run from there.

Here is what I normally use.

$ cat run_sgpdd-survey.sh
#!/bin/bash

  1. Device List, for example, "/dev/raw/raw1 /dev/raw/raw2 /dev/raw/raw3"
    DEVICE="regal-oss01:/dev/sdh regal-oss01:/dev/sdi regal-oss01:/dev/sdj regal-oss01:/dev/sdk regal-oss01:/dev/sdl regal-oss01:/dev/sdm regal-oss00:/dev/sdh regal-oss00:/dev/sdi regal-oss00:/dev/sdj regal-oss00:/dev/sdk regal-oss00:/dev/sdl regal-oss00:/dev/sdm"
  1. The directory for the test output
    OUTDIR="/root/test_results/sgpdd_survey/"
  1. The test dataset size (MB) for each LUN. The total dataset size must be larger than 2 times of the RAM size in order to avoid the caching.
  2. The calculation of the size
  3. (RAM size * 2) / the number of LUNs
  4. For example, the server RAM is 24GB and there are 5 LUN
  5. the size should be (24GB * 2 ) / 5 ~= 10 GB = 10240 MB
  6. However, it is a good idea to run a very short trial run to make sure the test configuration working properly before scheduling a complete test.
  7. SIZE=100 is a good number for the trial run
  1. SIZE=100
    SIZE=23000
  1. Concurrent Regions
  2. CRGLO=1 - start from 1 concurrent regions
  3. CRGHI=256 - finish at the X number of concurrent regions.
  4. Before starting a complete run, use the following settings for a quick trial run.
    CRGLO=1
    CRGHI=256
  1. Thread count
  2. THRLO=1
  3. THRHI=4096
  4. Before starting a complete run, use the following settings for a quick trial run.
    THRLO=1
    THRHI=4096

mkdir -p ${OUTDIR}
ssh regal-oss00 mkdir -p ${OUTDIR}
echo crglo=$CRGLO crghi=$CRGHI thrlo=$THRLO thrhi=$THRHI size=$SIZE rslt_loc=${OUTDIR} scsidevs="$DEVICE" ./sgpdd-survey
crglo=$CRGLO crghi=$CRGHI thrlo=$THRLO thrhi=$THRHI size=$SIZE rslt_loc=${OUTDIR} scsidevs="$DEVICE" ./sgpdd-survey

I used the enhanced sgpdd-survey from https://jira.hpdd.intel.com/browse/LU-2043 According to the ticket, It should be in 2.4 now.

Comment by Dan Ferber (Inactive) [ 22/Aug/13 ]

I received the same error... The root cause is the sg device. If you try to run sgp-dd against the sd device, you don't receive any errors, but if you try (as sgpddsurvey does) the sg device you receive the memory error. My workaround (per Zhiqi suggestion) was using raw devices.

Gabriele Paciucci

Generated at Sat Feb 10 03:40:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.