[LU-10710] parallel-scale test write_disjoint hung Created: 24/Feb/18  Updated: 25/May/18  Resolved: 09/Apr/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

parallel-scale test write_disjoint hung as follows on master branch:

/usr/lib64/lustre/tests/write_disjoint: option requires an argument -- 'm'
random seed: 1519386210
loop 0: chunk_size 13798672
loop 1000: chunk_size 12937343

Maloo reports:
https://testing.hpdd.intel.com/test_sets/83f68bf6-18dd-11e8-bd00-52540065bddc
https://testing.hpdd.intel.com/test_sets/91293d5e-18d4-11e8-bd00-52540065bddc

There are no obvious error messages in console/syslog logs. And the stack trace logs were not gathered (ATM-828).



 Comments   
Comment by James Nunez (Inactive) [ 14/Mar/18 ]

This issue was added by the patch for LU-9409 at https://review.whamcloud.com/27903. The patch adds the –m flag and parameter to write_disjoint()

826  
827         local cmd="$WRITE_DISJOINT -f $testdir/file -n $wdisjoint_REP -m "\ 
828                           "$chunk_size_limit" 
829  
830         echo "+ $cmd"

 

From the suite_log, we can see that the value of the –m flag never makes it to the write_disjoint command and is interpreted as a standalone line

== parallel-scale test write_disjoint: write_disjoint ================================================ 11:43:29 (1519386209)

OPTIONS:

WRITE_DISJOINT=/usr/lib64/lustre/tests/write_disjoint

clients=trevis-3vm1.trevis.hpdd.intel.com,trevis-3vm2

wdisjoint_THREADS=4

wdisjoint_REP=10000

MACHINEFILE=/tmp/parallel-scale.machines

trevis-3vm1.trevis.hpdd.intel.com

trevis-3vm2

/usr/lib64/lustre/tests/functions.sh: line 828: local: `123456': not a valid identifier

+ /usr/lib64/lustre/tests/write_disjoint -f /mnt/lustre/d0.write_disjoint/file -n 10000 –m

 

The problem is the line continuation at the end of line 827; we just need to remove the end quotation on line 827 and first quotation on line 828. The reason why the code doesn’t work as it is, is due to the tab/spaces on line 828.

 

I’ll upload a patch for this.

Comment by Gerrit Updater [ 14/Mar/18 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/31645
Subject: LU-10710 tests: fix run_write_disjoint line continuation
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6ff2c1d3b134be39c1c63511bc3a7b06bc41214e

Comment by Gerrit Updater [ 09/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31645/
Subject: LU-10710 tests: fix run_write_disjoint line continuation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a033bd837329d1eb98d1dd71f4491f1af56a27f0

Comment by James Nunez (Inactive) [ 09/Apr/18 ]

Patch landed to master (future 2.12)

Generated at Sat Feb 10 02:37:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.