[LU-16861] Janitor Testing Fails to copy latest obdfilter-survey (Uses old obdfilter-survey) Created: 01/Jun/23 Updated: 18/Jan/24 Resolved: 18/Jan/24 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Arshad Hussain | Assignee: | Arshad Hussain |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client: 4.18.0-372.9.1.el8(8.5) |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Testing/Fixing Here are the few revelant logs. Let me know if more information is requied. All logs are under https://review.whamcloud.com/c/fs/lustre-release/+/51035) CASE 1: This is modfied to use specific path for obdfilter-survey to make janitor passing. This is the obdfilter-survey janitor always uses ls -ali /usr/bin/obdfilter-survey 920857 -rwxr-xr-x. 1 root root 16279 Jun 4 2016 /usr/bin/obdfilter-survey This is the obdfilter-survey which it is supposed to used. (There are changes withing this code -which maloo correctly picks it up). Please notice the binary size and date stamp for both. ls -ali /home/green/git/lustre-release/lustre/../lustre-iokit/obdfilter-survey total 42 ... 2278 -rwxr-xr-x 1 green green 15632 May 31 07:47 obdfilter-survey ...
test_1a under obdfilter-survey.sh was modifed with to use specfic OBDSURVEY instead of generic system path version (which is /usr/bin/obdfilter-survey). When this is done the test passes janitor. Else it fails. export PATH=$PATH:/home/green/git/lustre-release/lustre/../lustre-iokit/obdfilter-survey OBDSURVEY=/home/green/git/lustre-release/lustre/../lustre-iokit/obdfilter-survey/obdfilter-survey obdflter_survey_run disk CASE 2: The fail case This uses old /usr/bin/obdfilter-survey and new changes are not reflected.
<snip> + eval NETTYPE=tcp thrlo=2 nobjhi=1 thrhi=4 size=1024 case=disk rslt_loc=/tmp 'targets="192.168.203.104:lustre-OST0000' '192.168.203.104:lustre-OST0001"' /usr/bin/obdfilter-survey ++ NETTYPE=tcp ++ thrlo=2 ++ nobjhi=1 ++ thrhi=4 ++ size=1024 ++ case=disk ++ rslt_loc=/tmp ++ targets='192.168.203.104:lustre-OST0000 192.168.203.104:lustre-OST0001' ++ /usr/bin/obdfilter-survey Warning: Permanently added '192.168.203.104' (ECDSA) to the list of known hosts. bash: lctl: command not found /usr/bin/obdfilter-survey: line 242: ( << 16) | ( << 8) | : syntax error: operand expected (error token is "<< 16) | ( << 8) | ") /usr/bin/obdfilter-survey: line 254: [: -lt: unary operator expected bash: lctl: command not found bash: lctl: command not found OST lustre-OST0000 not setup <snip>
|
| Comments |
| Comment by Andreas Dilger [ 09/Jan/24 ] |
|
This is failing 100% of runs on master. It looks like something wrong with the quoting of the targets (note the extra double quotes before each of the targets): + NETTYPE=tcp thrlo=2 nobjhi=1 thrhi=4 size=1024 case=disk rslt_loc=/tmp targets=""10.240.26.105:lustre-OST0000 "10.240.26.105:lustre-OST0001 "10.240.26.105:lustre-OST0002 "10.240.26.105:lustre-OST0003 "10.240.26.105:lustre-OST0004 "10.240.26.105:lustre-OST0005 "10.240.26.105:lustre-OST0006 "10.240.26.105:lustre-OST0007" /usr/bin/obdfilter-survey /usr/lib64/lustre/tests/obdfilter-survey.sh: line 77: 10.240.26.105:lustre-OST0001 10.240.26.105:lustre-OST0002: command not found cat: '/tmp/obdfilter_survey*': No such file or directory obdfilter-survey test_1a: @@@@@@ FAIL: /usr/bin/obdfilter-survey failed: 127 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6947:error() = /usr/lib64/lustre/tests/obdfilter-survey.sh:81:obdflter_survey_run() = /usr/lib64/lustre/tests/obdfilter-survey.sh:85:test_1a() = /usr/lib64/lustre/tests/test-framework.sh:7287:run_one() |
| Comment by Arshad Hussain [ 09/Jan/24 ] |
|
I am checking. |
| Comment by Arshad Hussain [ 09/Jan/24 ] |
|
Looks like failing line is
I am getting the patch
|
| Comment by Gerrit Updater [ 09/Jan/24 ] |
|
"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53620 |
| Comment by Arshad Hussain [ 09/Jan/24 ] |
Andreas, thanks for the hint/pointer. When picking up NID it was including the quotes which was causing the failure.
|
| Comment by Gerrit Updater [ 18/Jan/24 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53620/ |
| Comment by Peter Jones [ 18/Jan/24 ] |
|
Landed for 2.16 |