[LU-16827] obdfilter-survey: /usr/bin/obdfilter-survey failed: 2 Created: 12/May/23  Updated: 29/Nov/23  Resolved: 29/Nov/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Arshad Hussain
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Gantt End to Start
has to be done before LU-16834 obdfilter-survey throws "error: attac... Resolved
has to be done after LU-16861 Janitor Testing Fails to copy latest ... Resolved
Related
is related to LU-13340 add LCFG_ADD_UUIDv6 and related commands Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Chris Horn <chris.horn@hpe.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/cf8d9cb0-9efd-4978-b9c4-f32417b78f26

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/94764 - 4.18.0-425.10.1.el8_7.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/94764 - 4.18.0-425.10.1.el8_lustre.x86_64

== obdfilter-survey test 1a: Object Storage Targets survey ========================================================== 21:29:16 (1683754156)
+ NETTYPE=tcp thrlo=2 nobjhi=1 thrhi=4 size=1024 case=disk rslt_loc=/tmp targets="lustre-OST0000 lustre-OST0001 lustre-OST0002 lustre-OST0003 lustre-OST0004 lustre-OST0005 lustre-OST0006" /usr/bin/obdfilter-survey
OST lustre-OST0000 not setup
/usr/bin/iokit-libecho: line 221: 420704 Killed                  remote_shell $host "vmstat 5 >> $host_vmstatf" &> /dev/null
program exited with error 


 Comments   
Comment by Gerrit Updater [ 17/May/23 ]

"Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51035
Subject: LU-16827 obdfilter: Test patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 35bbcad1b9e68d5772107a034974878e3ab8cbc5

Comment by Arshad Hussain [ 18/May/23 ]

Failure is due to not being able to setup echo_client. lctl dl does not have an entry like below...

28 UP echo_client lustreT-OST0000_ecc lustreT-OST0000_ecc_UUID 2

Lookup for the above entry is returned false/empty by get_ec_devno()  and cleanup() is called with non-zero value.

Comment by Andreas Dilger [ 21/Aug/23 ]

It looks like obdfilter-survey is failing 100% of test runs now, and it was caused by the landing of patch https://review.whamcloud.com/50096 "LU-13340 lustre: Support large nids in LCFG_ADD_UUID". Looking back at the test results, it shows that patch 50096 and 3 others based on it at the time (50099, 50102, 50103) all failed on 2023-05-02 before they landed, more failures on 2023-06-07 and 2023-06-22 when the patch was refreshed, and then started failing solidly when the patch landed on 2023-07-08.

 

I haven't looked into why this patch caused the failures, but based on Arshad's comments above it may be related to a change in the output format for "lctl dl", but I'm not sure.

Comment by Arshad Hussain [ 22/Aug/23 ]

Andreas,  I have the fix (I think) - Need a cleanup (I will get this done) - https://review.whamcloud.com/c/fs/lustre-release/+/51035

The reason  for not getting +1 from Maloo seems to be - https://jira.whamcloud.com/browse/LU-16861.  So along with above patch , LU-16861 must be fixed first according to me.

This seems to be fail Failing since May,2023 - I first encounter this on https://testing.whamcloud.com/test_sessions/79ec0895-a03c-4828-8dc8-c48b959e0990

Comment by Arshad Hussain [ 22/Aug/23 ]
Andreas,  I have the fix (I think) - Need a cleanup (I will get this done) - https://review.whamcloud.com/c/fs/lustre-release/+/51035

Done.

Comment by Gerrit Updater [ 31/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51035/
Subject: LU-16827 obdfilter: Fix obdfilter-survery/1a
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 91a3b286ba57bb491b5c17600d7cec9e516a428f

Comment by Peter Jones [ 31/Aug/23 ]

Looks to be landed for 2.16

Comment by Gerrit Updater [ 10/Nov/23 ]

"Vitaliy Kuznetsov <vkuznetsov@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53083
Subject: LU-16827 obdfilter: Fix "emfperf obdfilter-survey" error
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 753faeec133f0dbb63bb990432a7d80544e21f3e

Comment by Gerrit Updater [ 29/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53083/
Subject: LU-16827 obdfilter: Fix "emfperf obdfilter-survey" error
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6b514c0bd0e3f83cfb9f547148b6fc183db7c4c9

Comment by Peter Jones [ 29/Nov/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:30:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.