[LU-15338] sanity test_205a: No jobstats for id.205a.dd.320 found on ost1::*.lustre-OST0000.job_stats Created: 07/Dec/21  Updated: 07/Sep/23  Resolved: 23/Dec/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/32ad6edf-a1b8-46ac-91ff-92eecbe0efe3

test_205a failed with the following error:

 sanity test_205a: @@@@@@ FAIL: No jobstats for id.205a.dd.320 found on ost1::*.lustre-OST0000.job_stats 

However, the dumped jobstats show that there is a result for id.205a.dd.320, but also a result from the previous "dd" write operation that has an ID that has a matching (but longer) string id.205a.dd.32075, so the "grep -c" in the test matches two jobids and fails the test:

- job_id:          id.205a.dd.32075
  snapshot_time:   1615208459
  write_bytes:     { samples:           1, unit: bytes, min: 1048576, max: 1048576, sum:         1048576, sumsq:      1099511627776 }
  write:           { samples:           1, unit: usecs, min:     141, max:     141, sum:             141, sumsq:              19881 }
  punch:           { samples:           1, unit: usecs, min:      31, max:      31, sum:              31, sumsq:                961 }
  sync:            { samples:           1, unit: usecs, min:   42111, max:   42111, sum:           42111, sumsq:         1773336321 }
- job_id:          id.205a.dd.320
  snapshot_time:   1615208461
  read_bytes:      { samples:           1, unit: bytes, min: 1048576, max: 1048576, sum:         1048576, sumsq:      1099511627776 }
  read:            { samples:           1, unit: usecs, min:      53, max:      53, sum:              53, sumsq:               2809 }

I don't know the exact statistics of this happening, maybe around (4/32768 = 1/8192) since any 5-digit random number for the first "dd" has a 4-, 3-, 2-, and 1-digit substring that would also match. In any case, full-string matching should fix this problem.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_205a - No jobstats for id.205a.dd.4 found on ost1::*.lustre-OST0000.job_stats



 Comments   
Comment by Gerrit Updater [ 07/Dec/21 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45774
Subject: LU-15338 tests: check whole jobid in sanity test_205a
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 477615cd4e0778cd7d8fcc9e5beed755eba0eac4

Comment by Gerrit Updater [ 23/Dec/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45774/
Subject: LU-15338 tests: check whole jobid in sanity 205a
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1ee894a4355ecec869754c0b6c566c0e187e27a7

Comment by Peter Jones [ 23/Dec/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:17:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.