[LU-5862] Jobstats tracking in changelogs doesn't support jobid_var=disable Created: 04/Nov/14  Updated: 26/Feb/15  Resolved: 01/Dec/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Critical
Reporter: Michael MacDonald (Inactive) Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: HB, patch
Environment:

Seen on master as of b6a3222


Attachments: File LU-5862.sh    
Issue Links:
Related
is related to LU-1996 Fine-grained job activity tracking us... Resolved
is related to LU-5907 Interop 2.6.0<->master lustre-rsync-t... Resolved
is related to LU-5899 Interop 2.6.0<->2.7 sanity test_160a:... Resolved
Severity: 3
Rank (Obsolete): 16403

 Description   

The work that has been done for LU-1996 seems to have introduced an odd bug in changelogs when jobid_var=disable. I don't know enough about this area to speculate on severity, but it seems pretty serious to me. The upshot is that changelogs don't make sense with the default jobid_var setting.

Please see the attached reproducer script for details.



 Comments   
Comment by Michael MacDonald (Inactive) [ 04/Nov/14 ]

Running the attached reproducer results in output similar to the following:

+ lctl set_param jobid_var=disable
jobid_var=disable
+ date
+ rm /tmp/LU-5862/client/foo
+ lfs changelog LU-5862-MDT0000
1 01CREAT 21:39:26.463022185 2014.11.04 0x0 t=[0x200000400:0x1:0x0] j=foo p=[0x200000007:0x1:0x0]
2 06UNLNK 21:39:26.468021603 2014.11.04 0x1 t=[0x200000400:0x1:0x0] j=foo p=[0x200000007:0x1:0x0]
+ lctl set_param jobid_var=procname_uid
jobid_var=procname_uid
+ sleep 1
+ lfs changelog_clear LU-5862-MDT0000 cl1 0
+ date
+ rm /tmp/LU-5862/client/bar
+ lfs changelog LU-5862-MDT0000
3 01CREAT 21:39:27.478020701 2014.11.04 0x0 t=[0x200000400:0x2:0x0] j=LU-5862.sh.0 p=[0x200000007:0x1:0x0] bar
4 06UNLNK 21:39:27.481020460 2014.11.04 0x1 t=[0x200000400:0x2:0x0] j=rm.0 p=[0x200000007:0x1:0x0] bar

Note that the j= (jobid) field in the first file's entries contains the filename (a bug), whereas the second file's entries have the correct j= field as well as the filename in the right place.

Comment by Henri Doreau (Inactive) [ 05/Nov/14 ]

Looks like you're right, this regression was introduced by LU-1996. Are you already on it or shall I work on a patch?

Comment by Henri Doreau (Inactive) [ 05/Nov/14 ]

http://review.whamcloud.com/12574

Comment by Michael MacDonald (Inactive) [ 05/Nov/14 ]

Hi. I haven't started working on a patch. It would probably be faster for you to create one, since you're familiar with the code already. While you're in there, would you also please take a look at LU-5859 to see if it's related? It may not be – I haven't done a bisect to figure out when that one was introduced. It seems plausible that they could be related, though.

Comment by Peter Jones [ 05/Nov/14 ]

James

Could you please take care of this patch?

Thanks

Peter

Comment by Gerrit Updater [ 24/Nov/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12574/
Subject: LU-5862 changelog: Proper record remapping
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a98690be9057f409b7840dc15c75f28d2c861c92

Comment by James Nunez (Inactive) [ 01/Dec/14 ]

Landed to master (pre-2.7).

Generated at Sat Feb 10 01:55:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.