Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5862

Jobstats tracking in changelogs doesn't support jobid_var=disable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.7.0
    • Seen on master as of b6a3222
    • 3
    • 16403

    Description

      The work that has been done for LU-1996 seems to have introduced an odd bug in changelogs when jobid_var=disable. I don't know enough about this area to speculate on severity, but it seems pretty serious to me. The upshot is that changelogs don't make sense with the default jobid_var setting.

      Please see the attached reproducer script for details.

      Attachments

        Issue Links

          Activity

            [LU-5862] Jobstats tracking in changelogs doesn't support jobid_var=disable

            Landed to master (pre-2.7).

            jamesanunez James Nunez (Inactive) added a comment - Landed to master (pre-2.7).

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12574/
            Subject: LU-5862 changelog: Proper record remapping
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a98690be9057f409b7840dc15c75f28d2c861c92

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12574/ Subject: LU-5862 changelog: Proper record remapping Project: fs/lustre-release Branch: master Current Patch Set: Commit: a98690be9057f409b7840dc15c75f28d2c861c92
            pjones Peter Jones added a comment -

            James

            Could you please take care of this patch?

            Thanks

            Peter

            pjones Peter Jones added a comment - James Could you please take care of this patch? Thanks Peter

            Hi. I haven't started working on a patch. It would probably be faster for you to create one, since you're familiar with the code already. While you're in there, would you also please take a look at LU-5859 to see if it's related? It may not be – I haven't done a bisect to figure out when that one was introduced. It seems plausible that they could be related, though.

            mjmac Michael MacDonald (Inactive) added a comment - Hi. I haven't started working on a patch. It would probably be faster for you to create one, since you're familiar with the code already. While you're in there, would you also please take a look at LU-5859 to see if it's related? It may not be – I haven't done a bisect to figure out when that one was introduced. It seems plausible that they could be related, though.

            Looks like you're right, this regression was introduced by LU-1996. Are you already on it or shall I work on a patch?

            hdoreau Henri Doreau (Inactive) added a comment - Looks like you're right, this regression was introduced by LU-1996 . Are you already on it or shall I work on a patch?

            Running the attached reproducer results in output similar to the following:

            + lctl set_param jobid_var=disable
            jobid_var=disable
            + date
            + rm /tmp/LU-5862/client/foo
            + lfs changelog LU-5862-MDT0000
            1 01CREAT 21:39:26.463022185 2014.11.04 0x0 t=[0x200000400:0x1:0x0] j=foo p=[0x200000007:0x1:0x0]
            2 06UNLNK 21:39:26.468021603 2014.11.04 0x1 t=[0x200000400:0x1:0x0] j=foo p=[0x200000007:0x1:0x0]
            + lctl set_param jobid_var=procname_uid
            jobid_var=procname_uid
            + sleep 1
            + lfs changelog_clear LU-5862-MDT0000 cl1 0
            + date
            + rm /tmp/LU-5862/client/bar
            + lfs changelog LU-5862-MDT0000
            3 01CREAT 21:39:27.478020701 2014.11.04 0x0 t=[0x200000400:0x2:0x0] j=LU-5862.sh.0 p=[0x200000007:0x1:0x0] bar
            4 06UNLNK 21:39:27.481020460 2014.11.04 0x1 t=[0x200000400:0x2:0x0] j=rm.0 p=[0x200000007:0x1:0x0] bar
            

            Note that the j= (jobid) field in the first file's entries contains the filename (a bug), whereas the second file's entries have the correct j= field as well as the filename in the right place.

            mjmac Michael MacDonald (Inactive) added a comment - Running the attached reproducer results in output similar to the following: + lctl set_param jobid_var=disable jobid_var=disable + date + rm /tmp/LU-5862/client/foo + lfs changelog LU-5862-MDT0000 1 01CREAT 21:39:26.463022185 2014.11.04 0x0 t=[0x200000400:0x1:0x0] j=foo p=[0x200000007:0x1:0x0] 2 06UNLNK 21:39:26.468021603 2014.11.04 0x1 t=[0x200000400:0x1:0x0] j=foo p=[0x200000007:0x1:0x0] + lctl set_param jobid_var=procname_uid jobid_var=procname_uid + sleep 1 + lfs changelog_clear LU-5862-MDT0000 cl1 0 + date + rm /tmp/LU-5862/client/bar + lfs changelog LU-5862-MDT0000 3 01CREAT 21:39:27.478020701 2014.11.04 0x0 t=[0x200000400:0x2:0x0] j=LU-5862.sh.0 p=[0x200000007:0x1:0x0] bar 4 06UNLNK 21:39:27.481020460 2014.11.04 0x1 t=[0x200000400:0x2:0x0] j=rm.0 p=[0x200000007:0x1:0x0] bar Note that the j= (jobid) field in the first file's entries contains the filename (a bug), whereas the second file's entries have the correct j= field as well as the filename in the right place.

            People

              jamesanunez James Nunez (Inactive)
              mjmac Michael MacDonald (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: