Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0, Lustre 2.5.0
    • None
    • clang 3.4
    • 3
    • 10815

    Description

      While porting lustre to macos i found a some strange behavior - likely wrong type used for variable.

      osc_request.c:2503:41: error: implicit conversion from 'unsigned long long' to 'int' changes value from 2199023255552 to 0 [-Werror,-Wconstant-conversion]
              int match_lvb = (agl != 0 ? 0 : LDLM_FL_LVB_READY);
                  ~~~~~~~~~                   ^~~~~~~~~~~~~~~~~
      /Users/shadow/work/lustre/work/WC-review/mac-os/lustre/include/lustre_dlm_flags.h:244:41: note: expanded from macro 'LDLM_FL_LVB_READY'
      #define LDLM_FL_LVB_READY               0x0000020000000000ULL // bit  41
      
      

      so if we assign 64bit data to 32bit variable - we will be have zero always and that logic newer work.

      Attachments

        Activity

          [LU-4023] wrong type used for ldlm flag
          jlevi Jodi Levi (Inactive) made changes -
          Fix Version/s New: Lustre 2.6.0 [ 10595 ]
          pjones Peter Jones made changes -
          Labels Original: mq413
          adilger Andreas Dilger made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

          I don't think it is a big deal either way, but the reason I point at the b58a810 commit ("LU-1684 ldlm: move ldlm flags not sent through wire to upper 32bits") is because it moved the LDLM_FL_LVB_READY value from fitting in an int to not fitting in an int, which is what affected osc_enqueue_base():

          -#define LDLM_FL_LVB_READY      0x100000
          +#define LDLM_FL_LVB_READY      0x20000000000ULL
          

          It still isn't clear what effect this had (if any?) on the behaviour of osc_enqueue_base(). I might guess there was some kind of race that might lead "ls -l" and statahead requests from returning data from OST extent locks that hadn't actually been granted yet (because ldlm_lock_match() was not sleeping until LVB_READY was set on the lock)? In any case, I'm glad that this problem was found and fixed.

          Shadow, if you think there is still a problem in osc_enqueue_base->ldlm_lock_match() because of this change back to the pre-2.3.64 behaviour (which has existed since v2_1_53_0-40-g6f5813d then please file a separate bug with details on how the current code is broken. I agree that it is possible that something broke by fixing osc_enqueue_base() to pass LVB_READY down to ldlm_lock_match(), but a quick look doesn't show anything obvious.

          adilger Andreas Dilger added a comment - I don't think it is a big deal either way, but the reason I point at the b58a810 commit (" LU-1684 ldlm: move ldlm flags not sent through wire to upper 32bits ") is because it moved the LDLM_FL_LVB_READY value from fitting in an int to not fitting in an int, which is what affected osc_enqueue_base(): -#define LDLM_FL_LVB_READY 0x100000 +#define LDLM_FL_LVB_READY 0x20000000000ULL It still isn't clear what effect this had (if any?) on the behaviour of osc_enqueue_base(). I might guess there was some kind of race that might lead "ls -l" and statahead requests from returning data from OST extent locks that hadn't actually been granted yet (because ldlm_lock_match() was not sleeping until LVB_READY was set on the lock)? In any case, I'm glad that this problem was found and fixed. Shadow, if you think there is still a problem in osc_enqueue_base->ldlm_lock_match() because of this change back to the pre-2.3.64 behaviour (which has existed since v2_1_53_0-40-g6f5813d then please file a separate bug with details on how the current code is broken. I agree that it is possible that something broke by fixing osc_enqueue_base() to pass LVB_READY down to ldlm_lock_match(), but a quick look doesn't show anything obvious.
          adilger Andreas Dilger made changes -
          Fix Version/s New: Lustre 2.5.0 [ 10295 ]
          Fix Version/s Original: Lustre 2.5.1 [ 10608 ]
          adilger Andreas Dilger made changes -
          Labels New: mq413

          Andreas Dilger added a comment - 01/Oct/13 4:28 PM
          > This code has been broken since Vitaly landed the patch b58a810 in 2.3.54.
          > It isn't clear why we didn't notice this failure,

          probably just because it WAS NOT broken since Vitaly landed that patch and the failure was introduced later by Intel

          vitaly_fertman Vitaly Fertman added a comment - Andreas Dilger added a comment - 01/Oct/13 4:28 PM > This code has been broken since Vitaly landed the patch b58a810 in 2.3.54. > It isn't clear why we didn't notice this failure, probably just because it WAS NOT broken since Vitaly landed that patch and the failure was introduced later by Intel

          I think lack a response to inspector question - it's not a good style. you have a different opinion? if yes - please elaborate why Intel developers allowed to do it, but you strongly agains it from other developers.

          In my opinion (I can't speak for Dmitry) your question was more of a feature request, and not an objection to the patch. I agree that replying to inspector comments is important for everyone, regardless of where they are working. Maybe at another time this patch would have taken longer to be inspected and landed, but we are trying to finish all of the blocker bugs with enough time to test the code enough before release. I also agree that it makes sense for you to file a new bug with your feature request and attach the patch to that bug, so that this one can be closed for 2.5.0.

          i have plan to test patch with clang to see none breakage introduced, but none interested with verification i see.

          Nowhere in your comments did you mention testing the patch with clang, and I can't read your mind. I'm definitely interested in static code analysis (see LU-871 and LU-2675 regarding clang, and many patches from Bull landed under LU-2753 from Coverity). I will be happy to land any patches you submit for defects you find with clang or sparse or other static analysis tools.

          i know about -1 in gerrit, but i don't think i need set to -1 if i have any questions to the author, but now i will set -1 for any patch with any questions and clear only when finished a review - it's easy when see - none interested with review results.

          This is something that happens fairly regularly with our own inspectors - someone marks a patch -1 if they think there is an issue, and the change it to +1 if their issue is explained. It would be even better if some code is unclear to add a comment in the code to explain it rather than just a note in Gerrit. I think it is untrue to say that nobody is interested with review results just because of this minor issue. I would be happy to have more review from Xyratex developers, even on patches that are not their own. Reviewing patches is as much work as developing them in the first place, and if more people review a patch we can hope that they will find more bugs.

          that patch is not a small change as you think. previously we have drop working any logic related to the alg as match_lvb always zero in past. did you inspect all related code - how it work with LDLM_FL_LVB_READY passed in down call? so that patch may change flag to correct mask but introduce a new bugs due LDLM_FL_LVB_READY will passed to argument of ldlm_lock_match?

          I agree, though the code was (presumably) working before Vitaly's patch http://review.whamcloud.com/3494 broke it and we can hope it is working again. I'm happy to have your analysis of what was broken by that patch, and a new test that exercises this code better.

          To be honest, it makes me wonder what was broken/fixed with this change? Something related to async locking for statahead, but I wonder how it would be visible to userspace? Would it just mean that statahead needs to enqueue the lock again? Is a race when accessing lock LVB data? Is this code doing nothing useful and can be deleted?

          adilger Andreas Dilger added a comment - I think lack a response to inspector question - it's not a good style. you have a different opinion? if yes - please elaborate why Intel developers allowed to do it, but you strongly agains it from other developers. In my opinion (I can't speak for Dmitry) your question was more of a feature request, and not an objection to the patch. I agree that replying to inspector comments is important for everyone, regardless of where they are working. Maybe at another time this patch would have taken longer to be inspected and landed, but we are trying to finish all of the blocker bugs with enough time to test the code enough before release. I also agree that it makes sense for you to file a new bug with your feature request and attach the patch to that bug, so that this one can be closed for 2.5.0. i have plan to test patch with clang to see none breakage introduced, but none interested with verification i see. Nowhere in your comments did you mention testing the patch with clang, and I can't read your mind. I'm definitely interested in static code analysis (see LU-871 and LU-2675 regarding clang, and many patches from Bull landed under LU-2753 from Coverity). I will be happy to land any patches you submit for defects you find with clang or sparse or other static analysis tools. i know about -1 in gerrit, but i don't think i need set to -1 if i have any questions to the author, but now i will set -1 for any patch with any questions and clear only when finished a review - it's easy when see - none interested with review results. This is something that happens fairly regularly with our own inspectors - someone marks a patch -1 if they think there is an issue, and the change it to +1 if their issue is explained. It would be even better if some code is unclear to add a comment in the code to explain it rather than just a note in Gerrit. I think it is untrue to say that nobody is interested with review results just because of this minor issue. I would be happy to have more review from Xyratex developers, even on patches that are not their own. Reviewing patches is as much work as developing them in the first place, and if more people review a patch we can hope that they will find more bugs. that patch is not a small change as you think. previously we have drop working any logic related to the alg as match_lvb always zero in past. did you inspect all related code - how it work with LDLM_FL_LVB_READY passed in down call? so that patch may change flag to correct mask but introduce a new bugs due LDLM_FL_LVB_READY will passed to argument of ldlm_lock_match? I agree, though the code was (presumably) working before Vitaly's patch http://review.whamcloud.com/3494 broke it and we can hope it is working again. I'm happy to have your analysis of what was broken by that patch, and a new test that exercises this code better. To be honest, it makes me wonder what was broken/fixed with this change? Something related to async locking for statahead, but I wonder how it would be visible to userspace? Would it just mean that statahead needs to enqueue the lock again? Is a race when accessing lock LVB data? Is this code doing nothing useful and can be deleted?

          Andreas,

          first, i think lack a response to inspector question - it's not a good style. you have a different opinion? if yes - please elaborate why Intel developers allowed to do it, but you strongly agains it from other developers.
          If Dmitry was post a reply and create a new ticket with that cleanup it's make different response from my side.
          I understand about code freeze, but i don't have guaranteed we don't have other places with same bugs, as i don't able to build kernel part with clang now. It's main objection if we have such bug in two releases - we don't think it's big regression in past. so have some changes in that area?

          second - i have plan to test patch with clang to see none breakage introduced, but none interested with verification i see.

          third - i know about -1 in gerrit, but i don't think i need set to -1 if i have any questions to the author, but now i will set -1 for any patch with any questions and clear only when finished a review - it's easy when see - none interested with review results.

          as about patch - that patch is not a small change as you think. previously we have drop working any logic related to the alg as match_lvb always zero in past. did you inspect all related code - how it work with LDLM_FL_LVB_READY passed in down call? so that patch may change flag to correct mask but introduce a new bugs due LDLM_FL_LVB_READY will passed to argument of ldlm_lock_match?

          shadow Alexey Lyashkov added a comment - Andreas, first, i think lack a response to inspector question - it's not a good style. you have a different opinion? if yes - please elaborate why Intel developers allowed to do it, but you strongly agains it from other developers. If Dmitry was post a reply and create a new ticket with that cleanup it's make different response from my side. I understand about code freeze, but i don't have guaranteed we don't have other places with same bugs, as i don't able to build kernel part with clang now. It's main objection if we have such bug in two releases - we don't think it's big regression in past. so have some changes in that area? second - i have plan to test patch with clang to see none breakage introduced, but none interested with verification i see. third - i know about -1 in gerrit, but i don't think i need set to -1 if i have any questions to the author, but now i will set -1 for any patch with any questions and clear only when finished a review - it's easy when see - none interested with review results. as about patch - that patch is not a small change as you think. previously we have drop working any logic related to the alg as match_lvb always zero in past. did you inspect all related code - how it work with LDLM_FL_LVB_READY passed in down call? so that patch may change flag to correct mask but introduce a new bugs due LDLM_FL_LVB_READY will passed to argument of ldlm_lock_match?
          adilger Andreas Dilger made changes -
          Summary Original: wrong type used New: wrong type used for ldlm flag

          People

            dmiter Dmitry Eremin (Inactive)
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: