Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12362

kernel warning 'do not call blocking ops when !TASK_RUNNING ' in ptlrpcd

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Upstream, Lustre 2.15.0
    • Lustre 2.12.0
    • ARM client running TOSS kernel 4.14.0-115.6.1.1chaos.ch6a.aarch64
    • 3
    • 9223372036854775807

    Description

      kernel: do not call blocking ops when !TASK_RUNNING; state=1 set at <ffff000002a473d0>] ptlrpcd+0x410/0x5c0 [ptlrpc]
      kernel: ------------[ cut here ]------------
      kernel: WARNING: CPU: 35 PID: 14128 at kernel/sched/core.c:5988 __might_sleep+0x84/0x8c 

      Attachments

        Issue Links

          Activity

            [LU-12362] kernel warning 'do not call blocking ops when !TASK_RUNNING ' in ptlrpcd

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45069/
            Subject: LU-12362 ptlrpc: use wait_woken() in ptlrpcd()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 885b494632ca16d95fd09685a571b76d80d09414

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45069/ Subject: LU-12362 ptlrpc: use wait_woken() in ptlrpcd() Project: fs/lustre-release Branch: master Current Patch Set: Commit: 885b494632ca16d95fd09685a571b76d80d09414
            xinliang Xinliang Liu added a comment - - edited

            Thanks Neil's patch, "wait_woken"  should fix such using sleeping locking in sleeping(a.k.a nested sleeping primitives) context issues. See more details here: https://lwn.net/Articles/628628/. I wonder if there is other wait event place in Lustre need to be fix too, as only opening the kernel debug option (CONFIG_DEBUG_ATOMIC_SLEEP) this warning will print.

            xinliang Xinliang Liu added a comment - - edited Thanks Neil's patch, "wait_woken"  should fix such using sleeping locking in sleeping(a.k.a nested sleeping primitives) context issues. See more details here: https://lwn.net/Articles/628628/.  I wonder if there is other wait event place in Lustre need to be fix too, as only opening the kernel debug option (CONFIG_DEBUG_ATOMIC_SLEEP) this warning will print.

            "Neil Brown <neilb@suse.de>" uploaded a new patch: https://review.whamcloud.com/45069
            Subject: LU-12362 ptlrpc: use wait_woken() in ptlrpcd()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 296a0dd12f4e5c7f0615d8cd00a983ecf0b59ab3

            gerrit Gerrit Updater added a comment - "Neil Brown <neilb@suse.de>" uploaded a new patch: https://review.whamcloud.com/45069 Subject: LU-12362 ptlrpc: use wait_woken() in ptlrpcd() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 296a0dd12f4e5c7f0615d8cd00a983ecf0b59ab3
            neilb Neil Brown added a comment -

            It turns out that this is really easy to fix.  Since Linux 3.19 we have had a "wait_woken" function.

            This effective implements that "change every place that calls wake_up in ->set_wait to also set a flag." option that I have above.

            The "wake_up" is made to call woken_wake_function which sets WQ_FLAG_WOKEN, and wait_woken() waits for that flag to be set, then clears it.

            The only difficult part of adding that functionality for older kernels (do we still support kernels before 3.19??).  Apart from that, the patch is "1 file changed, 19 insertions, 8 deletions"

             

            neilb Neil Brown added a comment - It turns out that this is really easy to fix.  Since Linux 3.19 we have had a "wait_woken" function. This effective implements that "change every place that calls wake_up in ->set_wait to also set a flag." option that I have above. The "wake_up" is made to call woken_wake_function which sets WQ_FLAG_WOKEN, and wait_woken() waits for that flag to be set, then clears it. The only difficult part of adding that functionality for older kernels (do we still support kernels before 3.19??).  Apart from that, the patch is "1 file changed, 19 insertions , 8 deletions "  

            Thanks for weighing in, Neil.

            I think either is fine, my glance through the code suggested that the 'condition' being checked is kind of complicated.  Perhaps there is some simplification I missed, but it seemed like there were various ways/reasons it would decide it had nothing to do.

            pfarrell Patrick Farrell (Inactive) added a comment - Thanks for weighing in, Neil. I think either is fine, my glance through the code suggested that the 'condition' being checked is kind of complicated.  Perhaps there is some simplification I missed, but it seemed like there were various ways/reasons it would decide it had nothing to do.
            neilb Neil Brown added a comment -

            That code is rather convoluted...

            The possible problem here is that the wait loop could spin rather than waiting.  If the 'condition' ever comes close to sleeping, it will reset the task state to RUNNING and the subsequent schedule() call will do nothing, causing l_wait_event() to loop around and call the condition() function again.

            If the condition() does this every time through, then the wait loop becomes a busy-wait.  If it only does it occasionally, then there is no real problem.

            I can see two approaches to "fixing" it.

            One is to change every place that calls wake_up in ->set_wait to also set a flag.  Then ptlrpcd() waits for that flag to be set, then calls ptlrpcd_check().

            The other is to factor out the checks in ptlrpcd_check() from the work.  Then do the work before calling l_wait_event(), then just do the checks inside l_wait_event().

            I couldn't suggest which is better without first following through all the code and working out that it is really doing.

             

            neilb Neil Brown added a comment - That code is rather convoluted... The possible problem here is that the wait loop could spin rather than waiting.  If the 'condition' ever comes close to sleeping, it will reset the task state to RUNNING and the subsequent schedule() call will do nothing, causing l_wait_event() to loop around and call the condition() function again. If the condition() does this every time through, then the wait loop becomes a busy-wait.  If it only does it occasionally, then there is no real problem. I can see two approaches to "fixing" it. One is to change every place that calls wake_up in ->set_wait to also set a flag.  Then ptlrpcd() waits for that flag to be set, then calls ptlrpcd_check(). The other is to factor out the checks in ptlrpcd_check() from the work.  Then do the work before calling l_wait_event(), then just do the checks inside l_wait_event(). I couldn't suggest which is better without first following through all the code and working out that it is really doing.  

            So the problem is we're calling ptlrpcd_check() as the condition, and it sends RPCs, etc, rather than just checking a condition.

            BTW, in case anyone else was going to check, Neil didn't fix this for us already (: ) ):
            http://lists.lustre.org/pipermail/lustre-devel-lustre.org/2018-February/006500.html

            pfarrell Patrick Farrell (Inactive) added a comment - So the problem is we're calling ptlrpcd_check() as the condition, and it sends RPCs, etc, rather than just checking a condition. BTW, in case anyone else was going to check, Neil didn't fix this for us already (: ) ): http://lists.lustre.org/pipermail/lustre-devel-lustre.org/2018-February/006500.html

            Yes, that's the main thing, and that's a shame that it's not in RHEL 7.

            Oh, this is interesting - Actually, wait_event() has the same behavior.  It calls condition() after setting task_state to whatever has been requested.  So, actually, I think the problem might be with the nature of our condition().  So in fact we'd still have this problem even with Neil's patch.  My bad.

            In fact, now that I look at it, our condition here is pretty weird.  It's using condition() to do the main work of the loop.

            Having sampled about 20-30 of the uses of l_wait_event in Lustre, this is the only one I can find where condition() is anything more than trivial.  Most of them are a single read of some variable, others are trivial state checking functions.

            So I'd say we're pretty clearly misusing condition() here, and we need to move the work of this function out of the condition().

            pfarrell Patrick Farrell (Inactive) added a comment - Yes, that's the main thing, and that's a shame that it's not in RHEL 7. Oh, this is interesting - Actually, wait_event() has the same behavior.  It calls condition() after setting task_state to whatever has been requested.  So, actually, I think the problem might be with the nature of our condition().  So in fact we'd still have this problem even with Neil's patch.  My bad. In fact, now that I look at it, our condition here is pretty weird.  It's using condition() to do the main work of the loop. Having sampled about 20-30 of the uses of l_wait_event in Lustre, this is the only one I can find where condition() is anything more than trivial.  Most of them are a single read of some variable, others are trivial state checking functions. So I'd say we're pretty clearly misusing condition() here, and we need to move the work of this function out of the condition().

            I think it's too early for us to be able to give up RHEL7... One option would be to map wait_event_idle_*() to an equivalent l_wait_event() implementation so that we can migrate the code over to kernel style without dropping support. I suspect that would likely need several patches to change from setting the "style" of wait in LWI to being arguments to differently-named functions.

            adilger Andreas Dilger added a comment - I think it's too early for us to be able to give up RHEL7... One option would be to map wait_event_idle_*() to an equivalent l_wait_event() implementation so that we can migrate the code over to kernel style without dropping support. I suspect that would likely need several patches to change from setting the "style" of wait in LWI to being arguments to differently-named functions.

            To my understanding the key to using wait_event_idle*() that landed upstream in 4.17 is that it used the TASK_IDLE state. That was introduced with kernel commit 

            80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf

             

            Looking at the RHEL7 kernel code I don't see that handling  It was introduce in kernel version 4.1-rc4. So pretty much its supported every where but RHEL7. 

            simmonsja James A Simmons added a comment - To my understanding the key to using wait_event_idle*() that landed upstream in 4.17 is that it used the TASK_IDLE state. That was introduced with kernel commit  80ed87c8a9ca0cad7ca66cf3bbdfb17559a66dcf   Looking at the RHEL7 kernel code I don't see that handling  It was introduce in kernel version 4.1-rc4. So pretty much its supported every where but RHEL7. 

            People

              simmonsja James A Simmons
              ruth.klundt@gmail.com Ruth Klundt (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: