Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12651

High kworker CPU usage (osc_grant_work_handler) on IDLE connections

Details

    • 3
    • 9223372036854775807

    Description

      We discovered that on our systems with lustre mounted, the kworker is using signifficant amount of CPU.
      perf top shows on an idle system:

       39.44%  [kernel]                  [k] osc_should_shrink_grant
        12.14%  [kernel]                  [k] osc_grant_work_handler
         2.81%  [kernel]                  [k] process_one_work
         2.64%  [kernel]                  [k] __queue_work
         2.56%  [kernel]                  [k] read_tsc
      

      We currently have grant_shrink=0 on this system.

      Looks like doing just du -hs /fs makes the problem go away for some time.
      Also unmounting the filesystem makes the problem go away.
      This is Centos 7.6 system with Lustre 2.12.0

      Attachments

        Issue Links

          Activity

            [LU-12651] High kworker CPU usage (osc_grant_work_handler) on IDLE connections

            Hi Alexander,
            Thanks for looking into it. Would you be so kind to provide patch for 2.12.3 as well?
            Regards.
            Jacek Tomaka

            Tomaka Jacek Tomaka (Inactive) added a comment - Hi Alexander, Thanks for looking into it. Would you be so kind to provide patch for 2.12.3 as well? Regards. Jacek Tomaka
            zam Alexander Zarochentsev added a comment - - edited

            my experiments with 2.12-based lustre and grant_shrink=0:

            w/o the fix, kworker starts to eat 100% CPU after 20 min from Lustre mount time (default grant shrinking interval)

            top - 00:03:08 up 2 days, 11:32,  3 users,  load average: 2.95, 2.47, 2.22
            Tasks: 258 total,   3 running, 255 sleeping,   0 stopped,   0 zombie
            %Cpu(s):  0.0 us, 25.0 sy,  0.0 ni, 75.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
            KiB Mem :  2914024 total,  1138684 free,   544988 used,  1230352 buff/cache
            KiB Swap:  2113532 total,  2113532 free,        0 used.  2190536 avail Mem 
            
              PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                        
            21631 root      20   0       0      0      0 R 100.0  0.0   3:03.08 kworker/3:2                                                                                    
                1 root      20   0  191032   3912   2584 S   0.0  0.1   0:06.70 systemd                                                                                        
                2 root      20   0       0      0      0 S   0.0  0.0   0:00.06 kthreadd                                                                                       
                3 root      20   0       0      0      0 S   0.0  0.0   0:01.06 ksoftirqd/0                                                                                    
                5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                                                                   
            

            with the fix,  22 min after start, system is idle:

            top - 00:32:05 up 2 days, 12:01,  3 users,  load average: 2.00, 2.01, 2.06
            Tasks: 261 total,   2 running, 259 sleeping,   0 stopped,   0 zombie
            %Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
            KiB Mem :  2914024 total,  1133004 free,   549940 used,  1231080 buff/cache
            KiB Swap:  2113532 total,  2113532 free,        0 used.  2185136 avail Mem 
            
              PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                        
              367 root      20   0  162180   2456   1584 R   0.3  0.1   0:00.03 top                                                                                            
                1 root      20   0  191032   3912   2584 S   0.0  0.1   0:06.85 systemd                                                                                        
                2 root      20   0       0      0      0 S   0.0  0.0   0:00.07 kthreadd                                                                                       
                3 root      20   0       0      0      0 S   0.0  0.0   0:01.10 ksoftirqd/0                                                                                    
                5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                                                                   
                7 root      rt   0       0      0      0 S   0.0  0.0   0:00.61 migration/0                                                                                    
                8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh                                                                                         
            
            zam Alexander Zarochentsev added a comment - - edited my experiments with 2.12-based lustre and grant_shrink=0: w/o the fix, kworker starts to eat 100% CPU after 20 min from Lustre mount time (default grant shrinking interval) top - 00:03:08 up 2 days, 11:32, 3 users, load average: 2.95, 2.47, 2.22 Tasks: 258 total, 3 running, 255 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 25.0 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 2914024 total, 1138684 free, 544988 used, 1230352 buff/cache KiB Swap: 2113532 total, 2113532 free, 0 used. 2190536 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21631 root 20 0 0 0 0 R 100.0 0.0 3:03.08 kworker/3:2 1 root 20 0 191032 3912 2584 S 0.0 0.1 0:06.70 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.06 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:01.06 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H with the fix,  22 min after start, system is idle: top - 00:32:05 up 2 days, 12:01, 3 users, load average: 2.00, 2.01, 2.06 Tasks: 261 total, 2 running, 259 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 2914024 total, 1133004 free, 549940 used, 1231080 buff/cache KiB Swap: 2113532 total, 2113532 free, 0 used. 2185136 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 367 root 20 0 162180 2456 1584 R 0.3 0.1 0:00.03 top 1 root 20 0 191032 3912 2584 S 0.0 0.1 0:06.85 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.07 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:01.10 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root rt 0 0 0 0 S 0.0 0.0 0:00.61 migration/0 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
            zam Alexander Zarochentsev added a comment - Jasek Tomaka, can you try https://review.whamcloud.com/37429 ?

            Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/37429
            Subject: LU-12651 osc: always call update_next_shrink
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2986155c51914c5a63f6c351908c9a49dbe5042f

            gerrit Gerrit Updater added a comment - Alexander Zarochentsev (c17826@cray.com) uploaded a new patch: https://review.whamcloud.com/37429 Subject: LU-12651 osc: always call update_next_shrink Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2986155c51914c5a63f6c351908c9a49dbe5042f

            Any news on this ticket?

            Tomaka Jacek Tomaka (Inactive) added a comment - Any news on this ticket?
            Tomaka Jacek Tomaka (Inactive) added a comment - - edited

            Most likely regression from LU-8708

            Tomaka Jacek Tomaka (Inactive) added a comment - - edited Most likely regression from LU-8708

            People

              zam Alexander Zarochentsev
              Tomaka Jacek Tomaka (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: