Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.6.0
    • Lustre 2.4.0
    • 2_3_49_92_1-llnl
    • 3
    • 3127

    Description

      We've been seeing strange caching behavior on our PPC IO nodes, eventually resulting in OOM events. This is particularly harmful for us because there are critical system components running in user space on these nodes, forcing us to run with "panic_on_oom" enabled.

      We see a large amount of "Active File" pages as reported by /proc/vmstat and /proc/meminfo which spikes during Lustre IOR jobs. For the test I am running that is unusual since I'm not running any executables out of Lustre, it should only be "inactive" IOR data accumulating in the page cache as a result of the Lustre IO. The really strange thing is, prior to testing Orion rebased code, "Active Files" would sometimes stay low (in the 100's Meg range) and sometimes it would grow very large (in the 5 Gig range). It's hard to tell if the variation still exists in the rebased code because the OOM events are hitting more frequently, basically every time I run an IOR.

      We also see a large amount of "Inactive File" pages which we believe should be limited by the patch we carry from LU-744, but doesn't seem to be the case:

      commit 98400981e6d6e5707233be2c090e4227a77e2c46
      Author: Jinshan Xiong <jinshan.xiong@whamcloud.com>
      Date:   Tue May 15 20:11:37 2012 -0700
      
          LU-744 osc: add lru pages management - new RPC 
          
          Add a cache management at osc layer, this way we can control how much
          memory can be used to cache lustre pages and avoid complex solution
          as what we did in b1_8.
          
          In this patch, admins can set how much memory will be used for caching
          lustre pages per file system. A self-adapative algorithm is used to
          balance those budget among OSCs.
          
          Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
          Change-Id: I76c840aef5ca9a3a4619f06fcaee7de7f95b05f5
          Revision-Id: 21
      

      From what I can tell, Lustre is trying to limit the cache to the value we are setting 4G. When I dump the lustre page cache I roughly see 4G worth of pages, but the number of pages listed does not reflect the values seen in vmstat and meminfo.

      So I have a few questions which I'd like to get an answer to:

       1. Why are Lustre pages being marked as "referenced" and moved to the 
          Active list in the first place? Without any running executables
          coming from Lustre I would not expect this to happen.
      
       2. Why more "Inactive File" pages are accumulating on the system past
          the 4G limit we are trying to set within Lustre?
      
       3. Why these "Inactive File" pages are unable to be reclaimed  when we
          hit a low memory situation? Ultimately resulting in an out of memory
          event and panic_on_oom triggering. This _might_ be related to (1) 
          above.
      

      I added a systemtap script to disable the panic_on_oom flag and dump the Lustre page cache, /proc/vmstat, and /proc/meminfo file to try and gain some understanding into the problem. I'll upload those files as attachments in case they prove useful.

      Attachments

        Issue Links

          Activity

            [LU-2139] Tracking unstable pages
            yujian Jian Yu added a comment - Here are the back-ported patches for Lustre b2_5 branch: http://review.whamcloud.com/12604 (from http://review.whamcloud.com/6284 ) http://review.whamcloud.com/12605 (from http://review.whamcloud.com/4374 ) http://review.whamcloud.com/12606 (from http://review.whamcloud.com/4375 ) http://review.whamcloud.com/12612 (from http://review.whamcloud.com/5935 )
            pjones Peter Jones added a comment -

            All patches now landed to master

            pjones Peter Jones added a comment - All patches now landed to master
            morrone Christopher Morrone (Inactive) added a comment - Remaining: http://review.whamcloud.com/5935
            pjones Peter Jones added a comment -

            Prakash

            Thanks for the refresh. Yes, we will review these patches shortly

            Peter

            pjones Peter Jones added a comment - Prakash Thanks for the refresh. Yes, we will review these patches shortly Peter

            I've just refreshed these three patches onto HEAD of master:

            1) http://review.whamcloud.com/6284
            2) http://review.whamcloud.com/4374
            3) http://review.whamcloud.com/4375

            It would be nice to get some feedback on them, we've been running with previous versions of these patches on Sequoia/Grove for nearly a year now.

            prakash Prakash Surya (Inactive) added a comment - I've just refreshed these three patches onto HEAD of master: 1) http://review.whamcloud.com/6284 2) http://review.whamcloud.com/4374 3) http://review.whamcloud.com/4375 It would be nice to get some feedback on them, we've been running with previous versions of these patches on Sequoia/Grove for nearly a year now.
            pjones Peter Jones added a comment -

            This remains a support priority to get these patches refreshed and landed but the patches are not presently ready for consideration to include in 2.5.0

            pjones Peter Jones added a comment - This remains a support priority to get these patches refreshed and landed but the patches are not presently ready for consideration to include in 2.5.0
            pjones Peter Jones added a comment -

            Lai

            Could you please take care of refreshing Jinshan's patches so they are suitable for landing to master?

            Thanks

            Peter

            pjones Peter Jones added a comment - Lai Could you please take care of refreshing Jinshan's patches so they are suitable for landing to master? Thanks Peter
            green Oleg Drokin added a comment - - edited

            Hit assertion introduced by first patch in the series:
            "LustreError: 17404:0:(osc_cache.c:1774:osc_dec_unstable_pages()) ASSERTION( atomic_read(&cli->cl_cache->ccc_unstable_nr) >= 0 ) failed"

            See LU-3274 for more details.

            green Oleg Drokin added a comment - - edited Hit assertion introduced by first patch in the series: "LustreError: 17404:0:(osc_cache.c:1774:osc_dec_unstable_pages()) ASSERTION( atomic_read(&cli->cl_cache->ccc_unstable_nr) >= 0 ) failed" See LU-3274 for more details.

            efocht Please make sure you are using patch-set 30 of http://review.whamcloud.com/4245. The earlier patch-sets had deficiencies in them, and patch-set 29 specifically had a bug in it causing umounts to hang in ll_put_super (which looks like the problem you are having).

            prakash Prakash Surya (Inactive) added a comment - efocht Please make sure you are using patch-set 30 of http://review.whamcloud.com/4245 . The earlier patch-sets had deficiencies in them, and patch-set 29 specifically had a bug in it causing umounts to hang in ll_put_super (which looks like the problem you are having).
            efocht Erich Focht added a comment -

            Applied the patch stack (4245, 4374, 4375) to a 2.6.32-358.0.1.el6.x86_64 kernel. Getting soft lockups when unmounting. The servers run Lustre 2.1.5.

            Pid: 3792, comm: umount Not tainted 2.6.32-358.0.1.el6.x86_64 #1 Supermicro X9DRT/X9DRT
            RIP: 0010:[<ffffffffa0b65e2c>]  [<ffffffffa0b65e2c>] ll_put_super+0x10c/0x510 [lustre]
            RSP: 0018:ffff881016853d28  EFLAGS: 00000246
            RAX: 0000000000000000 RBX: ffff881016853e28 RCX: ffff881027b6c110
            RDX: ffff88102d7d1540 RSI: 000000000000005a RDI: ffff88104a73fc00
            RBP: ffffffff8100bb8e R08: 0000000000000000 R09: ffff881049e980c0
            R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
            R13: ffff881016853d98 R14: ffff88106ebfba00 R15: ffff881027afa044
            FS:  00002ab82d4c7740(0000) GS:ffff88089c520000(0000) knlGS:0000000000000000
            CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
            CR2: 00002ab82d167360 CR3: 000000102d0ca000 CR4: 00000000000407e0
            DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            Process umount (pid: 3792, threadinfo ffff881016852000, task ffff88102d7d1540)
            Stack:
             ffffffff81fcb700 01ff88086e17b148 ffff88102c932400 ffff88102c932000
            <d> ffff88102d7d1540 ffff88106ebfba00 ffff88086e17b350 ffff88086e17b138
            <d> ffff881016853d88 ffffffff8119cccf ffff88086e17b138 ffff88086e17b138
            Call Trace:
             [<ffffffff8119cccf>] ? destroy_inode+0x2f/0x60
             [<ffffffff8119d19c>] ? dispose_list+0xfc/0x120
             [<ffffffff8119d596>] ? invalidate_inodes+0xf6/0x190
             [<ffffffff8118334b>] ? generic_shutdown_super+0x5b/0xe0
             [<ffffffff81183436>] ? kill_anon_super+0x16/0x60
             [<ffffffffa06e82ea>] ? lustre_kill_super+0x4a/0x60 [obdclass]
             [<ffffffff81183bd7>] ? deactivate_super+0x57/0x80
             [<ffffffff811a1c4f>] ? mntput_no_expire+0xbf/0x110
             [<ffffffff811a26bb>] ? sys_umount+0x7b/0x3a0
             [<ffffffff810863b1>] ? sigprocmask+0x71/0x110
             [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
            Code: c0 4c 89 a5 10 ff ff ff 4c 8d b5 40 ff ff ff 48 89 95 20 ff ff ff 49 89 cd 49 89 d4 0f 85 ad 00 00 00 45 85 ff 0f 85 a4 00 00 00 <8b> 83 0c 01 00 00 85 c0 74 f6 4c 89 f7 e8 72 f3 a4 ff 4c 89 f6
            Call Trace:
             [<ffffffffa0b65dc7>] ? ll_put_super+0xa7/0x510 [lustre]
             [<ffffffff8119cccf>] ? destroy_inode+0x2f/0x60
             [<ffffffff8119d19c>] ? dispose_list+0xfc/0x120
             [<ffffffff8119d596>] ? invalidate_inodes+0xf6/0x190
             [<ffffffff8118334b>] ? generic_shutdown_super+0x5b/0xe0
             [<ffffffff81183436>] ? kill_anon_super+0x16/0x60
             [<ffffffffa06e82ea>] ? lustre_kill_super+0x4a/0x60 [obdclass]
             [<ffffffff81183bd7>] ? deactivate_super+0x57/0x80
             [<ffffffff811a1c4f>] ? mntput_no_expire+0xbf/0x110
             [<ffffffff811a26bb>] ? sys_umount+0x7b/0x3a0
             [<ffffffff810863b1>] ? sigprocmask+0x71/0x110
             [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
            BUG: soft lockup - CPU#25 stuck for 67s! [umount:3792]
            
            efocht Erich Focht added a comment - Applied the patch stack (4245, 4374, 4375) to a 2.6.32-358.0.1.el6.x86_64 kernel. Getting soft lockups when unmounting. The servers run Lustre 2.1.5. Pid: 3792, comm: umount Not tainted 2.6.32-358.0.1.el6.x86_64 #1 Supermicro X9DRT/X9DRT RIP: 0010:[<ffffffffa0b65e2c>] [<ffffffffa0b65e2c>] ll_put_super+0x10c/0x510 [lustre] RSP: 0018:ffff881016853d28 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff881016853e28 RCX: ffff881027b6c110 RDX: ffff88102d7d1540 RSI: 000000000000005a RDI: ffff88104a73fc00 RBP: ffffffff8100bb8e R08: 0000000000000000 R09: ffff881049e980c0 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff881016853d98 R14: ffff88106ebfba00 R15: ffff881027afa044 FS: 00002ab82d4c7740(0000) GS:ffff88089c520000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002ab82d167360 CR3: 000000102d0ca000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount (pid: 3792, threadinfo ffff881016852000, task ffff88102d7d1540) Stack: ffffffff81fcb700 01ff88086e17b148 ffff88102c932400 ffff88102c932000 <d> ffff88102d7d1540 ffff88106ebfba00 ffff88086e17b350 ffff88086e17b138 <d> ffff881016853d88 ffffffff8119cccf ffff88086e17b138 ffff88086e17b138 Call Trace: [<ffffffff8119cccf>] ? destroy_inode+0x2f/0x60 [<ffffffff8119d19c>] ? dispose_list+0xfc/0x120 [<ffffffff8119d596>] ? invalidate_inodes+0xf6/0x190 [<ffffffff8118334b>] ? generic_shutdown_super+0x5b/0xe0 [<ffffffff81183436>] ? kill_anon_super+0x16/0x60 [<ffffffffa06e82ea>] ? lustre_kill_super+0x4a/0x60 [obdclass] [<ffffffff81183bd7>] ? deactivate_super+0x57/0x80 [<ffffffff811a1c4f>] ? mntput_no_expire+0xbf/0x110 [<ffffffff811a26bb>] ? sys_umount+0x7b/0x3a0 [<ffffffff810863b1>] ? sigprocmask+0x71/0x110 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b Code: c0 4c 89 a5 10 ff ff ff 4c 8d b5 40 ff ff ff 48 89 95 20 ff ff ff 49 89 cd 49 89 d4 0f 85 ad 00 00 00 45 85 ff 0f 85 a4 00 00 00 <8b> 83 0c 01 00 00 85 c0 74 f6 4c 89 f7 e8 72 f3 a4 ff 4c 89 f6 Call Trace: [<ffffffffa0b65dc7>] ? ll_put_super+0xa7/0x510 [lustre] [<ffffffff8119cccf>] ? destroy_inode+0x2f/0x60 [<ffffffff8119d19c>] ? dispose_list+0xfc/0x120 [<ffffffff8119d596>] ? invalidate_inodes+0xf6/0x190 [<ffffffff8118334b>] ? generic_shutdown_super+0x5b/0xe0 [<ffffffff81183436>] ? kill_anon_super+0x16/0x60 [<ffffffffa06e82ea>] ? lustre_kill_super+0x4a/0x60 [obdclass] [<ffffffff81183bd7>] ? deactivate_super+0x57/0x80 [<ffffffff811a1c4f>] ? mntput_no_expire+0xbf/0x110 [<ffffffff811a26bb>] ? sys_umount+0x7b/0x3a0 [<ffffffff810863b1>] ? sigprocmask+0x71/0x110 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b BUG: soft lockup - CPU#25 stuck for 67s! [umount:3792]

            People

              laisiyao Lai Siyao
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: