Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9458

LustreError: 12764:0:(sec_bulk.c:188:enc_pools_release_free_pages()) ASSERTION( npages <= page_pools.epp_free_pages ) failed:

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      On mount:

      [248586.498989] Lustre: Lustre: Build Version: 2.9.56_82_g0eff453
      [248586.559319] LNet: Added LNI 192.168.122.121@tcp [8/256/0/180]
      [248586.559412] LNet: Accept secure, port 988
      [248586.741706] Lustre: Echo OBD driver; http://www.lustre.org/
      [248587.766363] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      [248588.072575] LDISKFS-fs (loop0): file extents enabled, maximum tree depth=5
      [248588.074580] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      [248588.446477] LDISKFS-fs (loop0): file extents enabled, maximum tree depth=5
      [248588.448298] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      [248589.577659] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      [248589.597799] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      [248589.620201] Lustre: MGS: Connection restored to 79c760f4-6c2d-891b-1d76-5801427fe973 (at 0@lo)
      [248589.641150] LustreError: 12764:0:(sec_bulk.c:188:enc_pools_release_free_pages()) ASSERTION( npages <= page_pools.epp_free_pages ) failed:
      [248589.654676] LustreError: 12764:0:(sec_bulk.c:188:enc_pools_release_free_pages()) LBUG
      [248589.663682] Pid: 12764, comm: mount.lustre
      [248589.663684]
                      Call Trace:
      [248589.663709]  [<ffffffffa05a47ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
      [248589.663716]  [<ffffffffa05a487c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [248589.663780]  [<ffffffffa092a9b7>] enc_pools_shrink+0x5e7/0x680 [ptlrpc]
      [248589.663787]  [<ffffffff811942b3>] shrink_slab+0x163/0x330
      [248589.663791]  [<ffffffff810c1b25>] ? check_preempt_curr+0x85/0xa0
      [248589.663794]  [<ffffffff810c1b59>] ? ttwu_do_wakeup+0x19/0xd0
      [248589.663798]  [<ffffffff811975b2>] do_try_to_free_pages+0x3c2/0x4e0
      [248589.663801]  [<ffffffff811977cc>] try_to_free_pages+0xfc/0x180
      [248589.663807]  [<ffffffff81682074>] __alloc_pages_slowpath+0x458/0x725
      [248589.663810]  [<ffffffff8118b155>] __alloc_pages_nodemask+0x405/0x420
      [248589.663814]  [<ffffffff81683262>] kmalloc_large_node+0x60/0x8d
      [248589.663819]  [<ffffffff811dd5e7>] __kmalloc_node+0x247/0x2b0
      [248589.663856]  [<ffffffffa06b0599>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [248589.663897]  [<ffffffffa0902c15>] ptlrpc_alloc_rqbd+0xd5/0x580 [ptlrpc]
      [248589.663936]  [<ffffffffa090319d>] ptlrpc_grow_req_bufs+0xdd/0x280 [ptlrpc]
      [248589.663979]  [<ffffffffa0903654>] ptlrpc_service_part_init+0x314/0x680 [ptlrpc]
      [248589.664037]  [<ffffffffa0908907>] ptlrpc_register_service+0x337/0xe60 [ptlrpc]
      [248589.664063]  [<ffffffffa0f221dc>] mds_start_ptlrpc_service+0x41c/0xbb0 [mdt]
      [248589.664077]  [<ffffffffa0f22a84>] mds_device_alloc+0x114/0x290 [mdt]
      [248589.664106]  [<ffffffffa06b9944>] obd_setup+0x114/0x2a0 [obdclass]
      [248589.664130]  [<ffffffffa06bcdc4>] class_setup+0x2f4/0x8d0 [obdclass]
      [248589.664153]  [<ffffffffa06c0f92>] class_process_config+0x1d12/0x2b80 [obdclass]
      [248589.664176]  [<ffffffffa06b0599>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [248589.664199]  [<ffffffffa06ca1d9>] do_lcfg+0x159/0x5d0 [obdclass]
      [248589.664221]  [<ffffffffa06caf98>] lustre_start_simple+0x88/0x210 [obdclass]
      [248589.664250]  [<ffffffffa06f3542>] server_start_targets+0xac2/0x2a70 [obdclass]
      [248589.664291]  [<ffffffffa090def8>] ? ptlrpc_pinger_wake_up+0x28/0x30 [ptlrpc]
      [248589.664330]  [<ffffffffa090e0a7>] ? ptlrpc_pinger_add_import+0x1a7/0x1e0 [ptlrpc]
      [248589.664353]  [<ffffffffa06b06c1>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass]
      [248589.664375]  [<ffffffffa06cb32d>] ? lustre_start_mgc+0x20d/0x2780 [obdclass]
      [248589.664399]  [<ffffffffa06f657d>] server_fill_super+0x108d/0x184a [obdclass]
      [248589.664421]  [<ffffffffa06ce348>] lustre_fill_super+0x328/0x950 [obdclass]
      [248589.664442]  [<ffffffffa06ce020>] ? lustre_fill_super+0x0/0x950 [obdclass]
      [248589.664447]  [<ffffffff81201b1d>] mount_nodev+0x4d/0xb0
      [248589.664488]  [<ffffffffa06c5af8>] lustre_mount+0x38/0x60 [obdclass]
      [248589.664492]  [<ffffffff812024c9>] mount_fs+0x39/0x1b0
      [248589.664497]  [<ffffffff8121e25f>] vfs_kern_mount+0x5f/0xf0
      [248589.664500]  [<ffffffff812207be>] do_mount+0x24e/0xaa0
      [248589.664504]  [<ffffffff81185a7e>] ? __get_free_pages+0xe/0x50
      [248589.664507]  [<ffffffff812210a6>] SyS_mount+0x96/0xf0
      [248589.664512]  [<ffffffff81696d49>] system_call_fastpath+0x16/0x1b
      [248589.664514]
      [248589.664516] Kernel panic - not syncing: LBUG
      

      I am suspicious of the following from LU-3308.

      @@ -242,7 +242,7 @@ static unsigned long enc_pools_shrink_count(struct shrinker *s,
              }
       
              LASSERT(page_pools.epp_idle_idx <= IDLE_IDX_MAX);
      -       return max((int)page_pools.epp_free_pages - PTLRPC_MAX_BRW_PAGES, 0) *
      +       return max(page_pools.epp_free_pages - PTLRPC_MAX_BRW_PAGES, 0UL) *
                      (IDLE_IDX_MAX - page_pools.epp_idle_idx) / IDLE_IDX_MAX;
       }
      

      page_pools.epp_free_pages - PTLRPC_MAX_BRW_PAGES is unsigned long and if val is unsigned long then max(val, 0UL) is always equal to val.

      Attachments

        Issue Links

          Activity

            [LU-9458] LustreError: 12764:0:(sec_bulk.c:188:enc_pools_release_free_pages()) ASSERTION( npages <= page_pools.epp_free_pages ) failed:
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27016/
            Subject: LU-9458 ptlrpc: handle case of epp_free_pages <= PTLRPC_MAX_BRW_PAGES
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2da01a984786ef0bc2700530a6f29ea9063362e9

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27016/ Subject: LU-9458 ptlrpc: handle case of epp_free_pages <= PTLRPC_MAX_BRW_PAGES Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2da01a984786ef0bc2700530a6f29ea9063362e9
            bogl Bob Glossman (Inactive) added a comment - - edited

            have seen and reproduced this failure on both el7.3 and sles12sp2.
            have also confirmed that the proposed fix makes the issue go away in both environments.

            bogl Bob Glossman (Inactive) added a comment - - edited have seen and reproduced this failure on both el7.3 and sles12sp2. have also confirmed that the proposed fix makes the issue go away in both environments.

            Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/27016
            Subject: LU-9458 ptlrpc: handle case of epp_free_pages <= PTLRPC_MAX_BRW_PAGES
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6a81c23cd7bc70ae2c8c292c98e6a6b13dc929a2

            gerrit Gerrit Updater added a comment - Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/27016 Subject: LU-9458 ptlrpc: handle case of epp_free_pages <= PTLRPC_MAX_BRW_PAGES Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6a81c23cd7bc70ae2c8c292c98e6a6b13dc929a2
            jhammond John Hammond added a comment -

            And in enc_pools_shrink_scan() we should handle the case that page_pools.epp_free_pages is less than PTLRPC_MAX_BRW_PAGES:

                    sc->nr_to_scan = min_t(unsigned long, sc->nr_to_scan,
                                          page_pools.epp_free_pages - PTLRPC_MAX_BRW_PAGES);
            
            jhammond John Hammond added a comment - And in enc_pools_shrink_scan() we should handle the case that page_pools.epp_free_pages is less than PTLRPC_MAX_BRW_PAGES : sc->nr_to_scan = min_t(unsigned long , sc->nr_to_scan, page_pools.epp_free_pages - PTLRPC_MAX_BRW_PAGES);

            People

              bogl Bob Glossman (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: