Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8435

LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) )

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • Lustre 2.7.0
    • Bull Lustre distribution based on Lustre 2.7.2
    • 3
    • 9223372036854775807

    Description

      In the last month one of our customer hit more than 100 times a crash with the following signature:

      [506626.555125] SLUB: Unable to allocate memory on node -1 (gfp=0x80c0)
      [506626.562216]   cache: kvm_mmu_page_header(22:step_batch), object size: 168,
      buffer size: 168, default order: 1, min order: 0
      [506626.574729]   node 0: slabs: 0, objs: 0, free: 0
      [506626.579974]   node 1: slabs: 0, objs: 0, free: 0
      [506626.585219]   node 2: slabs: 60, objs: 2880, free: 0
      [506626.590852]   node 3: slabs: 0, objs: 0, free: 0
      [506626.596112] LustreError: 41604:0:(osc_cache.c:1290:osc_completion())
      ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) failed:
      cp_state:0, cmd:1
      [506626.612512] LustreError: 41604:0:(osc_cache.c:1290:osc_completion()) LBUG
      [506626.620186] Pid: 41604, comm: cat
      [506626.623978]
                      Call Trace:
      [506626.628573]  [<ffffffffa05eb853>] libcfs_debug_dumpstack+0x53/0x80
      [libcfs]
      [506626.636448]  [<ffffffffa05ebdf5>] lbug_with_loc+0x45/0xc0 [libcfs]
      [506626.643456]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
      [osc]
      [506626.651526]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
      [506626.659108]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
      [506626.666037]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
      [506626.673531]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
      [506626.680450]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
      [506626.687961]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
      [506626.694964]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
      [506626.702078]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
      [506626.709674]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
      [506626.716785]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
      [506626.723797]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
      [506626.731477]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
      [506626.738962]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
      [506626.745962]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
      [506626.751700]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
      [506626.757345]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
      [506626.764138]
      [506626.765990] Kernel panic - not syncing: LBUG
      [506626.770850] CPU: 53 PID: 41604 Comm: cat Tainted: G           OE 
      ------------   3.10.0-327.22.2.el7.x86_64 #1
      [506626.782104] Hardware name: BULL bullx blade/CHPU, BIOS BIOSX07.037.01.003
      10/23/2015
      [506626.790838]  ffffffffa0610ced 000000000f6a3070 ffff8817799eb8c0
      ffffffff816360f4
      [506626.799228]  ffff8817799eb940 ffffffff8162f96a ffffffff00000008
      ffff8817799eb950
      [506626.807618]  ffff8817799eb8f0 000000000f6a3070 ffffffffa0e01466
      0000000000000246
      [506626.816005] Call Trace:
      [506626.818839]  [<ffffffff816360f4>] dump_stack+0x19/0x1b
      [506626.824668]  [<ffffffff8162f96a>] panic+0xd8/0x1e7
      [506626.830128]  [<ffffffffa05ebe5b>] lbug_with_loc+0xab/0xc0 [libcfs]
      [506626.837129]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
      [osc]
      [506626.845192]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
      [506626.852766]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
      [506626.859702]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
      [506626.867184]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
      [506626.874099]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
      [506626.881611]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
      [506626.888609]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
      [506626.895721]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
      [506626.903322]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
      [506626.910418]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
      [506626.917420]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
      [506626.925091]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
      [506626.932575]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
      [506626.939569]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
      [506626.945300]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
      [506626.950938]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
      

      The customer being a black site, we can't provide the crashdump, but will happily provide any text output you would find useful.

      Attachments

        1. crash_output.txt
          24 kB
        2. foreach_bt_merge.txt
          152 kB
        3. struct_analyze1.txt
          50 kB

        Issue Links

          Activity

            [LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) )

            Hello Sebastien,
            Patch has been back-ported and has landed on b2_7_fe branch, do you agree to close ?

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Sebastien, Patch has been back-ported and has landed on b2_7_fe branch, do you agree to close ?

            I'm running a test on this now. Seems good, no crashes ... yet.

            bevans Ben Evans (Inactive) added a comment - I'm running a test on this now. Seems good, no crashes ... yet.

            My bad, I miss your last comment . That's the problem working too often with 2.7.2 only, you sometimes miss what was added in master.

            I've pushed a reproducer for this in sanity. Do whatever you want with it.

            adegremont Aurelien Degremont (Inactive) added a comment - My bad, I miss your last comment . That's the problem working too often with 2.7.2 only, you sometimes miss what was added in master. I've pushed a reproducer for this in sanity. Do whatever you want with it.

            Aurelien Degremont (aurelien.degremont@cea.fr) uploaded a new patch: http://review.whamcloud.com/21745
            Subject: LU-8435 tests: slab alloc error does not LBUG
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0e8a912a7dfc65d14d1f252efea043637620f518

            gerrit Gerrit Updater added a comment - Aurelien Degremont (aurelien.degremont@cea.fr) uploaded a new patch: http://review.whamcloud.com/21745 Subject: LU-8435 tests: slab alloc error does not LBUG Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0e8a912a7dfc65d14d1f252efea043637620f518

            Well in fact, I just found that this has already been fixed in master by Andreas fix for LU-6215 (http://review.whamcloud.com/13956, commit eaba382df06cd6f570b16c5f759f612b057e5324). Thus I will simply back-port it to b2_7!

            bfaccini Bruno Faccini (Inactive) added a comment - Well in fact, I just found that this has already been fixed in master by Andreas fix for LU-6215 ( http://review.whamcloud.com/13956 , commit eaba382df06cd6f570b16c5f759f612b057e5324). Thus I will simply back-port it to b2_7!

            And use of list_for_each_entry_safe() + list_del() looks like a better one
            This will be my first patch attempt to fix this, I will then try to expose it to your reproducer to see how this works then.

            bfaccini Bruno Faccini (Inactive) added a comment - And use of list_for_each_entry_safe() + list_del() looks like a better one This will be my first patch attempt to fix this, I will then try to expose it to your reproducer to see how this works then.

            I agree with this analysis. cp_state is CPS_PAGEIN in almost all our crashdumps. Except for 1 or 2 but this could means that the page was freed between the 2 iterations IMO?

            list_for_each_entry + list_del_init seems a really bad idea.

            adegremont Aurelien Degremont (Inactive) added a comment - I agree with this analysis. cp_state is CPS_PAGEIN in almost all our crashdumps. Except for 1 or 2 but this could means that the page was freed between the 2 iterations IMO? list_for_each_entry + list_del_init seems a really bad idea.

            Well having a better look to the code involved in error path, my guess is that the problem may be caused by the following piece:

            2621 int osc_queue_sync_pages(const struct lu_env *env, struct osc_object *obj,
            2622                          struct list_head *list, int cmd, int brw_flags)
            2623 {
            2624         struct client_obd     *cli = osc_cli(obj);
            2625         struct osc_extent     *ext;
            2626         struct osc_async_page *oap;
            2627         int     page_count = 0;
            2628         int     mppr       = cli->cl_max_pages_per_rpc;
            2629         pgoff_t start      = CL_PAGE_EOF;
            2630         pgoff_t end        = 0;
            2631         ENTRY;
            2632 
            2633         list_for_each_entry(oap, list, oap_pending_item) {
            2634                 pgoff_t index = osc_index(oap2osc(oap));
            2635                 if (index > end)
            2636                         end = index;
            2637                 if (index < start)
            2638                         start = index;
            2639                 ++page_count;
            2640                 mppr <<= (page_count > mppr);
            2641         }
            2642 
            2643         ext = osc_extent_alloc(obj);   <<<<<<<<< Memory allocation failure can occur here
            2644         if (ext == NULL) {
            2645                 list_for_each_entry(oap, list, oap_pending_item) {
            2646                         list_del_init(&oap->oap_pending_item);     <<<<<<<<<<<<
            2647                         osc_ap_completion(env, cli, oap, 0, -ENOMEM);
            2648                 }
            2649                 RETURN(-ENOMEM);
            2650         }
            

            the list_del_init() should cause the current oap to have its next/prev oap_pending_item list pointers to point on itself and thus to be again selected by list-for_each_entry() to be passed for a 2nd call to osc_ap_completion() when its processing has already been done during 1st call to osc_ap_completion() for it, and particularly its associated cl_page->cp_state has already been set to CPS_CACHED in osc_ap_completion()>osc_completion()>cl_page_completion(), hence the Assert du to no longer CPS_PAGEIN.

            I am working on a patch to fix this and to be pushed soon.

            bfaccini Bruno Faccini (Inactive) added a comment - Well having a better look to the code involved in error path, my guess is that the problem may be caused by the following piece: 2621 int osc_queue_sync_pages(const struct lu_env *env, struct osc_object *obj, 2622 struct list_head *list, int cmd, int brw_flags) 2623 { 2624 struct client_obd *cli = osc_cli(obj); 2625 struct osc_extent *ext; 2626 struct osc_async_page *oap; 2627 int page_count = 0; 2628 int mppr = cli->cl_max_pages_per_rpc; 2629 pgoff_t start = CL_PAGE_EOF; 2630 pgoff_t end = 0; 2631 ENTRY; 2632 2633 list_for_each_entry(oap, list, oap_pending_item) { 2634 pgoff_t index = osc_index(oap2osc(oap)); 2635 if (index > end) 2636 end = index; 2637 if (index < start) 2638 start = index; 2639 ++page_count; 2640 mppr <<= (page_count > mppr); 2641 } 2642 2643 ext = osc_extent_alloc(obj); <<<<<<<<< Memory allocation failure can occur here 2644 if (ext == NULL) { 2645 list_for_each_entry(oap, list, oap_pending_item) { 2646 list_del_init(&oap->oap_pending_item); <<<<<<<<<<<< 2647 osc_ap_completion(env, cli, oap, 0, -ENOMEM); 2648 } 2649 RETURN(-ENOMEM); 2650 } the list_del_init() should cause the current oap to have its next/prev oap_pending_item list pointers to point on itself and thus to be again selected by list-for_each_entry() to be passed for a 2nd call to osc_ap_completion() when its processing has already been done during 1st call to osc_ap_completion() for it, and particularly its associated cl_page->cp_state has already been set to CPS_CACHED in osc_ap_completion() >osc_completion() >cl_page_completion(), hence the Assert du to no longer CPS_PAGEIN. I am working on a patch to fix this and to be pushed soon.

            ConstrainRAMSpace=yes is the line. That's the same issue than us. If you can afford to turn it off, this should work around the crash.

            adegremont Aurelien Degremont (Inactive) added a comment - ConstrainRAMSpace=yes is the line. That's the same issue than us. If you can afford to turn it off, this should work around the crash.

            the cgroup.conf looks like this:

            CgroupAutomount=yes
            CgroupMountpoint="/dev"
            ConstrainCores=yes
            ConstrainRAMSpace=yes
            TaskAffinity=no

            bevans Ben Evans (Inactive) added a comment - the cgroup.conf looks like this: CgroupAutomount=yes CgroupMountpoint="/dev" ConstrainCores=yes ConstrainRAMSpace=yes TaskAffinity=no

            Could you verify memory cgroup is enabled in your SLURM environment? See /etc/slurm/cgroup.conf
            SLURM is tuning these files.

            https://github.com/SchedMD/slurm/blob/slurm-16.05/src/plugins/task/cgroup/task_cgroup_memory.c#L305

            adegremont Aurelien Degremont (Inactive) added a comment - Could you verify memory cgroup is enabled in your SLURM environment? See /etc/slurm/cgroup.conf SLURM is tuning these files. https://github.com/SchedMD/slurm/blob/slurm-16.05/src/plugins/task/cgroup/task_cgroup_memory.c#L305

            People

              bfaccini Bruno Faccini (Inactive)
              spiechurski Sebastien Piechurski
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: