Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-620

"Bad page state" reported after unlink

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.0, Lustre 2.1.2, Lustre 1.8.6
    • None
    • Client: Lustre b1_8 Git 999530e, Linux 2.6.32.8
    • 3
    • 4860

    Description

      I have a reproducable test case of a page bug, which is clearly reported on a kernel with additional debugging enabled.

      $ dd if=/dev/zero of=/net/lustre/file bs=4096 count=1
      $ rm /net/lustre/file
      BUG: Bad page state in process rm pfn:21fe6a
      page:ffffea00076fa730 flags:800000000000000c count:0 mapcount:0 mapping:(null) index:1

      The bug occurs on unlink() of a file shortly after it was written to.

      If there is a delay of a few seconds before the rm, all is okay. Truncate works, but a subsequent unlink rm can fail if it is quick enough.

      It appears that this bug could be the cause of some kind of mis accounting of the kernel's page cache, which causes lockups when the task is running in a cgroup. Originally I brought this up in a mailing list thread:

      http://lists.lustre.org/pipermail/lustre-devel/2011-July/003865.html
      http://lists.lustre.org/pipermail/lustre-devel/2011-August/003876.html

      Here's a full example, taken today on the attached kernel config. The process is not running in cgroup, although the kernel is built with cgroup.

      BUG: Bad page state in process rm pfn:77813
      page:ffffea0002914688 flags:400000000000000c count:0 mapcount:0 mapping:(null) index:0
      Pid: 1173, comm: rm Not tainted 2.6.32.28-ml #8
      Call Trace:
      [<ffffffff81094ab2>] bad_page+0xd2/0x130
      [<ffffffff810c4c39>] ? lookup_page_cgroup_used+0x9/0x20
      [<ffffffff810978ea>] free_hot_cold_page+0x6a/0x2d0
      [<ffffffff81097bab>] free_hot_page+0xb/0x10
      [<ffffffff8109a65a>] put_page+0xea/0x140
      [<ffffffffa04fc5c7>] ll_page_removal_cb+0x207/0x510 [lustre]
      [<ffffffffa041207b>] cache_remove_lock+0x1ab/0x29c [osc]
      [<ffffffffa03fafad>] osc_extent_blocking_cb+0x25d/0x2e0 [osc]
      [<ffffffff8137bcf6>] ? _spin_unlock+0x26/0x30
      [<ffffffffa02db058>] ? unlock_res_and_lock+0x58/0x100 [ptlrpc]
      [<ffffffffa02df630>] ldlm_cancel_callback+0x60/0xf0 [ptlrpc]
      [<ffffffffa02f877c>] ldlm_cli_cancel_local+0x6c/0x350 [ptlrpc]
      [<ffffffffa02fa960>] ldlm_cancel_list+0xf0/0x240 [ptlrpc]
      [<ffffffffa02fac67>] ldlm_cancel_resource_local+0x1b7/0x2d0 [ptlrpc]
      [<ffffffff81070f99>] ? is_module_address+0x9/0x20
      [<ffffffffa03fcb57>] osc_destroy+0x107/0x730 [osc]
      [<ffffffffa04b6a65>] ? lov_prep_destroy_set+0x285/0x970 [lov]
      [<ffffffffa04a07c8>] lov_destroy+0x568/0xf20 [lov]
      [<ffffffffa05355e3>] ll_objects_destroy+0x4e3/0x18c0 [lustre]
      [<ffffffffa046d099>] ? mdc_reint+0xd9/0x270 [mdc]
      [<ffffffffa0537098>] ll_unlink_generic+0x298/0x360 [lustre]
      [<ffffffff8137a65f>] ? __mutex_lock_common+0x27f/0x3b0
      [<ffffffff810d3c7e>] ? vfs_unlink+0x5e/0xd0
      [<ffffffffa01a65c9>] ? cfs_free+0x9/0x10 [libcfs]
      [<ffffffffa053716d>] ll_unlink+0xd/0x10 [lustre]
      [<ffffffff810d3cad>] vfs_unlink+0x8d/0xd0
      [<ffffffff810d6245>] ? lookup_hash+0x35/0x50
      [<ffffffff810d7613>] do_unlinkat+0x183/0x1c0
      [<ffffffff8137b828>] ? lockdep_sys_exit_thunk+0x35/0x67
      [<ffffffff8137b7b2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [<ffffffff810d77ad>] sys_unlinkat+0x1d/0x40
      [<ffffffff8100b3c2>] system_call_fastpath+0x16/0x1b

      Attachments

        Issue Links

          Activity

            [LU-620] "Bad page state" reported after unlink
            mark Mark Hills added a comment -

            We are testing this with kernel 2.6.32-220.4.1.el6.x86_64; and Whamcloud b1_8 HEAD (Git 18aafe97)

            The patch does not fix the bug, seemingly because this kernel does not export any of

            truncate_complete_page
            remove_from_page_cache
            delete_from_page_cache

            For now, I need to continue to use my initial patch, which exports truncate_complete_page from the kernel.

            mark Mark Hills added a comment - We are testing this with kernel 2.6.32-220.4.1.el6.x86_64; and Whamcloud b1_8 HEAD (Git 18aafe97) The patch does not fix the bug, seemingly because this kernel does not export any of truncate_complete_page remove_from_page_cache delete_from_page_cache For now, I need to continue to use my initial patch, which exports truncate_complete_page from the kernel.
            bobijam Zhenyu Xu added a comment -

            b2_1 patch tracking at http://review.whamcloud.com/2230

            bobijam Zhenyu Xu added a comment - b2_1 patch tracking at http://review.whamcloud.com/2230
            pjones Peter Jones added a comment -

            Bobi

            Could you please port this patch to b2_1

            Thanks

            Peter

            pjones Peter Jones added a comment - Bobi Could you please port this patch to b2_1 Thanks Peter
            pjones Peter Jones added a comment -

            Landed for 2.2

            pjones Peter Jones added a comment - Landed for 2.2

            Integrated in lustre-b1_8 » i686,server,el5,ofa #163
            LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

            Result = SUCCESS
            Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
            Files :

            • lustre/include/linux/lustre_compat25.h
            • lustre/llite/rw.c
            • lustre/include/linux/lustre_patchless_compat.h
            • lustre/llite/file.c
            • lustre/autoconf/lustre-core.m4
            • lustre/llite/dir.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,server,el5,ofa #163 LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217) Result = SUCCESS Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217 Files : lustre/include/linux/lustre_compat25.h lustre/llite/rw.c lustre/include/linux/lustre_patchless_compat.h lustre/llite/file.c lustre/autoconf/lustre-core.m4 lustre/llite/dir.c

            Integrated in lustre-b1_8 » i686,server,el5,inkernel #163
            LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

            Result = SUCCESS
            Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
            Files :

            • lustre/llite/file.c
            • lustre/llite/rw.c
            • lustre/llite/dir.c
            • lustre/autoconf/lustre-core.m4
            • lustre/include/linux/lustre_compat25.h
            • lustre/include/linux/lustre_patchless_compat.h
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,server,el5,inkernel #163 LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217) Result = SUCCESS Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217 Files : lustre/llite/file.c lustre/llite/rw.c lustre/llite/dir.c lustre/autoconf/lustre-core.m4 lustre/include/linux/lustre_compat25.h lustre/include/linux/lustre_patchless_compat.h

            Integrated in lustre-b1_8 » i686,client,el5,inkernel #163
            LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

            Result = SUCCESS
            Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
            Files :

            • lustre/llite/file.c
            • lustre/autoconf/lustre-core.m4
            • lustre/llite/rw.c
            • lustre/llite/dir.c
            • lustre/include/linux/lustre_compat25.h
            • lustre/include/linux/lustre_patchless_compat.h
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,client,el5,inkernel #163 LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217) Result = SUCCESS Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217 Files : lustre/llite/file.c lustre/autoconf/lustre-core.m4 lustre/llite/rw.c lustre/llite/dir.c lustre/include/linux/lustre_compat25.h lustre/include/linux/lustre_patchless_compat.h

            Integrated in lustre-b1_8 » i686,client,el5,ofa #163
            LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

            Result = SUCCESS
            Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
            Files :

            • lustre/include/linux/lustre_compat25.h
            • lustre/include/linux/lustre_patchless_compat.h
            • lustre/llite/rw.c
            • lustre/llite/dir.c
            • lustre/autoconf/lustre-core.m4
            • lustre/llite/file.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,client,el5,ofa #163 LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217) Result = SUCCESS Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217 Files : lustre/include/linux/lustre_compat25.h lustre/include/linux/lustre_patchless_compat.h lustre/llite/rw.c lustre/llite/dir.c lustre/autoconf/lustre-core.m4 lustre/llite/file.c

            Integrated in lustre-b1_8 » i686,client,el6,inkernel #163
            LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

            Result = SUCCESS
            Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
            Files :

            • lustre/llite/rw.c
            • lustre/include/linux/lustre_compat25.h
            • lustre/llite/dir.c
            • lustre/llite/file.c
            • lustre/autoconf/lustre-core.m4
            • lustre/include/linux/lustre_patchless_compat.h
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,client,el6,inkernel #163 LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217) Result = SUCCESS Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217 Files : lustre/llite/rw.c lustre/include/linux/lustre_compat25.h lustre/llite/dir.c lustre/llite/file.c lustre/autoconf/lustre-core.m4 lustre/include/linux/lustre_patchless_compat.h

            Integrated in lustre-b1_8 » x86_64,server,el5,ofa #163
            LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

            Result = SUCCESS
            Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
            Files :

            • lustre/llite/file.c
            • lustre/llite/rw.c
            • lustre/llite/dir.c
            • lustre/include/linux/lustre_compat25.h
            • lustre/include/linux/lustre_patchless_compat.h
            • lustre/autoconf/lustre-core.m4
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » x86_64,server,el5,ofa #163 LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217) Result = SUCCESS Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217 Files : lustre/llite/file.c lustre/llite/rw.c lustre/llite/dir.c lustre/include/linux/lustre_compat25.h lustre/include/linux/lustre_patchless_compat.h lustre/autoconf/lustre-core.m4

            People

              bobijam Zhenyu Xu
              mark Mark Hills
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: