[LU-620] "Bad page state" reported after unlink Created: 23/Aug/11  Updated: 09/May/12  Resolved: 16/Mar/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 2.1.2, Lustre 1.8.6
Fix Version/s: Lustre 2.2.0, Lustre 2.1.2, Lustre 1.8.8

Type: Bug Priority: Blocker
Reporter: Mark Hills Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Client: Lustre b1_8 Git 999530e, Linux 2.6.32.8


Attachments: File config.gz    
Severity: 3
Rank (Obsolete): 4860

 Description   

I have a reproducable test case of a page bug, which is clearly reported on a kernel with additional debugging enabled.

$ dd if=/dev/zero of=/net/lustre/file bs=4096 count=1
$ rm /net/lustre/file
BUG: Bad page state in process rm pfn:21fe6a
page:ffffea00076fa730 flags:800000000000000c count:0 mapcount:0 mapping:(null) index:1

The bug occurs on unlink() of a file shortly after it was written to.

If there is a delay of a few seconds before the rm, all is okay. Truncate works, but a subsequent unlink rm can fail if it is quick enough.

It appears that this bug could be the cause of some kind of mis accounting of the kernel's page cache, which causes lockups when the task is running in a cgroup. Originally I brought this up in a mailing list thread:

http://lists.lustre.org/pipermail/lustre-devel/2011-July/003865.html
http://lists.lustre.org/pipermail/lustre-devel/2011-August/003876.html

Here's a full example, taken today on the attached kernel config. The process is not running in cgroup, although the kernel is built with cgroup.

BUG: Bad page state in process rm pfn:77813
page:ffffea0002914688 flags:400000000000000c count:0 mapcount:0 mapping:(null) index:0
Pid: 1173, comm: rm Not tainted 2.6.32.28-ml #8
Call Trace:
[<ffffffff81094ab2>] bad_page+0xd2/0x130
[<ffffffff810c4c39>] ? lookup_page_cgroup_used+0x9/0x20
[<ffffffff810978ea>] free_hot_cold_page+0x6a/0x2d0
[<ffffffff81097bab>] free_hot_page+0xb/0x10
[<ffffffff8109a65a>] put_page+0xea/0x140
[<ffffffffa04fc5c7>] ll_page_removal_cb+0x207/0x510 [lustre]
[<ffffffffa041207b>] cache_remove_lock+0x1ab/0x29c [osc]
[<ffffffffa03fafad>] osc_extent_blocking_cb+0x25d/0x2e0 [osc]
[<ffffffff8137bcf6>] ? _spin_unlock+0x26/0x30
[<ffffffffa02db058>] ? unlock_res_and_lock+0x58/0x100 [ptlrpc]
[<ffffffffa02df630>] ldlm_cancel_callback+0x60/0xf0 [ptlrpc]
[<ffffffffa02f877c>] ldlm_cli_cancel_local+0x6c/0x350 [ptlrpc]
[<ffffffffa02fa960>] ldlm_cancel_list+0xf0/0x240 [ptlrpc]
[<ffffffffa02fac67>] ldlm_cancel_resource_local+0x1b7/0x2d0 [ptlrpc]
[<ffffffff81070f99>] ? is_module_address+0x9/0x20
[<ffffffffa03fcb57>] osc_destroy+0x107/0x730 [osc]
[<ffffffffa04b6a65>] ? lov_prep_destroy_set+0x285/0x970 [lov]
[<ffffffffa04a07c8>] lov_destroy+0x568/0xf20 [lov]
[<ffffffffa05355e3>] ll_objects_destroy+0x4e3/0x18c0 [lustre]
[<ffffffffa046d099>] ? mdc_reint+0xd9/0x270 [mdc]
[<ffffffffa0537098>] ll_unlink_generic+0x298/0x360 [lustre]
[<ffffffff8137a65f>] ? __mutex_lock_common+0x27f/0x3b0
[<ffffffff810d3c7e>] ? vfs_unlink+0x5e/0xd0
[<ffffffffa01a65c9>] ? cfs_free+0x9/0x10 [libcfs]
[<ffffffffa053716d>] ll_unlink+0xd/0x10 [lustre]
[<ffffffff810d3cad>] vfs_unlink+0x8d/0xd0
[<ffffffff810d6245>] ? lookup_hash+0x35/0x50
[<ffffffff810d7613>] do_unlinkat+0x183/0x1c0
[<ffffffff8137b828>] ? lockdep_sys_exit_thunk+0x35/0x67
[<ffffffff8137b7b2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff810d77ad>] sys_unlinkat+0x1d/0x40
[<ffffffff8100b3c2>] system_call_fastpath+0x16/0x1b



 Comments   
Comment by Mark Hills [ 23/Aug/11 ]

I found the source of this problem: an out-of-date copied function in lustre_patchless_compat.h

truncate_complete_page needs to handle cgroup appropriately, and the copy with its own ll_remove_from_page_cache does not. A call to mem_cgroup_uncharge_cache_page is needed but it is not exported, nor does it seem easy or sensible to copy into the Lustre tree.

Looks like this has broken the back of the compatibility layer for truncate_complete_page? For now I exported truncate_complete_page from the kernel and in an initial test it seemed to have fixed the problem, and cgroup began working reliably.

Comment by Peter Jones [ 20/Sep/11 ]

Bobijam

Could you please look into this report?

Thanks

Peter

Comment by Zhenyu Xu [ 21/Sep/11 ]

Mark,

Looks like 2.6.32.8 kernel exports delete_from_page_cache in mm/filemap.c (Mark, can you confirm that also?)which can do what ll_remove_from_page_cache and page_cache_release do.

For new pachless client, we cannot patch client kernel code to export truncate_complete_page but can leverage already exported delete_from_page_cache to do the same job (uncharge cgroup accounting for the page).

But 2.6.38.8 exports another cgroup aware function remove_from_page_cache which do what ll_remove_from_page_cache does while rhel6 kernel does not export this function, it seems kernel has not settled down for this part which makes patchless client support difficult.

Comment by Zhenyu Xu [ 21/Sep/11 ]

patch tracking at http://review.whamcloud.com/1399

Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/autoconf/lustre-core.m4
  • lustre/llite/dir.c
  • lustre/include/linux/lustre_compat25.h
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/llite/dir.c
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/autoconf/lustre-core.m4
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/llite/dir.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/autoconf/lustre-core.m4
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/llite/dir.c
  • lustre/include/linux/lustre_compat25.h
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/llite/dir.c
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_compat25.h
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » i686,server,el5,ofa #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/llite/dir.c
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/llite/dir.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/autoconf/lustre-core.m4
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/dir.c
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/llite/dir.c
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_compat25.h
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/dir.c
  • lustre/autoconf/lustre-core.m4
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/dir.c
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/include/linux/lustre_compat25.h
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/llite/dir.c
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » i686,client,el5,ofa #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/dir.c
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Build Master (Inactive) [ 03/Nov/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #338
LU-620 llite: add delete_from_page_cache and remove_from_page_cache check (Revision 1515e409cc57af5eaef809eee6d8f8d6725d092b)

Result = SUCCESS
Oleg Drokin : 1515e409cc57af5eaef809eee6d8f8d6725d092b
Files :

  • lustre/llite/dir.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Zhenyu Xu [ 04/Nov/11 ]

b1_8 patch tracking at http://review.whamcloud.com/1649

Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/llite/dir.c
  • lustre/llite/rw.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/file.c
  • lustre/autoconf/lustre-core.m4
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/llite/dir.c
  • lustre/llite/file.c
  • lustre/llite/rw.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/llite/dir.c
  • lustre/llite/file.c
  • lustre/autoconf/lustre-core.m4
  • lustre/llite/rw.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/llite/rw.c
  • lustre/llite/file.c
  • lustre/autoconf/lustre-core.m4
  • lustre/llite/dir.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/llite/file.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/dir.c
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/llite/rw.c
  • lustre/autoconf/lustre-core.m4
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/llite/file.c
  • lustre/llite/rw.c
  • lustre/llite/dir.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/autoconf/lustre-core.m4
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/llite/rw.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/dir.c
  • lustre/llite/file.c
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/include/linux/lustre_compat25.h
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/llite/rw.c
  • lustre/llite/dir.c
  • lustre/autoconf/lustre-core.m4
  • lustre/llite/file.c
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/llite/file.c
  • lustre/autoconf/lustre-core.m4
  • lustre/llite/rw.c
  • lustre/llite/dir.c
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/llite/file.c
  • lustre/llite/rw.c
  • lustre/llite/dir.c
  • lustre/autoconf/lustre-core.m4
  • lustre/include/linux/lustre_compat25.h
  • lustre/include/linux/lustre_patchless_compat.h
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #163
LU-620 llite: add delete/remove_from_page_cache check (Revision cdf199679c1814902f181bec81a5bfe9902b8217)

Result = SUCCESS
Johann Lombardi : cdf199679c1814902f181bec81a5bfe9902b8217
Files :

  • lustre/include/linux/lustre_compat25.h
  • lustre/llite/rw.c
  • lustre/include/linux/lustre_patchless_compat.h
  • lustre/llite/file.c
  • lustre/autoconf/lustre-core.m4
  • lustre/llite/dir.c
Comment by Peter Jones [ 16/Jan/12 ]

Landed for 2.2

Comment by Peter Jones [ 29/Feb/12 ]

Bobi

Could you please port this patch to b2_1

Thanks

Peter

Comment by Zhenyu Xu [ 29/Feb/12 ]

b2_1 patch tracking at http://review.whamcloud.com/2230

Comment by Mark Hills [ 11/Apr/12 ]

We are testing this with kernel 2.6.32-220.4.1.el6.x86_64; and Whamcloud b1_8 HEAD (Git 18aafe97)

The patch does not fix the bug, seemingly because this kernel does not export any of

truncate_complete_page
remove_from_page_cache
delete_from_page_cache

For now, I need to continue to use my initial patch, which exports truncate_complete_page from the kernel.

Generated at Sat Feb 10 01:08:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.