[LU-844] LBUG: ASSERTION(iobuf->dr_npages < iobuf->dr_max_pages) when run obdfilter_survey using rsz >= 2M Created: 14/Nov/11  Updated: 04/Jan/12  Resolved: 04/Jan/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.2.0

Type: Bug Priority: Critical
Reporter: Sean Xu (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

On linux 2.6.18-238.12.1


Attachments: Text File lbug_report.log    
Severity: 4
Epic: performance
Rank (Obsolete): 4783

 Description   

Ran obdfilter_survey(lustre-iokit-1.2-200709210921) for rsz>=2M against local disk using command:

  1. size=32768 case=disk sh obdfilter-survey

I hit a LBUG:

Nov 11 13:24:01 oss0 kernel: LustreError: 12359:0:(filter_io_26.c:293:filter_iobuf_add_page()) ASSERTION(iobuf->dr_npages < iobuf->dr_max_pages) failed
Nov 11 13:24:01 oss0 kernel: LustreError: 12359:0:(filter_io_26.c:293:filter_iobuf_add_page()) LBUG
Nov 11 13:24:01 oss0 kernel: Pid: 12359, comm: lctl
Nov 11 13:24:01 oss0 kernel:
Nov 11 13:24:01 oss0 kernel: Call Trace:
Nov 11 13:24:01 oss0 kernel: [<ffffffff887786a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88778bda>] lbug_with_loc+0x7a/0xd0 [libcfs]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88780ff0>] tracefile_init+0x0/0x110 [libcfs]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88b3b650>] ldiskfs_bmap+0x0/0xf0 [ldiskfs]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88c0291b>] filter_iobuf_add_page+0x3b/0x70 [obdfilter]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88c05580>] filter_commitrw_write+0x4e0/0x2be0 [obdfilter]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88bfe8ce>] filter_brw+0x52e/0x6e0 [obdfilter]
Nov 11 13:24:01 oss0 kernel: [<ffffffff8000f41d>] __alloc_pages+0x78/0x308
Nov 11 13:24:01 oss0 kernel: [<ffffffff88c2d462>] echo_client_brw_ioctl+0x1032/0x3050 [obdecho]
Nov 11 13:24:01 oss0 kernel: [<ffffffff8003336f>] __tcp_push_pending_frames+0x766/0x840
Nov 11 13:24:01 oss0 kernel: [<ffffffff80022185>] tcp_transmit_skb+0x644/0x67c
Nov 11 13:24:01 oss0 kernel: [<ffffffff80064ae9>] _spin_lock_bh+0x9/0x14
Nov 11 13:24:01 oss0 kernel: [<ffffffff80030fdc>] release_sock+0x13/0xc1
Nov 11 13:24:01 oss0 kernel: [<ffffffff800ce73f>] zone_statistics+0x3e/0x6d
Nov 11 13:24:01 oss0 kernel: [<ffffffff8000abd2>] get_page_from_freelist+0x378/0x43a
Nov 11 13:24:01 oss0 kernel: [<ffffffff800ce73f>] zone_statistics+0x3e/0x6d
Nov 11 13:24:01 oss0 kernel: [<ffffffff800ce73f>] zone_statistics+0x3e/0x6d
Nov 11 13:24:01 oss0 kernel: [<ffffffff88c30583>] echo_client_iocontrol+0x1103/0x2420 [obdecho]
Nov 11 13:24:01 oss0 kernel: [<ffffffff8001aa2d>] vsnprintf+0x5df/0x627
Nov 11 13:24:01 oss0 kernel: [<ffffffff80001c20>] _stext+0xc20/0x1000
Nov 11 13:24:01 oss0 kernel: [<ffffffff800d3af5>] __vmalloc_area_node+0x12e/0x156
Nov 11 13:24:01 oss0 kernel: [<ffffffff88804f07>] obd_ioctl_getdata+0x5b7/0xec0 [obdclass]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88819cf0>] class_handle_ioctl+0x1de0/0x2190 [obdclass]
Nov 11 13:24:01 oss0 kernel: [<ffffffff8000dad0>] permission+0x8d/0xc8
Nov 11 13:24:01 oss0 kernel: [<ffffffff801aedd8>] misc_open+0x16c/0x260
Nov 11 13:24:01 oss0 kernel: [<ffffffff888043cb>] obd_class_ioctl+0x19b/0x220 [obdclass]
Nov 11 13:24:01 oss0 kernel: [<ffffffff800423f3>] do_ioctl+0x21/0x6b
Nov 11 13:24:01 oss0 kernel: [<ffffffff80030401>] vfs_ioctl+0x457/0x4b9
Nov 11 13:24:01 oss0 kernel: [<ffffffff800b9630>] audit_syscall_entry+0x1a4/0x1cf
Nov 11 13:24:01 oss0 kernel: [<ffffffff8004c8fb>] sys_ioctl+0x59/0x78
Nov 11 13:24:01 oss0 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0

I tested it using Lustre 1.8.5, 1.8.6-wc1, 1.8.7-wc1 and 2.1. All have the exact same problem. Especially, 1.8.5 and 2.1 will cause kernel panic. This problem happened on both real machine and virtual machine. It does not matter what other parameters I used, once I set rsc 1M to run obdfilter_survey, the problem happened immediately.



 Comments   
Comment by Cliff White (Inactive) [ 15/Nov/11 ]

I will see if I can replicate. Lustre does not use IO larger than 1M, so the 2M size may be unsupported. obdfilter-survey is now part of the main Lustre package, and should be installed in /usr/bin/obdfilter-survey. Can you re-try with that version, just for verification? We don't support the stand-alone version, and haven't since 2007 (hence the old date on the tarball)

Comment by Sean Xu (Inactive) [ 15/Nov/11 ]

Thanks for your information. I tested using /usr/bin/obdfilter-survey on Lustre-1.8.6-wc1 and 2.1. rsz=2M still triggered same LBUG and kernel panic (lustre 2.1).

Comment by Cliff White (Inactive) [ 15/Nov/11 ]

I can confirm that it LBUGs on a test machine here also. However, max Lustre IO size is 1M, so this is either a bug
in test_brw, or perhaps a non-supported feature. I will see about escalation.

Comment by Peter Jones [ 16/Nov/11 ]

Bobi

Could you please help on this one?

Thanks

Peter

Comment by Zhenyu Xu [ 16/Nov/11 ]

What environmental variables have you set? esp. what rszhi did you set?

Comment by Sean Xu (Inactive) [ 17/Nov/11 ]

I set rszlo=2048 and rszhi=2048 to run obdfilter-survery:

$ nobjhi=2 thrhi=2 rszlo=2048 rszhi=2048 size=10240 case=dsik sh /usr/bin/obdfilter-survery

It triggered LBUG immediately.

Comment by Zhenyu Xu [ 17/Nov/11 ]

I think its obdfilter-survey issue, "disk" case interacts with filter module directly, and it only supports 1M IO data, please use rszhi less than 1024KB with "disk" test case. I'll come up a obdfilter-survey patch to constrain the limit.

Comment by Sean Xu (Inactive) [ 18/Nov/11 ]

Alright, thanks.

Comment by Zhenyu Xu [ 25/Nov/11 ]

patch tracking at http://review.whamcloud.com/1741

Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Peter Jones [ 04/Jan/12 ]

Landed for 2.2

Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,server,el5,ofa #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Comment by Build Master (Inactive) [ 04/Jan/12 ]

Integrated in lustre-master » i686,client,el5,ofa #404
LU-844 test: limit max IO data size for obdfilter test (Revision 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a)

Result = SUCCESS
Oleg Drokin : 78b38388cc3aa8f2f9ed367bc812165f24cbdb6a
Files :

  • lustre-iokit/obdfilter-survey/obdfilter-survey
Generated at Sat Feb 10 01:10:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.