Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
On linux 2.6.18-238.12.1
-
4
-
4783
Description
Ran obdfilter_survey(lustre-iokit-1.2-200709210921) for rsz>=2M against local disk using command:
- size=32768 case=disk sh obdfilter-survey
I hit a LBUG:
Nov 11 13:24:01 oss0 kernel: LustreError: 12359:0:(filter_io_26.c:293:filter_iobuf_add_page()) ASSERTION(iobuf->dr_npages < iobuf->dr_max_pages) failed
Nov 11 13:24:01 oss0 kernel: LustreError: 12359:0:(filter_io_26.c:293:filter_iobuf_add_page()) LBUG
Nov 11 13:24:01 oss0 kernel: Pid: 12359, comm: lctl
Nov 11 13:24:01 oss0 kernel:
Nov 11 13:24:01 oss0 kernel: Call Trace:
Nov 11 13:24:01 oss0 kernel: [<ffffffff887786a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88778bda>] lbug_with_loc+0x7a/0xd0 [libcfs]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88780ff0>] tracefile_init+0x0/0x110 [libcfs]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88b3b650>] ldiskfs_bmap+0x0/0xf0 [ldiskfs]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88c0291b>] filter_iobuf_add_page+0x3b/0x70 [obdfilter]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88c05580>] filter_commitrw_write+0x4e0/0x2be0 [obdfilter]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88bfe8ce>] filter_brw+0x52e/0x6e0 [obdfilter]
Nov 11 13:24:01 oss0 kernel: [<ffffffff8000f41d>] __alloc_pages+0x78/0x308
Nov 11 13:24:01 oss0 kernel: [<ffffffff88c2d462>] echo_client_brw_ioctl+0x1032/0x3050 [obdecho]
Nov 11 13:24:01 oss0 kernel: [<ffffffff8003336f>] __tcp_push_pending_frames+0x766/0x840
Nov 11 13:24:01 oss0 kernel: [<ffffffff80022185>] tcp_transmit_skb+0x644/0x67c
Nov 11 13:24:01 oss0 kernel: [<ffffffff80064ae9>] _spin_lock_bh+0x9/0x14
Nov 11 13:24:01 oss0 kernel: [<ffffffff80030fdc>] release_sock+0x13/0xc1
Nov 11 13:24:01 oss0 kernel: [<ffffffff800ce73f>] zone_statistics+0x3e/0x6d
Nov 11 13:24:01 oss0 kernel: [<ffffffff8000abd2>] get_page_from_freelist+0x378/0x43a
Nov 11 13:24:01 oss0 kernel: [<ffffffff800ce73f>] zone_statistics+0x3e/0x6d
Nov 11 13:24:01 oss0 kernel: [<ffffffff800ce73f>] zone_statistics+0x3e/0x6d
Nov 11 13:24:01 oss0 kernel: [<ffffffff88c30583>] echo_client_iocontrol+0x1103/0x2420 [obdecho]
Nov 11 13:24:01 oss0 kernel: [<ffffffff8001aa2d>] vsnprintf+0x5df/0x627
Nov 11 13:24:01 oss0 kernel: [<ffffffff80001c20>] _stext+0xc20/0x1000
Nov 11 13:24:01 oss0 kernel: [<ffffffff800d3af5>] __vmalloc_area_node+0x12e/0x156
Nov 11 13:24:01 oss0 kernel: [<ffffffff88804f07>] obd_ioctl_getdata+0x5b7/0xec0 [obdclass]
Nov 11 13:24:01 oss0 kernel: [<ffffffff88819cf0>] class_handle_ioctl+0x1de0/0x2190 [obdclass]
Nov 11 13:24:01 oss0 kernel: [<ffffffff8000dad0>] permission+0x8d/0xc8
Nov 11 13:24:01 oss0 kernel: [<ffffffff801aedd8>] misc_open+0x16c/0x260
Nov 11 13:24:01 oss0 kernel: [<ffffffff888043cb>] obd_class_ioctl+0x19b/0x220 [obdclass]
Nov 11 13:24:01 oss0 kernel: [<ffffffff800423f3>] do_ioctl+0x21/0x6b
Nov 11 13:24:01 oss0 kernel: [<ffffffff80030401>] vfs_ioctl+0x457/0x4b9
Nov 11 13:24:01 oss0 kernel: [<ffffffff800b9630>] audit_syscall_entry+0x1a4/0x1cf
Nov 11 13:24:01 oss0 kernel: [<ffffffff8004c8fb>] sys_ioctl+0x59/0x78
Nov 11 13:24:01 oss0 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
I tested it using Lustre 1.8.5, 1.8.6-wc1, 1.8.7-wc1 and 2.1. All have the exact same problem. Especially, 1.8.5 and 2.1 will cause kernel panic. This problem happened on both real machine and virtual machine. It does not matter what other parameters I used, once I set rsc 1M to run obdfilter_survey, the problem happened immediately.
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA
-
Changelog 2.2 version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....