[LU-10331] mds-survey test_1: mds-survey failed Created: 05/Dec/17 Updated: 13/Mar/18 Resolved: 17/Dec/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.3 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for nasf <fan.yong@intel.com> Please provide additional information about the failure here. This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e5c93b90-d9cb-11e7-a066-52540065bddc. Page allocation failed on the MDT: 35343.175183] mdt_out00_003: page allocation failure: order:4, mode:0x10c050 [35343.176281] CPU: 1 PID: 28124 Comm: mdt_out00_003 Tainted: G OE ------------ 3.10.0-693.5.2.el7_lustre.x86_64 #1 [35343.177744] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [35343.178422] 000000000010c050 00000000f24a572c ffff8800581b78a8 ffffffff816a3e2d [35343.179454] ffff8800581b7938 ffffffff81188820 0000000000000000 ffff88007ffd5000 [35343.180519] 0000000000000004 000000000010c050 ffff8800581b7938 00000000f24a572c [35343.181512] Call Trace: [35343.181900] [<ffffffff816a3e2d>] dump_stack+0x19/0x1b [35343.182627] [<ffffffff81188820>] warn_alloc_failed+0x110/0x180 [35343.183324] [<ffffffff8169fe2a>] __alloc_pages_slowpath+0x6b6/0x724 [35343.184139] [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420 [35343.184959] [<ffffffff811d1078>] alloc_pages_current+0x98/0x110 [35343.185754] [<ffffffff8118761e>] __get_free_pages+0xe/0x40 [35343.186483] [<ffffffff811dca2e>] kmalloc_order_trace+0x2e/0xa0 [35343.187299] [<ffffffff811e05c1>] __kmalloc+0x211/0x230 [35343.188097] [<ffffffffc07d6159>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [35343.189223] [<ffffffffc0a8ed7d>] out_handle+0xa5d/0x1920 [ptlrpc] [35343.190073] [<ffffffffc0a1fed2>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc] [35343.191040] [<ffffffffc0a88ca9>] ? tgt_request_preprocess.isra.28+0x299/0x7a0 [ptlrpc] [35343.192078] [<ffffffffc0a89ad5>] tgt_request_handle+0x925/0x13b0 [ptlrpc] [35343.193041] [<ffffffffc0a2ddee>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [35343.194080] [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90 [35343.194933] [<ffffffffc0a31592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [35343.195723] [<ffffffffc0a30b00>] ? ptlrpc_register_service+0xe80/0xe80 [ptlrpc] [35343.196727] [<ffffffff810b099f>] kthread+0xcf/0xe0 [35343.197367] [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40 [35343.198138] [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90 [35343.198875] [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40 |
| Comments |
| Comment by John Hammond [ 07/Dec/17 ] |
|
We should be using OBD_ALLOC_LARGE() in out_handle(). Since it will try to allocate anything up to #define OUT_MAXREQSIZE (1000 * 1024) for (i = 0; i < update_buf_count; i++, tmp++) {
if (tmp->oub_size >= OUT_MAXREQSIZE)
GOTO(out_free, rc = err_serious(-EPROTO));
OBD_ALLOC(update_bufs[i], tmp->oub_size);
if (update_bufs[i] == NULL)
GOTO(out_free, rc = err_serious(-ENOMEM));
desc->bd_frag_ops->add_iov_frag(desc, update_bufs[i],
tmp->oub_size);
}
|
| Comment by Gerrit Updater [ 08/Dec/17 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30455 |
| Comment by Gerrit Updater [ 17/Dec/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30455/ |
| Comment by Peter Jones [ 17/Dec/17 ] |
|
Landed for 2.11 |
| Comment by Jian Yu [ 01/Mar/18 ] |
|
The failure also occurred on Lustre b2_10 branch: |
| Comment by Gerrit Updater [ 01/Mar/18 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: https://review.whamcloud.com/31474 |
| Comment by Gerrit Updater [ 13/Mar/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31474/ |