[LU-10331] mds-survey test_1: mds-survey failed Created: 05/Dec/17  Updated: 13/Mar/18  Resolved: 17/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.10.3
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Minor
Reporter: Maloo Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-10446 mds-survey test_1: mdt_out00_001: pag... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e5c93b90-d9cb-11e7-a066-52540065bddc.

Page allocation failed on the MDT:

35343.175183] mdt_out00_003: page allocation failure: order:4, mode:0x10c050
[35343.176281] CPU: 1 PID: 28124 Comm: mdt_out00_003 Tainted: G           OE  ------------   3.10.0-693.5.2.el7_lustre.x86_64 #1
[35343.177744] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[35343.178422]  000000000010c050 00000000f24a572c ffff8800581b78a8 ffffffff816a3e2d
[35343.179454]  ffff8800581b7938 ffffffff81188820 0000000000000000 ffff88007ffd5000
[35343.180519]  0000000000000004 000000000010c050 ffff8800581b7938 00000000f24a572c
[35343.181512] Call Trace:
[35343.181900]  [<ffffffff816a3e2d>] dump_stack+0x19/0x1b
[35343.182627]  [<ffffffff81188820>] warn_alloc_failed+0x110/0x180
[35343.183324]  [<ffffffff8169fe2a>] __alloc_pages_slowpath+0x6b6/0x724
[35343.184139]  [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420
[35343.184959]  [<ffffffff811d1078>] alloc_pages_current+0x98/0x110
[35343.185754]  [<ffffffff8118761e>] __get_free_pages+0xe/0x40
[35343.186483]  [<ffffffff811dca2e>] kmalloc_order_trace+0x2e/0xa0
[35343.187299]  [<ffffffff811e05c1>] __kmalloc+0x211/0x230
[35343.188097]  [<ffffffffc07d6159>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[35343.189223]  [<ffffffffc0a8ed7d>] out_handle+0xa5d/0x1920 [ptlrpc]
[35343.190073]  [<ffffffffc0a1fed2>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc]
[35343.191040]  [<ffffffffc0a88ca9>] ? tgt_request_preprocess.isra.28+0x299/0x7a0 [ptlrpc]
[35343.192078]  [<ffffffffc0a89ad5>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
[35343.193041]  [<ffffffffc0a2ddee>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[35343.194080]  [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
[35343.194933]  [<ffffffffc0a31592>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[35343.195723]  [<ffffffffc0a30b00>] ? ptlrpc_register_service+0xe80/0xe80 [ptlrpc]
[35343.196727]  [<ffffffff810b099f>] kthread+0xcf/0xe0
[35343.197367]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
[35343.198138]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
[35343.198875]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40


 Comments   
Comment by John Hammond [ 07/Dec/17 ]

We should be using OBD_ALLOC_LARGE() in out_handle(). Since it will try to allocate anything up to

#define OUT_MAXREQSIZE  (1000 * 1024)
		for (i = 0; i < update_buf_count; i++, tmp++) {
                        if (tmp->oub_size >= OUT_MAXREQSIZE)
                                GOTO(out_free, rc = err_serious(-EPROTO));

                        OBD_ALLOC(update_bufs[i], tmp->oub_size);
                        if (update_bufs[i] == NULL)
                                GOTO(out_free, rc = err_serious(-ENOMEM));

                        desc->bd_frag_ops->add_iov_frag(desc, update_bufs[i],
                                                        tmp->oub_size);
                }
Comment by Gerrit Updater [ 08/Dec/17 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30455
Subject: LU-10331 out: use OBD_ALLOC_LARGE() for update buffers
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 29d4f6d094b9bea124523faf3b050c15fb3eff8d

Comment by Gerrit Updater [ 17/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30455/
Subject: LU-10331 out: use OBD_ALLOC_LARGE() for update buffers
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f8987f51967ff5be46fa1d232de21644af927037

Comment by Peter Jones [ 17/Dec/17 ]

Landed for 2.11

Comment by Jian Yu [ 01/Mar/18 ]

The failure also occurred on Lustre b2_10 branch:
https://testing.hpdd.intel.com/test_sets/145987f0-1cfa-11e8-a7cd-52540065bddc

Comment by Gerrit Updater [ 01/Mar/18 ]

Jian Yu (jian.yu@intel.com) uploaded a new patch: https://review.whamcloud.com/31474
Subject: LU-10331 out: use OBD_ALLOC_LARGE() for update buffers
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 8bde118e67994678f59a0375fd7d3d530c85632f

Comment by Gerrit Updater [ 13/Mar/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31474/
Subject: LU-10331 out: use OBD_ALLOC_LARGE() for update buffers
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 7c1373e769f300efcd35c16ce193df682ddfa52f

Generated at Sat Feb 10 02:34:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.