[7566373.794723] LNetError: 130593:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7566373.806293] LNetError: 130593:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7566447.980688] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds [7566447.990943] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 257 previous similar messages [7566495.301196] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7566495.313366] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7566507.268172] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7566507.276633] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7566507.285331] Lustre: Skipped 6 previous similar messages [7566674.990260] LNetError: 87044:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7566675.002427] LNetError: 87044:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages [7566776.993255] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583552799/real 1583552799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583552806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7566777.021029] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38268296 previous similar messages [7566786.993366] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7566787.004062] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38278205 previous similar messages [7566976.206861] LNetError: 87044:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7566976.218334] LNetError: 87044:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7567061.987420] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7567061.997683] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 286 previous similar messages [7567096.721833] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7567096.734052] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7567108.259766] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7567108.268241] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7567277.470935] LNetError: 87044:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7567277.483106] LNetError: 87044:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 143 previous similar messages [7567376.999903] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583553399/real 1583553399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583553406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7567377.027670] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38433115 previous similar messages [7567387.002485] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7567387.013186] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38431105 previous similar messages [7567435.181605] Lustre: fir-OST001f: haven't heard from client 3f9feeb7-0792-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c1372b68400, cur 1583553458 expire 1583553308 last 1583553231 [7567435.201694] Lustre: Skipped 5 previous similar messages [7567577.782418] LNetError: 104380:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7567577.793979] LNetError: 104380:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7567663.994041] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7567664.004300] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 241 previous similar messages [7567699.270477] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7567699.282664] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7567709.290308] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7567709.298767] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7567878.985549] LNetError: 104380:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7567878.997807] LNetError: 104380:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 135 previous similar messages [7567977.006489] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583553999/real 1583553999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583554006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7567977.034330] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 37749082 previous similar messages [7567987.008600] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7567987.019293] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 37712489 previous similar messages [7568180.212290] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7568180.223853] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7568271.000762] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7568271.011106] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 264 previous similar messages [7568300.678094] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7568300.690264] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7568310.280912] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7568310.289379] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7568310.301798] Lustre: Skipped 6 previous similar messages [7568481.486105] LNetError: 117619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7568481.498359] LNetError: 117619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages [7568577.013124] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583554599/real 1583554599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583554606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7568577.040918] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38055247 previous similar messages [7568587.015238] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7568587.025972] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38081882 previous similar messages [7568629.119406] LustreError: 8682:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST001f: cli d4be6328-2552-4 claims 4870144 GRANT, real grant 73728 [7568781.673336] LNetError: 117619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7568781.684944] LNetError: 117619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7568872.007389] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds [7568872.017651] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 192 previous similar messages [7568903.156761] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7568903.168933] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7568911.376559] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7568911.385058] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7569082.864003] LNetError: 117619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7569082.876296] LNetError: 117619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages [7569112.198047] Lustre: fir-OST0019: haven't heard from client 1ecf6944-6593-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c3d952b8000, cur 1583555135 expire 1583554985 last 1583554908 [7569112.223189] Lustre: Skipped 5 previous similar messages [7569177.019745] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583555199/real 1583555199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583555206 ref 2 fl Rpc:eX/2/ffffffff rc 0/-1 [7569177.047518] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38072374 previous similar messages [7569187.021862] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7569187.032559] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38063717 previous similar messages [7569384.026257] LNetError: 117619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7569384.037824] LNetError: 117619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7569476.014061] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7569476.024406] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 274 previous similar messages [7569505.494398] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7569505.506567] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7569512.471221] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7569512.479713] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7569685.252512] LNetError: 117619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7569685.264771] LNetError: 117619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages [7569777.026283] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583555799/real 1583555799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583555806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7569777.054079] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 37644228 previous similar messages [7569787.028385] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7569787.039083] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 37665797 previous similar messages [7569902.208084] Lustre: fir-OST001f: haven't heard from client fb88818e-b66b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4932268c00, cur 1583555925 expire 1583555775 last 1583555698 [7569902.228191] Lustre: Skipped 5 previous similar messages [7569986.465683] LNetError: 117619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7569986.477295] LNetError: 117619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7570096.020812] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds [7570096.031155] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 194 previous similar messages [7570106.936945] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7570106.949118] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7570113.565630] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7570113.574102] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7570113.582809] Lustre: Skipped 6 previous similar messages [7570286.726116] LNetError: 117619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7570286.738375] LNetError: 117619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 138 previous similar messages [7570377.032907] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583556399/real 1583556399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583556406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7570377.064738] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38341165 previous similar messages [7570387.035014] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7570387.045708] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38332694 previous similar messages [7570587.959439] LNetError: 117619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7570587.970999] LNetError: 117619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7570697.027449] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7570697.037798] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 297 previous similar messages [7570709.416607] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7570709.428819] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7570714.910592] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7570714.919059] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7570714.927757] Lustre: Skipped 6 previous similar messages [7570888.029574] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7570888.041761] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 155 previous similar messages [7570977.039539] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583556999/real 1583556999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583557006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7570977.067364] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38390891 previous similar messages [7570987.041662] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7570987.052359] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38402243 previous similar messages [7571190.389973] LNetError: 14558:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7571190.401445] LNetError: 14558:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7571307.034315] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7571307.044571] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 259 previous similar messages [7571311.906394] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7571311.918570] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7571315.882374] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7571315.890842] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7571490.610770] LNetError: 14558:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7571490.622970] LNetError: 14558:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages [7571577.046580] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583557599/real 1583557599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583557606 ref 2 fl Rpc:eX/2/ffffffff rc 0/-1 [7571577.074368] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38132939 previous similar messages [7571587.048701] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7571587.059393] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38132930 previous similar messages [7571597.440153] LustreError: 3107:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST001b: cli 9a1654e8-6409-4 claims 466944 GRANT, real grant 0 [7571791.787970] LNetError: 14558:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7571791.799470] LNetError: 14558:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7571913.274457] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7571913.286629] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7571915.041459] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7571915.051721] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 310 previous similar messages [7571916.849240] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7571916.857707] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7572092.974122] LNetError: 14558:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7572092.986294] LNetError: 14558:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 129 previous similar messages [7572177.053446] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583558199/real 1583558199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583558206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7572177.081217] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38111847 previous similar messages [7572187.055557] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7572187.066276] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38107823 previous similar messages [7572394.114037] LNetError: 14558:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7572394.125516] LNetError: 14558:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7572515.575396] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7572515.587568] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages [7572516.048366] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 1 seconds [7572516.058617] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 315 previous similar messages [7572517.816192] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7572517.824676] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7572559.020831] LustreError: 101264:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0021: cli 00c7f158-cc8b-4 claims 286720 GRANT, real grant 28672 [7572693.050519] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7572693.062692] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages [7572777.060482] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583558799/real 1583558799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583558806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7572777.088259] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38493417 previous similar messages [7572787.062591] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7572787.073322] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38481770 previous similar messages [7572996.339208] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7572996.350680] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7573116.833478] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7573116.845669] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages [7573118.783303] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7573118.791779] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7573122.055528] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7573122.065870] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 496 previous similar messages [7573297.561740] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7573297.573906] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 155 previous similar messages [7573377.067511] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583559399/real 1583559399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583559406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7573377.095283] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 39317731 previous similar messages [7573387.069623] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7573387.080348] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 39342783 previous similar messages [7573597.664214] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7573597.675811] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7573718.115554] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7573718.127734] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7573719.750274] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7573719.758763] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7573723.062588] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7573723.072931] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 415 previous similar messages [7573898.818754] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7573898.830924] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages [7573977.074619] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583559999/real 1583559999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583560006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7573977.102436] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38438157 previous similar messages [7573987.076738] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7573987.087431] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38408981 previous similar messages [7574199.911507] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7574199.923010] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7574320.717274] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7574320.725735] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7574321.356719] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7574321.368892] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7574328.069766] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds [7574328.080111] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 346 previous similar messages [7574501.030898] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7574501.043070] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages [7574577.081683] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583560599/real 1583560599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583560606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7574577.109477] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38173175 previous similar messages [7574587.083809] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7574587.094545] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38203379 previous similar messages [7574802.133276] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7574802.144781] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7574921.685780] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7574921.694295] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7574922.590776] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7574922.602978] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages [7574940.076960] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7574940.087221] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 386 previous similar messages [7575102.078874] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7575102.091046] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages [7575153.402304] LustreError: 90686:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST001f: cli d4be6328-2552-4 claims 7901184 GRANT, real grant 4870144 [7575177.088749] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583561199/real 1583561199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583561206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7575177.116523] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38953852 previous similar messages [7575187.090866] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7575187.101559] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38946355 previous similar messages [7575404.404555] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7575404.416033] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7575522.779460] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7575522.787928] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7575523.830865] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7575523.843041] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7575541.084026] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7575541.094281] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 421 previous similar messages [7575634.597996] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7575634.610490] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 108 previous similar messages [7575705.088685] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7575705.100861] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages [7575777.095708] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583561799/real 1583561799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583561806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7575777.123557] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 38076882 previous similar messages [7575787.097822] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7575787.108521] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 38024797 previous similar messages [7576005.675253] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7576005.686760] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7576123.874105] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7576123.882571] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7576128.134617] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7576128.146839] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7576142.090743] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds [7576142.101007] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 366 previous similar messages [7576306.092585] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7576306.104759] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages [7576377.102351] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583562399/real 1583562399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583562406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7576377.130124] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 37192275 previous similar messages [7576387.104457] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7576387.115157] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 37039324 previous similar messages [7576608.302345] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7576608.313894] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7576724.968757] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7576724.977308] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7576733.013250] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7576733.025430] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7576744.097333] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds [7576744.107589] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 337 previous similar messages [7576808.912072] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7576907.100073] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7576907.112245] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages [7576977.108738] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583562999/real 1583562999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583563006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7576977.136522] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20390964 previous similar messages [7576987.110830] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7576987.121529] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20201505 previous similar messages [7577209.762239] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7577209.773717] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7577325.935778] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7577325.944343] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7577334.527416] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7577334.539596] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7577345.103493] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds [7577345.113755] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 370 previous similar messages [7577509.105189] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7577509.117360] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages [7577577.114920] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583563599/real 1583563599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583563606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7577577.142698] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17601201 previous similar messages [7577587.116985] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7577587.127686] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17619176 previous similar messages [7577812.277331] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7577812.288812] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7577892.963313] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7577927.028996] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7577927.037483] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7577952.109690] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 4 seconds [7577952.119947] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 421 previous similar messages [7578025.440799] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7578113.301318] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7578113.313487] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 225 previous similar messages [7578177.120982] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583564199/real 1583564199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583564206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7578177.148758] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16271501 previous similar messages [7578187.123064] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7578187.133757] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16269642 previous similar messages [7578232.977633] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7578232.989808] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7578322.322531] Lustre: fir-OST0021: haven't heard from client b3370a0f-7bce-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c3feda97c00, cur 1583564345 expire 1583564195 last 1583564118 [7578322.342598] Lustre: Skipped 5 previous similar messages [7578413.845528] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7578413.857008] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7578527.971862] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7578527.980337] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7578557.115802] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7578557.126060] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 417 previous similar messages [7578714.117358] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7578714.129531] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages [7578777.127038] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583564799/real 1583564799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583564806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7578777.154808] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13518004 previous similar messages [7578787.129080] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7578787.139773] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 13500825 previous similar messages [7578836.200361] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7578836.212537] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7579016.221403] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7579016.232882] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7579128.960813] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7579128.969272] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7579164.121833] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7579164.132093] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 400 previous similar messages [7579316.839315] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7579316.851512] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages [7579326.846851] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7579377.132905] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583565399/real 1583565399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583565406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7579377.160673] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16381739 previous similar messages [7579387.134997] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7579387.145694] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16426943 previous similar messages [7579437.556539] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7579437.568708] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7579618.527317] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7579618.538909] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7579729.926948] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7579729.935425] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7579768.127932] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 4 seconds [7579768.138274] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 498 previous similar messages [7579917.117561] LNetError: 57632:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7579917.129741] LNetError: 57632:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages [7579977.139165] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583565999/real 1583565999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583566006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7579977.166946] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14980414 previous similar messages [7579987.141253] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7579987.151954] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14964677 previous similar messages [7580039.864910] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7580039.877110] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7580219.786860] LNetError: 87045:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7580219.798368] LNetError: 87045:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7580330.892261] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7580330.900733] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7580371.134361] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds [7580371.144710] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 467 previous similar messages [7580520.762066] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7580520.774238] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages [7580577.145528] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583566599/real 1583566599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583566606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7580577.173307] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15138157 previous similar messages [7580587.147639] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7580587.158334] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15116719 previous similar messages [7580643.320249] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7580643.332493] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7580822.406161] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7580822.417675] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7580931.859399] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7580931.867882] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7580973.140784] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds [7580973.151043] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 451 previous similar messages [7581010.220862] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7581122.112311] LNetError: 1155:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7581122.124398] LNetError: 1155:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 242 previous similar messages [7581177.151870] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583567199/real 1583567199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583567206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7581177.179635] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14820897 previous similar messages [7581187.153975] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7581187.164671] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14860906 previous similar messages [7581243.751628] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7581243.763824] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7581423.753480] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7581423.764964] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7581532.825005] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7581532.833878] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7581575.147094] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds [7581575.157353] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 441 previous similar messages [7581723.148681] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7581723.160854] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 233 previous similar messages [7581777.158250] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583567799/real 1583567799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583567806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7581777.186024] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14698939 previous similar messages [7581787.160360] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7581787.171055] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14624715 previous similar messages [7581849.847076] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7581849.859275] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages [7582025.832071] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7582025.843547] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7582110.500065] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7582133.793214] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7582133.801725] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7582177.153579] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7582177.163926] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 422 previous similar messages [7582327.182375] LNetError: 87045:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7582327.194555] LNetError: 87045:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages [7582377.164729] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583568399/real 1583568399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583568406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7582377.192496] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16095968 previous similar messages [7582387.166863] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7582387.177558] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16211136 previous similar messages [7582628.011577] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7582628.023059] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7582734.886983] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7582734.895444] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7582747.814759] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7582747.826967] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages [7582782.160111] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7582782.170370] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 432 previous similar messages [7582928.161686] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7582928.173856] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages [7582977.171200] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583568999/real 1583568999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583569006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7582977.198974] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14626396 previous similar messages [7582987.173304] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7582987.184006] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14506960 previous similar messages [7583230.059264] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7583230.070832] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7583335.854543] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7583335.863006] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7583352.684253] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7583352.696431] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7583383.166567] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7583383.176828] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 459 previous similar messages [7583531.580347] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7583531.592514] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages [7583577.177732] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583569599/real 1583569599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583569606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7583577.205509] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16684551 previous similar messages [7583587.179759] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7583587.190466] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16703001 previous similar messages [7583832.352717] LNetError: 87045:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7583832.364214] LNetError: 87045:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7583936.947101] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7583936.955576] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7583952.941759] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7583952.953981] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7583988.173103] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds [7583988.183366] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 401 previous similar messages [7584133.045863] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7584133.058074] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages [7584177.184141] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583570199/real 1583570199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583570206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7584177.211918] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16565230 previous similar messages [7584187.186245] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7584187.196937] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16558445 previous similar messages [7584291.605524] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7584434.499246] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7584434.510767] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7584537.915508] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7584537.924162] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7584550.459183] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7584555.201281] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7584555.213463] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7584597.179670] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 5 seconds [7584597.190020] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 434 previous similar messages [7584735.237946] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7584735.250151] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages [7584777.190661] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583570799/real 1583570799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583570806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7584777.218438] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16006273 previous similar messages [7584787.192970] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7584787.203665] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15965583 previous similar messages [7584967.468687] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7585036.414438] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7585036.425930] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7585138.985126] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7585138.993619] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7585157.026748] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7585157.038913] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7585203.186245] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7585203.196592] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 439 previous similar messages [7585337.144994] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7585337.157174] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages [7585377.197130] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583571399/real 1583571399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583571406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7585377.224908] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13511898 previous similar messages [7585387.199260] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7585387.209956] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 13553084 previous similar messages [7585638.022239] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7585638.033732] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7585739.975616] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7585739.984077] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7585757.639442] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7585757.651625] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7585805.192783] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7585805.203044] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 458 previous similar messages [7585853.169455] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7585902.983677] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7585937.194233] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7585937.206411] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages [7585977.203904] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583571999/real 1583571999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583572006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7585977.231741] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15340036 previous similar messages [7585987.205954] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7585987.216898] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15276199 previous similar messages [7586240.497626] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7586240.509097] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7586340.943248] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7586340.951838] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7586361.375848] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7586361.388023] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7586408.199321] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds [7586408.209585] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 371 previous similar messages [7586539.200744] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7586539.212910] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages [7586577.210173] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583572599/real 1583572599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583572606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7586577.237948] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15791025 previous similar messages [7586587.212250] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7586587.222975] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15951678 previous similar messages [7586841.994191] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7586842.005703] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7586942.035822] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7586942.044294] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7586963.678391] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7586963.690606] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7587013.205895] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7587013.216154] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 486 previous similar messages [7587140.207277] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7587140.219443] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages [7587177.216663] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583573199/real 1583573199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583573206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7587177.244435] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16826306 previous similar messages [7587187.218774] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7587187.229466] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16777712 previous similar messages [7587444.280810] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7587444.292322] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7587542.979263] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7587542.987723] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7587563.888872] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7587563.901036] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7587619.212445] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 2 seconds [7587619.222795] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 460 previous similar messages [7587743.213776] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7587743.225949] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 228 previous similar messages [7587777.223137] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583573799/real 1583573799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583573806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7587777.250917] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17123156 previous similar messages [7587787.225246] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7587787.235945] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17127782 previous similar messages [7587979.850331] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7588046.489302] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7588046.500783] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7588143.968819] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7588143.977291] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7588167.108405] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7588167.120571] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7588224.218996] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7588224.229254] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages [7588347.069824] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7588347.081994] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages [7588377.229711] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583574399/real 1583574399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583574406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7588377.257485] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16850404 previous similar messages [7588387.231751] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7588387.242455] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16846744 previous similar messages [7588648.582678] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7588648.594162] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7588739.819462] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7588744.937274] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7588744.945746] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7588769.293932] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7588769.306115] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7588825.225506] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7588825.235768] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 420 previous similar messages [7588947.226847] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7588947.239031] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages [7588977.236152] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583574999/real 1583574999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583575006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7588977.263929] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15295938 previous similar messages [7588987.238280] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7588987.248975] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15287783 previous similar messages [7589250.739267] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7589250.750743] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7589346.030809] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7589346.039271] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7589374.287419] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7589374.299594] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7589428.231975] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7589428.242233] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 400 previous similar messages [7589548.233260] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7589548.245438] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 251 previous similar messages [7589555.146262] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7589577.242581] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583575599/real 1583575599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583575606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7589577.270355] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16063839 previous similar messages [7589587.244675] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7589587.255371] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16065943 previous similar messages [7589852.738519] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7589852.750104] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7589947.125278] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7589947.133795] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7589974.432901] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7589974.445074] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7590033.238527] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7590033.248872] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 464 previous similar messages [7590153.272661] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7590153.284837] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 245 previous similar messages [7590177.249061] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583576199/real 1583576199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583576206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7590177.276826] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15942160 previous similar messages [7590187.251179] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7590187.261871] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15952569 previous similar messages [7590453.891294] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7590453.902772] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7590548.219678] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7590548.228140] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7590575.430446] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7590575.442672] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7590638.245044] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 1 seconds [7590638.255303] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 502 previous similar messages [7590755.394384] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7590755.406556] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages [7590777.255551] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583576799/real 1583576799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583576806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7590777.283427] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15941135 previous similar messages [7590787.257673] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7590787.268374] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15875105 previous similar messages [7591055.874698] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7591055.886197] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7591149.186185] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7591149.194649] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7591178.689976] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7591178.702145] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7591243.251577] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 4 seconds [7591243.261917] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 535 previous similar messages [7591306.573690] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7591357.013949] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7591357.026117] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 257 previous similar messages [7591377.262046] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583577399/real 1583577399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583577406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7591377.289821] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15594866 previous similar messages [7591387.264111] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7591387.274805] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15632574 previous similar messages [7591387.469708] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7591658.231154] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7591658.242631] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7591750.152585] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7591750.161062] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7591779.839356] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7591779.851610] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages [7591848.258082] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7591848.268346] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 553 previous similar messages [7591957.259262] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7591957.271440] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 248 previous similar messages [7591977.268486] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583577999/real 1583577999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583578006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7591977.296264] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15528520 previous similar messages [7591987.276831] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7591987.287537] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15504046 previous similar messages [7592260.488602] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7592260.500079] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7592351.119160] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7592351.127655] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7592384.091886] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7592384.104108] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7592454.264638] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds [7592454.274900] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 495 previous similar messages [7592558.265778] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7592558.277962] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages [7592577.274976] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583578599/real 1583578599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583578606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7592577.302748] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15317657 previous similar messages [7592587.283053] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7592587.293750] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15339474 previous similar messages [7592862.691064] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7592862.702573] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7592952.086721] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7592952.095213] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7592984.400366] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7592984.412535] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7593057.271139] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds [7593057.281402] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages [7593163.418180] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7593163.430355] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages [7593177.281485] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583579199/real 1583579199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583579206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7593177.309285] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14467175 previous similar messages [7593187.289542] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7593187.300237] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14479290 previous similar messages [7593464.367874] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7593464.379447] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7593590.116933] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7593590.129154] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7593663.277694] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds [7593663.288035] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 329 previous similar messages [7593675.545283] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7593675.553790] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7593700.922346] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7593764.278766] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7593764.290952] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages [7593777.287910] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583579799/real 1583579799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583579806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7593777.315687] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14438236 previous similar messages [7593787.297216] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7593787.307915] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14402092 previous similar messages [7594066.547968] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7594066.559452] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7594268.284189] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7594268.294445] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 354 previous similar messages [7594277.580668] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7594277.589137] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7594367.345191] LNetError: 111738:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7594367.357452] LNetError: 111738:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages [7594377.294422] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583580399/real 1583580399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583580406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7594377.322195] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15572716 previous similar messages [7594387.304475] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7594387.315173] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15676715 previous similar messages [7594488.002610] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7594488.014785] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages [7594667.907448] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7594667.918933] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7594876.290758] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7594876.301016] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 416 previous similar messages [7594878.634459] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7594878.642929] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7594968.291756] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7594968.303935] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages [7594977.300882] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583580999/real 1583580999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583581006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7594977.328657] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15061100 previous similar messages [7594987.311135] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7594987.321839] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14965083 previous similar messages [7595090.377275] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7595090.389449] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7595270.435478] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7595270.446961] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7595480.426936] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7595480.435419] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7595482.297291] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds [7595482.307548] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 400 previous similar messages [7595570.936532] LNetError: 111738:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7595570.948796] LNetError: 111738:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages [7595577.307323] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583581599/real 1583581599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583581606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7595577.335107] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14424761 previous similar messages [7595587.317490] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7595587.328191] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14401026 previous similar messages [7595618.724319] LustreError: 83173:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST001d: cli 1ccff414-1582-4 claims 8421376 GRANT, real grant 0 [7595690.414593] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7595690.426784] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7595872.419781] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7595872.431349] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7596081.463342] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7596081.471809] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7596088.303823] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7596088.314079] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 445 previous similar messages [7596172.304737] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7596172.316910] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 237 previous similar messages [7596177.313771] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583582199/real 1583582199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583582206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7596177.341548] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13835543 previous similar messages [7596187.323880] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7596187.334576] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 13864122 previous similar messages [7596293.942110] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7596293.954326] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7596474.249008] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7596474.260584] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7596683.439997] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7596683.448576] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7596693.310358] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds [7596693.320623] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 533 previous similar messages [7596775.722668] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7596775.734839] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages [7596777.320251] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583582799/real 1583582799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583582806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7596777.348022] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15016085 previous similar messages [7596782.974132] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7596787.330371] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7596787.341063] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15043806 previous similar messages [7596894.330597] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7596894.342781] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7596994.649527] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7597076.400453] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7597076.411934] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7597284.444554] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7597284.453118] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7597294.316840] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 2 seconds [7597294.327188] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 455 previous similar messages [7597376.317733] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7597376.329920] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages [7597377.326725] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583583399/real 1583583399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583583406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7597377.354504] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16344168 previous similar messages [7597387.336849] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7597387.347550] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16267281 previous similar messages [7597498.643103] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7597498.655281] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7597678.845158] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7597678.856647] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7597884.884046] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7597884.892597] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7597899.323395] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7597899.333650] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 514 previous similar messages [7597977.333267] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583583999/real 1583583999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583584006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7597977.361125] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14198055 previous similar messages [7597979.564332] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7597979.576518] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 231 previous similar messages [7597987.343404] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7597987.354102] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14200528 previous similar messages [7598100.136733] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7598100.148910] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7598280.219006] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7598280.230582] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7598486.456740] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7598486.465279] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7598505.330267] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7598505.340525] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 385 previous similar messages [7598577.340084] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583584599/real 1583584599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583584606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7598577.367865] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15288158 previous similar messages [7598581.331121] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7598581.343298] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages [7598587.350161] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7598587.360852] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15395525 previous similar messages [7598702.338497] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7598702.350678] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7598882.228432] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7598882.239917] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7599087.448325] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7599087.456799] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7599113.337106] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 5 seconds [7599113.347369] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 412 previous similar messages [7599177.346903] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583585199/real 1583585199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583585206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7599177.374680] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15141667 previous similar messages [7599183.270150] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7599183.282319] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages [7599187.356927] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7599187.367624] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15071198 previous similar messages [7599304.192275] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7599304.204448] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7599484.329527] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7599484.341094] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7599688.413966] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7599688.422563] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7599719.343838] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds [7599719.354098] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 459 previous similar messages [7599777.353479] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583585799/real 1583585799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583585806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7599777.381335] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15719024 previous similar messages [7599783.344550] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7599783.356727] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages [7599787.363741] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7599787.374442] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15756097 previous similar messages [7599906.485932] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7599906.498104] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7600086.386116] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7600086.397688] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7600289.381386] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7600289.389859] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7600328.350454] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 5 seconds [7600328.360800] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 462 previous similar messages [7600377.359955] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583586399/real 1583586399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583586406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7600377.387733] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15404172 previous similar messages [7600386.887171] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7600386.899347] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages [7600387.370067] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7600387.380769] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15365889 previous similar messages [7600509.518384] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7600509.530606] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7600688.397451] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7600688.408957] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7600890.348145] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7600890.356617] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7600933.356979] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7600933.367239] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 436 previous similar messages [7600977.366518] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583586999/real 1583586999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583587006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7600977.394293] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15846591 previous similar messages [7600987.357557] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7600987.369738] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages [7600987.380120] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7600987.390821] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15874587 previous similar messages [7601110.964963] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7601110.977141] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7601289.925743] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7601289.937238] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7601491.314647] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7601491.323293] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7601538.363539] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds [7601538.373801] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 463 previous similar messages [7601577.372997] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583587599/real 1583587599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583587606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7601577.400776] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13493640 previous similar messages [7601587.365105] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7601587.377276] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages [7601587.387880] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7601587.398595] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 13437250 previous similar messages [7601591.510806] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7601715.628519] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7601715.640693] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages [7601737.538369] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7601892.465685] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7601892.477293] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7602092.258340] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7602092.266814] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7602149.370199] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7602149.380455] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 503 previous similar messages [7602177.379494] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583588199/real 1583588199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583588206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7602177.407305] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15854759 previous similar messages [7602187.393614] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7602187.404313] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15938467 previous similar messages [7602188.370629] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7602188.382804] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages [7602261.606051] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7602319.589073] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7602319.601256] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7602494.599129] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7602494.610758] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7602649.223574] LustreError: 120648:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST001d: cli 1ccff414-1582-4 claims 12607488 GRANT, real grant 8421376 [7602693.247667] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7602693.256141] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7602753.376801] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds [7602753.387057] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 480 previous similar messages [7602777.386077] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583588799/real 1583588799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583588806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7602777.414109] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17094726 previous similar messages [7602787.400206] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7602787.410902] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17109465 previous similar messages [7602793.377253] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7602793.389434] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages [7602919.783656] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7602919.795829] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7603096.497809] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7603096.509288] LNetError: 42286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7603294.214392] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7603294.222863] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7603358.383428] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 3 seconds [7603358.393773] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 499 previous similar messages [7603377.392626] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583589399/real 1583589399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583589406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7603377.420397] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16343773 previous similar messages [7603387.406763] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7603387.417465] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16339809 previous similar messages [7603397.860072] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7603397.872245] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 245 previous similar messages [7603521.523241] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7603521.535425] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7603698.348739] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7603698.360231] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7603895.158514] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7603895.167000] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7603959.390037] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7603959.400299] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 503 previous similar messages [7603977.399221] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583589999/real 1583589999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583590006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7603977.427002] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16734746 previous similar messages [7603987.413487] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7603987.424183] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16688965 previous similar messages [7603999.784637] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7603999.796816] LNetError: 42286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages [7604125.426877] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7604125.439057] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7604300.298039] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7604300.309566] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7604401.990812] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7604496.147126] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7604496.155622] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7604561.396679] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7604561.407023] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 509 previous similar messages [7604577.405985] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583590599/real 1583590599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583590606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7604577.433765] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16667583 previous similar messages [7604587.419975] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7604587.430674] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16656577 previous similar messages [7604600.397098] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7604600.409282] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 242 previous similar messages [7604902.503780] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7604902.515281] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7605023.161813] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7605023.173987] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages [7605097.113020] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7605097.121496] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7605163.403234] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds [7605163.413580] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 508 previous similar messages [7605177.412454] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583591199/real 1583591199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583591206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7605177.440232] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16795628 previous similar messages [7605187.426479] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7605187.437174] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16803115 previous similar messages [7605203.129893] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7605203.142082] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 238 previous similar messages [7605504.680801] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7605504.692280] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7605625.307125] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7605625.319299] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages [7605698.080472] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7605698.089017] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7605768.409678] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 5 seconds [7605768.420025] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 395 previous similar messages [7605777.418779] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583591799/real 1583591799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583591806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7605777.446561] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17707990 previous similar messages [7605787.432886] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7605787.443580] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17738848 previous similar messages [7605803.318078] LNetError: 35537:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7605803.330258] LNetError: 35537:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages [7606106.701341] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7606106.712825] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7606230.358611] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7606230.370793] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages [7606299.046981] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7606299.055454] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7606374.416135] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds [7606374.426480] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 438 previous similar messages [7606377.425169] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583592399/real 1583592399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583592406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7606377.452943] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16821120 previous similar messages [7606387.439260] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7606387.449960] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16803035 previous similar messages [7606404.416472] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7606404.428647] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 236 previous similar messages [7606708.314063] LNetError: 86516:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7606708.325554] LNetError: 86516:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7606830.928156] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 [7606830.940351] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7606900.012474] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7606900.020965] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7606977.431651] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583592999/real 1583592999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583593006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7606977.459417] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16800247 previous similar messages [7606987.422760] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7606987.433018] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 493 previous similar messages [7606987.445900] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7606987.456597] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16867313 previous similar messages [7607005.422961] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7607005.435143] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 236 previous similar messages [7607310.417331] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7607310.428905] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7607434.604683] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7607434.616867] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7607500.980083] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7607500.988559] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7607577.438214] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583593599/real 1583593599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583593606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7607577.465990] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16124492 previous similar messages [7607587.452311] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7607587.463007] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16062410 previous similar messages [7607588.429343] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds [7607588.439602] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 520 previous similar messages [7607609.429590] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7607609.441757] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 247 previous similar messages [7607912.892924] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7607912.904424] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7608101.946545] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7608101.955010] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7608177.444753] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583594199/real 1583594199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583594206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7608177.472528] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16745370 previous similar messages [7608187.458880] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7608187.469579] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16720124 previous similar messages [7608193.435925] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds [7608193.446267] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 503 previous similar messages [7608213.373315] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7608213.385498] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages [7608333.941500] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7608333.953669] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7608514.859378] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7608514.870863] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7608702.914058] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7608702.922527] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7608777.451282] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583594799/real 1583594799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583594806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7608777.479060] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17102793 previous similar messages [7608787.465380] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7608787.476076] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17116003 previous similar messages [7608798.442512] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 2 seconds [7608798.452854] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages [7608815.124917] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7608815.137138] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 237 previous similar messages [7608934.638104] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7608934.650274] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7609116.558160] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7609116.569638] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7609304.007642] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7609304.016120] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7609377.457814] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583595399/real 1583595399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583595406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7609377.485587] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17673852 previous similar messages [7609387.471928] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7609387.482622] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17678344 previous similar messages [7609403.449109] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds [7609403.459366] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 462 previous similar messages [7609415.449250] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7609415.461433] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages [7609538.385670] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7609538.397852] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7609718.339496] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7609718.350976] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7609904.951295] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7609904.959762] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7609977.464471] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583595999/real 1583595999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583596006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7609977.492604] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16789032 previous similar messages [7609987.478490] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7609987.489186] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16800705 previous similar messages [7610008.455718] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7610008.465981] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 521 previous similar messages [7610019.056979] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7610019.069154] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages [7610138.664222] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7610138.676389] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages [7610320.560247] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7610320.571820] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7610505.940791] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7610505.949270] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7610577.470966] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583596599/real 1583596599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583596606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7610577.498730] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 19656852 previous similar messages [7610587.485076] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7610587.495766] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 19726100 previous similar messages [7610613.462378] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7610613.472639] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 513 previous similar messages [7610620.994552] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7610621.006746] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages [7610689.281976] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7610739.496813] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7610739.508981] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7610922.405968] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7610922.417449] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7611106.883383] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7611106.891851] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7611177.477534] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583597199/real 1583597199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583597206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7611177.505310] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 18838151 previous similar messages [7611187.491646] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7611187.502347] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 18793849 previous similar messages [7611218.468991] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7611218.479249] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 509 previous similar messages [7611222.469051] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7611222.481222] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages [7611344.267369] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7611344.279547] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7611524.984463] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7611524.995944] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7611579.867733] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7611707.873811] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7611707.882271] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7611777.484110] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583597799/real 1583597799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583597806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7611777.511887] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16937146 previous similar messages [7611787.498191] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7611787.508884] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16942813 previous similar messages [7611820.475563] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7611820.485818] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 548 previous similar messages [7611825.318916] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7611825.331086] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages [7611946.877049] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7611946.889218] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7612126.789125] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7612126.800771] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7612308.840403] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7612308.848893] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7612377.490646] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583598399/real 1583598399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583598406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7612377.518417] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 18811493 previous similar messages [7612387.504762] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7612387.515459] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 18838138 previous similar messages [7612423.482156] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7612423.492415] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 512 previous similar messages [7612427.147244] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7612427.159413] LNetError: 23905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages [7612548.767587] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7612548.779761] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7612577.405614] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7612728.711673] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7612728.723214] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7612909.783453] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7612909.791929] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7612975.235434] LustreError: 6891:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST001f: cli d4be6328-2552-4 claims 8626176 GRANT, real grant 7901184 [7612977.497195] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583598999/real 1583598999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583599006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7612977.524977] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 18489993 previous similar messages [7612987.511311] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7612987.522003] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 18389107 previous similar messages [7613028.488742] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7613028.499005] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 445 previous similar messages [7613028.508680] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7613028.520846] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages [7613150.755119] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7613150.767291] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages [7613330.701136] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7613330.712612] LNetError: 23905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7613510.773316] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7613510.781792] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7613577.503364] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583599599/real 1583599599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583599606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7613577.531132] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 18575054 previous similar messages [7613587.517420] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7613587.528119] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 18670834 previous similar messages [7613631.222278] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7613631.234451] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 228 previous similar messages [7613633.494928] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7613633.505269] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 435 previous similar messages [7613754.787273] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7613754.799442] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7613932.704285] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7613932.715784] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7614111.867609] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7614111.876069] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7614177.509689] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583600199/real 1583600199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583600206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7614177.537467] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16399341 previous similar messages [7614187.523764] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7614187.534464] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16387813 previous similar messages [7614233.009290] LNetError: 118266:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7614233.021555] LNetError: 118266:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages [7614239.501326] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 6 seconds [7614239.511584] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages [7614359.678787] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7614359.690974] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7614534.678743] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7614534.690316] LNetError: 111738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7614712.834522] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7614712.843000] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7614777.516007] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583600799/real 1583600799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583600806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7614777.543786] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 19129428 previous similar messages [7614787.530096] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7614787.540795] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 19154962 previous similar messages [7614835.249866] LNetError: 5179:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7614835.261955] LNetError: 5179:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages [7614843.507694] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7614843.517948] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 413 previous similar messages [7614959.837924] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7614959.850159] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages [7615136.707921] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7615136.719402] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7615313.928134] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7615313.936776] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7615377.522235] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583601399/real 1583601399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583601406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7615377.550004] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20084936 previous similar messages [7615387.536329] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7615387.547025] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20064283 previous similar messages [7615437.173039] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7615437.185213] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages [7615448.513979] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7615448.524237] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 467 previous similar messages [7615560.690185] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7615560.702402] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages [7615738.486325] LNetError: 86516:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7615738.497848] LNetError: 86516:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7615914.893348] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7615914.901825] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7615977.528487] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583601999/real 1583601999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583602006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7615977.556256] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 19572714 previous similar messages [7615987.542592] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7615987.553284] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 19566005 previous similar messages [7616037.520130] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7616037.532298] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 242 previous similar messages [7616053.520290] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7616053.530550] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 573 previous similar messages [7616165.417428] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7616165.429649] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7616309.674436] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7616340.123465] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7616340.134958] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7616515.860104] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7616515.868577] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7616577.534603] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583602599/real 1583602599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583602606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7616577.562377] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 19570848 previous similar messages [7616587.548699] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7616587.559393] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 19541216 previous similar messages [7616641.425800] LNetError: 20115:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7616641.437969] LNetError: 20115:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 247 previous similar messages [7616654.526328] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7616654.536591] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 522 previous similar messages [7616768.006504] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7616768.018677] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7616942.826975] LNetError: 5179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7616942.838367] LNetError: 5179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7616978.673038] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7617116.826412] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7617116.834876] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7617177.540552] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583603199/real 1583603199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583603206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7617177.568317] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 19806647 previous similar messages [7617187.554652] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7617187.565343] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 19865431 previous similar messages [7617222.181013] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7617241.532189] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7617241.544364] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 228 previous similar messages [7617258.532358] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds [7617258.542707] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 435 previous similar messages [7617288.062022] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7617368.924606] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7617368.936798] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7617544.851545] LNetError: 5179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7617544.862976] LNetError: 5179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7617717.791286] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7617717.799816] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7617777.546475] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583603799/real 1583603799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583603806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7617777.574243] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 18850394 previous similar messages [7617787.560572] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7617787.571269] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 18855922 previous similar messages [7617845.224400] LNetError: 5179:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7617845.236485] LNetError: 5179:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 244 previous similar messages [7617864.538360] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7617864.548621] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 509 previous similar messages [7617969.884436] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7617969.896605] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7618146.777356] LNetError: 5179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7618146.788757] LNetError: 5179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7618275.914903] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7618319.759337] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7618319.767804] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7618377.552752] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583604399/real 1583604399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583604406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7618377.580530] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 19557247 previous similar messages [7618387.566774] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7618387.577469] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 19445107 previous similar messages [7618447.221623] LNetError: 5179:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7618447.233705] LNetError: 5179:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages [7618468.544572] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds [7618468.554828] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 500 previous similar messages [7618570.748672] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7618570.760885] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages [7618748.089837] LNetError: 5179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7618748.101378] LNetError: 5179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7618921.748660] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7618921.757216] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7618942.400956] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7618977.558938] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583604999/real 1583604999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583605006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7618977.586708] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 19009181 previous similar messages [7618987.573050] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7618987.583743] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 19118339 previous similar messages [7619047.550690] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7619047.562863] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 230 previous similar messages [7619074.550961] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds [7619074.561222] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 506 previous similar messages [7619146.258855] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7619171.133115] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7619171.145294] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7619350.978072] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7619350.989548] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7619523.739303] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7619523.747788] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7619577.565370] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583605599/real 1583605599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583605606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7619577.593136] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17298055 previous similar messages [7619587.579471] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7619587.590166] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17274620 previous similar messages [7619648.557138] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7619648.569305] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages [7619679.557467] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7619679.567723] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 496 previous similar messages [7619774.166577] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7619774.178743] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7619952.134860] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7619952.146345] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7620125.729891] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7620125.738348] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7620177.572010] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583606199/real 1583606199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583606206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7620177.599784] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17233079 previous similar messages [7620187.586121] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7620187.596815] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17200714 previous similar messages [7620249.563808] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7620249.575978] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages [7620280.564156] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds [7620280.574420] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 436 previous similar messages [7620374.589209] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7620374.601545] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7620422.632430] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7620554.638446] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7620554.649938] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7620691.277930] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7620727.720538] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7620727.729055] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7620777.578721] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583606799/real 1583606799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583606806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7620777.606513] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16742295 previous similar messages [7620787.592752] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7620787.603449] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16662798 previous similar messages [7620853.570433] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7620853.582600] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages [7620884.570784] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds [7620884.581044] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 365 previous similar messages [7620976.186800] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7620976.199021] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages [7621156.195714] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7621156.207186] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7621326.942238] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7621328.712283] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7621328.720750] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7621377.585184] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583607399/real 1583607399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583607406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7621377.612959] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16611023 previous similar messages [7621387.599338] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7621387.610049] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16671651 previous similar messages [7621457.946194] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7621457.958388] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages [7621493.577446] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7621493.587701] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 349 previous similar messages [7621580.589429] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7621580.601627] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7621648.479333] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7621758.723454] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7621758.734936] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7621929.653660] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7621929.662143] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7621977.591745] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583607999/real 1583607999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583608006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7621977.619513] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 18319720 previous similar messages [7621987.605852] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7621987.616548] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 18334607 previous similar messages [7622058.583645] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7622058.595821] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages [7622104.584142] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7622104.594402] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 340 previous similar messages [7622181.002990] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7622181.015168] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7622361.065892] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7622361.077374] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7622530.620581] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7622530.629112] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7622577.598656] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583608599/real 1583608599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583608606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7622577.626429] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13938125 previous similar messages [7622587.612404] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7622587.623105] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 13967184 previous similar messages [7622661.843181] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7622661.855362] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages [7622716.590842] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds [7622716.601097] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 448 previous similar messages [7622718.613973] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7622786.655576] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7622786.667750] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages [7622962.620791] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7622962.632276] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7623131.611169] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7623131.619629] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7623177.605053] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583609199/real 1583609199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583609206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7623177.632831] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 17218000 previous similar messages [7623187.618949] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7623187.629644] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 17145147 previous similar messages [7623188.791819] Lustre: fir-OST001b: haven't heard from client c25a35c7-c47b-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c140b97a800, cur 1583609211 expire 1583609061 last 1583608984 [7623188.811890] Lustre: Skipped 5 previous similar messages [7623208.785315] Lustre: fir-OST001f: haven't heard from client c25a35c7-c47b-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c40752a1c00, cur 1583609231 expire 1583609081 last 1583609004 [7623263.475873] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7623263.488887] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages [7623324.597446] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7623324.607705] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 316 previous similar messages [7623388.227166] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7623388.239416] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7623564.209158] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7623564.220698] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7623732.577729] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7623732.586206] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7623732.595085] Lustre: Skipped 6 previous similar messages [7623769.545697] LustreError: 67869:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST001d: cli 1ccff414-1582-4 claims 16752640 GRANT, real grant 12607488 [7623777.611524] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583609799/real 1583609799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583609806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7623777.639300] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14552367 previous similar messages [7623787.625575] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7623787.636302] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14525129 previous similar messages [7623863.603421] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7623863.615593] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages [7623931.604168] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7623931.614431] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 395 previous similar messages [7623988.878853] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7623988.891060] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages [7624166.983855] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7624166.995335] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7624333.544370] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7624333.552834] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7624377.618320] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583610399/real 1583610399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583610406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7624377.646097] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13783070 previous similar messages [7624387.636262] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7624387.646957] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 13768126 previous similar messages [7624463.935810] LNetError: 125876:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7624463.948068] LNetError: 125876:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages [7624533.610843] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7624533.621106] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 394 previous similar messages [7624590.700492] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7624590.712705] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7624768.570472] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7624768.581949] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7624809.873759] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7624934.512865] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7624934.521324] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7624977.624836] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583610999/real 1583610999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583611006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7624977.652664] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15421621 previous similar messages [7624987.643025] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7624987.653728] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15480376 previous similar messages [7625064.616628] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7625064.628802] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages [7625135.617401] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7625135.627661] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 422 previous similar messages [7625195.492091] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7625195.504294] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7625370.555610] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7625370.567091] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7625535.604520] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7625535.612978] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7625577.631188] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583611599/real 1583611599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583611606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7625577.658986] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15259986 previous similar messages [7625587.649307] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7625587.660007] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15281482 previous similar messages [7625671.208440] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7625671.220611] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages [7625743.624019] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7625743.634363] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 332 previous similar messages [7625795.809764] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7625795.821941] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7625973.043653] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7625973.055126] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7626107.892961] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7626136.572137] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7626136.580594] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7626177.637716] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583612199/real 1583612199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583612206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7626177.665480] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16331379 previous similar messages [7626187.655827] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7626187.666524] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16347176 previous similar messages [7626273.638952] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7626273.651144] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages [7626353.630656] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7626353.641002] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 423 previous similar messages [7626574.588270] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7626574.599751] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7626695.358458] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7626695.370625] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7626737.539582] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7626737.548112] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7626777.644331] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583612799/real 1583612799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583612806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7626777.672101] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15614456 previous similar messages [7626787.662646] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7626787.673376] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15593797 previous similar messages [7626875.166672] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7626875.178867] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages [7626886.738177] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7626955.637366] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 6 seconds [7626955.647626] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 356 previous similar messages [7627177.072006] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7627177.083492] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7627297.654151] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7627297.666347] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7627338.633422] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7627338.641920] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7627377.651047] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583613399/real 1583613399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583613406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7627377.678827] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14736387 previous similar messages [7627387.669122] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7627387.679819] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14624823 previous similar messages [7627477.705216] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7627477.717388] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages [7627556.643981] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds [7627556.654242] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 436 previous similar messages [7627650.994241] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7627778.633706] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7627778.645188] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7627893.636174] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7627899.296786] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7627899.308971] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7627939.728040] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7627939.736515] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7627977.657581] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583613999/real 1583613999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583614006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7627977.685360] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16114464 previous similar messages [7627987.675689] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7627987.686388] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16244011 previous similar messages [7628079.302074] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7628079.314259] LNetError: 79739:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages [7628160.650611] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7628160.660870] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 380 previous similar messages [7628380.868994] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7628380.880483] LNetError: 79739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7628501.486401] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7628501.498579] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7628540.822610] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7628540.831069] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7628577.664287] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583614599/real 1583614599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583614606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7628577.692070] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14974429 previous similar messages [7628587.682303] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7628587.692997] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14875165 previous similar messages [7628679.656294] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7628679.668460] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages [7628762.657190] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7628762.667542] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 362 previous similar messages [7628982.571634] LNetError: 55894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7628982.583112] LNetError: 55894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7629104.194896] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7629104.207072] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7629142.421999] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7629142.430480] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7629177.670677] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583615199/real 1583615199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583615206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7629177.698458] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15357985 previous similar messages [7629187.688785] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7629187.699486] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15458480 previous similar messages [7629284.057775] LNetError: 55894:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7629284.069953] LNetError: 55894:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages [7629365.663730] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds [7629365.673993] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 390 previous similar messages [7629584.562200] LNetError: 55894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7629584.573721] LNetError: 55894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7629705.289598] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7629705.301824] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7629743.395883] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7629743.404366] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7629777.677318] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583615799/real 1583615799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583615806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7629777.705086] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15668366 previous similar messages [7629787.695425] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7629787.706118] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15629224 previous similar messages [7629885.235783] LNetError: 55894:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7629885.247955] LNetError: 55894:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 224 previous similar messages [7629983.670620] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7629983.680881] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages [7630186.783998] LNetError: 55894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7630186.795485] LNetError: 55894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7630305.398204] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7630305.410373] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7630344.490463] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7630344.498938] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7630377.683998] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583616399/real 1583616399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583616406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7630377.711777] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16150324 previous similar messages [7630387.702265] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7630387.712956] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16169339 previous similar messages [7630485.676147] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7630485.688324] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 211 previous similar messages [7630593.677319] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7630593.687575] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 473 previous similar messages [7630789.039755] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7630789.051290] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7630909.667909] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7630909.680105] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7630911.979980] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7630945.457111] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7630945.465577] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7630977.690678] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583616999/real 1583616999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583617006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7630977.718457] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16500086 previous similar messages [7630987.708673] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7630987.719365] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16502602 previous similar messages [7631089.668857] LNetError: 87046:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7631089.681068] LNetError: 87046:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages [7631198.684007] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7631198.694262] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 424 previous similar messages [7631268.471099] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7631390.337321] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7631390.348838] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7631512.002649] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7631512.014825] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7631546.398730] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7631546.407722] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7631577.697271] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583617599/real 1583617599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583617606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7631577.725051] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16875931 previous similar messages [7631580.227286] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7631587.715384] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7631587.726078] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16791009 previous similar messages [7631689.689534] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7631689.701704] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages [7631799.690732] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7631799.700994] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 395 previous similar messages [7631992.557110] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7631992.568600] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7632114.192120] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7632114.204299] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7632147.390250] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7632147.398729] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7632177.703805] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583618199/real 1583618199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583618206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7632177.731584] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15241427 previous similar messages [7632187.721895] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7632187.732622] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15301622 previous similar messages [7632293.399260] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7632293.411430] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages [7632410.697340] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds [7632410.707688] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 412 previous similar messages [7632594.593616] LNetError: 87046:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7632594.605114] LNetError: 87046:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7632715.289754] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7632715.301935] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7632748.940679] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7632748.949158] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7632777.710291] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583618799/real 1583618799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583618806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7632777.738062] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15713934 previous similar messages [7632787.728401] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7632787.739097] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15732909 previous similar messages [7632895.327770] LNetError: 78212:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7632895.339944] LNetError: 78212:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages [7633016.703903] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds [7633016.714247] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 404 previous similar messages [7633196.862993] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7633196.874492] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7633317.482199] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7633317.494384] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages [7633349.964305] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7633349.972763] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7633377.716829] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583619399/real 1583619399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583619406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7633377.750904] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15066913 previous similar messages [7633387.734939] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7633387.745637] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15038524 previous similar messages [7633497.540176] LNetError: 78212:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7633497.552352] LNetError: 78212:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages [7633617.710467] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds [7633617.720730] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 430 previous similar messages [7633799.083397] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7633799.094876] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7633921.680818] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7633921.692998] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7633951.066456] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7633951.075032] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7633977.724249] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583619999/real 1583619999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583620006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7633977.752040] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15141686 previous similar messages [7633987.741515] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7633987.752211] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15099164 previous similar messages [7634097.715722] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7634097.727899] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 224 previous similar messages [7634221.717082] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 1 seconds [7634221.727432] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 468 previous similar messages [7634400.720163] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7634400.731646] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7634523.372418] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7634523.384664] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7634552.024226] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7634552.032700] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7634577.729998] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583620599/real 1583620599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583620606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7634577.757776] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15112841 previous similar messages [7634587.748125] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7634587.758826] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15074766 previous similar messages [7634698.722319] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7634698.734496] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages [7634822.723700] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7634822.733960] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 411 previous similar messages [7635003.123603] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7635003.135080] LNetError: 78212:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7635125.819994] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7635125.832215] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7635154.053488] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7635154.061946] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7635177.736547] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583621199/real 1583621199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583621206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7635177.764325] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14982950 previous similar messages [7635187.754625] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7635187.765318] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15148127 previous similar messages [7635217.630837] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7635249.618307] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7635303.728920] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7635303.741087] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages [7635424.730209] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds [7635424.740557] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 428 previous similar messages [7635604.386107] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7635604.397701] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7635726.267435] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7635726.279608] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7635755.110193] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7635755.118905] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7635777.742976] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583621799/real 1583621799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583621806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7635777.770741] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16286701 previous similar messages [7635787.761043] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7635787.771744] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16246832 previous similar messages [7635810.899412] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7635904.355321] LNetError: 85500:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7635904.367501] LNetError: 85500:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages [7636026.736604] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds [7636026.746862] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 488 previous similar messages [7636206.994714] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7636207.006539] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7636332.631942] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7636332.644132] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages [7636356.146714] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7636356.155178] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7636377.749398] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583622399/real 1583622399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583622406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7636377.777170] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15121847 previous similar messages [7636387.767480] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7636387.778178] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15120835 previous similar messages [7636505.741787] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7636505.753954] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages [7636629.743132] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds [7636629.753387] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 403 previous similar messages [7636809.142190] LNetError: 6316:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7636809.153603] LNetError: 6316:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7636957.170598] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7636957.179059] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7636977.756033] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583622999/real 1583622999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583623006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7636977.783816] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15742987 previous similar messages [7636987.773966] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7636987.784664] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15738257 previous similar messages [7637058.104239] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7637109.973424] LNetError: 6316:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7637109.985523] LNetError: 6316:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages [7637230.605641] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7637230.617818] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7637235.749573] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7637235.759832] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 339 previous similar messages [7637410.589375] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7637410.600876] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7637558.949403] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7637558.957873] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7637577.762394] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583623599/real 1583623599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583623606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7637577.790168] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15876615 previous similar messages [7637587.780377] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7637587.791069] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 15848873 previous similar messages [7637600.863246] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7637712.255688] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7637712.267907] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages [7637833.073156] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7637833.085329] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7637858.756279] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7637858.766539] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 304 previous similar messages [7638012.715064] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7638012.726541] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7638039.872725] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7638098.643147] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7638160.998720] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7638161.007199] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7638177.768774] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583624199/real 1583624199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583624206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7638177.796552] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13577240 previous similar messages [7638187.786753] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7638187.797450] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 13520835 previous similar messages [7638313.705419] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7638313.717649] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 133 previous similar messages [7638435.273517] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7638435.285681] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7638459.762687] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 8 seconds [7638459.773036] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 185 previous similar messages [7638614.563483] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7638614.574964] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7638762.628146] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7638762.636625] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7638777.775071] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583624799/real 1583624799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583624806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7638777.802862] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14005838 previous similar messages [7638787.793183] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7638787.803879] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14067485 previous similar messages [7638913.767535] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7638913.779704] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 137 previous similar messages [7639037.218001] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7639037.230184] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7639063.769147] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7639063.779404] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 212 previous similar messages [7639216.787405] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7639216.798970] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7639363.571010] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7639363.579998] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7639377.781493] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583625399/real 1583625399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583625406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7639377.809445] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 15020766 previous similar messages [7639387.799607] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7639387.810304] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 14991916 previous similar messages [7639517.307519] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7639517.319695] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages [7639640.197330] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7639640.209612] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7639669.775626] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7639669.785887] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 133 previous similar messages [7639818.934490] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7639818.945972] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7639964.564042] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7639964.572530] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7639977.787933] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583625999/real 1583625999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583626006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7639977.815708] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13831003 previous similar messages [7639987.806174] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7639987.816888] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 13846445 previous similar messages [7640119.685713] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7640119.697883] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages [7640240.745833] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7640240.758009] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7640283.782365] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7640283.792619] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 148 previous similar messages [7640364.161375] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7640421.264888] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7640421.276364] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7640555.292345] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7640565.656404] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7640565.664908] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7640577.794455] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583626599/real 1583626599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583626606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7640577.822334] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 16464396 previous similar messages [7640587.812559] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7640587.823462] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 16506173 previous similar messages [7640721.601189] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7640721.613372] LNetError: 87043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 125 previous similar messages [7640845.536347] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7640845.548535] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7640899.788931] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7640899.799188] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 155 previous similar messages [7641022.716459] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7641022.727960] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7641166.752671] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7641166.761162] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7641177.800968] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583627199/real 1583627199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583627206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7641177.828745] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20125261 previous similar messages [7641187.819079] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7641187.829772] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20193310 previous similar messages [7641324.106539] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7641324.118726] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 142 previous similar messages [7641418.114918] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7641445.894964] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7641445.907145] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7641505.795585] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7641505.805845] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 171 previous similar messages [7641624.963131] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7641624.974699] LNetError: 87043:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7641767.845738] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7641767.854202] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7641777.807596] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583627799/real 1583627799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583627806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7641777.835635] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20661535 previous similar messages [7641787.825703] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7641787.836394] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20656669 previous similar messages [7641826.558885] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7641925.633502] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7641925.645753] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 135 previous similar messages [7642051.889676] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7642051.901889] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7642107.802274] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7642107.812537] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 146 previous similar messages [7642226.553780] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7642226.565260] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7642369.058340] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7642369.066834] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7642377.814274] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583628399/real 1583628399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583628406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7642377.842108] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21513991 previous similar messages [7642387.832354] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7642387.843053] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21479893 previous similar messages [7642528.247156] LNetError: 20115:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7642528.259364] LNetError: 20115:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 141 previous similar messages [7642652.490348] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7642652.502544] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7642708.808890] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7642708.819152] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 163 previous similar messages [7642829.045435] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7642829.056914] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7642970.369994] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7642970.378481] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7642977.820838] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583628999/real 1583628999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583629006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7642977.848608] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21377371 previous similar messages [7642987.838969] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7642987.849665] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21403851 previous similar messages [7643129.433687] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7643129.445872] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 148 previous similar messages [7643254.063935] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7643254.076118] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7643316.815550] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7643316.825897] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 279 previous similar messages [7643430.991009] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7643431.002505] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7643553.906790] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7643571.385483] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7643571.393997] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7643577.827449] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583629599/real 1583629599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583629606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7643577.855222] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21176143 previous similar messages [7643587.845559] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7643587.856256] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21163125 previous similar messages [7643729.808148] LNetError: 99450:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7643729.820326] LNetError: 99450:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages [7643855.828554] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7643855.840723] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7643918.822178] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7643918.832436] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 294 previous similar messages [7644032.664594] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7644032.676083] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7644172.328892] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7644172.337472] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7644177.834003] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583630199/real 1583630199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583630206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7644177.861769] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20702196 previous similar messages [7644187.852148] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7644187.862848] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20713450 previous similar messages [7644271.055374] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7644333.472856] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7644333.485047] LNetError: 31987:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages [7644460.182114] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7644460.194318] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7644521.828762] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7644521.839020] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 319 previous similar messages [7644634.997146] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7644635.008635] LNetError: 20115:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7644773.320439] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7644773.328930] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7644777.840592] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583630799/real 1583630799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583630806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7644777.868368] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20865522 previous similar messages [7644787.858667] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7644787.869367] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20878979 previous similar messages [7644933.833275] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7644933.845451] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages [7645061.892732] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7645061.904936] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7645127.835410] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7645127.845663] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 361 previous similar messages [7645209.896190] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7645236.752816] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7645236.764345] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7645374.413516] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7645374.422066] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7645377.847204] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583631399/real 1583631399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583631406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7645377.874979] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20875349 previous similar messages [7645387.865276] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7645387.875973] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20857886 previous similar messages [7645535.839936] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7645535.852107] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages [7645729.842075] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7645729.852332] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 334 previous similar messages [7645768.817248] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7645838.459166] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7645838.470645] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7645955.951581] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7645955.963760] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages [7645975.381175] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7645975.389707] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7645977.853818] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583631999/real 1583631999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583632006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7645977.881587] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21045240 previous similar messages [7645987.871908] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7645987.882604] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21053256 previous similar messages [7646138.846586] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7646138.858760] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 211 previous similar messages [7646214.104397] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7646331.848702] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 1 seconds [7646331.859046] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 414 previous similar messages [7646353.569316] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7646440.773071] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7646440.784659] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7646560.220324] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7646560.232497] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7646576.475965] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7646576.484439] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7646577.860438] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583632599/real 1583632599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583632606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7646577.888207] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21237457 previous similar messages [7646587.878563] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7646587.889258] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21239888 previous similar messages [7646739.853315] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7646739.865484] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages [7646935.855513] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7646935.865769] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 395 previous similar messages [7646993.205064] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7647042.577296] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7647042.588999] LNetError: 31987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7647161.068059] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7647161.080229] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7647177.569768] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7647177.578569] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7647177.867218] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583633199/real 1583633199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583633206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7647177.894988] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20559961 previous similar messages [7647187.885316] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7647187.896011] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20555921 previous similar messages [7647341.860039] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7647341.872219] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages [7647545.862325] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds [7647545.872583] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 413 previous similar messages [7647645.043623] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7647645.055117] LNetError: 42281:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7647765.494988] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7647765.507165] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7647777.873980] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583633799/real 1583633799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583633806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7647777.901750] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21160141 previous similar messages [7647778.537502] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7647778.546009] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7647787.892073] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7647787.902765] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21169862 previous similar messages [7647946.235980] LNetError: 20115:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7647946.248194] LNetError: 20115:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages [7648146.869125] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds [7648146.879381] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 473 previous similar messages [7648247.358379] LNetError: 32348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7648247.369865] LNetError: 32348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7648365.823716] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7648365.835903] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7648370.406648] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7648377.880763] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583634399/real 1583634399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583634406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7648377.908544] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21194678 previous similar messages [7648379.631254] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7648379.639739] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7648387.898854] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7648387.909550] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21173490 previous similar messages [7648546.873657] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7648546.885828] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages [7648749.875935] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds [7648749.886198] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 447 previous similar messages [7648848.798180] LNetError: 32348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7648848.809679] LNetError: 32348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7648970.322475] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7648970.334665] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7648977.887472] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583634999/real 1583634999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583635006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7648977.915247] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21181755 previous similar messages [7648980.598050] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7648980.607033] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7648987.905575] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7648987.916313] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21208667 previous similar messages [7649148.880329] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7649148.892505] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages [7649352.882549] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds [7649352.892892] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 450 previous similar messages [7649451.137162] LNetError: 32348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7649451.148673] LNetError: 32348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7649570.590917] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7649570.603126] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7649577.893958] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583635599/real 1583635599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583635606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7649577.921730] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21335156 previous similar messages [7649581.564541] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7649581.573200] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7649587.912034] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7649587.922749] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21320544 previous similar messages [7649749.886718] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7649749.898899] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 225 previous similar messages [7649954.888787] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 5 seconds [7649954.899047] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 450 previous similar messages [7650052.513079] LNetError: 32348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7650052.524560] LNetError: 32348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7650171.013062] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7650171.025246] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7650177.900072] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583636199/real 1583636199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583636206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7650177.927845] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20816775 previous similar messages [7650182.658640] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7650182.667143] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7650187.918193] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7650187.928891] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20816867 previous similar messages [7650281.879185] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7650353.773208] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7650353.785387] LNetError: 42281:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages [7650555.894903] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds [7650555.905246] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 429 previous similar messages [7650612.600735] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7650639.299589] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7650654.903569] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7650654.915059] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7650721.996295] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7650775.405051] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7650775.417328] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7650777.906054] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583636799/real 1583636799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583636806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7650777.933866] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20709833 previous similar messages [7650783.624380] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7650783.633053] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7650787.924187] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7650787.934887] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20694058 previous similar messages [7650795.354862] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7650955.898827] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7650955.910999] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages [7651156.900820] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds [7651156.911168] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 393 previous similar messages [7651257.341287] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7651257.352877] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7651375.784119] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7651375.796298] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7651377.912065] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583637399/real 1583637399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583637406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7651377.939831] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20622937 previous similar messages [7651384.590449] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7651384.598945] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7651387.930174] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7651387.940867] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20636026 previous similar messages [7651555.905985] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7651555.918153] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages [7651759.907239] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds [7651759.917494] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 563 previous similar messages [7651858.471554] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7651858.483209] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7651975.917639] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7651975.929807] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 8 previous similar messages [7651977.918621] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583637999/real 1583637999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583638006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7651977.946393] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20708768 previous similar messages [7651985.555851] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7651985.564386] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7651987.936706] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7651987.947401] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20714626 previous similar messages [7652159.596811] LNetError: 61050:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7652159.608987] LNetError: 61050:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 255 previous similar messages [7652360.913766] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7652360.924018] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 548 previous similar messages [7652422.318197] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7652460.774353] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7652460.785831] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7652577.925150] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583638599/real 1583638599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583638606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7652577.952919] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21024269 previous similar messages [7652580.202327] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7652580.214514] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages [7652586.524389] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7652586.532853] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7652587.943285] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7652587.953981] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21031992 previous similar messages [7652759.890247] LNetError: 63355:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7652759.902438] LNetError: 63355:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages [7652895.821532] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7652961.920452] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7652961.930710] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 491 previous similar messages [7653063.095705] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7653063.107186] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7653177.931868] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583639199/real 1583639199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583639206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7653177.959639] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20863955 previous similar messages [7653180.555934] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7653180.568159] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7653187.530067] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7653187.538557] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7653187.949974] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7653187.960675] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20849883 previous similar messages [7653359.924877] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7653359.937058] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages [7653564.927143] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 8 seconds [7653564.937402] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 518 previous similar messages [7653664.510226] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7653664.521711] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7653777.938498] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583639799/real 1583639799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583639806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7653777.966268] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20805262 previous similar messages [7653786.026709] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7653786.038910] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7653787.956604] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7653787.967301] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20799068 previous similar messages [7653788.583790] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7653788.592267] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7653960.931545] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7653960.943733] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 228 previous similar messages [7654167.933850] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7654167.944197] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 444 previous similar messages [7654266.970350] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7654266.982010] LNetError: 61050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7654377.945173] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583640399/real 1583640399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583640406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7654377.972945] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20378189 previous similar messages [7654387.963312] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7654387.974002] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20397010 previous similar messages [7654389.528395] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7654389.536882] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7654391.510411] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7654391.522591] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7654564.938280] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7654564.950453] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages [7654768.940529] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 1 seconds [7654768.950784] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 470 previous similar messages [7654869.246556] LNetError: 107965:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7654869.258177] LNetError: 107965:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7654977.951823] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583640999/real 1583640999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583641006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7654977.979598] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20163084 previous similar messages [7654987.969915] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7654987.980610] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20161700 previous similar messages [7654990.518735] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7654990.527207] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7654992.722032] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7654992.734204] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7655121.051515] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7655167.944912] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7655167.957089] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 231 previous similar messages [7655369.946726] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds [7655369.957069] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 479 previous similar messages [7655470.517483] LNetError: 58838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7655470.528983] LNetError: 58838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7655577.957727] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583641599/real 1583641599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583641606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7655577.985509] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20264260 previous similar messages [7655587.975807] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7655587.986498] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20249849 previous similar messages [7655591.611898] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7655591.620362] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7655595.016893] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7655595.029075] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7655601.164309] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7655768.950686] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7655768.962863] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages [7655970.952762] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 1 seconds [7655970.963111] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 482 previous similar messages [7656072.781458] LNetError: 123486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7656072.793051] LNetError: 123486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7656177.963893] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583642199/real 1583642199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583642206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7656177.991665] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20257637 previous similar messages [7656187.981952] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7656187.992646] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20266456 previous similar messages [7656192.579149] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7656192.587747] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7656197.235069] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7656197.247248] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7656369.848837] LNetError: 125264:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7656369.861105] LNetError: 125264:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages [7656579.959073] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7656579.969330] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 463 previous similar messages [7656674.871103] LNetError: 58838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7656674.882600] LNetError: 58838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7656688.778803] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7656777.970285] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583642799/real 1583642799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583642806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7656777.998062] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20884394 previous similar messages [7656787.988427] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7656787.999120] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20878728 previous similar messages [7656793.673472] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7656793.682125] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7656800.327571] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 [7656800.339750] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7656969.963445] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7656969.975621] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages [7657189.966880] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7657189.977143] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 460 previous similar messages [7657277.060944] LNetError: 123486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7657277.072513] LNetError: 123486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7657377.976895] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583643399/real 1583643399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583643406 ref 2 fl Rpc:eX/2/ffffffff rc 0/-1 [7657378.004668] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20899345 previous similar messages [7657387.995031] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7657388.005730] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20907373 previous similar messages [7657394.767245] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7657394.775725] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7657575.970133] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7657575.982306] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages [7657695.554465] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7657695.566643] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 7 previous similar messages [7657790.972497] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 2 seconds [7657790.982756] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 576 previous similar messages [7657879.177551] LNetError: 123486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7657879.189180] LNetError: 123486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7657977.983562] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583643999/real 1583643999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583644006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7657978.011334] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21407073 previous similar messages [7657988.001671] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7657988.012367] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21475788 previous similar messages [7657995.862109] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7657995.870584] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7658179.976833] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7658179.989005] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 245 previous similar messages [7658300.620343] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7658300.632520] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7658391.980183] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds [7658391.990443] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 573 previous similar messages [7658481.258113] LNetError: 123486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7658481.269689] LNetError: 123486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7658577.990251] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583644599/real 1583644599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583644606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7658578.018021] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21522859 previous similar messages [7658588.008360] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7658588.019062] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21460741 previous similar messages [7658596.956843] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7658596.965318] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7658782.295955] LNetError: 123486:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7658782.308219] LNetError: 123486:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 238 previous similar messages [7658902.671974] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7658902.684183] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages [7658993.985883] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds [7658993.996144] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 601 previous similar messages [7659083.322003] LNetError: 59168:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7659083.333489] LNetError: 59168:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7659177.996927] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583645199/real 1583645199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583645206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7659178.024696] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21376713 previous similar messages [7659188.015067] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7659188.025765] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21366392 previous similar messages [7659198.052244] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7659198.060780] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7659382.990225] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7659383.002404] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 237 previous similar messages [7659506.879803] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7659506.891997] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7659594.992577] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds [7659595.002834] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 541 previous similar messages [7659685.537920] LNetError: 58838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7659685.549441] LNetError: 58838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7659778.003620] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583645799/real 1583645799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583645806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7659778.031389] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21051255 previous similar messages [7659788.021733] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7659788.032472] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21026314 previous similar messages [7659799.145193] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7659799.153728] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7659984.608988] LNetError: 20306:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7659984.621154] LNetError: 20306:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 241 previous similar messages [7660112.137464] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 [7660112.149667] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7660198.999274] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds [7660199.009617] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 561 previous similar messages [7660287.791686] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7660287.803233] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7660378.010102] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583646399/real 1583646399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583646406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7660378.037870] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20533792 previous similar messages [7660388.028203] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7660388.038905] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20559519 previous similar messages [7660400.113626] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7660400.122143] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7660584.823207] LNetError: 32309:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7660584.835433] LNetError: 32309:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 228 previous similar messages [7660805.005447] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds [7660805.015705] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 524 previous similar messages [7660839.132558] LNetError: 80403:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7660889.976845] LNetError: 42002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7660889.988338] LNetError: 42002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7660978.016263] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583646999/real 1583646999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583647006 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7660978.044037] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20796027 previous similar messages [7660988.034365] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7660988.045062] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20791806 previous similar messages [7661001.205875] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7661001.214458] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7661011.376739] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7661011.388920] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7661190.009444] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7661190.021624] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 233 previous similar messages [7661409.011710] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds [7661409.021969] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 630 previous similar messages [7661491.990011] LNetError: 86516:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7661492.001489] LNetError: 86516:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7661578.022472] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583647599/real 1583647599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583647606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7661578.050246] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21218565 previous similar messages [7661588.040539] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7661588.051240] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21226705 previous similar messages [7661603.173017] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7661603.181495] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7661612.379873] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7661612.392061] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages [7661791.015581] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7661791.027749] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 250 previous similar messages [7661911.208791] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7662015.018855] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7662015.029111] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 554 previous similar messages [7662093.972674] LNetError: 42002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7662093.984159] LNetError: 42002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7662178.028532] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583648199/real 1583648199] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583648206 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7662178.056298] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20893819 previous similar messages [7662188.046624] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7662188.057344] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20895297 previous similar messages [7662205.226957] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7662205.235473] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7662214.359987] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7662214.372196] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages [7662270.481510] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7662391.022751] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7662391.034934] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages [7662617.024245] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds [7662617.034504] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 580 previous similar messages [7662695.997049] LNetError: 42002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7662696.008530] LNetError: 42002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7662778.035053] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583648799/real 1583648799] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583648806 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7662778.062822] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21056487 previous similar messages [7662788.053175] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7662788.063875] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21053738 previous similar messages [7662806.280648] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7662806.289120] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7662815.377582] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 [7662815.389762] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7662916.759459] LNetError: 80404:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7662991.029458] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7662991.041632] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 253 previous similar messages [7663218.030955] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7663218.041216] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 617 previous similar messages [7663297.960257] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7663297.971740] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7663378.041749] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583649399/real 1583649399] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583649406 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7663378.069518] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21394994 previous similar messages [7663388.059845] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7663388.070545] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21404069 previous similar messages [7663408.248738] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7663408.257214] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7663421.351358] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7663421.363537] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 8 previous similar messages [7663546.472331] LNetError: 80402:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7663594.035131] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7663594.047300] LNetError: 80392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 245 previous similar messages [7663819.037618] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds [7663819.047878] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 618 previous similar messages [7663869.765196] LNetError: 80409:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.224@o2ib7 added to recovery queue. Health = 900 [7663869.778339] LNetError: 80409:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 2 previous similar messages [7663869.789366] LustreError: 2946:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(4194304) req@ffff9c11a9588850 x1659415032616320/t0(0) o3->1ccff414-1582-4@10.50.5.29@o2ib2:426/0 lens 488/440 e 0 to 0 dl 1583649921 ref 1 fl Interpret:/0/0 rc 0/0 [7663869.812316] LustreError: 2946:0:(ldlm_lib.c:3271:target_bulk_io()) Skipped 3 previous similar messages [7663869.822100] Lustre: fir-OST001d: Bulk IO read error with 1ccff414-1582-4 (at 10.50.5.29@o2ib2), client will retry: rc -110 [7663869.833329] Lustre: Skipped 3 previous similar messages [7663899.883683] LNetError: 42002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7663899.895169] LNetError: 42002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7663978.048351] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583649999/real 1583649999] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583650006 ref 2 fl Rpc:eX/2/ffffffff rc 0/-1 [7663978.076127] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 21840089 previous similar messages [7663988.066479] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7663988.077177] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 21832206 previous similar messages [7664010.239079] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7664010.247541] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7664023.339908] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 [7664023.352107] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages [7664076.236280] Lustre: fir-OST0023: haven't heard from client 1ccff414-1582-4 (at 10.50.5.29@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c3fedacb400, cur 1583650098 expire 1583649948 last 1583649871 [7664076.256343] Lustre: Skipped 4 previous similar messages [7664078.234762] Lustre: fir-OST0019: haven't heard from client 1ccff414-1582-4 (at 10.50.5.29@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4575786800, cur 1583650100 expire 1583649950 last 1583649873 [7664078.255135] Lustre: Skipped 1 previous similar message [7664080.238751] Lustre: fir-OST001f: haven't heard from client 1ccff414-1582-4 (at 10.50.5.29@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c3fcf7f7c00, cur 1583650102 expire 1583649952 last 1583649875 [7664083.229587] Lustre: fir-OST0021: haven't heard from client 1ccff414-1582-4 (at 10.50.5.29@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c3fedac9800, cur 1583650105 expire 1583649955 last 1583649878 [7664195.021798] LNetError: 94959:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.106@o2ib7 added to recovery queue. Health = 900 [7664195.033974] LNetError: 94959:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages [7664280.905605] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7664342.420486] LNetError: 80401:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) [7664425.044298] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds [7664425.054559] LNet: 80392:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 603 previous similar messages [7664502.327548] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error [7664502.339070] LNetError: 84226:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 55 previous similar messages [7664578.055010] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1583650599/real 1583650599] req@ffff9c2e05194380 x1652574281110560/t0(0) o106->fir-OST0019@10.9.0.63@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1583650606 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 [7664578.082806] Lustre: 124006:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 20170191 previous similar messages [7664588.073130] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.0.63@o2ib4 from <?> [7664588.083829] LNetError: 80427:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 20095893 previous similar messages [7664612.229893] Lustre: fir-OST0019: Client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) reconnecting [7664612.238464] Lustre: fir-OST0019: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) [7664612.247241] Lustre: Skipped 6 previous similar messages [7664627.821583] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.235@o2ib7: -125 [7664627.833810] LNetError: 80409:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages [7664653.864670] ll_ost_io03_079 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664653.873113] ll_ost_io03_079 cpuset=/ mems_allowed=3 [7664653.878176] CPU: 15 PID: 90700 Comm: ll_ost_io03_079 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664653.891553] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664653.899386] Call Trace: [7664653.902017] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664653.907335] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664653.912820] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664653.918655] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664653.924410] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664653.930421] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664653.936783] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664653.942882] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664653.948636] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664653.955170] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664653.961705] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664653.967894] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664653.973908] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664653.980020] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664653.986908] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664653.993066] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664654.000159] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664654.006729] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664654.013384] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664654.020741] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664654.027489] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664654.034836] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664654.042367] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664654.049294] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664654.056387] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664654.064149] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664654.071411] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664654.079284] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664654.086258] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664654.091703] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664654.098183] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664654.105756] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664654.110818] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664654.117094] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664654.123713] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664654.129985] Mem-Info: [7664654.132457] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33364 inactive_file:35739 isolated_file:1504 unevictable:9044 dirty:0 writeback:8 unstable:0 slab_reclaimable:824017 slab_unreclaimable:62296359 mapped:1720 shmem:0 pagetables:2953 bounce:0 free:590242 free_pcp:2 free_cma:0 [7664654.166731] Node 3 Normal free:525304kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:37912kB inactive_file:40156kB unevictable:840kB isolated(anon):0kB isolated(file):5632kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:28kB mapped:844kB shmem:0kB slab_reclaimable:854176kB slab_unreclaimable:62369264kB kernel_stack:4224kB pagetables:3252kB unstable:0kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:225984 all_unreclaimable? yes [7664654.213683] lowmem_reserve[]: 0 0 0 0 [7664654.217653] Node 3 Normal: 131743*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 526972kB [7664654.230062] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664654.238938] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664654.247552] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664654.256426] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664654.265043] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664654.273916] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664654.282529] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664654.291401] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664654.300009] 72622 total pagecache pages [7664654.304021] 0 pages in swap cache [7664654.307514] Swap cache stats: add 21120185, delete 21136157, find 4513346/7609731 [7664654.315163] Free swap = 2001536kB [7664654.318745] Total swap = 4194300kB [7664654.322325] 66993253 pages RAM [7664654.325556] 0 pages HighMem/MovableOnly [7664654.329569] 1101945 pages reserved [7664654.333158] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664654.341205] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664654.350165] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664654.358960] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664654.367551] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664654.375733] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664654.383999] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664654.392614] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664654.400875] [53099] 0 53099 6670 239 18 649 0 smartd [7664654.409053] [53101] 0 53101 1910 64 9 172 0 mdadm [7664654.417141] [53104] 0 53104 74785 315 85 275 0 sssd [7664654.425142] [53106] 0 53106 5514 191 15 219 0 irqbalance [7664654.433669] [53108] 0 53108 38960 175 19 84 0 dsm_sa_eventmgr [7664654.442629] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664654.450978] [53139] 997 53139 29446 250 28 128 0 chronyd [7664654.459242] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664654.467502] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664654.475853] [53179] 0 53179 71689 281 85 227 0 sssd_pam [7664654.484204] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664654.493076] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664654.501078] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664654.509434] [53863] 0 53863 176656 249 39 1246 0 collectd [7664654.517788] [53969] 0 53969 31572 205 20 168 0 crond [7664654.525874] [54035] 0 54035 27526 164 10 33 0 agetty [7664654.534048] [54036] 0 54036 27526 158 11 33 0 agetty [7664654.542229] [54186] 0 54186 22934 210 46 272 0 master [7664654.550410] [54206] 89 54206 25545 272 47 271 0 qmgr [7664654.558540] [36317] 0 36317 28294 187 14 61 0 bash [7664654.566548] [36328] 0 36328 154746 223 201 98 0 journalctl [7664654.575078] [36329] 0 36329 28177 160 14 55 0 grep [7664654.583176] [117987] 0 117987 283356 297 509 230727 0 python [7664654.591542] [76204] 89 76204 25501 252 46 282 0 pickup [7664654.599720] [97037] 0 97037 50542 270 55 2086 0 lustre.py [7664654.608156] [97087] 0 97087 34453 276 25 1402 0 mdraid.py [7664654.616590] [97088] 0 97088 51294 272 55 2323 0 lustre-oss-expo [7664654.625552] [97173] 0 97173 48653 264 49 261 0 crond [7664654.633646] [97192] 0 97192 34468 258 25 1344 0 python3 [7664654.641915] [97789] 0 97789 44960 255 44 1248 0 lustre.py [7664654.650355] [97872] 0 97872 48653 263 49 263 0 crond [7664654.658442] [97890] 0 97890 31176 229 18 734 0 python3 [7664654.666712] [98004] 0 98004 31176 237 18 711 0 mdraid.py [7664654.675152] [98087] 0 98087 45129 286 46 1400 0 lustre-oss-expo [7664654.684111] [98530] 0 98530 31341 228 18 642 0 lustre.py [7664654.692543] [98579] 0 98579 48653 266 49 235 0 crond [7664654.700632] [98713] 0 98713 30977 243 16 529 0 python3 [7664654.708900] [98967] 0 98967 30977 239 19 528 0 mdraid.py [7664654.717340] [99292] 0 99292 48653 257 49 261 0 crond [7664654.725428] [99349] 0 99349 4779 217 14 463 0 lustre-oss-expo [7664654.734387] [99450] 0 99450 30913 236 18 446 0 python3 [7664654.742648] [99592] 89 99592 25538 229 47 273 0 cleanup [7664654.750916] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664654.759878] [100032] 0 100032 48653 266 49 240 0 crond [7664654.768147] [100105] 89 100105 25553 264 47 274 0 smtp [7664654.776326] [100203] 0 100203 30816 222 17 351 0 python3 [7664654.784758] [100288] 0 100288 4568 176 14 235 0 lustre.py [7664654.793364] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664654.801112] Killed process 97088 (lustre-oss-expo) total-vm:205176kB, anon-rss:0kB, file-rss:1088kB, shmem-rss:0kB [7664654.830370] lustre-oss-expo: page allocation failure: order:0, mode:0x200da [7664654.837519] CPU: 37 PID: 97088 Comm: lustre-oss-expo Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664654.850909] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664654.858748] Call Trace: [7664654.861396] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664654.866716] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664654.872815] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664654.878401] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664654.884415] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664654.890948] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664654.897486] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664654.903326] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664654.909943] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664654.916217] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664654.922148] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664654.928158] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664654.934082] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664654.940008] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664654.945590] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664654.950909] Mem-Info: [7664654.953388] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34512 inactive_file:35547 isolated_file:992 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823987 slab_unreclaimable:62296552 mapped:1726 shmem:0 pagetables:2953 bounce:0 free:590386 free_pcp:0 free_cma:0 [7664654.987574] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664655.029327] lowmem_reserve[]: 0 1418 63868 63868 [7664655.034248] Node 0 DMA32 free:261320kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:900kB inactive_file:3388kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:180kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686436kB kernel_stack:384kB pagetables:16kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:31147 all_unreclaimable? yes [7664655.079208] lowmem_reserve[]: 0 0 62450 62450 [7664655.083876] Node 0 Normal free:508328kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:43872kB inactive_file:44332kB unevictable:168kB isolated(anon):0kB isolated(file):512kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610908kB slab_unreclaimable:60243084kB kernel_stack:6112kB pagetables:3188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:332616 all_unreclaimable? yes [7664655.130651] lowmem_reserve[]: 0 0 0 0 [7664655.134628] Node 1 Normal free:525316kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15556kB inactive_file:15728kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3800kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:64724 all_unreclaimable? yes [7664655.181404] lowmem_reserve[]: 0 0 0 0 [7664655.185380] Node 2 Normal free:525440kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:34280kB inactive_file:37052kB unevictable:8680kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5708kB shmem:0kB slab_reclaimable:715124kB slab_unreclaimable:62476100kB kernel_stack:7936kB pagetables:1556kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:283481 all_unreclaimable? yes [7664655.232244] lowmem_reserve[]: 0 0 0 0 [7664655.236220] Node 3 Normal free:523388kB min:525460kB low:656824kB high:788188kB active_anon:24kB inactive_anon:0kB active_file:43056kB inactive_file:41952kB unevictable:840kB isolated(anon):0kB isolated(file):2560kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854176kB slab_unreclaimable:62369240kB kernel_stack:4224kB pagetables:3252kB unstable:0kB bounce:0kB free_pcp:2248kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:784740 all_unreclaimable? yes [7664655.283429] lowmem_reserve[]: 0 0 0 0 [7664655.287403] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664655.302241] Node 0 DMA32: 366*4kB (EM) 393*8kB (UEM) 1217*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261456kB [7664655.318647] Node 0 Normal: 6433*4kB (UEM) 5775*8kB (UEM) 3898*16kB (UEM) 4479*32kB (EM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508668kB [7664655.335401] Node 1 Normal: 87993*4kB (EM) 21668*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525316kB [7664655.348469] Node 2 Normal: 27410*4kB (UEM) 40141*8kB (UEM) 837*16kB (UEM) 1683*32kB (UEM) 428*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525408kB [7664655.363957] Node 3 Normal: 131115*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524460kB [7664655.376308] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664655.385175] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664655.393780] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664655.402646] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664655.411253] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664655.420119] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664655.428725] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664655.437590] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664655.446199] 72919 total pagecache pages [7664655.450219] 0 pages in swap cache [7664655.453713] Swap cache stats: add 21120253, delete 21136225, find 4513355/7609757 [7664655.461367] Free swap = 2001492kB [7664655.464953] Total swap = 4194300kB [7664655.468534] 66993253 pages RAM [7664655.471774] 0 pages HighMem/MovableOnly [7664655.475787] 1101945 pages reserved [7664655.633560] ll_ost_io02_095 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664655.642001] ll_ost_io02_095 cpuset=/ mems_allowed=2 [7664655.647061] CPU: 2 PID: 8682 Comm: ll_ost_io02_095 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664655.660270] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664655.668102] Call Trace: [7664655.670735] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664655.676051] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664655.681548] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664655.687391] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664655.693142] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664655.699156] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664655.705506] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664655.711599] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664655.717349] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664655.723883] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664655.730418] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664655.736604] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664655.742609] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664655.748720] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664655.755603] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664655.762831] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664655.769003] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664655.775624] [<ffffffffa021bd89>] ? ___slab_alloc+0x209/0x4f0 [7664655.781554] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664655.787321] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664655.793765] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664655.800646] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664655.806187] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664655.813284] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664655.821039] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664655.828301] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664655.836180] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664655.843194] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664655.851156] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664655.858459] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664655.864969] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664655.872575] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664655.877630] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664655.883899] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664655.890518] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664655.896783] Mem-Info: [7664655.899255] active_anon:0 inactive_anon:13 isolated_anon:0 active_file:34972 inactive_file:34812 isolated_file:1728 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823987 slab_unreclaimable:62296531 mapped:1727 shmem:0 pagetables:2953 bounce:0 free:589976 free_pcp:0 free_cma:0 [7664655.933624] Node 2 Normal free:525408kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:34284kB inactive_file:36920kB unevictable:8680kB isolated(anon):0kB isolated(file):128kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5708kB shmem:0kB slab_reclaimable:715124kB slab_unreclaimable:62476132kB kernel_stack:7936kB pagetables:1556kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:365277 all_unreclaimable? yes [7664655.980685] lowmem_reserve[]: 0 0 0 0 [7664655.984654] Node 2 Normal: 27413*4kB (UEM) 40141*8kB (UEM) 837*16kB (UEM) 1683*32kB (UEM) 428*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525420kB [7664656.000142] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664656.009012] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664656.017624] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664656.026490] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664656.035097] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664656.043963] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664656.052571] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664656.061443] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664656.070049] 72934 total pagecache pages [7664656.074063] 0 pages in swap cache [7664656.077554] Swap cache stats: add 21120266, delete 21136238, find 4513356/7609760 [7664656.085206] Free swap = 2010452kB [7664656.088787] Total swap = 4194300kB [7664656.092366] 66993253 pages RAM [7664656.095598] 0 pages HighMem/MovableOnly [7664656.099611] 1101945 pages reserved [7664656.103191] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664656.111239] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664656.120198] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664656.128993] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664656.137583] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664656.145765] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664656.154032] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664656.162638] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664656.170898] [53099] 0 53099 6670 239 18 649 0 smartd [7664656.179081] [53101] 0 53101 1910 64 9 172 0 mdadm [7664656.187174] [53104] 0 53104 74785 315 85 275 0 sssd [7664656.195181] [53106] 0 53106 5514 191 15 219 0 irqbalance [7664656.203702] [53108] 0 53108 38960 175 19 84 0 dsm_sa_eventmgr [7664656.212663] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664656.221017] [53139] 997 53139 29446 250 28 128 0 chronyd [7664656.229277] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664656.237537] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664656.245889] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664656.254247] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664656.263133] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664656.271155] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664656.279515] [53863] 0 53863 176656 249 39 1246 0 collectd [7664656.287864] [53969] 0 53969 31572 205 20 168 0 crond [7664656.295950] [54035] 0 54035 27526 164 10 33 0 agetty [7664656.304131] [54036] 0 54036 27526 158 11 33 0 agetty [7664656.312304] [54186] 0 54186 22934 210 46 272 0 master [7664656.320477] [54206] 89 54206 25545 272 47 271 0 qmgr [7664656.328593] [36317] 0 36317 28294 187 14 61 0 bash [7664656.336597] [36328] 0 36328 154746 223 201 98 0 journalctl [7664656.345117] [36329] 0 36329 28177 160 14 55 0 grep [7664656.353199] [117987] 0 117987 283356 297 509 230727 0 python [7664656.361565] [76204] 89 76204 25501 252 46 282 0 pickup [7664656.369742] [97037] 0 97037 50542 270 55 2086 0 lustre.py [7664656.378182] [97087] 0 97087 34453 276 25 1402 0 mdraid.py [7664656.386628] [97173] 0 97173 48653 264 49 261 0 crond [7664656.394729] [97192] 0 97192 34468 258 25 1344 0 python3 [7664656.402994] [97789] 0 97789 44960 255 44 1248 0 lustre.py [7664656.411426] [97872] 0 97872 48653 263 49 263 0 crond [7664656.419512] [97890] 0 97890 31176 229 18 734 0 python3 [7664656.427773] [98004] 0 98004 31176 237 18 711 0 mdraid.py [7664656.436215] [98087] 0 98087 45129 286 46 1400 0 lustre-oss-expo [7664656.445175] [98530] 0 98530 31341 228 18 642 0 lustre.py [7664656.453607] [98579] 0 98579 48653 266 49 235 0 crond [7664656.461692] [98713] 0 98713 30977 243 16 529 0 python3 [7664656.469953] [98967] 0 98967 30977 239 19 528 0 mdraid.py [7664656.478385] [99292] 0 99292 48653 257 49 261 0 crond [7664656.486472] [99349] 0 99349 4779 217 14 469 0 lustre-oss-expo [7664656.495433] [99450] 0 99450 30913 236 18 446 0 python3 [7664656.503693] [99592] 89 99592 25538 229 47 273 0 cleanup [7664656.511956] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664656.520913] [100032] 0 100032 48653 266 49 240 0 crond [7664656.529173] [100105] 89 100105 25553 264 47 274 0 smtp [7664656.537347] [100203] 0 100203 30816 222 17 351 0 python3 [7664656.545789] [100288] 0 100288 4568 176 14 235 0 lustre.py [7664656.554400] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664656.562146] Killed process 97037 (lustre.py) total-vm:202168kB, anon-rss:0kB, file-rss:1080kB, shmem-rss:0kB [7664656.574178] lustre.py: page allocation failure: order:0, mode:0x200da [7664656.580795] CPU: 34 PID: 97037 Comm: lustre.py Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664656.593653] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664656.601478] Call Trace: [7664656.604113] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664656.609431] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664656.615527] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664656.621110] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664656.627117] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664656.633653] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664656.640191] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664656.646029] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664656.652648] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664656.658916] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664656.664846] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664656.670860] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664656.676793] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664656.682714] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664656.688289] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664656.693609] Mem-Info: [7664656.696089] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34476 inactive_file:35213 isolated_file:2208 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823987 slab_unreclaimable:62296527 mapped:1725 shmem:0 pagetables:2898 bounce:0 free:590281 free_pcp:0 free_cma:0 [7664656.730358] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664656.772117] lowmem_reserve[]: 0 1418 63868 63868 [7664656.777049] Node 0 DMA32 free:261312kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:936kB inactive_file:3376kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:176kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686388kB kernel_stack:384kB pagetables:16kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8546 all_unreclaimable? no [7664656.821833] lowmem_reserve[]: 0 0 62450 62450 [7664656.826495] Node 0 Normal free:508576kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:47604kB inactive_file:43356kB unevictable:168kB isolated(anon):0kB isolated(file):7936kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610908kB slab_unreclaimable:60243024kB kernel_stack:5952kB pagetables:3068kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:270135 all_unreclaimable? yes [7664656.873349] lowmem_reserve[]: 0 0 0 0 [7664656.877325] Node 1 Normal free:525352kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15636kB inactive_file:15620kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3792kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:98366 all_unreclaimable? yes [7664656.924104] lowmem_reserve[]: 0 0 0 0 [7664656.928078] Node 2 Normal free:525420kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:29520kB inactive_file:30544kB unevictable:8680kB isolated(anon):0kB isolated(file):4608kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5708kB shmem:0kB slab_reclaimable:715124kB slab_unreclaimable:62476132kB kernel_stack:7936kB pagetables:1544kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11722 all_unreclaimable? no [7664656.975025] lowmem_reserve[]: 0 0 0 0 [7664656.978996] Node 3 Normal free:524560kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:43400kB inactive_file:42512kB unevictable:840kB isolated(anon):0kB isolated(file):384kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854176kB slab_unreclaimable:62369244kB kernel_stack:4224kB pagetables:3172kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:237933 all_unreclaimable? yes [7664657.025769] lowmem_reserve[]: 0 0 0 0 [7664657.029735] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664657.044573] Node 0 DMA32: 392*4kB (UEM) 393*8kB (UEM) 1216*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261544kB [7664657.061075] Node 0 Normal: 6387*4kB (UEM) 5782*8kB (UEM) 3897*16kB (EM) 4480*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508556kB [7664657.077830] Node 1 Normal: 88002*4kB (EM) 21668*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525352kB [7664657.090899] Node 2 Normal: 27641*4kB (UEM) 40336*8kB (UEM) 873*16kB (UEM) 1690*32kB (UEM) 428*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 528692kB [7664657.106385] Node 3 Normal: 131140*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524560kB [7664657.118824] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664657.127690] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664657.136296] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664657.145165] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664657.153779] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664657.162654] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664657.171268] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664657.180132] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664657.188738] 72840 total pagecache pages [7664657.192754] 0 pages in swap cache [7664657.196243] Swap cache stats: add 21120266, delete 21136238, find 4513356/7609760 [7664657.203896] Free swap = 2010452kB [7664657.207476] Total swap = 4194300kB [7664657.211057] 66993253 pages RAM [7664657.214289] 0 pages HighMem/MovableOnly [7664657.218303] 1101945 pages reserved [7664657.421407] ll_ost_io01_077 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [7664657.429593] ll_ost_io01_077 cpuset=/ mems_allowed=1 [7664657.434655] CPU: 41 PID: 90482 Comm: ll_ost_io01_077 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664657.448033] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664657.455857] Call Trace: [7664657.458492] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664657.463807] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664657.469296] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664657.475134] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664657.480882] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664657.486893] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664657.493248] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664657.499342] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664657.505095] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664657.511626] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664657.518157] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664657.524407] [<ffffffffc124293f>] tgt_checksum_niobuf_rw+0xbf/0xe00 [ptlrpc] [7664657.531664] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664657.538979] [<ffffffffc172e0ac>] ? ofd_preprw+0x5dc/0x11b0 [ofd] [7664657.545274] [<ffffffffc0cb71e0>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [7664657.552104] [<ffffffffc1247325>] tgt_brw_read+0xc35/0x1e50 [ptlrpc] [7664657.558649] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664657.565999] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664657.573345] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664657.580858] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664657.587782] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664657.594868] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664657.602624] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664657.609889] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664657.617754] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664657.624717] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664657.630157] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664657.636629] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664657.644198] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664657.649258] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664657.655526] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664657.662147] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664657.668419] Mem-Info: [7664657.670879] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32542 inactive_file:34474 isolated_file:4384 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823987 slab_unreclaimable:62296487 mapped:1642 shmem:0 pagetables:2843 bounce:0 free:590338 free_pcp:0 free_cma:0 [7664657.705154] Node 1 Normal free:525404kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:14996kB inactive_file:15620kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:204093 all_unreclaimable? yes [7664657.752011] lowmem_reserve[]: 0 0 0 0 [7664657.755980] Node 1 Normal: 88026*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525472kB [7664657.769052] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664657.777928] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664657.786542] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664657.795414] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664657.804019] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664657.812885] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664657.821492] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664657.830358] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664657.838962] 72858 total pagecache pages [7664657.842977] 0 pages in swap cache [7664657.846468] Swap cache stats: add 21120271, delete 21136243, find 4513357/7609762 [7664657.854120] Free swap = 2018900kB [7664657.857701] Total swap = 4194300kB [7664657.861282] 66993253 pages RAM [7664657.864513] 0 pages HighMem/MovableOnly [7664657.868524] 1101945 pages reserved [7664657.872103] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664657.880156] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664657.889113] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664657.897907] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664657.906483] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664657.914663] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664657.922929] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664657.931538] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664657.939803] [53099] 0 53099 6670 239 18 649 0 smartd [7664657.947975] [53101] 0 53101 1910 64 9 172 0 mdadm [7664657.956062] [53104] 0 53104 74785 315 85 275 0 sssd [7664657.964062] [53106] 0 53106 5514 189 15 219 0 irqbalance [7664657.972582] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664657.981543] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664657.989888] [53139] 997 53139 29446 250 28 128 0 chronyd [7664657.998149] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664658.006416] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664658.014771] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664658.023127] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664658.032003] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664658.040011] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664658.048364] [53863] 0 53863 176656 246 39 1246 0 collectd [7664658.056718] [53969] 0 53969 31572 205 20 168 0 crond [7664658.064804] [54035] 0 54035 27526 164 10 33 0 agetty [7664658.072976] [54036] 0 54036 27526 158 11 33 0 agetty [7664658.081150] [54186] 0 54186 22934 210 46 272 0 master [7664658.089332] [54206] 89 54206 25545 272 47 271 0 qmgr [7664658.097427] [36317] 0 36317 28294 187 14 61 0 bash [7664658.105435] [36328] 0 36328 154746 223 201 98 0 journalctl [7664658.113962] [36329] 0 36329 28177 160 14 55 0 grep [7664658.122053] [117987] 0 117987 283356 282 509 230727 0 python [7664658.130423] [76204] 89 76204 25501 252 46 282 0 pickup [7664658.138608] [97087] 0 97087 34453 266 25 1402 0 mdraid.py [7664658.147046] [97173] 0 97173 48653 264 49 261 0 crond [7664658.155141] [97192] 0 97192 34468 247 25 1344 0 python3 [7664658.163408] [97789] 0 97789 44960 250 44 1248 0 lustre.py [7664658.171853] [97872] 0 97872 48653 263 49 263 0 crond [7664658.179943] [97890] 0 97890 31176 214 18 734 0 python3 [7664658.188213] [98004] 0 98004 31176 224 18 711 0 mdraid.py [7664658.196653] [98087] 0 98087 45129 249 46 1400 0 lustre-oss-expo [7664658.205615] [98530] 0 98530 31341 224 18 642 0 lustre.py [7664658.214056] [98579] 0 98579 48653 266 49 235 0 crond [7664658.222151] [98713] 0 98713 30977 230 16 529 0 python3 [7664658.230418] [98967] 0 98967 30977 227 19 528 0 mdraid.py [7664658.238852] [99292] 0 99292 48653 257 49 261 0 crond [7664658.246947] [99349] 0 99349 4779 194 14 469 0 lustre-oss-expo [7664658.255909] [99450] 0 99450 30913 226 18 446 0 python3 [7664658.264175] [99592] 89 99592 25538 229 47 273 0 cleanup [7664658.272443] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664658.281408] [100032] 0 100032 48653 266 49 240 0 crond [7664658.289674] [100105] 89 100105 25553 264 47 274 0 smtp [7664658.297857] [100203] 0 100203 30816 209 17 333 0 python3 [7664658.306300] [100288] 0 100288 4568 160 14 235 0 lustre.py [7664658.314910] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664658.322659] Killed process 98087 (lustre-oss-expo) total-vm:180516kB, anon-rss:0kB, file-rss:996kB, shmem-rss:0kB [7664658.400559] lustre-oss-expo: page allocation failure: order:0, mode:0x200da [7664658.407702] CPU: 43 PID: 98087 Comm: lustre-oss-expo Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664658.421080] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664658.428910] Call Trace: [7664658.431543] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664658.436862] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664658.442961] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664658.448536] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664658.454551] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664658.461083] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664658.467610] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664658.473444] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664658.480062] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664658.486330] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664658.492249] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664658.498255] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664658.504175] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664658.510094] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664658.515667] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664658.520979] Mem-Info: [7664658.523453] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33535 inactive_file:35253 isolated_file:3840 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823990 slab_unreclaimable:62296479 mapped:1642 shmem:0 pagetables:2843 bounce:0 free:590288 free_pcp:0 free_cma:0 [7664658.557720] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664658.599474] lowmem_reserve[]: 0 1418 63868 63868 [7664658.604398] Node 0 DMA32 free:261344kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:964kB inactive_file:3568kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:176kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686312kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9892 all_unreclaimable? yes [7664658.649263] lowmem_reserve[]: 0 0 62450 62450 [7664658.653930] Node 0 Normal free:508440kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:40556kB inactive_file:41756kB unevictable:168kB isolated(anon):0kB isolated(file):7936kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610908kB slab_unreclaimable:60242984kB kernel_stack:5984kB pagetables:3012kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:524518 all_unreclaimable? yes [7664658.700795] lowmem_reserve[]: 0 0 0 0 [7664658.704768] Node 1 Normal free:525472kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15660kB inactive_file:15492kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:204093 all_unreclaimable? yes [7664658.751625] lowmem_reserve[]: 0 0 0 0 [7664658.755597] Node 2 Normal free:524828kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32208kB inactive_file:38884kB unevictable:8680kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5376kB shmem:0kB slab_reclaimable:715124kB slab_unreclaimable:62476112kB kernel_stack:7936kB pagetables:1480kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:282508 all_unreclaimable? yes [7664658.802624] lowmem_reserve[]: 0 0 0 0 [7664658.806593] Node 3 Normal free:525164kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:41172kB inactive_file:42748kB unevictable:840kB isolated(anon):0kB isolated(file):3968kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854188kB slab_unreclaimable:62369188kB kernel_stack:4224kB pagetables:3092kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4640 all_unreclaimable? no [7664658.853194] lowmem_reserve[]: 0 0 0 0 [7664658.857161] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664658.871999] Node 0 DMA32: 384*4kB (EM) 396*8kB (UEM) 1211*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261456kB [7664658.888403] Node 0 Normal: 6378*4kB (EM) 5766*8kB (UEM) 3901*16kB (UEM) 4483*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508552kB [7664658.905157] Node 1 Normal: 88026*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525472kB [7664658.918228] Node 2 Normal: 27386*4kB (UEM) 40221*8kB (UEM) 837*16kB (UEM) 1667*32kB (UEM) 421*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524992kB [7664658.933716] Node 3 Normal: 131301*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525204kB [7664658.946152] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664658.955019] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664658.963625] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664658.972491] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664658.981099] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664658.989963] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664658.998570] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664659.007436] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664659.016041] 73118 total pagecache pages [7664659.020054] 0 pages in swap cache [7664659.023547] Swap cache stats: add 21120302, delete 21136274, find 4513363/7609773 [7664659.031199] Free swap = 2018876kB [7664659.034780] Total swap = 4194300kB [7664659.038360] 66993253 pages RAM [7664659.041590] 0 pages HighMem/MovableOnly [7664659.045602] 1101945 pages reserved [7664659.063445] ll_ost_io03_047 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664659.071885] ll_ost_io03_047 cpuset=/ mems_allowed=3 [7664659.076951] CPU: 31 PID: 6896 Comm: ll_ost_io03_047 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664659.090241] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664659.098072] Call Trace: [7664659.100706] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664659.106022] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664659.111517] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664659.117357] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664659.123105] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664659.129119] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664659.135472] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664659.141574] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664659.147320] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664659.153854] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664659.160381] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664659.166568] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664659.172577] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664659.178691] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664659.185575] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664659.192808] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664659.198975] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664659.205630] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664659.212711] [<ffffffffc11dcbe7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [7664659.219373] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664659.226464] [<ffffffffc0c833c9>] ? class_handle2object+0xb9/0x1c0 [obdclass] [7664659.233772] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664659.239523] [<ffffffffa00ddd9e>] ? account_entity_dequeue+0xae/0xd0 [7664659.246049] [<ffffffffa00e192c>] ? dequeue_entity+0x11c/0x5e0 [7664659.252060] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664659.257616] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664659.264721] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664659.272480] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664659.279752] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664659.287621] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664659.294619] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664659.302575] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664659.309831] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664659.316319] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664659.323889] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664659.328951] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664659.335224] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664659.341834] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664659.348100] Mem-Info: [7664659.350563] active_anon:0 inactive_anon:3 isolated_anon:0 active_file:33780 inactive_file:35393 isolated_file:3328 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823991 slab_unreclaimable:62296484 mapped:1641 shmem:0 pagetables:2797 bounce:0 free:590287 free_pcp:0 free_cma:0 [7664659.384840] Node 3 Normal free:525168kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:12kB active_file:39756kB inactive_file:41840kB unevictable:840kB isolated(anon):0kB isolated(file):8192kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854188kB slab_unreclaimable:62369192kB kernel_stack:4224kB pagetables:3076kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:814844 all_unreclaimable? yes [7664659.431796] lowmem_reserve[]: 0 0 0 0 [7664659.435765] Node 3 Normal: 131302*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525208kB [7664659.448202] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664659.457068] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664659.465676] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664659.474540] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664659.483146] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664659.492012] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664659.500621] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664659.509494] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664659.518098] 73117 total pagecache pages [7664659.522113] 0 pages in swap cache [7664659.525604] Swap cache stats: add 21120316, delete 21136288, find 4513363/7609775 [7664659.533257] Free swap = 2024508kB [7664659.536836] Total swap = 4194300kB [7664659.540417] 66993253 pages RAM [7664659.543649] 0 pages HighMem/MovableOnly [7664659.547662] 1101945 pages reserved [7664659.551240] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664659.559290] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664659.568248] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664659.577044] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664659.585648] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664659.593824] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664659.602092] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664659.610706] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664659.618967] [53099] 0 53099 6670 239 18 649 0 smartd [7664659.627147] [53101] 0 53101 1910 64 9 172 0 mdadm [7664659.635233] [53104] 0 53104 74785 315 85 275 0 sssd [7664659.643240] [53106] 0 53106 5514 189 15 219 0 irqbalance [7664659.651761] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664659.660722] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664659.669078] [53139] 997 53139 29446 250 28 128 0 chronyd [7664659.677344] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664659.685603] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664659.693952] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664659.702303] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664659.711175] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664659.719179] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664659.727527] [53863] 0 53863 176656 246 39 1246 0 collectd [7664659.735881] [53969] 0 53969 31572 205 20 168 0 crond [7664659.743974] [54035] 0 54035 27526 164 10 33 0 agetty [7664659.752147] [54036] 0 54036 27526 158 11 33 0 agetty [7664659.760320] [54186] 0 54186 22934 210 46 272 0 master [7664659.768493] [54206] 89 54206 25545 272 47 271 0 qmgr [7664659.776612] [36317] 0 36317 28294 187 14 61 0 bash [7664659.784613] [36328] 0 36328 154746 223 201 98 0 journalctl [7664659.793133] [36329] 0 36329 28177 160 14 55 0 grep [7664659.801220] [117987] 0 117987 283356 282 509 230727 0 python [7664659.809589] [76204] 89 76204 25501 252 46 282 0 pickup [7664659.817768] [97087] 0 97087 34453 266 25 1402 0 mdraid.py [7664659.826207] [97173] 0 97173 48653 264 49 261 0 crond [7664659.834301] [97192] 0 97192 34468 247 25 1344 0 python3 [7664659.842569] [97789] 0 97789 44960 250 44 1248 0 lustre.py [7664659.851009] [97872] 0 97872 48653 263 49 263 0 crond [7664659.859095] [97890] 0 97890 31176 214 18 734 0 python3 [7664659.867355] [98004] 0 98004 31176 224 18 711 0 mdraid.py [7664659.875790] [98530] 0 98530 31341 224 18 642 0 lustre.py [7664659.884230] [98579] 0 98579 48653 266 49 235 0 crond [7664659.892315] [98713] 0 98713 30977 230 16 529 0 python3 [7664659.900576] [98967] 0 98967 30977 227 19 528 0 mdraid.py [7664659.909017] [99292] 0 99292 48653 257 49 261 0 crond [7664659.917111] [99349] 0 99349 4779 194 14 469 0 lustre-oss-expo [7664659.926065] [99450] 0 99450 30913 226 18 446 0 python3 [7664659.934332] [99592] 89 99592 25538 229 47 273 0 cleanup [7664659.942590] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664659.951544] [100032] 0 100032 48653 266 49 240 0 crond [7664659.959813] [100105] 89 100105 25553 264 47 274 0 smtp [7664659.967995] [100203] 0 100203 30816 203 17 333 0 python3 [7664659.976437] [100288] 0 100288 4568 160 14 235 0 lustre.py [7664659.985052] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664659.992807] Killed process 97087 (mdraid.py) total-vm:137812kB, anon-rss:0kB, file-rss:1064kB, shmem-rss:0kB [7664660.348817] mdraid.py: page allocation failure: order:0, mode:0x200da [7664660.355439] CPU: 8 PID: 97087 Comm: mdraid.py Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664660.368209] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664660.376042] Call Trace: [7664660.378676] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664660.383996] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664660.390092] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664660.395666] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664660.401683] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664660.408216] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664660.414750] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664660.420593] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664660.427212] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664660.433486] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664660.439407] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664660.445420] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664660.451339] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664660.457256] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664660.462830] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664660.468143] Mem-Info: [7664660.470620] active_anon:0 inactive_anon:3 isolated_anon:0 active_file:33385 inactive_file:34779 isolated_file:3621 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823991 slab_unreclaimable:62296475 mapped:1635 shmem:0 pagetables:2797 bounce:0 free:590472 free_pcp:0 free_cma:0 [7664660.504889] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664660.546638] lowmem_reserve[]: 0 1418 63868 63868 [7664660.551560] Node 0 DMA32 free:261328kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:984kB inactive_file:3288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:172kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686312kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:113582 all_unreclaimable? yes [7664660.596601] lowmem_reserve[]: 0 0 62450 62450 [7664660.601266] Node 0 Normal free:508652kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:46788kB inactive_file:46692kB unevictable:168kB isolated(anon):0kB isolated(file):2048kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610908kB slab_unreclaimable:60242932kB kernel_stack:5952kB pagetables:2896kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:656327 all_unreclaimable? yes [7664660.648125] lowmem_reserve[]: 0 0 0 0 [7664660.652095] Node 1 Normal free:525472kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15580kB inactive_file:15572kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:204093 all_unreclaimable? yes [7664660.698960] lowmem_reserve[]: 0 0 0 0 [7664660.702936] Node 2 Normal free:525336kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31276kB inactive_file:33872kB unevictable:8680kB isolated(anon):0kB isolated(file):4864kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5352kB shmem:0kB slab_reclaimable:715128kB slab_unreclaimable:62476144kB kernel_stack:7936kB pagetables:1428kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:764000 all_unreclaimable? yes [7664660.750056] lowmem_reserve[]: 0 0 0 0 [7664660.754028] Node 3 Normal free:525196kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:12kB active_file:38940kB inactive_file:43736kB unevictable:840kB isolated(anon):0kB isolated(file):7444kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854188kB slab_unreclaimable:62369192kB kernel_stack:4224kB pagetables:3076kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2078411 all_unreclaimable? yes [7664660.801078] lowmem_reserve[]: 0 0 0 0 [7664660.805046] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664660.819880] Node 0 DMA32: 381*4kB (EM) 395*8kB (UEM) 1211*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261436kB [7664660.836288] Node 0 Normal: 6399*4kB (UEM) 5773*8kB (UEM) 3897*16kB (EM) 4481*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508564kB [7664660.853042] Node 1 Normal: 88026*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525472kB [7664660.866112] Node 2 Normal: 27446*4kB (UEM) 40274*8kB (UEM) 838*16kB (UEM) 1667*32kB (UEM) 421*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525672kB [7664660.881599] Node 3 Normal: 131308*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525232kB [7664660.894037] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664660.902902] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664660.911508] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664660.920374] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664660.928982] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664660.937849] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664660.946454] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664660.955320] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664660.963926] 72976 total pagecache pages [7664660.967941] 0 pages in swap cache [7664660.971431] Swap cache stats: add 21120316, delete 21136288, find 4513363/7609775 [7664660.979083] Free swap = 2024508kB [7664660.982663] Total swap = 4194300kB [7664660.986245] 66993253 pages RAM [7664660.989481] 0 pages HighMem/MovableOnly [7664660.993494] 1101945 pages reserved [7664660.999826] crond invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [7664661.007403] crond cpuset=/ mems_allowed=0-3 [7664661.011770] CPU: 13 PID: 53969 Comm: crond Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664661.024286] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664661.032117] Call Trace: [7664661.034751] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664661.040065] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664661.045553] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664661.051394] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664661.057151] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664661.063163] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664661.069523] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664661.075616] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664661.081363] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664661.087899] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664661.094432] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664661.100264] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664661.106876] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664661.113144] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664661.119065] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664661.125078] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664661.131010] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664661.136934] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664661.142505] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664661.147816] Mem-Info: [7664661.150293] active_anon:0 inactive_anon:3 isolated_anon:0 active_file:32182 inactive_file:34249 isolated_file:3781 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823991 slab_unreclaimable:62296475 mapped:1635 shmem:0 pagetables:2797 bounce:0 free:590489 free_pcp:62 free_cma:0 [7664661.184645] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664661.226400] lowmem_reserve[]: 0 1418 63868 63868 [7664661.231322] Node 0 DMA32 free:261380kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:984kB inactive_file:3512kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:172kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686312kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:332256 all_unreclaimable? yes [7664661.276367] lowmem_reserve[]: 0 0 62450 62450 [7664661.281037] Node 0 Normal free:508556kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:45488kB inactive_file:45736kB unevictable:168kB isolated(anon):0kB isolated(file):2048kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610908kB slab_unreclaimable:60242932kB kernel_stack:6048kB pagetables:2800kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:289578 all_unreclaimable? yes [7664661.327902] lowmem_reserve[]: 0 0 0 0 [7664661.331872] Node 1 Normal free:525472kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15580kB inactive_file:15572kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:204093 all_unreclaimable? yes [7664661.378729] lowmem_reserve[]: 0 0 0 0 [7664661.382704] Node 2 Normal free:525440kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31228kB inactive_file:37628kB unevictable:8680kB isolated(anon):0kB isolated(file):1536kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5352kB shmem:0kB slab_reclaimable:715128kB slab_unreclaimable:62476112kB kernel_stack:7936kB pagetables:1424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1005958 all_unreclaimable? yes [7664661.429914] lowmem_reserve[]: 0 0 0 0 [7664661.433882] Node 3 Normal free:525160kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:12kB active_file:37152kB inactive_file:39116kB unevictable:840kB isolated(anon):0kB isolated(file):11412kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854188kB slab_unreclaimable:62369188kB kernel_stack:4224kB pagetables:3076kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:690742 all_unreclaimable? yes [7664661.480916] lowmem_reserve[]: 0 0 0 0 [7664661.484884] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664661.499722] Node 0 DMA32: 379*4kB (EM) 395*8kB (UEM) 1211*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261428kB [7664661.516127] Node 0 Normal: 6399*4kB (UEM) 5775*8kB (UEM) 3897*16kB (EM) 4481*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508580kB [7664661.532881] Node 1 Normal: 88026*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525472kB [7664661.545952] Node 2 Normal: 27446*4kB (UEM) 40274*8kB (UEM) 838*16kB (UEM) 1667*32kB (UEM) 421*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525672kB [7664661.561447] Node 3 Normal: 131298*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525192kB [7664661.573885] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664661.582754] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664661.591367] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664661.600240] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664661.608848] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664661.617722] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664661.626329] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664661.635193] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664661.643799] 72959 total pagecache pages [7664661.647815] 0 pages in swap cache [7664661.651312] Swap cache stats: add 21120330, delete 21136302, find 4513364/7609777 [7664661.658966] Free swap = 2030140kB [7664661.662545] Total swap = 4194300kB [7664661.666127] 66993253 pages RAM [7664661.669356] 0 pages HighMem/MovableOnly [7664661.673371] 1101945 pages reserved [7664661.676949] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664661.684998] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664661.693959] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664661.702751] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664661.711328] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664661.719506] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664661.727775] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664661.736388] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664661.744649] [53099] 0 53099 6670 239 18 649 0 smartd [7664661.752830] [53101] 0 53101 1910 64 9 172 0 mdadm [7664661.760924] [53104] 0 53104 74785 315 85 275 0 sssd [7664661.768929] [53106] 0 53106 5514 189 15 219 0 irqbalance [7664661.777451] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664661.786405] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664661.794758] [53139] 997 53139 29446 250 28 128 0 chronyd [7664661.803019] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664661.811286] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664661.819633] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664661.827983] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664661.836854] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664661.844852] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664661.853199] [53863] 0 53863 176656 246 39 1246 0 collectd [7664661.861543] [53969] 0 53969 31572 205 20 168 0 crond [7664661.869631] [54035] 0 54035 27526 164 10 33 0 agetty [7664661.877805] [54036] 0 54036 27526 158 11 33 0 agetty [7664661.885984] [54186] 0 54186 22934 210 46 272 0 master [7664661.894159] [54206] 89 54206 25545 272 47 271 0 qmgr [7664661.902259] [36317] 0 36317 28294 187 14 61 0 bash [7664661.910263] [36328] 0 36328 154746 223 201 98 0 journalctl [7664661.918791] [36329] 0 36329 28177 160 14 55 0 grep [7664661.926889] [117987] 0 117987 283356 282 509 230727 0 python [7664661.935256] [76204] 89 76204 25501 252 46 282 0 pickup [7664661.943434] [97173] 0 97173 48653 264 49 261 0 crond [7664661.951524] [97192] 0 97192 34468 247 25 1344 0 python3 [7664661.959793] [97789] 0 97789 44960 250 44 1248 0 lustre.py [7664661.968233] [97872] 0 97872 48653 263 49 263 0 crond [7664661.976318] [97890] 0 97890 31176 214 18 734 0 python3 [7664661.984579] [98004] 0 98004 31176 224 18 711 0 mdraid.py [7664661.993013] [98530] 0 98530 31341 224 18 642 0 lustre.py [7664662.001452] [98579] 0 98579 48653 266 49 235 0 crond [7664662.009539] [98713] 0 98713 30977 230 16 529 0 python3 [7664662.017799] [98967] 0 98967 30977 227 19 528 0 mdraid.py [7664662.026244] [99292] 0 99292 48653 257 49 261 0 crond [7664662.034334] [99349] 0 99349 4779 194 14 469 0 lustre-oss-expo [7664662.043287] [99450] 0 99450 30913 226 18 446 0 python3 [7664662.051547] [99592] 89 99592 25538 229 47 273 0 cleanup [7664662.059817] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664662.068777] [100032] 0 100032 48653 266 49 240 0 crond [7664662.077047] [100105] 89 100105 25553 264 47 274 0 smtp [7664662.085226] [100203] 0 100203 30816 203 17 333 0 python3 [7664662.093662] [100288] 0 100288 4568 160 14 235 0 lustre.py [7664662.102273] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664662.110018] Killed process 97789 (lustre.py) total-vm:179840kB, anon-rss:0kB, file-rss:1000kB, shmem-rss:0kB [7664662.144458] lustre.py: page allocation failure: order:0, mode:0x200da [7664662.151079] CPU: 35 PID: 97789 Comm: lustre.py Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664662.163938] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664662.171775] Call Trace: [7664662.174414] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664662.179730] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664662.185831] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664662.191413] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664662.197427] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664662.203961] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664662.210487] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664662.216319] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664662.222931] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664662.229199] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664662.235117] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664662.241125] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664662.247057] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664662.252982] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664662.258563] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664662.263881] Mem-Info: [7664662.266362] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:32816 inactive_file:33553 isolated_file:5472 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:823994 slab_unreclaimable:62296464 mapped:1630 shmem:0 pagetables:2772 bounce:0 free:590449 free_pcp:62 free_cma:0 [7664662.300722] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664662.342475] lowmem_reserve[]: 0 1418 63868 63868 [7664662.347403] Node 0 DMA32 free:261252kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1000kB inactive_file:3448kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:172kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686304kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9308 all_unreclaimable? yes [7664662.392534] lowmem_reserve[]: 0 0 62450 62450 [7664662.397201] Node 0 Normal free:508824kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44184kB inactive_file:41768kB unevictable:168kB isolated(anon):0kB isolated(file):5248kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610908kB slab_unreclaimable:60242900kB kernel_stack:5952kB pagetables:2800kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:369749 all_unreclaimable? yes [7664662.444067] lowmem_reserve[]: 0 0 0 0 [7664662.448047] Node 1 Normal free:525472kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15580kB inactive_file:15572kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3776kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:204093 all_unreclaimable? yes [7664662.494912] lowmem_reserve[]: 0 0 0 0 [7664662.498882] Node 2 Normal free:525232kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31760kB inactive_file:33804kB unevictable:8680kB isolated(anon):0kB isolated(file):10496kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715128kB slab_unreclaimable:62476144kB kernel_stack:7936kB pagetables:1424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:533557 all_unreclaimable? yes [7664662.546093] lowmem_reserve[]: 0 0 0 0 [7664662.550071] Node 3 Normal free:525180kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:39944kB inactive_file:43828kB unevictable:840kB isolated(anon):0kB isolated(file):2688kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854204kB slab_unreclaimable:62369188kB kernel_stack:4224kB pagetables:3076kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2020180 all_unreclaimable? yes [7664662.597025] lowmem_reserve[]: 0 0 0 0 [7664662.600992] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664662.615830] Node 0 DMA32: 375*4kB (EM) 396*8kB (UEM) 1211*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261420kB [7664662.632236] Node 0 Normal: 6432*4kB (UEM) 5776*8kB (UEM) 3902*16kB (UEM) 4482*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508832kB [7664662.649077] Node 1 Normal: 88026*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525472kB [7664662.662144] Node 2 Normal: 27412*4kB (UEM) 40262*8kB (EM) 829*16kB (EM) 1665*32kB (EM) 421*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525232kB [7664662.677286] Node 3 Normal: 131295*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525180kB [7664662.689724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664662.698591] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664662.707203] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664662.716071] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664662.724677] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664662.733545] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664662.742149] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664662.751015] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664662.759619] 73026 total pagecache pages [7664662.763634] 0 pages in swap cache [7664662.767126] Swap cache stats: add 21120330, delete 21136302, find 4513364/7609777 [7664662.774779] Free swap = 2030140kB [7664662.778358] Total swap = 4194300kB [7664662.781939] 66993253 pages RAM [7664662.785168] 0 pages HighMem/MovableOnly [7664662.789182] 1101945 pages reserved [7664663.129112] ll_ost_io03_110 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664663.137555] ll_ost_io03_110 cpuset=/ mems_allowed=3 [7664663.142617] CPU: 43 PID: 8770 Comm: ll_ost_io03_110 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664663.155915] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664663.163743] Call Trace: [7664663.166374] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664663.171693] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664663.177182] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664663.183021] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664663.188768] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664663.194780] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664663.201135] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664663.207240] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664663.212991] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664663.219527] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664663.226060] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664663.232238] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664663.238242] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664663.244353] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664663.251241] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664663.257403] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664663.264500] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664663.271066] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664663.277714] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664663.285067] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664663.291809] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664663.299154] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664663.306677] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664663.313591] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664663.320681] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664663.328431] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664663.335687] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664663.343554] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664663.350514] [<ffffffffa00d7c40>] ? wake_up_state+0x20/0x20 [7664663.356297] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664663.362787] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664663.370362] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664663.375413] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664663.381686] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664663.388302] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664663.394575] Mem-Info: [7664663.397035] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32352 inactive_file:33829 isolated_file:4540 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824007 slab_unreclaimable:62296453 mapped:1628 shmem:0 pagetables:2728 bounce:0 free:590474 free_pcp:0 free_cma:0 [7664663.431312] Node 3 Normal free:525148kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:40988kB inactive_file:40932kB unevictable:840kB isolated(anon):0kB isolated(file):896kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854244kB slab_unreclaimable:62369188kB kernel_stack:4224kB pagetables:3072kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:200668 all_unreclaimable? yes [7664663.478091] lowmem_reserve[]: 0 0 0 0 [7664663.482066] Node 3 Normal: 131498*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525992kB [7664663.494505] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664663.503382] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664663.511995] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664663.520867] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664663.529474] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664663.538339] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664663.546947] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664663.555812] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664663.564419] 72421 total pagecache pages [7664663.568432] 0 pages in swap cache [7664663.571923] Swap cache stats: add 21120342, delete 21136314, find 4513365/7609779 [7664663.579575] Free swap = 2035004kB [7664663.583154] Total swap = 4194300kB [7664663.586735] 66993253 pages RAM [7664663.589965] 0 pages HighMem/MovableOnly [7664663.593979] 1101945 pages reserved [7664663.597560] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664663.605605] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664663.614565] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664663.623349] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664663.631948] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664663.640123] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664663.648386] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664663.657002] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664663.665266] [53099] 0 53099 6670 239 18 649 0 smartd [7664663.673439] [53101] 0 53101 1910 64 9 172 0 mdadm [7664663.681523] [53104] 0 53104 74785 315 85 275 0 sssd [7664663.689525] [53106] 0 53106 5514 189 15 219 0 irqbalance [7664663.698051] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664663.707005] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664663.715351] [53139] 997 53139 29446 250 28 128 0 chronyd [7664663.723610] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664663.731869] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664663.740217] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664663.748571] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664663.757440] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664663.765447] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664663.773799] [53863] 0 53863 176656 246 39 1246 0 collectd [7664663.782149] [53969] 0 53969 31572 205 20 168 0 crond [7664663.790241] [54035] 0 54035 27526 164 10 33 0 agetty [7664663.798421] [54036] 0 54036 27526 158 11 33 0 agetty [7664663.806596] [54186] 0 54186 22934 210 46 272 0 master [7664663.814775] [54206] 89 54206 25545 272 47 271 0 qmgr [7664663.822896] [36317] 0 36317 28294 187 14 61 0 bash [7664663.830897] [36328] 0 36328 154746 223 201 98 0 journalctl [7664663.839425] [36329] 0 36329 28177 160 14 55 0 grep [7664663.847525] [117987] 0 117987 283356 282 509 230727 0 python [7664663.855891] [76204] 89 76204 25501 252 46 282 0 pickup [7664663.864070] [97173] 0 97173 48653 264 49 261 0 crond [7664663.872161] [97192] 0 97192 34468 247 25 1344 0 python3 [7664663.880427] [97872] 0 97872 48653 263 49 263 0 crond [7664663.888514] [97890] 0 97890 31176 214 18 734 0 python3 [7664663.896781] [98004] 0 98004 31176 224 18 711 0 mdraid.py [7664663.905214] [98530] 0 98530 31341 224 18 642 0 lustre.py [7664663.913649] [98579] 0 98579 48653 266 49 235 0 crond [7664663.921741] [98713] 0 98713 30977 230 16 529 0 python3 [7664663.930003] [98967] 0 98967 30977 227 19 528 0 mdraid.py [7664663.938442] [99292] 0 99292 48653 257 49 261 0 crond [7664663.946529] [99349] 0 99349 4779 194 14 469 0 lustre-oss-expo [7664663.955481] [99450] 0 99450 30913 226 18 446 0 python3 [7664663.963745] [99592] 89 99592 25538 229 47 273 0 cleanup [7664663.972008] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664663.980963] [100032] 0 100032 48653 266 49 240 0 crond [7664663.989231] [100105] 89 100105 25553 264 47 274 0 smtp [7664663.997411] [100203] 0 100203 30816 203 17 333 0 python3 [7664664.005844] [100288] 0 100288 4568 160 14 235 0 lustre.py [7664664.014450] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664664.022194] Killed process 98004 (mdraid.py) total-vm:124704kB, anon-rss:0kB, file-rss:896kB, shmem-rss:0kB [7664664.055065] mdraid.py: page allocation failure: order:0, mode:0x200da [7664664.061689] CPU: 3 PID: 98004 Comm: mdraid.py Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664664.074459] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664664.082297] Call Trace: [7664664.084933] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664664.090251] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664664.096354] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664664.101944] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664664.107965] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664664.114502] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664664.121033] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664664.126867] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664664.133479] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664664.139744] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664664.145664] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664664.151672] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664664.157591] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664664.163509] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664664.169081] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664664.174397] Mem-Info: [7664664.176880] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32448 inactive_file:34176 isolated_file:3580 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824007 slab_unreclaimable:62296448 mapped:1628 shmem:0 pagetables:2728 bounce:0 free:590711 free_pcp:272 free_cma:0 [7664664.211333] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664664.253083] lowmem_reserve[]: 0 1418 63868 63868 [7664664.258014] Node 0 DMA32 free:261340kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1024kB inactive_file:3392kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:164kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686300kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9083 all_unreclaimable? no [7664664.302892] lowmem_reserve[]: 0 0 62450 62450 [7664664.307562] Node 0 Normal free:508432kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:42476kB inactive_file:42020kB unevictable:168kB isolated(anon):0kB isolated(file):8544kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610916kB slab_unreclaimable:60242708kB kernel_stack:6128kB pagetables:2688kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1408 all_unreclaimable? no [7664664.354170] lowmem_reserve[]: 0 0 0 0 [7664664.358140] Node 1 Normal free:525504kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15540kB inactive_file:15584kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3772kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:52787 all_unreclaimable? yes [7664664.404914] lowmem_reserve[]: 0 0 0 0 [7664664.408882] Node 2 Normal free:525128kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31912kB inactive_file:35548kB unevictable:8680kB isolated(anon):0kB isolated(file):3200kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715168kB slab_unreclaimable:62476128kB kernel_stack:7936kB pagetables:1368kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:388709 all_unreclaimable? no [7664664.455921] lowmem_reserve[]: 0 0 0 0 [7664664.459895] Node 3 Normal free:525308kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:41124kB inactive_file:42136kB unevictable:840kB isolated(anon):0kB isolated(file):384kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854256kB slab_unreclaimable:62369164kB kernel_stack:4224kB pagetables:3072kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:322266 all_unreclaimable? no [7664664.506583] lowmem_reserve[]: 0 0 0 0 [7664664.510550] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664664.525388] Node 0 DMA32: 370*4kB (UEM) 395*8kB (UEM) 1211*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261392kB [7664664.541891] Node 0 Normal: 6227*4kB (UEM) 5742*8kB (UEM) 3989*16kB (UEM) 4498*32kB (UEM) 2053*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 510092kB [7664664.558729] Node 1 Normal: 88041*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525532kB [7664664.571799] Node 2 Normal: 27460*4kB (UEM) 40263*8kB (UEM) 874*16kB (UEM) 1692*32kB (UEM) 418*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 526824kB [7664664.587199] Node 3 Normal: 131471*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525884kB [7664664.599638] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664664.608506] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664664.617110] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664664.625976] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664664.634582] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664664.643449] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664664.652056] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664664.660923] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664664.669535] 72861 total pagecache pages [7664664.673548] 0 pages in swap cache [7664664.677040] Swap cache stats: add 21120342, delete 21136314, find 4513365/7609779 [7664664.684692] Free swap = 2035004kB [7664664.688274] Total swap = 4194300kB [7664664.691853] 66993253 pages RAM [7664664.695083] 0 pages HighMem/MovableOnly [7664664.699095] 1101945 pages reserved [7664665.544714] ll_ost_io00_029 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664665.553167] ll_ost_io00_029 cpuset=/ mems_allowed=0 [7664665.558234] CPU: 28 PID: 123071 Comm: ll_ost_io00_029 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664665.571696] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664665.579520] Call Trace: [7664665.582156] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664665.587471] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664665.592965] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664665.598808] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664665.604562] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664665.610568] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664665.616927] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664665.623019] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664665.628766] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664665.635294] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664665.641827] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664665.648007] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664665.654012] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664665.660121] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664665.667005] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664665.674232] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664665.680389] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664665.687048] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664665.694134] [<ffffffffc11dcbe7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [7664665.700795] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664665.707845] [<ffffffffa00dca58>] ? __enqueue_entity+0x78/0x80 [7664665.713865] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664665.719396] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664665.726495] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664665.734246] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664665.741506] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664665.749373] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664665.756368] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664665.764316] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664665.771570] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664665.778050] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664665.785619] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664665.790682] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664665.796963] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664665.803579] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664665.809850] Mem-Info: [7664665.812317] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32796 inactive_file:34009 isolated_file:4828 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824028 slab_unreclaimable:62296402 mapped:1628 shmem:0 pagetables:2728 bounce:0 free:590442 free_pcp:0 free_cma:0 [7664665.846584] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664665.888336] lowmem_reserve[]: 0 1418 63868 63868 [7664665.893258] Node 0 DMA32 free:261336kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1032kB inactive_file:3584kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:164kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686300kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:271370 all_unreclaimable? yes [7664665.938391] lowmem_reserve[]: 0 0 62450 62450 [7664665.943061] Node 0 Normal free:508740kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:36936kB inactive_file:38016kB unevictable:168kB isolated(anon):0kB isolated(file):14960kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610944kB slab_unreclaimable:60242704kB kernel_stack:6240kB pagetables:2688kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1099072 all_unreclaimable? no [7664665.990018] lowmem_reserve[]: 0 0 0 0 [7664665.993993] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664666.008829] Node 0 DMA32: 370*4kB (UEM) 395*8kB (UEM) 1211*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261392kB [7664666.025320] Node 0 Normal: 6293*4kB (UEM) 5742*8kB (UEM) 3970*16kB (UEM) 4495*32kB (UEM) 2053*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 509956kB [7664666.042171] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664666.051035] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664666.059644] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664666.068510] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664666.077115] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664666.085982] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664666.094586] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664666.103453] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664666.112059] 72816 total pagecache pages [7664666.116073] 0 pages in swap cache [7664666.119566] Swap cache stats: add 21120351, delete 21136323, find 4513366/7609781 [7664666.127216] Free swap = 2037812kB [7664666.130795] Total swap = 4194300kB [7664666.134376] 66993253 pages RAM [7664666.137608] 0 pages HighMem/MovableOnly [7664666.141620] 1101945 pages reserved [7664666.145200] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664666.153246] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664666.162201] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664666.170994] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664666.179588] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664666.187767] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664666.196034] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664666.204640] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664666.212909] [53099] 0 53099 6670 239 18 649 0 smartd [7664666.221091] [53101] 0 53101 1910 64 9 172 0 mdadm [7664666.229183] [53104] 0 53104 74785 315 85 275 0 sssd [7664666.237183] [53106] 0 53106 5514 189 15 219 0 irqbalance [7664666.245702] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664666.254654] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664666.263002] [53139] 997 53139 29446 250 28 128 0 chronyd [7664666.271261] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664666.279520] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664666.287866] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664666.296214] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664666.305090] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664666.313102] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664666.321455] [53863] 0 53863 176656 246 39 1246 0 collectd [7664666.329807] [53969] 0 53969 31572 205 20 168 0 crond [7664666.337905] [54035] 0 54035 27526 164 10 33 0 agetty [7664666.346083] [54036] 0 54036 27526 158 11 33 0 agetty [7664666.354269] [54186] 0 54186 22934 210 46 272 0 master [7664666.362449] [54206] 89 54206 25545 272 47 271 0 qmgr [7664666.370573] [36317] 0 36317 28294 187 14 61 0 bash [7664666.378573] [36328] 0 36328 154746 223 201 98 0 journalctl [7664666.387092] [36329] 0 36329 28177 160 14 55 0 grep [7664666.395181] [117987] 0 117987 283356 282 509 230727 0 python [7664666.403551] [76204] 89 76204 25501 252 46 282 0 pickup [7664666.411728] [97173] 0 97173 48653 264 49 261 0 crond [7664666.419818] [97192] 0 97192 34468 247 25 1344 0 python3 [7664666.428077] [97872] 0 97872 48653 263 49 263 0 crond [7664666.436166] [97890] 0 97890 31176 214 18 734 0 python3 [7664666.444435] [98530] 0 98530 31341 224 18 642 0 lustre.py [7664666.452872] [98579] 0 98579 48653 266 49 235 0 crond [7664666.460959] [98713] 0 98713 30977 230 16 529 0 python3 [7664666.469221] [98967] 0 98967 30977 227 19 528 0 mdraid.py [7664666.477660] [99292] 0 99292 48653 257 49 261 0 crond [7664666.485746] [99349] 0 99349 4779 194 14 469 0 lustre-oss-expo [7664666.494699] [99450] 0 99450 30913 226 18 446 0 python3 [7664666.502959] [99592] 89 99592 25538 229 47 273 0 cleanup [7664666.511219] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664666.520181] [100032] 0 100032 48653 266 49 240 0 crond [7664666.528448] [100105] 89 100105 25553 264 47 274 0 smtp [7664666.536619] [100203] 0 100203 30816 203 17 333 0 python3 [7664666.545052] [100288] 0 100288 4568 160 14 235 0 lustre.py [7664666.553661] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664666.561413] Killed process 98530 (lustre.py) total-vm:125364kB, anon-rss:0kB, file-rss:896kB, shmem-rss:0kB [7664666.661186] lustre.py: page allocation failure: order:0, mode:0x200da [7664666.667804] CPU: 12 PID: 98530 Comm: lustre.py Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664666.680671] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664666.688513] Call Trace: [7664666.691147] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664666.696466] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664666.702565] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664666.708147] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664666.714159] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664666.720686] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664666.727212] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664666.733054] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664666.739675] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664666.745938] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664666.751861] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664666.757875] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664666.763794] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664666.769714] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664666.775286] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664666.780597] Mem-Info: [7664666.783073] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32430 inactive_file:34395 isolated_file:4928 unevictable:9044 dirty:9 writeback:0 unstable:0 slab_reclaimable:824028 slab_unreclaimable:62296425 mapped:1628 shmem:0 pagetables:2710 bounce:0 free:590587 free_pcp:0 free_cma:0 [7664666.817339] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664666.859098] lowmem_reserve[]: 0 1418 63868 63868 [7664666.864034] Node 0 DMA32 free:261336kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1032kB inactive_file:3588kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:164kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686300kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:276266 all_unreclaimable? yes [7664666.909159] lowmem_reserve[]: 0 0 62450 62450 [7664666.913823] Node 0 Normal free:508708kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:39628kB inactive_file:38440kB unevictable:168kB isolated(anon):0kB isolated(file):16128kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610944kB slab_unreclaimable:60242800kB kernel_stack:6560kB pagetables:2684kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:872275 all_unreclaimable? yes [7664666.960778] lowmem_reserve[]: 0 0 0 0 [7664666.964750] Node 1 Normal free:525532kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15584kB inactive_file:15512kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:12kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3772kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:52787 all_unreclaimable? yes [7664667.011616] lowmem_reserve[]: 0 0 0 0 [7664667.015591] Node 2 Normal free:525508kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:30544kB inactive_file:35552kB unevictable:8680kB isolated(anon):0kB isolated(file):4224kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:36kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715172kB slab_unreclaimable:62476116kB kernel_stack:7936kB pagetables:1300kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:165945 all_unreclaimable? no [7664667.062719] lowmem_reserve[]: 0 0 0 0 [7664667.066690] Node 3 Normal free:525420kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:41792kB inactive_file:43096kB unevictable:840kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854256kB slab_unreclaimable:62369164kB kernel_stack:4224kB pagetables:3072kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:267246 all_unreclaimable? yes [7664667.113466] lowmem_reserve[]: 0 0 0 0 [7664667.117437] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664667.132277] Node 0 DMA32: 371*4kB (UEM) 397*8kB (UEM) 1212*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261428kB [7664667.148769] Node 0 Normal: 6079*4kB (UEM) 5734*8kB (UEM) 3927*16kB (UEM) 4492*32kB (UEM) 2053*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508252kB [7664667.165608] Node 1 Normal: 88041*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525532kB [7664667.178678] Node 2 Normal: 27128*4kB (EM) 40110*8kB (UEM) 874*16kB (UEM) 1688*32kB (UEM) 418*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524144kB [7664667.193993] Node 3 Normal: 131420*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525680kB [7664667.206430] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664667.215299] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664667.223910] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664667.232776] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664667.241385] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664667.250251] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664667.258859] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664667.267733] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664667.276349] 73167 total pagecache pages [7664667.280368] 0 pages in swap cache [7664667.283859] Swap cache stats: add 21120351, delete 21136323, find 4513366/7609781 [7664667.291512] Free swap = 2037812kB [7664667.295090] Total swap = 4194300kB [7664667.298676] 66993253 pages RAM [7664667.301910] 0 pages HighMem/MovableOnly [7664667.305924] 1101945 pages reserved [7664667.629418] ll_ost_io01_096 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664667.637862] ll_ost_io01_096 cpuset=/ mems_allowed=1 [7664667.642920] CPU: 33 PID: 27189 Comm: ll_ost_io01_096 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664667.656296] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664667.664123] Call Trace: [7664667.666758] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664667.672072] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664667.677560] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664667.683400] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664667.689147] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664667.695160] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664667.701512] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664667.707604] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664667.713350] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664667.719877] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664667.726405] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664667.732592] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664667.738599] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664667.744705] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664667.751589] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664667.757747] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664667.764846] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664667.771404] [<ffffffffc11966b2>] ? ldlm_res_hop_get_locked+0x12/0x20 [ptlrpc] [7664667.778814] [<ffffffffc0a13297>] ? cfs_hash_bd_lookup_intent+0xf7/0x170 [libcfs] [7664667.786505] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664667.793853] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664667.800596] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664667.807943] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664667.815465] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664667.822386] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664667.829473] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664667.837224] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664667.844482] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664667.852352] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664667.859351] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664667.867299] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664667.874553] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664667.881027] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664667.888596] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664667.893655] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664667.899925] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664667.906544] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664667.912818] Mem-Info: [7664667.915279] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:33575 inactive_file:34350 isolated_file:3200 unevictable:9044 dirty:9 writeback:0 unstable:0 slab_reclaimable:824055 slab_unreclaimable:62296433 mapped:1627 shmem:0 pagetables:2692 bounce:0 free:590125 free_pcp:0 free_cma:0 [7664667.949550] Node 1 Normal free:525532kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15584kB inactive_file:15512kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:12kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3772kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:52787 all_unreclaimable? yes [7664667.996417] lowmem_reserve[]: 0 0 0 0 [7664668.000381] Node 1 Normal: 88041*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525532kB [7664668.013457] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664668.022329] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664668.030938] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664668.039813] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664668.048425] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664668.057294] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664668.065907] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664668.074772] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664668.083378] 73029 total pagecache pages [7664668.087389] 7 pages in swap cache [7664668.090885] Swap cache stats: add 21120389, delete 21136354, find 4513370/7609789 [7664668.098536] Free swap = 2040356kB [7664668.102115] Total swap = 4194300kB [7664668.105694] 66993253 pages RAM [7664668.108929] 0 pages HighMem/MovableOnly [7664668.112940] 1101945 pages reserved [7664668.116519] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664668.124571] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664668.133526] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664668.142317] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664668.150903] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664668.159086] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664668.167352] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664668.175960] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664668.184227] [53099] 0 53099 6670 239 18 649 0 smartd [7664668.192408] [53101] 0 53101 1910 64 9 172 0 mdadm [7664668.200493] [53104] 0 53104 74785 315 85 275 0 sssd [7664668.208494] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664668.217014] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664668.225976] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664668.234331] [53139] 997 53139 29446 250 28 128 0 chronyd [7664668.242597] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664668.250856] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664668.259202] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664668.267550] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664668.276425] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664668.284434] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664668.292792] [53863] 0 53863 176656 246 39 1246 0 collectd [7664668.301142] [53969] 0 53969 31572 205 20 168 0 crond [7664668.309240] [54035] 0 54035 27526 164 10 33 0 agetty [7664668.317418] [54036] 0 54036 27526 158 11 33 0 agetty [7664668.325602] [54186] 0 54186 22934 210 46 272 0 master [7664668.333781] [54206] 89 54206 25545 272 47 271 0 qmgr [7664668.341883] [36317] 0 36317 28294 187 14 61 0 bash [7664668.349884] [36328] 0 36328 154746 223 201 98 0 journalctl [7664668.358414] [36329] 0 36329 28177 160 14 55 0 grep [7664668.366509] [117987] 0 117987 283356 282 509 230727 0 python [7664668.374879] [76204] 89 76204 25501 252 46 282 0 pickup [7664668.383066] [97173] 0 97173 48653 264 49 261 0 crond [7664668.391153] [97192] 0 97192 34468 247 25 1344 0 python3 [7664668.399414] [97872] 0 97872 48653 263 49 263 0 crond [7664668.407503] [97890] 0 97890 31176 215 18 701 0 python3 [7664668.415774] [98579] 0 98579 48653 266 49 235 0 crond [7664668.423863] [98713] 0 98713 30977 230 16 529 0 python3 [7664668.432133] [98967] 0 98967 30977 227 19 528 0 mdraid.py [7664668.440572] [99292] 0 99292 48653 257 49 261 0 crond [7664668.448661] [99349] 0 99349 4779 194 14 469 0 lustre-oss-expo [7664668.457624] [99450] 0 99450 30913 226 18 446 0 python3 [7664668.465888] [99592] 89 99592 25538 229 47 273 0 cleanup [7664668.474149] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664668.483108] [100032] 0 100032 48653 266 49 240 0 crond [7664668.491369] [100105] 89 100105 25553 264 47 274 0 smtp [7664668.499549] [100203] 0 100203 30816 203 17 333 0 python3 [7664668.507986] [100288] 0 100288 4568 160 14 235 0 lustre.py [7664668.516598] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664668.524350] Killed process 98967 (mdraid.py) total-vm:123908kB, anon-rss:0kB, file-rss:908kB, shmem-rss:0kB [7664668.563588] mdraid.py: page allocation failure: order:0, mode:0x200da [7664668.570211] CPU: 16 PID: 98967 Comm: mdraid.py Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664668.583072] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664668.590910] Call Trace: [7664668.593554] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664668.598876] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664668.604974] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664668.610558] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664668.616575] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664668.623105] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664668.629640] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664668.635481] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664668.642099] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664668.648366] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664668.654287] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664668.660293] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664668.666213] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664668.672141] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664668.677722] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664668.683034] Mem-Info: [7664668.685507] active_anon:0 inactive_anon:4 isolated_anon:0 active_file:33325 inactive_file:35903 isolated_file:1920 unevictable:9044 dirty:9 writeback:0 unstable:0 slab_reclaimable:824056 slab_unreclaimable:62296402 mapped:1628 shmem:0 pagetables:2692 bounce:0 free:590365 free_pcp:0 free_cma:0 [7664668.719778] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664668.761539] lowmem_reserve[]: 0 1418 63868 63868 [7664668.766477] Node 0 DMA32 free:261344kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1036kB inactive_file:3540kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:160kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686248kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10496 all_unreclaimable? yes [7664668.811519] lowmem_reserve[]: 0 0 62450 62450 [7664668.816181] Node 0 Normal free:508532kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44380kB inactive_file:45208kB unevictable:168kB isolated(anon):0kB isolated(file):4224kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610944kB slab_unreclaimable:60242840kB kernel_stack:6048kB pagetables:2632kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1947553 all_unreclaimable? yes [7664668.863138] lowmem_reserve[]: 0 0 0 0 [7664668.867118] Node 1 Normal free:525532kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15584kB inactive_file:15512kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:12kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3772kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:52787 all_unreclaimable? yes [7664668.913977] lowmem_reserve[]: 0 0 0 0 [7664668.917945] Node 2 Normal free:525068kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31116kB inactive_file:38308kB unevictable:8680kB isolated(anon):0kB isolated(file):1536kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:36kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715256kB slab_unreclaimable:62476028kB kernel_stack:7936kB pagetables:1292kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:263825 all_unreclaimable? no [7664668.965076] lowmem_reserve[]: 0 0 0 0 [7664668.969046] Node 3 Normal free:525080kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:16kB active_file:37244kB inactive_file:37872kB unevictable:840kB isolated(anon):0kB isolated(file):7040kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:852kB shmem:0kB slab_reclaimable:854284kB slab_unreclaimable:62369172kB kernel_stack:4224kB pagetables:3060kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2148692 all_unreclaimable? no [7664669.015994] lowmem_reserve[]: 0 0 0 0 [7664669.019959] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664669.034798] Node 0 DMA32: 370*4kB (UEM) 397*8kB (UEM) 1214*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261456kB [7664669.051292] Node 0 Normal: 6219*4kB (UEM) 5737*8kB (UEM) 3923*16kB (UEM) 4491*32kB (UEM) 2053*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508740kB [7664669.068132] Node 1 Normal: 88041*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525532kB [7664669.081201] Node 2 Normal: 27389*4kB (UEM) 40197*8kB (UEM) 866*16kB (UEM) 1679*32kB (UEM) 416*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525340kB [7664669.096602] Node 3 Normal: 131249*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524996kB [7664669.109039] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664669.117908] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664669.126522] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664669.135395] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664669.144002] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664669.152868] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664669.161477] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664669.170348] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664669.178955] 73025 total pagecache pages [7664669.182969] 0 pages in swap cache [7664669.186470] Swap cache stats: add 21120390, delete 21136362, find 4513370/7609789 [7664669.194120] Free swap = 2040356kB [7664669.197700] Total swap = 4194300kB [7664669.201282] 66993253 pages RAM [7664669.204511] 0 pages HighMem/MovableOnly [7664669.208524] 1101945 pages reserved [7664669.386713] ll_ost_io03_071 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664669.395183] ll_ost_io03_071 cpuset=/ mems_allowed=3 [7664669.400254] CPU: 47 PID: 90679 Comm: ll_ost_io03_071 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664669.413642] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664669.421469] Call Trace: [7664669.424102] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664669.429419] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664669.434914] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664669.440754] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664669.446511] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664669.452527] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664669.458890] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664669.464992] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664669.470744] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664669.477275] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664669.483803] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664669.489987] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664669.495998] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664669.502111] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664669.509000] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664669.515170] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664669.522284] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664669.528818] [<ffffffffa021ab4e>] ? kmalloc_order_trace+0x2e/0xa0 [7664669.535101] [<ffffffffa021e721>] ? __kmalloc+0x211/0x230 [7664669.540719] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664669.548078] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664669.554834] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664669.562194] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664669.569714] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664669.576655] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664669.583758] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664669.591515] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664669.598802] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664669.606669] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664669.613644] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664669.619090] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664669.625573] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664669.633152] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664669.638209] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664669.644486] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664669.651109] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664669.657377] Mem-Info: [7664669.659838] active_anon:0 inactive_anon:8 isolated_anon:0 active_file:33079 inactive_file:34284 isolated_file:5344 unevictable:9044 dirty:9 writeback:0 unstable:0 slab_reclaimable:824056 slab_unreclaimable:62296402 mapped:1628 shmem:0 pagetables:2673 bounce:0 free:590386 free_pcp:0 free_cma:0 [7664669.694145] Node 3 Normal free:525080kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:16kB active_file:39404kB inactive_file:43096kB unevictable:840kB isolated(anon):0kB isolated(file):9472kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:852kB shmem:0kB slab_reclaimable:854284kB slab_unreclaimable:62369172kB kernel_stack:4224kB pagetables:3044kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:515512 all_unreclaimable? yes [7664669.741119] lowmem_reserve[]: 0 0 0 0 [7664669.745097] Node 3 Normal: 131253*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525012kB [7664669.757549] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664669.766425] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664669.775040] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664669.783914] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664669.792529] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664669.801403] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664669.810007] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664669.818878] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664669.827490] 73029 total pagecache pages [7664669.831510] 0 pages in swap cache [7664669.835013] Swap cache stats: add 21120396, delete 21136368, find 4513371/7609791 [7664669.842664] Free swap = 2042404kB [7664669.846244] Total swap = 4194300kB [7664669.849825] 66993253 pages RAM [7664669.853054] 0 pages HighMem/MovableOnly [7664669.857070] 1101945 pages reserved [7664669.860649] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664669.868700] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664669.877662] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664669.886457] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664669.895057] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664669.903241] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664669.911508] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664669.920123] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664669.928390] [53099] 0 53099 6670 239 18 649 0 smartd [7664669.936570] [53101] 0 53101 1910 64 9 172 0 mdadm [7664669.944660] [53104] 0 53104 74785 315 85 275 0 sssd [7664669.952670] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664669.961201] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664669.970166] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664669.978521] [53139] 997 53139 29446 250 28 128 0 chronyd [7664669.986789] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664669.995058] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664670.003410] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664670.011767] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664670.020643] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664670.028653] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664670.037002] [53863] 0 53863 176656 246 39 1246 0 collectd [7664670.045360] [53969] 0 53969 31572 205 20 168 0 crond [7664670.053453] [54035] 0 54035 27526 164 10 33 0 agetty [7664670.061633] [54036] 0 54036 27526 158 11 33 0 agetty [7664670.069821] [54186] 0 54186 22934 210 46 272 0 master [7664670.077996] [54206] 89 54206 25545 272 47 271 0 qmgr [7664670.086147] [36317] 0 36317 28294 187 14 61 0 bash [7664670.094151] [36328] 0 36328 154746 223 201 98 0 journalctl [7664670.102678] [36329] 0 36329 28177 160 14 55 0 grep [7664670.110777] [117987] 0 117987 283356 282 509 230727 0 python [7664670.119148] [76204] 89 76204 25501 252 46 282 0 pickup [7664670.127334] [97173] 0 97173 48653 264 49 261 0 crond [7664670.135423] [97192] 0 97192 34468 247 25 1344 0 python3 [7664670.143694] [97872] 0 97872 48653 263 49 263 0 crond [7664670.151788] [97890] 0 97890 31176 215 18 701 0 python3 [7664670.160070] [98579] 0 98579 48653 266 49 235 0 crond [7664670.168169] [98713] 0 98713 30977 230 16 529 0 python3 [7664670.176448] [99292] 0 99292 48653 257 49 261 0 crond [7664670.184539] [99349] 0 99349 4779 194 14 469 0 lustre-oss-expo [7664670.193499] [99450] 0 99450 30913 226 18 446 0 python3 [7664670.201775] [99592] 89 99592 25538 229 47 273 0 cleanup [7664670.210055] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664670.219011] [100032] 0 100032 48653 266 49 240 0 crond [7664670.227274] [100105] 89 100105 25553 264 47 274 0 smtp [7664670.235459] [100203] 0 100203 30816 203 17 333 0 python3 [7664670.243899] [100288] 0 100288 4568 160 14 235 0 lustre.py [7664670.252512] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664670.260269] Killed process 99349 (lustre-oss-expo) total-vm:19116kB, anon-rss:0kB, file-rss:776kB, shmem-rss:0kB [7664670.464170] ll_ost_io02_000 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664670.472632] ll_ost_io02_000 cpuset=/ mems_allowed=2 [7664670.477700] CPU: 14 PID: 101256 Comm: ll_ost_io02_000 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664670.491198] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664670.499025] Call Trace: [7664670.501662] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664670.506976] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664670.512472] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664670.518313] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664670.524065] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664670.530071] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664670.536424] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664670.542518] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664670.548271] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664670.554797] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664670.561325] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664670.567511] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664670.573516] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664670.579625] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664670.586508] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664670.593734] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664670.599901] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664670.606523] [<ffffffffa021bd89>] ? ___slab_alloc+0x209/0x4f0 [7664670.612483] [<ffffffffc11dcbe7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [7664670.619134] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664670.626187] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664670.631941] [<ffffffffa00ddd9e>] ? account_entity_dequeue+0xae/0xd0 [7664670.638466] [<ffffffffa00e192c>] ? dequeue_entity+0x11c/0x5e0 [7664670.644473] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664670.650003] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664670.657099] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664670.664848] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664670.672105] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664670.679973] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664670.686941] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664670.692383] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664670.698864] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664670.706433] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664670.711492] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664670.717761] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664670.724380] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664670.730646] Mem-Info: [7664670.733104] active_anon:0 inactive_anon:12 isolated_anon:0 active_file:32083 inactive_file:35695 isolated_file:4187 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824056 slab_unreclaimable:62296402 mapped:1627 shmem:0 pagetables:2659 bounce:0 free:590537 free_pcp:0 free_cma:0 [7664670.767460] Node 2 Normal free:525388kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:32kB active_file:29828kB inactive_file:36012kB unevictable:8680kB isolated(anon):0kB isolated(file):7808kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715256kB slab_unreclaimable:62476028kB kernel_stack:7936kB pagetables:1240kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:784850 all_unreclaimable? yes [7664670.814669] lowmem_reserve[]: 0 0 0 0 [7664670.818637] Node 2 Normal: 27503*4kB (UEM) 40283*8kB (UEM) 874*16kB (UEM) 1665*32kB (UEM) 415*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 526100kB [7664670.834127] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664670.842995] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664670.851610] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664670.860483] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664670.869089] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664670.877953] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664670.886562] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664670.895426] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664670.904039] 72834 total pagecache pages [7664670.908063] 0 pages in swap cache [7664670.911561] Swap cache stats: add 21120405, delete 21136377, find 4513373/7609793 [7664670.919215] Free swap = 2044188kB [7664670.922795] Total swap = 4194300kB [7664670.926375] 66993253 pages RAM [7664670.929606] 0 pages HighMem/MovableOnly [7664670.933620] 1101945 pages reserved [7664670.937198] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664670.945245] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664670.954206] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664670.962996] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664670.971595] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664670.979775] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664670.988041] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664670.996647] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664671.004909] [53099] 0 53099 6670 239 18 649 0 smartd [7664671.013089] [53101] 0 53101 1910 64 9 172 0 mdadm [7664671.021183] [53104] 0 53104 74785 315 85 275 0 sssd [7664671.029191] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664671.037711] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664671.046673] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664671.055027] [53139] 997 53139 29446 250 28 128 0 chronyd [7664671.063287] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664671.071555] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664671.079910] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664671.088263] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664671.097132] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664671.105138] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664671.113487] [53863] 0 53863 176656 246 39 1246 0 collectd [7664671.121912] [53969] 0 53969 31572 205 20 168 0 crond [7664671.130056] [54035] 0 54035 27526 164 10 33 0 agetty [7664671.138336] [54036] 0 54036 27526 158 11 33 0 agetty [7664671.146535] [54186] 0 54186 22934 210 46 272 0 master [7664671.154755] [54206] 89 54206 25545 272 47 271 0 qmgr [7664671.162938] [36317] 0 36317 28294 187 14 61 0 bash [7664671.170961] [36328] 0 36328 154746 223 201 98 0 journalctl [7664671.179480] [36329] 0 36329 28177 160 14 55 0 grep [7664671.187589] [117987] 0 117987 283356 282 509 230727 0 python [7664671.195957] [76204] 89 76204 25501 252 46 282 0 pickup [7664671.204137] [97173] 0 97173 48653 264 49 261 0 crond [7664671.212226] [97192] 0 97192 34468 247 25 1344 0 python3 [7664671.220496] [97872] 0 97872 48653 263 49 263 0 crond [7664671.228590] [97890] 0 97890 31176 215 18 701 0 python3 [7664671.236864] [98579] 0 98579 48653 266 49 235 0 crond [7664671.244953] [98713] 0 98713 30977 230 16 529 0 python3 [7664671.253220] [99292] 0 99292 48653 257 49 261 0 crond [7664671.261314] [99450] 0 99450 30913 226 18 446 0 python3 [7664671.269581] [99592] 89 99592 25538 229 47 273 0 cleanup [7664671.277843] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664671.286803] [100032] 0 100032 48653 266 49 240 0 crond [7664671.295079] [100105] 89 100105 25553 264 47 274 0 smtp [7664671.303264] [100203] 0 100203 30816 203 17 333 0 python3 [7664671.311734] [100288] 0 100288 4568 160 14 235 0 lustre.py [7664671.320341] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664671.328089] Killed process 100288 (lustre.py) total-vm:18272kB, anon-rss:0kB, file-rss:640kB, shmem-rss:0kB [7664671.400244] lustre.py: page allocation failure: order:0, mode:0x200da [7664671.406866] CPU: 35 PID: 100288 Comm: lustre.py Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664671.419810] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664671.427636] Call Trace: [7664671.430267] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664671.435585] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664671.441678] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664671.447260] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664671.453274] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664671.459808] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664671.466336] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664671.472175] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664671.478788] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664671.485053] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664671.490974] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664671.496979] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664671.502901] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664671.508827] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664671.514403] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664671.519723] Mem-Info: [7664671.522205] active_anon:0 inactive_anon:4 isolated_anon:0 active_file:32568 inactive_file:35967 isolated_file:4224 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824056 slab_unreclaimable:62296396 mapped:1627 shmem:0 pagetables:2659 bounce:0 free:590301 free_pcp:0 free_cma:0 [7664671.556473] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664671.598214] lowmem_reserve[]: 0 1418 63868 63868 [7664671.603139] Node 0 DMA32 free:261296kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1040kB inactive_file:3360kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:160kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686248kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11575 all_unreclaimable? yes [7664671.648180] lowmem_reserve[]: 0 0 62450 62450 [7664671.652844] Node 0 Normal free:508180kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:16kB active_file:44696kB inactive_file:45544kB unevictable:168kB isolated(anon):0kB isolated(file):4736kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610944kB slab_unreclaimable:60242824kB kernel_stack:6320kB pagetables:2632kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:843430 all_unreclaimable? yes [7664671.699790] lowmem_reserve[]: 0 0 0 0 [7664671.703760] Node 1 Normal free:525572kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15712kB inactive_file:15388kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411320kB kernel_stack:20816kB pagetables:3764kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:240998 all_unreclaimable? yes [7664671.750621] lowmem_reserve[]: 0 0 0 0 [7664671.754591] Node 2 Normal free:525068kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:29992kB inactive_file:34336kB unevictable:8680kB isolated(anon):0kB isolated(file):7296kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715256kB slab_unreclaimable:62476020kB kernel_stack:7936kB pagetables:1240kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:397513 all_unreclaimable? no [7664671.801616] lowmem_reserve[]: 0 0 0 0 [7664671.805586] Node 3 Normal free:525204kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:42104kB inactive_file:43092kB unevictable:840kB isolated(anon):0kB isolated(file):2176kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854284kB slab_unreclaimable:62369172kB kernel_stack:4224kB pagetables:2988kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:569827 all_unreclaimable? yes [7664671.852448] lowmem_reserve[]: 0 0 0 0 [7664671.856413] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664671.871252] Node 0 DMA32: 368*4kB (UEM) 397*8kB (UEM) 1215*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261464kB [7664671.887747] Node 0 Normal: 6197*4kB (UEM) 5708*8kB (UEM) 3928*16kB (UEM) 4493*32kB (UEM) 2053*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508564kB [7664671.904585] Node 1 Normal: 88060*4kB (EM) 21671*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525608kB [7664671.917655] Node 2 Normal: 27473*4kB (UEM) 40276*8kB (UEM) 869*16kB (UEM) 1662*32kB (UEM) 415*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525748kB [7664671.933142] Node 3 Normal: 131315*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525260kB [7664671.945579] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664671.954447] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664671.963061] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664671.971927] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664671.980535] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664671.989409] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664671.998024] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664672.006898] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664672.015504] 72941 total pagecache pages [7664672.019518] 0 pages in swap cache [7664672.023007] Swap cache stats: add 21120405, delete 21136377, find 4513373/7609793 [7664672.030661] Free swap = 2044188kB [7664672.034239] Total swap = 4194300kB [7664672.037820] 66993253 pages RAM [7664672.041052] 0 pages HighMem/MovableOnly [7664672.045063] 1101945 pages reserved [7664672.303880] ll_ost_io03_031 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664672.312319] ll_ost_io03_031 cpuset=/ mems_allowed=3 [7664672.317379] CPU: 47 PID: 119038 Comm: ll_ost_io03_031 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664672.330843] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664672.338672] Call Trace: [7664672.341302] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664672.346623] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664672.352115] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664672.357956] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664672.363703] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664672.369719] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664672.376077] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664672.382172] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664672.387926] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664672.394460] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664672.400995] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664672.407184] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664672.413198] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664672.419315] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664672.426200] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664672.433432] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664672.439595] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664672.446211] [<ffffffffa002a59e>] ? __switch_to+0xce/0x580 [7664672.451879] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664672.457632] [<ffffffffa00ddd9e>] ? account_entity_dequeue+0xae/0xd0 [7664672.464165] [<ffffffffa00e192c>] ? dequeue_entity+0x11c/0x5e0 [7664672.470171] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664672.475699] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664672.482785] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664672.490538] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664672.497797] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664672.505663] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664672.512622] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664672.518056] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664672.524529] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664672.532099] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664672.537156] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664672.543425] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664672.550045] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664672.556310] Mem-Info: [7664672.558771] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:34376 inactive_file:35546 isolated_file:1312 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824046 slab_unreclaimable:62296402 mapped:1620 shmem:0 pagetables:2645 bounce:0 free:590455 free_pcp:0 free_cma:0 [7664672.593049] Node 3 Normal free:525328kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:8kB active_file:41904kB inactive_file:42400kB unevictable:840kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854284kB slab_unreclaimable:62369164kB kernel_stack:4224kB pagetables:2936kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:146304 all_unreclaimable? no [7664672.639738] lowmem_reserve[]: 0 0 0 0 [7664672.643703] Node 3 Normal: 131384*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525536kB [7664672.656141] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664672.665011] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664672.673614] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664672.682481] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664672.691088] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664672.699953] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664672.708561] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664672.717426] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664672.726037] 72957 total pagecache pages [7664672.730053] 0 pages in swap cache [7664672.733547] Swap cache stats: add 21120410, delete 21136382, find 4513374/7609796 [7664672.741200] Free swap = 2045208kB [7664672.744776] Total swap = 4194300kB [7664672.748358] 66993253 pages RAM [7664672.751589] 0 pages HighMem/MovableOnly [7664672.755602] 1101945 pages reserved [7664672.759180] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664672.767231] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664672.776188] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664672.784979] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664672.793581] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664672.801755] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664672.810024] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664672.818640] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664672.826907] [53099] 0 53099 6670 239 18 649 0 smartd [7664672.835088] [53101] 0 53101 1910 64 9 172 0 mdadm [7664672.843183] [53104] 0 53104 74785 315 85 275 0 sssd [7664672.851191] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664672.859722] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664672.868680] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664672.877034] [53139] 997 53139 29446 250 28 128 0 chronyd [7664672.885296] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664672.893561] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664672.901909] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664672.910261] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664672.919132] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664672.927137] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664672.935484] [53863] 0 53863 176656 246 39 1246 0 collectd [7664672.943840] [53969] 0 53969 31572 205 20 168 0 crond [7664672.951934] [54035] 0 54035 27526 164 10 33 0 agetty [7664672.960112] [54036] 0 54036 27526 158 11 33 0 agetty [7664672.968285] [54186] 0 54186 22934 210 46 272 0 master [7664672.976458] [54206] 89 54206 25545 272 47 271 0 qmgr [7664672.984581] [36317] 0 36317 28294 187 14 61 0 bash [7664672.992588] [36328] 0 36328 154746 223 201 98 0 journalctl [7664673.001108] [36329] 0 36329 28177 160 14 55 0 grep [7664673.009196] [117987] 0 117987 283356 282 509 230727 0 python [7664673.017565] [76204] 89 76204 25501 252 46 282 0 pickup [7664673.025745] [97173] 0 97173 48653 264 49 261 0 crond [7664673.033833] [97192] 0 97192 34468 247 25 1344 0 python3 [7664673.042096] [97872] 0 97872 48653 263 49 263 0 crond [7664673.050188] [97890] 0 97890 31176 215 18 701 0 python3 [7664673.058449] [98579] 0 98579 48653 266 49 235 0 crond [7664673.066541] [98713] 0 98713 30977 230 16 529 0 python3 [7664673.074802] [99292] 0 99292 48653 257 49 261 0 crond [7664673.082896] [99450] 0 99450 30913 226 18 446 0 python3 [7664673.091154] [99592] 89 99592 25538 229 47 273 0 cleanup [7664673.099416] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664673.108376] [100032] 0 100032 48653 266 49 240 0 crond [7664673.116636] [100105] 89 100105 25553 264 47 274 0 smtp [7664673.124810] [100203] 0 100203 30816 203 17 333 0 python3 [7664673.133251] Out of memory: Kill process 117987 (python) score 3 or sacrifice child [7664673.140988] Killed process 117987 (python) total-vm:1133424kB, anon-rss:0kB, file-rss:1128kB, shmem-rss:0kB [7664673.160364] python: page allocation failure: order:0, mode:0x200da [7664673.166724] CPU: 30 PID: 117987 Comm: python Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664673.179410] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664673.187246] Call Trace: [7664673.189890] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664673.195211] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664673.201310] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664673.206886] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664673.212900] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664673.219433] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664673.225959] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664673.231801] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664673.238421] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664673.244690] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664673.250623] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664673.256634] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664673.262559] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664673.268488] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664673.274071] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664673.279390] Mem-Info: [7664673.281863] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32644 inactive_file:34792 isolated_file:4128 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824041 slab_unreclaimable:62296405 mapped:1621 shmem:0 pagetables:2645 bounce:0 free:590149 free_pcp:182 free_cma:0 [7664673.316306] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664673.358059] lowmem_reserve[]: 0 1418 63868 63868 [7664673.362987] Node 0 DMA32 free:261332kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1060kB inactive_file:3572kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:136kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686248kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:7737 all_unreclaimable? yes [7664673.407946] lowmem_reserve[]: 0 0 62450 62450 [7664673.412615] Node 0 Normal free:508568kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:46136kB inactive_file:46500kB unevictable:168kB isolated(anon):0kB isolated(file):256kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610944kB slab_unreclaimable:60242792kB kernel_stack:6224kB pagetables:2628kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:160638 all_unreclaimable? yes [7664673.459388] lowmem_reserve[]: 0 0 0 0 [7664673.463359] Node 1 Normal free:525352kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15616kB inactive_file:15656kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:3764kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:404311 all_unreclaimable? yes [7664673.510223] lowmem_reserve[]: 0 0 0 0 [7664673.514198] Node 2 Normal free:524648kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:28168kB inactive_file:32448kB unevictable:8680kB isolated(anon):0kB isolated(file):11648kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476084kB kernel_stack:7936kB pagetables:1240kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:511039 all_unreclaimable? yes [7664673.561406] lowmem_reserve[]: 0 0 0 0 [7664673.565375] Node 3 Normal free:524912kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:41692kB inactive_file:42612kB unevictable:840kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854276kB slab_unreclaimable:62369164kB kernel_stack:4224kB pagetables:2936kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:272344 all_unreclaimable? yes [7664673.612150] lowmem_reserve[]: 0 0 0 0 [7664673.616118] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664673.630973] Node 0 DMA32: 363*4kB (UEM) 397*8kB (UEM) 1215*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261444kB [7664673.647633] Node 0 Normal: 6237*4kB (UEM) 5710*8kB (UEM) 3929*16kB (UEM) 4491*32kB (UEM) 2053*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508692kB [7664673.664504] Node 1 Normal: 88021*4kB (UEM) 21658*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525348kB [7664673.677668] Node 2 Normal: 27439*4kB (UEM) 40237*8kB (EM) 870*16kB (UEM) 1650*32kB (UEM) 412*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524740kB [7664673.693086] Node 3 Normal: 131254*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525016kB [7664673.705492] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664673.714375] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664673.722992] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664673.731882] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664673.740494] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664673.749361] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664673.757966] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664673.766836] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664673.775446] 73277 total pagecache pages [7664673.779462] 0 pages in swap cache [7664673.782953] Swap cache stats: add 21120410, delete 21136382, find 4513374/7609796 [7664673.790607] Free swap = 2045208kB [7664673.794184] Total swap = 4194300kB [7664673.797765] 66993253 pages RAM [7664673.800995] 0 pages HighMem/MovableOnly [7664673.805011] 1101945 pages reserved [7664675.124509] ll_ost_io02_078 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664675.132960] ll_ost_io02_078 cpuset=/ mems_allowed=2 [7664675.138026] CPU: 14 PID: 83189 Comm: ll_ost_io02_078 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664675.151440] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664675.159280] Call Trace: [7664675.161927] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664675.167247] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664675.172746] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664675.178584] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664675.184340] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664675.190360] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664675.196724] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664675.202832] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664675.208617] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664675.215153] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664675.221695] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664675.227882] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664675.233893] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664675.240004] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664675.246886] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664675.253055] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664675.260168] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664675.266743] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664675.273399] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664675.280763] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664675.287503] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664675.294853] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664675.302375] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664675.309296] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664675.316380] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664675.324132] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664675.331399] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664675.339280] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664675.346285] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664675.354257] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664675.361512] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664675.368003] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664675.375572] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664675.380637] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664675.386914] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664675.393534] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664675.399807] Mem-Info: [7664675.402269] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32702 inactive_file:34505 isolated_file:5345 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824042 slab_unreclaimable:62296393 mapped:1621 shmem:0 pagetables:2269 bounce:0 free:590010 free_pcp:0 free_cma:0 [7664675.436573] Node 2 Normal free:524868kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:27956kB inactive_file:32352kB unevictable:8680kB isolated(anon):0kB isolated(file):13696kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476116kB kernel_stack:7936kB pagetables:1240kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:762123 all_unreclaimable? yes [7664675.483800] lowmem_reserve[]: 0 0 0 0 [7664675.487775] Node 2 Normal: 27509*4kB (UEM) 40245*8kB (UEM) 873*16kB (UEM) 1652*32kB (UEM) 412*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525196kB [7664675.503227] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664675.512121] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664675.520729] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664675.529595] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664675.538208] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664675.547087] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664675.555706] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664675.564604] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664675.573221] 73613 total pagecache pages [7664675.577244] 0 pages in swap cache [7664675.580744] Swap cache stats: add 21120420, delete 21136392, find 4513375/7609800 [7664675.588422] Free swap = 2967828kB [7664675.592003] Total swap = 4194300kB [7664675.595584] 66993253 pages RAM [7664675.598815] 0 pages HighMem/MovableOnly [7664675.602828] 1101945 pages reserved [7664675.606407] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664675.614467] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664675.623429] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664675.632248] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664675.640850] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664675.649033] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664675.657295] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664675.665937] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664675.674208] [53099] 0 53099 6670 239 18 649 0 smartd [7664675.682391] [53101] 0 53101 1910 64 9 172 0 mdadm [7664675.690478] [53104] 0 53104 74785 315 85 275 0 sssd [7664675.698516] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664675.707047] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664675.716006] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664675.724357] [53139] 997 53139 29446 250 28 128 0 chronyd [7664675.732625] [53159] 0 53159 110203 310 153 22622 0 sssd_be [7664675.740900] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664675.749255] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664675.757608] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664675.766483] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664675.774514] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664675.782870] [53863] 0 53863 176656 246 39 1246 0 collectd [7664675.791228] [53969] 0 53969 31572 205 20 168 0 crond [7664675.799320] [54035] 0 54035 27526 164 10 33 0 agetty [7664675.807525] [54036] 0 54036 27526 158 11 33 0 agetty [7664675.815709] [54186] 0 54186 22934 210 46 272 0 master [7664675.823891] [54206] 89 54206 25545 272 47 271 0 qmgr [7664675.832013] [36317] 0 36317 28294 187 14 61 0 bash [7664675.840044] [36328] 0 36328 154746 223 201 98 0 journalctl [7664675.848572] [36329] 0 36329 28177 160 14 55 0 grep [7664675.856683] [76204] 89 76204 25501 252 46 282 0 pickup [7664675.864873] [97173] 0 97173 48653 264 49 261 0 crond [7664675.872994] [97192] 0 97192 34468 247 25 1344 0 python3 [7664675.881267] [97872] 0 97872 48653 263 49 263 0 crond [7664675.889362] [97890] 0 97890 31176 215 18 701 0 python3 [7664675.897629] [98579] 0 98579 48653 266 49 235 0 crond [7664675.905750] [98713] 0 98713 30977 230 16 529 0 python3 [7664675.914020] [99292] 0 99292 48653 257 49 261 0 crond [7664675.922126] [99450] 0 99450 30913 226 18 446 0 python3 [7664675.930408] [99592] 89 99592 25538 229 47 273 0 cleanup [7664675.938675] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664675.947659] [100032] 0 100032 48653 266 49 240 0 crond [7664675.955927] [100105] 89 100105 25553 264 47 274 0 smtp [7664675.964109] [100203] 0 100203 30816 203 17 333 0 python3 [7664675.972552] Out of memory: Kill process 53159 (sssd_be) score 0 or sacrifice child [7664675.980325] Killed process 53159 (sssd_be) total-vm:440812kB, anon-rss:0kB, file-rss:1240kB, shmem-rss:0kB [7664676.026201] sssd_be: page allocation failure: order:0, mode:0x200da [7664676.032671] CPU: 42 PID: 53159 Comm: sssd_be Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664676.045357] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664676.053188] Call Trace: [7664676.055824] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664676.061142] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664676.067239] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664676.072814] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664676.078828] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664676.085363] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664676.091897] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664676.097741] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664676.104360] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664676.110627] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664676.116554] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664676.122570] [<ffffffffa076a10e>] ? schedule_hrtimeout_range_clock+0xbe/0x150 [7664676.129885] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664676.135812] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664676.141730] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664676.147305] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664676.152626] Mem-Info: [7664676.155106] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32636 inactive_file:33850 isolated_file:6272 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824042 slab_unreclaimable:62296393 mapped:1621 shmem:0 pagetables:2269 bounce:0 free:590006 free_pcp:0 free_cma:0 [7664676.189374] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664676.231130] lowmem_reserve[]: 0 1418 63868 63868 [7664676.236061] Node 0 DMA32 free:261308kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:816kB inactive_file:3384kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:136kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686248kB kernel_stack:384kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:33811 all_unreclaimable? yes [7664676.281024] lowmem_reserve[]: 0 0 62450 62450 [7664676.285693] Node 0 Normal free:507576kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:38544kB inactive_file:37920kB unevictable:168kB isolated(anon):0kB isolated(file):18048kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610948kB slab_unreclaimable:60242704kB kernel_stack:5952kB pagetables:2628kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:802422 all_unreclaimable? yes [7664676.332640] lowmem_reserve[]: 0 0 0 0 [7664676.336612] Node 1 Normal free:525360kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15992kB inactive_file:16036kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:2260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:117404 all_unreclaimable? yes [7664676.383472] lowmem_reserve[]: 0 0 0 0 [7664676.387442] Node 2 Normal free:524868kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31908kB inactive_file:38220kB unevictable:8680kB isolated(anon):0kB isolated(file):1152kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476116kB kernel_stack:7936kB pagetables:1240kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:232011 all_unreclaimable? yes [7664676.434565] lowmem_reserve[]: 0 0 0 0 [7664676.438541] Node 3 Normal free:525016kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:42420kB inactive_file:42860kB unevictable:840kB isolated(anon):0kB isolated(file):256kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854276kB slab_unreclaimable:62369168kB kernel_stack:4224kB pagetables:2936kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:410764 all_unreclaimable? yes [7664676.485317] lowmem_reserve[]: 0 0 0 0 [7664676.489281] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664676.504119] Node 0 DMA32: 410*4kB (EM) 396*8kB (UEM) 1210*16kB (UEM) 3689*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261576kB [7664676.520525] Node 0 Normal: 6249*4kB (UEM) 5693*8kB (UEM) 3923*16kB (UEM) 4484*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 507836kB [7664676.537366] Node 1 Normal: 88175*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525788kB [7664676.550895] Node 2 Normal: 27509*4kB (UEM) 40245*8kB (UEM) 873*16kB (UEM) 1652*32kB (UEM) 412*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525196kB [7664676.566294] Node 3 Normal: 131356*4kB (UM) 2*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525440kB [7664676.579020] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664676.587888] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664676.596502] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664676.605367] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664676.613973] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664676.622841] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664676.631445] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664676.640310] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664676.648918] 73563 total pagecache pages [7664676.652930] 0 pages in swap cache [7664676.656423] Swap cache stats: add 21120420, delete 21136392, find 4513375/7609800 [7664676.664075] Free swap = 2967828kB [7664676.667654] Total swap = 4194300kB [7664676.671235] 66993253 pages RAM [7664676.674466] 0 pages HighMem/MovableOnly [7664676.678480] 1101945 pages reserved [7664676.688951] ll_ost_io01_079 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664676.697394] ll_ost_io01_079 cpuset=/ mems_allowed=1 [7664676.702460] CPU: 25 PID: 90484 Comm: ll_ost_io01_079 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664676.715838] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664676.723672] Call Trace: [7664676.726314] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664676.731637] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664676.737132] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664676.742973] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664676.748726] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664676.754739] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664676.761093] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664676.767185] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664676.772933] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664676.779467] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664676.785995] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664676.792181] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664676.798185] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664676.804293] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664676.811175] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664676.817336] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664676.824435] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664676.831003] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664676.837652] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664676.844998] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664676.851744] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664676.859090] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664676.866611] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664676.873523] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664676.880611] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664676.888363] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664676.895621] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664676.903488] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664676.910448] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664676.915889] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664676.922363] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664676.929932] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664676.934990] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664676.941259] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664676.947880] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664676.954144] Mem-Info: [7664676.956604] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33209 inactive_file:35834 isolated_file:3872 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824042 slab_unreclaimable:62296393 mapped:1621 shmem:0 pagetables:2269 bounce:0 free:590009 free_pcp:0 free_cma:0 [7664676.990878] Node 1 Normal free:525360kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15992kB inactive_file:16036kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711252kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:2260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:117404 all_unreclaimable? yes [7664677.037744] lowmem_reserve[]: 0 0 0 0 [7664677.041711] Node 1 Normal: 88175*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525788kB [7664677.055242] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664677.064109] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664677.072713] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664677.081582] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664677.090186] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664677.099053] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664677.107659] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664677.116523] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664677.125131] 73563 total pagecache pages [7664677.129144] 0 pages in swap cache [7664677.132635] Swap cache stats: add 21120431, delete 21136403, find 4513376/7609803 [7664677.140290] Free swap = 3058196kB [7664677.143868] Total swap = 4194300kB [7664677.147449] 66993253 pages RAM [7664677.150678] 0 pages HighMem/MovableOnly [7664677.154693] 1101945 pages reserved [7664677.158272] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664677.166319] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664677.175278] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664677.184067] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664677.192649] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664677.200829] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664677.209098] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664677.217710] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664677.225970] [53099] 0 53099 6670 239 18 649 0 smartd [7664677.234142] [53101] 0 53101 1910 64 9 172 0 mdadm [7664677.242228] [53104] 0 53104 74785 315 85 275 0 sssd [7664677.250227] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664677.258749] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664677.267709] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664677.276055] [53139] 997 53139 29446 250 28 128 0 chronyd [7664677.284316] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664677.292670] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664677.301015] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664677.309885] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664677.317891] [53861] 0 53861 174315 320 170 4518 0 rsyslogd [7664677.326245] [53863] 0 53863 176656 246 39 1246 0 collectd [7664677.334591] [53969] 0 53969 31572 205 20 168 0 crond [7664677.342681] [54035] 0 54035 27526 164 10 33 0 agetty [7664677.350858] [54036] 0 54036 27526 158 11 33 0 agetty [7664677.359031] [54186] 0 54186 22934 210 46 272 0 master [7664677.367203] [54206] 89 54206 25545 272 47 271 0 qmgr [7664677.375293] [36317] 0 36317 28294 187 14 61 0 bash [7664677.383300] [36328] 0 36328 154746 223 201 98 0 journalctl [7664677.391827] [36329] 0 36329 28177 160 14 55 0 grep [7664677.399934] [76204] 89 76204 25501 252 46 282 0 pickup [7664677.408118] [97173] 0 97173 48653 264 49 261 0 crond [7664677.416208] [97192] 0 97192 34468 247 25 1344 0 python3 [7664677.424478] [97872] 0 97872 48653 263 49 263 0 crond [7664677.432569] [97890] 0 97890 31176 215 18 701 0 python3 [7664677.440838] [98579] 0 98579 48653 266 49 235 0 crond [7664677.448931] [98713] 0 98713 30977 230 16 529 0 python3 [7664677.457194] [99292] 0 99292 48653 257 49 261 0 crond [7664677.465287] [99450] 0 99450 30913 226 18 446 0 python3 [7664677.473553] [99592] 89 99592 25538 229 47 273 0 cleanup [7664677.481816] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664677.490775] [100032] 0 100032 48653 266 49 240 0 crond [7664677.499035] [100105] 89 100105 25553 264 47 274 0 smtp [7664677.507215] [100203] 0 100203 30816 203 17 333 0 python3 [7664677.515649] Out of memory: Kill process 53861 (rsyslogd) score 0 or sacrifice child [7664677.523473] Killed process 53861 (rsyslogd) total-vm:697260kB, anon-rss:0kB, file-rss:1280kB, shmem-rss:0kB [7664682.048160] LNetError: 80392:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [7664682.058503] LNetError: 80392:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.219@o2ib7 (6): c: 5, oc: 0, rc: 8 [7664682.072542] LNetError: 80401:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.219@o2ib7 added to recovery queue. Health = 900 [7664682.085670] LNetError: 80401:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 3 previous similar messages [7664684.047403] LNetError: 80392:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [7664684.057747] LNetError: 80392:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.217@o2ib7 (6): c: 4, oc: 0, rc: 8 [7664684.128892] LNetError: 80399:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.217@o2ib7 added to recovery queue. Health = 900 [7664684.142706] LustreError: 80404:0:(events.c:305:request_in_callback()) event type 2, status -5, service ost_io [7664684.153028] LustreError: 123039:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small for magic/version check [7664684.164683] LustreError: 123039:0:(sec.c:2191:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.50.8.41@o2ib2 x1659083427320768 [7664685.994467] LNetError: 80409:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.226@o2ib7 added to recovery queue. Health = 900 [7664686.007593] LNetError: 80409:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 2 previous similar messages [7664686.018600] LustreError: 3140:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(183274) req@ffff9c114c918050 x1659185977603776/t0(0) o3->f56fe6b7-932f-4@10.50.16.1@o2ib2:473/0 lens 488/440 e 0 to 0 dl 1583650723 ref 1 fl Interpret:/0/0 rc 0/0 [7664686.018662] Lustre: fir-OST0021: Bulk IO read error with 430e4894-d38d-4 (at 10.50.14.11@o2ib2), client will retry: rc -110 [7664686.027329] LustreError: 123084:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(74605) req@ffff9c2d8c244050 x1659209236853568/t0(0) o4->541f81d4-bd4f-4@10.50.7.3@o2ib2:488/0 lens 488/448 e 0 to 0 dl 1583650738 ref 1 fl Interpret:/0/0 rc 0/0 [7664686.027361] Lustre: fir-OST001b: Bulk IO write error with 541f81d4-bd4f-4 (at 10.50.7.3@o2ib2), client will retry: rc = -110 [7664686.087185] LustreError: 3140:0:(ldlm_lib.c:3271:target_bulk_io()) Skipped 4 previous similar messages [7664687.047354] LNetError: 80392:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [7664687.057696] LNetError: 80392:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [7664687.068037] LNetError: 80392:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.216@o2ib7 (8): c: 4, oc: 0, rc: 8 [7664687.080196] LNetError: 80392:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 2 previous similar messages [7664689.340976] ll_ost_io02_010 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664689.349423] ll_ost_io02_010 cpuset=/ mems_allowed=2 [7664689.354484] CPU: 6 PID: 119554 Comm: ll_ost_io02_010 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664689.367861] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664689.375689] Call Trace: [7664689.378333] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664689.383654] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664689.389142] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664689.394988] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664689.400740] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664689.406750] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664689.413101] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664689.419195] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664689.424942] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664689.431470] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664689.438006] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664689.444191] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664689.450198] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664689.456304] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664689.463191] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664689.470422] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664689.476588] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664689.483247] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664689.490301] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664689.496054] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664689.502502] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664689.509383] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664689.514911] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664689.521998] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664689.529749] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664689.537006] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664689.544876] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664689.551877] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664689.559831] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664689.567089] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664689.573568] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664689.581139] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664689.586196] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664689.592463] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664689.599084] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664689.605349] Mem-Info: [7664689.607809] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:32457 inactive_file:33840 isolated_file:5280 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824034 slab_unreclaimable:62296356 mapped:1608 shmem:0 pagetables:1813 bounce:0 free:590325 free_pcp:0 free_cma:0 [7664689.642084] Node 2 Normal free:525224kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31568kB inactive_file:37476kB unevictable:8680kB isolated(anon):0kB isolated(file):2944kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715188kB slab_unreclaimable:62476080kB kernel_stack:7920kB pagetables:620kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:299467 all_unreclaimable? yes [7664689.689128] lowmem_reserve[]: 0 0 0 0 [7664689.693098] Node 2 Normal: 27486*4kB (UEM) 40274*8kB (UEM) 880*16kB (UEM) 1653*32kB (UEM) 412*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525480kB [7664689.708501] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664689.717367] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664689.725974] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664689.734839] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664689.743443] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664689.752312] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664689.760928] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664689.769801] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664689.778407] 73887 total pagecache pages [7664689.782422] 0 pages in swap cache [7664689.785921] Swap cache stats: add 21120458, delete 21136430, find 4513385/7609819 [7664689.793573] Free swap = 3075808kB [7664689.797150] Total swap = 4194300kB [7664689.800734] 66993253 pages RAM [7664689.803965] 0 pages HighMem/MovableOnly [7664689.807976] 1101945 pages reserved [7664689.811556] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664689.819605] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664689.828563] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664689.837349] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664689.845940] [53050] 0 53050 13880 123 28 146 -1000 auditd [7664689.854122] [53078] 999 53078 156119 278 64 2197 0 polkitd [7664689.862391] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664689.871005] [53084] 32 53084 17316 115 37 138 0 rpcbind [7664689.879273] [53099] 0 53099 6670 239 18 649 0 smartd [7664689.887453] [53101] 0 53101 1910 64 9 172 0 mdadm [7664689.895543] [53104] 0 53104 74785 324 85 252 0 sssd [7664689.903548] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664689.912078] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664689.921037] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664689.929392] [53139] 997 53139 29446 250 28 128 0 chronyd [7664689.937652] [53178] 0 53178 76774 292 95 239 0 sssd_nss [7664689.946006] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664689.954351] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664689.963221] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664689.971228] [53863] 0 53863 176656 246 39 1246 0 collectd [7664689.979580] [53969] 0 53969 31572 205 20 168 0 crond [7664689.987666] [54035] 0 54035 27526 164 10 33 0 agetty [7664689.995840] [54036] 0 54036 27526 158 11 33 0 agetty [7664690.004013] [54186] 0 54186 22934 210 46 272 0 master [7664690.012185] [54206] 89 54206 25545 272 47 271 0 qmgr [7664690.020307] [36317] 0 36317 28294 187 14 61 0 bash [7664690.028315] [36328] 0 36328 154746 223 201 98 0 journalctl [7664690.036841] [36329] 0 36329 28177 160 14 55 0 grep [7664690.044943] [76204] 89 76204 25501 252 46 282 0 pickup [7664690.053125] [97173] 0 97173 48653 264 49 261 0 crond [7664690.061213] [97192] 0 97192 34468 245 25 1344 0 python3 [7664690.069474] [97872] 0 97872 48653 263 49 263 0 crond [7664690.077567] [97890] 0 97890 31176 215 18 701 0 python3 [7664690.085831] [98579] 0 98579 48653 266 49 235 0 crond [7664690.093922] [98713] 0 98713 30977 227 16 529 0 python3 [7664690.102192] [99292] 0 99292 48653 257 49 261 0 crond [7664690.110285] [99450] 0 99450 30913 224 18 446 0 python3 [7664690.118541] [99592] 89 99592 25538 229 47 273 0 cleanup [7664690.126807] [99739] 89 99739 25502 246 47 260 0 trivial-rewrite [7664690.135765] [100032] 0 100032 48653 266 49 240 0 crond [7664690.144025] [100105] 89 100105 25553 264 47 274 0 smtp [7664690.152207] [100203] 0 100203 30816 202 17 333 0 python3 [7664690.160649] Out of memory: Kill process 53078 (polkitd) score 0 or sacrifice child [7664690.168394] Killed process 53078 (polkitd) total-vm:624476kB, anon-rss:0kB, file-rss:1112kB, shmem-rss:0kB [7664691.020435] LustreError: 90710:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(4194304) req@ffff9c142c7ff050 x1659549650568960/t0(0) o4->b62a8bca-4275-4@10.50.1.4@o2ib2:467/0 lens 488/448 e 0 to 0 dl 1583650717 ref 1 fl Interpret:/0/0 rc 0/0 [7664691.020521] Lustre: fir-OST0023: Bulk IO write error with 12bd00c4-481c-4 (at 10.50.17.8@o2ib2), client will retry: rc = -110 [7664691.020535] LustreError: 123080:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(212363) req@ffff9c3ef347e050 x1659397014590848/t0(0) o3->7c9c28a0-1550-4@10.50.15.11@o2ib2:473/0 lens 488/440 e 0 to 0 dl 1583650723 ref 1 fl Interpret:/0/0 rc 0/0 [7664691.020580] Lustre: fir-OST001f: Bulk IO read error with 7c9c28a0-1550-4 (at 10.50.15.11@o2ib2), client will retry: rc -110 [7664691.020582] Lustre: Skipped 4 previous similar messages [7664691.094947] LustreError: 90710:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) Skipped 1 previous similar message [7664695.735889] trivial-rewrite invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [7664695.744341] trivial-rewrite cpuset=/ mems_allowed=0-3 [7664695.749581] CPU: 11 PID: 99739 Comm: trivial-rewrite Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664695.762953] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664695.770780] Call Trace: [7664695.773410] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664695.778730] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664695.784226] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664695.790071] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664695.796079] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664695.802431] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664695.808526] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664695.814271] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664695.820796] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664695.827322] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664695.833502] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664695.839508] [<ffffffffa01ba3c8>] filemap_fault+0x298/0x490 [7664695.845283] [<ffffffffc05871c6>] ext4_filemap_fault+0x36/0x50 [ext4] [7664695.851905] [<ffffffffa01e593a>] __do_fault.isra.59+0x8a/0x100 [7664695.858004] [<ffffffffa01e5eec>] do_read_fault.isra.61+0x4c/0x1b0 [7664695.864369] [<ffffffffa01ea874>] handle_pte_fault+0x2f4/0xd10 [7664695.870379] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664695.876298] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664695.882219] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664695.887790] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664695.893102] Mem-Info: [7664695.895580] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34444 inactive_file:34623 isolated_file:4384 unevictable:9044 dirty:90 writeback:0 unstable:0 slab_reclaimable:824034 slab_unreclaimable:62296394 mapped:1608 shmem:0 pagetables:1813 bounce:0 free:590109 free_pcp:0 free_cma:0 [7664695.929934] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664695.971685] lowmem_reserve[]: 0 1418 63868 63868 [7664695.976606] Node 0 DMA32 free:261324kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:864kB inactive_file:2816kB unevictable:0kB isolated(anon):0kB isolated(file):512kB present:1633052kB managed:1452284kB mlocked:0kB dirty:20kB writeback:0kB mapped:84kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686224kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:125842 all_unreclaimable? no [7664696.024342] LustreError: 3107:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(155474) req@ffff9c1fb0425050 x1659467991415552/t0(0) o3->fb2c1382-8f5a-4@10.50.15.10@o2ib2:473/0 lens 488/440 e 0 to 0 dl 1583650723 ref 1 fl Interpret:/0/0 rc 0/0 [7664696.024378] Lustre: fir-OST0023: Bulk IO read error with fb2c1382-8f5a-4 (at 10.50.15.10@o2ib2), client will retry: rc -110 [7664696.021652] lowmem_reserve[]: 0 0 62450 62450 [7664696.060537] Node 0 Normal free:508608kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:4kB active_file:44756kB inactive_file:42316kB unevictable:168kB isolated(anon):0kB isolated(file):7296kB present:64998912kB managed:63949072kB mlocked:168kB dirty:116kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610940kB slab_unreclaimable:60242672kB kernel_stack:6192kB pagetables:2548kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:39925 all_unreclaimable? no [7664696.107398] lowmem_reserve[]: 0 0 0 0 [7664696.111369] Node 1 Normal free:525508kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16500kB inactive_file:16660kB unevictable:26488kB isolated(anon):0kB isolated(file):896kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711248kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:2016kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:831626 all_unreclaimable? yes [7664696.158404] lowmem_reserve[]: 0 0 0 0 [7664696.162381] Node 2 Normal free:524980kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32220kB inactive_file:35688kB unevictable:8680kB isolated(anon):0kB isolated(file):2432kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:160kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715188kB slab_unreclaimable:62476080kB kernel_stack:7920kB pagetables:620kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:470065 all_unreclaimable? no [7664696.209493] lowmem_reserve[]: 0 0 0 0 [7664696.213464] Node 3 Normal free:523724kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:40kB active_file:43652kB inactive_file:43632kB unevictable:840kB isolated(anon):0kB isolated(file):1920kB present:67108352kB managed:66038732kB mlocked:840kB dirty:64kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854272kB slab_unreclaimable:62369264kB kernel_stack:4224kB pagetables:1860kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1304808 all_unreclaimable? yes [7664696.260595] lowmem_reserve[]: 0 0 0 0 [7664696.264566] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664696.279407] Node 0 DMA32: 389*4kB (EM) 400*8kB (UEM) 1213*16kB (UEM) 3689*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261572kB [7664696.295811] Node 0 Normal: 6502*4kB (UEM) 5720*8kB (UEM) 3924*16kB (UEM) 4484*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 509080kB [7664696.312652] Node 1 Normal: 88093*4kB (UEM) 21640*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525508kB [7664696.326182] Node 2 Normal: 27392*4kB (UEM) 40218*8kB (UEM) 896*16kB (UEM) 1675*32kB (UEM) 413*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525680kB [7664696.341582] Node 3 Normal: 130945*4kB (UM) 7*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 523836kB [7664696.354313] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664696.363179] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664696.371787] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664696.380653] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664696.389259] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664696.398127] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664696.406740] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664696.415605] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664696.424209] 74066 total pagecache pages [7664696.428226] 0 pages in swap cache [7664696.431716] Swap cache stats: add 21120629, delete 21136601, find 4513416/7609890 [7664696.439367] Free swap = 3084652kB [7664696.442947] Total swap = 4194300kB [7664696.446529] 66993253 pages RAM [7664696.449759] 0 pages HighMem/MovableOnly [7664696.453773] 1101945 pages reserved [7664696.457353] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664696.465401] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664696.474365] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664696.483152] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664696.491742] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664696.499920] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664696.508533] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664696.516799] [53099] 0 53099 6670 239 18 649 0 smartd [7664696.524972] [53101] 0 53101 1910 64 9 172 0 mdadm [7664696.533059] [53104] 0 53104 74785 324 85 252 0 sssd [7664696.541068] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664696.549594] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664696.558548] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664696.566894] [53139] 997 53139 29446 250 28 128 0 chronyd [7664696.575156] [53178] 0 53178 76774 291 95 241 0 sssd_nss [7664696.583508] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664696.591853] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664696.600721] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664696.608730] [53863] 0 53863 176656 246 39 1247 0 collectd [7664696.617085] [53969] 0 53969 31572 205 20 168 0 crond [7664696.625179] [54035] 0 54035 27526 164 10 33 0 agetty [7664696.633359] [54036] 0 54036 27526 158 11 33 0 agetty [7664696.641532] [54186] 0 54186 22934 210 46 272 0 master [7664696.649706] [54206] 89 54206 25545 272 47 271 0 qmgr [7664696.657830] [36317] 0 36317 28294 187 14 61 0 bash [7664696.665834] [36328] 0 36328 154746 223 201 98 0 journalctl [7664696.674354] [36329] 0 36329 28177 160 14 55 0 grep [7664696.682461] [76204] 89 76204 25501 252 46 282 0 pickup [7664696.690642] [97173] 0 97173 48653 264 49 261 0 crond [7664696.698732] [97192] 0 97192 34468 245 25 1344 0 python3 [7664696.706995] [97872] 0 97872 48653 263 49 263 0 crond [7664696.715087] [97890] 0 97890 31176 215 18 701 0 python3 [7664696.723349] [98579] 0 98579 48653 266 49 235 0 crond [7664696.731441] [98713] 0 98713 30977 227 16 529 0 python3 [7664696.739701] [99292] 0 99292 48653 257 49 261 0 crond [7664696.747787] [99450] 0 99450 30913 224 18 446 0 python3 [7664696.756046] [99592] 89 99592 25538 229 47 273 0 cleanup [7664696.764306] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664696.773259] [100032] 0 100032 48653 266 49 240 0 crond [7664696.781527] [100105] 89 100105 25553 264 47 274 0 smtp [7664696.789699] [100203] 0 100203 30816 202 17 333 0 python3 [7664696.798133] Out of memory: Kill process 97192 (python3) score 0 or sacrifice child [7664696.805879] Killed process 97192 (python3) total-vm:137872kB, anon-rss:0kB, file-rss:980kB, shmem-rss:0kB [7664696.845455] python3: page allocation failure: order:0, mode:0x200da [7664696.851913] CPU: 16 PID: 97192 Comm: python3 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664696.864601] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664696.872433] Call Trace: [7664696.875078] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664696.880401] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664696.886499] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664696.892072] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664696.898086] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664696.904614] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664696.911148] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664696.916989] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664696.923602] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664696.929878] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664696.935809] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664696.941822] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664696.947750] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664696.953674] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664696.959250] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664696.964568] Mem-Info: [7664696.967045] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33423 inactive_file:35173 isolated_file:3072 unevictable:9044 dirty:90 writeback:0 unstable:0 slab_reclaimable:824034 slab_unreclaimable:62296402 mapped:1608 shmem:0 pagetables:1749 bounce:0 free:590025 free_pcp:0 free_cma:0 [7664697.001411] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664697.043160] lowmem_reserve[]: 0 1418 63868 63868 [7664697.048090] Node 0 DMA32 free:261348kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:888kB inactive_file:3012kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:20kB writeback:0kB mapped:84kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686224kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:170188 all_unreclaimable? yes [7664697.093048] lowmem_reserve[]: 0 0 62450 62450 [7664697.097716] Node 0 Normal free:508560kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:4kB active_file:42148kB inactive_file:41352kB unevictable:168kB isolated(anon):0kB isolated(file):11136kB present:64998912kB managed:63949072kB mlocked:168kB dirty:116kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610940kB slab_unreclaimable:60242672kB kernel_stack:6352kB pagetables:2516kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:711185 all_unreclaimable? yes [7664697.144827] lowmem_reserve[]: 0 0 0 0 [7664697.148798] Node 1 Normal free:525604kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16760kB inactive_file:14932kB unevictable:26488kB isolated(anon):0kB isolated(file):1024kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711248kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:2000kB unstable:0kB bounce:0kB free_pcp:116kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3488 all_unreclaimable? no [7664697.195837] lowmem_reserve[]: 0 0 0 0 [7664697.199812] Node 2 Normal free:525048kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32344kB inactive_file:37564kB unevictable:8680kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:160kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715188kB slab_unreclaimable:62476112kB kernel_stack:7920kB pagetables:612kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:832 all_unreclaimable? no [7664697.246589] lowmem_reserve[]: 0 0 0 0 [7664697.250563] Node 3 Normal free:523976kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:40kB active_file:44388kB inactive_file:43124kB unevictable:840kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66038732kB mlocked:840kB dirty:64kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854272kB slab_unreclaimable:62369264kB kernel_stack:4224kB pagetables:1860kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:406370 all_unreclaimable? yes [7664697.297513] lowmem_reserve[]: 0 0 0 0 [7664697.301487] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664697.316324] Node 0 DMA32: 388*4kB (EM) 400*8kB (UEM) 1213*16kB (UEM) 3689*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261568kB [7664697.332730] Node 0 Normal: 6522*4kB (UEM) 5721*8kB (UEM) 3911*16kB (UEM) 4485*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508992kB [7664697.349570] Node 1 Normal: 87977*4kB (UEM) 21640*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525044kB [7664697.363099] Node 2 Normal: 27395*4kB (UEM) 40218*8kB (UEM) 895*16kB (UEM) 1675*32kB (UEM) 413*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525676kB [7664697.378499] Node 3 Normal: 131020*4kB (UM) 7*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524136kB [7664697.391222] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664697.400090] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664697.408696] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664697.417564] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664697.426167] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664697.435035] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664697.443641] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664697.452508] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664697.461112] 74095 total pagecache pages [7664697.465126] 0 pages in swap cache [7664697.468617] Swap cache stats: add 21120629, delete 21136601, find 4513416/7609890 [7664697.476270] Free swap = 3084652kB [7664697.479848] Total swap = 4194300kB [7664697.483429] 66993253 pages RAM [7664697.486660] 0 pages HighMem/MovableOnly [7664697.490677] 1101945 pages reserved [7664698.679210] ll_ost_io02_052 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [7664698.687398] ll_ost_io02_052 cpuset=/ mems_allowed=2 [7664698.692462] CPU: 18 PID: 6885 Comm: ll_ost_io02_052 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664698.705751] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664698.713585] Call Trace: [7664698.716218] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664698.721533] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664698.727030] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664698.732869] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664698.738626] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664698.744637] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664698.750991] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664698.757084] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664698.762839] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664698.769376] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664698.775911] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664698.782165] [<ffffffffc124293f>] tgt_checksum_niobuf_rw+0xbf/0xe00 [ptlrpc] [7664698.789428] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664698.796767] [<ffffffffc0cb71e0>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [7664698.803601] [<ffffffffc1247325>] tgt_brw_read+0xc35/0x1e50 [ptlrpc] [7664698.810160] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664698.817515] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664698.824865] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664698.832380] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664698.839299] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664698.846391] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664698.854141] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664698.861399] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664698.869268] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664698.876237] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664698.881679] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664698.888163] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664698.895738] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664698.900796] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664698.907063] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664698.913683] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664698.919947] Mem-Info: [7664698.922409] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:33575 inactive_file:36254 isolated_file:2688 unevictable:9044 dirty:90 writeback:0 unstable:0 slab_reclaimable:824027 slab_unreclaimable:62296403 mapped:1608 shmem:0 pagetables:1724 bounce:0 free:590090 free_pcp:0 free_cma:0 [7664698.956768] Node 2 Normal free:525360kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31736kB inactive_file:38236kB unevictable:8680kB isolated(anon):0kB isolated(file):896kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:160kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715184kB slab_unreclaimable:62476112kB kernel_stack:7920kB pagetables:612kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:220965 all_unreclaimable? yes [7664699.003902] lowmem_reserve[]: 0 0 0 0 [7664699.007879] Node 2 Normal: 27395*4kB (UEM) 40219*8kB (UEM) 895*16kB (EM) 1675*32kB (UEM) 413*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525684kB [7664699.023239] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664699.032113] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664699.040725] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664699.049601] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664699.058208] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664699.067076] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664699.075682] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664699.084548] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664699.093154] 74044 total pagecache pages [7664699.097168] 0 pages in swap cache [7664699.100668] Swap cache stats: add 21120632, delete 21136604, find 4513416/7609892 [7664699.108321] Free swap = 3090028kB [7664699.111899] Total swap = 4194300kB [7664699.115479] 66993253 pages RAM [7664699.118712] 0 pages HighMem/MovableOnly [7664699.122723] 1101945 pages reserved [7664699.126303] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664699.134353] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664699.143310] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664699.152107] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664699.160691] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664699.168870] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664699.177482] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664699.185744] [53099] 0 53099 6670 239 18 649 0 smartd [7664699.193925] [53101] 0 53101 1910 64 9 172 0 mdadm [7664699.202018] [53104] 0 53104 74785 324 85 252 0 sssd [7664699.210017] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664699.218537] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664699.227492] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664699.235846] [53139] 997 53139 29446 250 28 128 0 chronyd [7664699.244113] [53178] 0 53178 76774 291 95 241 0 sssd_nss [7664699.252467] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664699.260812] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664699.269685] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664699.277687] [53863] 0 53863 176656 246 39 1247 0 collectd [7664699.286035] [53969] 0 53969 31572 205 20 168 0 crond [7664699.294129] [54035] 0 54035 27526 164 10 33 0 agetty [7664699.302302] [54036] 0 54036 27526 158 11 33 0 agetty [7664699.310474] [54186] 0 54186 22934 210 46 272 0 master [7664699.318647] [54206] 89 54206 25545 272 47 271 0 qmgr [7664699.326764] [36317] 0 36317 28294 187 14 61 0 bash [7664699.334767] [36328] 0 36328 154746 223 201 98 0 journalctl [7664699.343287] [36329] 0 36329 28177 160 14 55 0 grep [7664699.351387] [76204] 89 76204 25501 252 46 282 0 pickup [7664699.359571] [97173] 0 97173 48653 264 49 261 0 crond [7664699.367660] [97872] 0 97872 48653 263 49 263 0 crond [7664699.375753] [97890] 0 97890 31176 215 18 701 0 python3 [7664699.384016] [98579] 0 98579 48653 266 49 235 0 crond [7664699.392108] [98713] 0 98713 30977 227 16 529 0 python3 [7664699.400377] [99292] 0 99292 48653 257 49 261 0 crond [7664699.408469] [99450] 0 99450 30913 224 18 446 0 python3 [7664699.416729] [99592] 89 99592 25538 229 47 273 0 cleanup [7664699.424990] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664699.433950] [100032] 0 100032 48653 266 49 240 0 crond [7664699.442209] [100105] 89 100105 25553 264 47 274 0 smtp [7664699.450381] [100203] 0 100203 30816 202 17 333 0 python3 [7664699.458814] Out of memory: Kill process 53863 (collectd) score 0 or sacrifice child [7664699.466641] Killed process 53863 (collectd) total-vm:706624kB, anon-rss:0kB, file-rss:984kB, shmem-rss:0kB [7664699.576429] collectd: page allocation failure: order:0, mode:0x200da [7664699.582970] CPU: 0 PID: 53863 Comm: collectd Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664699.595666] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664699.603500] Call Trace: [7664699.606139] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664699.611455] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664699.617556] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664699.623128] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664699.629137] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664699.635669] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664699.642196] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664699.648036] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664699.654648] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664699.660914] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664699.666836] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664699.672849] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664699.678768] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664699.684686] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664699.690262] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664699.695572] Mem-Info: [7664699.698048] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:33398 inactive_file:34474 isolated_file:4096 unevictable:9044 dirty:90 writeback:0 unstable:0 slab_reclaimable:824027 slab_unreclaimable:62296403 mapped:1608 shmem:0 pagetables:1724 bounce:0 free:590256 free_pcp:0 free_cma:0 [7664699.732404] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664699.774157] lowmem_reserve[]: 0 1418 63868 63868 [7664699.779086] Node 0 DMA32 free:261260kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:948kB inactive_file:3600kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:20kB writeback:0kB mapped:84kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686224kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:220193 all_unreclaimable? yes [7664699.824039] lowmem_reserve[]: 0 0 62450 62450 [7664699.828720] Node 0 Normal free:508524kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44404kB inactive_file:42716kB unevictable:168kB isolated(anon):0kB isolated(file):12416kB present:64998912kB managed:63949072kB mlocked:168kB dirty:116kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610920kB slab_unreclaimable:60242668kB kernel_stack:6512kB pagetables:2420kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:555294 all_unreclaimable? yes [7664699.875844] lowmem_reserve[]: 0 0 0 0 [7664699.879822] Node 1 Normal free:525380kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16236kB inactive_file:16784kB unevictable:26488kB isolated(anon):0kB isolated(file):896kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711248kB slab_unreclaimable:63411344kB kernel_stack:20816kB pagetables:2000kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:167047 all_unreclaimable? yes [7664699.926855] lowmem_reserve[]: 0 0 0 0 [7664699.930826] Node 2 Normal free:525364kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31736kB inactive_file:36824kB unevictable:8680kB isolated(anon):0kB isolated(file):2816kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:160kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715184kB slab_unreclaimable:62476112kB kernel_stack:7920kB pagetables:612kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:154883 all_unreclaimable? yes [7664699.978024] lowmem_reserve[]: 0 0 0 0 [7664699.981992] Node 3 Normal free:524596kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:40kB active_file:41984kB inactive_file:43068kB unevictable:840kB isolated(anon):0kB isolated(file):256kB present:67108352kB managed:66038732kB mlocked:840kB dirty:64kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369264kB kernel_stack:4208kB pagetables:1856kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:659623 all_unreclaimable? yes [7664700.028945] lowmem_reserve[]: 0 0 0 0 [7664700.032916] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664700.047755] Node 0 DMA32: 365*4kB (EM) 400*8kB (UEM) 1213*16kB (UEM) 3689*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261476kB [7664700.064158] Node 0 Normal: 6369*4kB (UEM) 5722*8kB (UEM) 3935*16kB (UEM) 4485*32kB (UEM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508772kB [7664700.080999] Node 1 Normal: 88022*4kB (UEM) 21640*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525224kB [7664700.094529] Node 2 Normal: 27394*4kB (UEM) 40219*8kB (UEM) 896*16kB (UEM) 1675*32kB (UEM) 413*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525696kB [7664700.109928] Node 3 Normal: 131195*4kB (UEM) 6*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524828kB [7664700.122827] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664700.131693] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664700.140299] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664700.149163] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664700.157776] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664700.166645] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664700.175252] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664700.184122] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664700.192736] 74042 total pagecache pages [7664700.196759] 0 pages in swap cache [7664700.200257] Swap cache stats: add 21120650, delete 21136622, find 4513420/7609903 [7664700.207911] Free swap = 3090028kB [7664700.211499] Total swap = 4194300kB [7664700.215087] 66993253 pages RAM [7664700.218327] 0 pages HighMem/MovableOnly [7664700.222338] 1101945 pages reserved [7664701.049747] LustreError: 89774:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(95248) req@ffff9c2f94fb9050 x1659475663823296/t0(0) o3->b4f8cb5a-edfb-4@10.50.13.3@o2ib2:493/0 lens 488/440 e 1 to 0 dl 1583650743 ref 1 fl Interpret:/0/0 rc 0/0 [7664701.072611] Lustre: fir-OST001f: Bulk IO read error with b4f8cb5a-edfb-4 (at 10.50.13.3@o2ib2), client will retry: rc -110 [7664706.048771] LNetError: 80392:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [7664706.059117] LNetError: 80392:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 1 previous similar message [7664706.069373] LNetError: 80392:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.227@o2ib7 (6): c: 0, oc: 0, rc: 8 [7664706.081538] LNetError: 80392:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 1 previous similar message [7664706.091934] LustreError: 90708:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(74651) req@ffff9c1f527ac850 x1659179002321472/t0(0) o3->ccea6ca8-94f6-4@10.50.15.3@o2ib2:496/0 lens 488/440 e 1 to 0 dl 1583650746 ref 1 fl Interpret:/0/0 rc 0/0 [7664706.092090] Lustre: fir-OST001d: Bulk IO read error with 430e4894-d38d-4 (at 10.50.14.11@o2ib2), client will retry: rc -110 [7664706.126079] LustreError: 90708:0:(ldlm_lib.c:3271:target_bulk_io()) Skipped 4 previous similar messages [7664706.665150] ll_ost_io00_088 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664706.673594] ll_ost_io00_088 cpuset=/ mems_allowed=0 [7664706.678657] CPU: 20 PID: 90706 Comm: ll_ost_io00_088 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664706.692034] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664706.699864] Call Trace: [7664706.702506] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664706.707820] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664706.713315] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664706.719156] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664706.724910] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664706.730924] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664706.737278] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664706.743382] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664706.749134] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664706.755671] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664706.762204] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664706.768389] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664706.774396] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664706.780513] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664706.787396] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664706.794620] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664706.800777] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664706.807391] [<ffffffffa021bd89>] ? ___slab_alloc+0x209/0x4f0 [7664706.813313] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664706.819067] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664706.825513] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664706.832388] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664706.837938] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664706.845026] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664706.852778] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664706.860039] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664706.867904] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664706.874907] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664706.882869] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664706.890138] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664706.896642] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664706.904232] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664706.909288] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664706.915562] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664706.922181] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664706.928447] Mem-Info: [7664706.930913] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:34050 inactive_file:34936 isolated_file:3744 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824025 slab_unreclaimable:62296545 mapped:1607 shmem:0 pagetables:1685 bounce:0 free:590531 free_pcp:2 free_cma:0 [7664706.965183] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664707.006933] lowmem_reserve[]: 0 1418 63868 63868 [7664707.011857] Node 0 DMA32 free:261316kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1052kB inactive_file:3364kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:80kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686220kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:20776 all_unreclaimable? yes [7664707.056724] lowmem_reserve[]: 0 0 62450 62450 [7664707.061387] Node 0 Normal free:508484kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44788kB inactive_file:44696kB unevictable:168kB isolated(anon):0kB isolated(file):4352kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243256kB kernel_stack:6080kB pagetables:2324kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:380279 all_unreclaimable? yes [7664707.108245] lowmem_reserve[]: 0 0 0 0 [7664707.112215] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664707.127055] Node 0 DMA32: 351*4kB (UEM) 399*8kB (UEM) 1214*16kB (UEM) 3689*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261428kB [7664707.143546] Node 0 Normal: 6455*4kB (UEM) 5705*8kB (UEM) 3934*16kB (UEM) 4479*32kB (EM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508772kB [7664707.160301] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664707.169167] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664707.177774] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664707.186640] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664707.195245] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664707.204110] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664707.212717] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664707.221588] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664707.230197] 73699 total pagecache pages [7664707.234212] 0 pages in swap cache [7664707.237703] Swap cache stats: add 21120677, delete 21136649, find 4513424/7609910 [7664707.245355] Free swap = 3094380kB [7664707.248935] Total swap = 4194300kB [7664707.252514] 66993253 pages RAM [7664707.255745] 0 pages HighMem/MovableOnly [7664707.259760] 1101945 pages reserved [7664707.263339] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664707.271388] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664707.280346] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664707.289136] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664707.297736] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664707.305915] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664707.314528] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664707.322795] [53099] 0 53099 6670 239 18 649 0 smartd [7664707.330969] [53101] 0 53101 1910 64 9 172 0 mdadm [7664707.339064] [53104] 0 53104 74785 324 85 252 0 sssd [7664707.347072] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664707.355591] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664707.364542] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664707.372891] [53139] 997 53139 29446 250 28 128 0 chronyd [7664707.381157] [53178] 0 53178 76774 291 95 241 0 sssd_nss [7664707.389505] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664707.397860] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664707.406737] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664707.414743] [53969] 0 53969 31572 205 20 168 0 crond [7664707.422836] [54035] 0 54035 27526 164 10 33 0 agetty [7664707.431008] [54036] 0 54036 27526 158 11 33 0 agetty [7664707.439182] [54186] 0 54186 22934 210 46 272 0 master [7664707.447356] [54206] 89 54206 25545 272 47 271 0 qmgr [7664707.455478] [36317] 0 36317 28294 187 14 61 0 bash [7664707.463483] [36328] 0 36328 154746 223 201 98 0 journalctl [7664707.472002] [36329] 0 36329 28177 160 14 55 0 grep [7664707.480108] [76204] 89 76204 25501 252 46 282 0 pickup [7664707.488295] [97173] 0 97173 48653 264 49 261 0 crond [7664707.496385] [97872] 0 97872 48653 263 49 263 0 crond [7664707.504478] [97890] 0 97890 31176 215 18 701 0 python3 [7664707.512740] [98579] 0 98579 48653 266 49 235 0 crond [7664707.520830] [98713] 0 98713 30977 227 16 529 0 python3 [7664707.529093] [99292] 0 99292 48653 257 49 261 0 crond [7664707.537188] [99450] 0 99450 30913 224 18 446 0 python3 [7664707.545455] [99592] 89 99592 25538 229 47 273 0 cleanup [7664707.553721] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664707.562676] [100032] 0 100032 48653 266 49 240 0 crond [7664707.570947] [100105] 89 100105 25553 264 47 274 0 smtp [7664707.579125] [100203] 0 100203 30816 202 17 333 0 python3 [7664707.587566] Out of memory: Kill process 97890 (python3) score 0 or sacrifice child [7664707.595313] Killed process 97890 (python3) total-vm:124704kB, anon-rss:0kB, file-rss:860kB, shmem-rss:0kB [7664707.689556] python3: page allocation failure: order:0, mode:0x201da [7664707.696007] CPU: 26 PID: 97890 Comm: python3 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664707.708696] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664707.716526] Call Trace: [7664707.719162] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664707.724491] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664707.730596] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664707.736186] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664707.742200] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664707.748732] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664707.755260] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664707.761448] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664707.767463] [<ffffffffa01ba3c8>] filemap_fault+0x298/0x490 [7664707.773250] [<ffffffffc05871c6>] ext4_filemap_fault+0x36/0x50 [ext4] [7664707.779882] [<ffffffffa01e593a>] __do_fault.isra.59+0x8a/0x100 [7664707.785998] [<ffffffffa0233289>] ? __mem_cgroup_uncharge_common+0x49/0x2f0 [7664707.793140] [<ffffffffa01e5eec>] do_read_fault.isra.61+0x4c/0x1b0 [7664707.799495] [<ffffffffa01ea874>] handle_pte_fault+0x2f4/0xd10 [7664707.805500] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664707.811421] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664707.817347] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664707.822920] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664707.828237] Mem-Info: [7664707.830718] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32589 inactive_file:35846 isolated_file:2656 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824025 slab_unreclaimable:62296554 mapped:1607 shmem:0 pagetables:1685 bounce:0 free:590366 free_pcp:0 free_cma:0 [7664707.864992] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664707.906747] lowmem_reserve[]: 0 1418 63868 63868 [7664707.911676] Node 0 DMA32 free:261300kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1072kB inactive_file:3432kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:80kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686220kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:21789 all_unreclaimable? yes [7664707.956574] lowmem_reserve[]: 0 0 62450 62450 [7664707.961373] Node 0 Normal free:508592kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:47580kB inactive_file:44596kB unevictable:168kB isolated(anon):0kB isolated(file):3456kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243256kB kernel_stack:6080kB pagetables:2324kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:778808 all_unreclaimable? yes [7664708.008263] lowmem_reserve[]: 0 0 0 0 [7664708.012235] Node 1 Normal free:525504kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16312kB inactive_file:16572kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711248kB slab_unreclaimable:63411344kB kernel_stack:20816kB pagetables:1988kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:301918 all_unreclaimable? yes [7664708.059135] lowmem_reserve[]: 0 0 0 0 [7664708.063108] Node 2 Normal free:525300kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32796kB inactive_file:35048kB unevictable:8680kB isolated(anon):0kB isolated(file):3200kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715184kB slab_unreclaimable:62476092kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1134574 all_unreclaimable? yes [7664708.110233] lowmem_reserve[]: 0 0 0 0 [7664708.114200] Node 3 Normal free:524888kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:43724kB inactive_file:44284kB unevictable:840kB isolated(anon):0kB isolated(file):384kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369304kB kernel_stack:4208kB pagetables:1852kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1367054 all_unreclaimable? yes [7664708.161055] lowmem_reserve[]: 0 0 0 0 [7664708.165019] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664708.179860] Node 0 DMA32: 352*4kB (UEM) 402*8kB (UEM) 1214*16kB (UEM) 3689*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261456kB [7664708.196353] Node 0 Normal: 6469*4kB (UEM) 5705*8kB (UEM) 3934*16kB (UEM) 4479*32kB (EM) 2046*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508828kB [7664708.213113] Node 1 Normal: 88054*4kB (UEM) 21659*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525504kB [7664708.226645] Node 2 Normal: 27478*4kB (UEM) 40145*8kB (UEM) 894*16kB (UEM) 1669*32kB (UEM) 414*64kB (UEM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525408kB [7664708.242511] Node 3 Normal: 131209*4kB (UEM) 6*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524884kB [7664708.255410] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664708.264287] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664708.272902] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664708.281775] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664708.290390] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664708.299264] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664708.307871] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664708.316747] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664708.325358] 73769 total pagecache pages [7664708.329373] 0 pages in swap cache [7664708.332873] Swap cache stats: add 21120677, delete 21136649, find 4513424/7609910 [7664708.340526] Free swap = 3094380kB [7664708.344107] Total swap = 4194300kB [7664708.347694] 66993253 pages RAM [7664708.350927] 0 pages HighMem/MovableOnly [7664708.354946] 1101945 pages reserved [7664708.500838] ll_ost_io03_035 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664708.509276] ll_ost_io03_035 cpuset=/ mems_allowed=3 [7664708.514332] CPU: 47 PID: 3183 Comm: ll_ost_io03_035 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664708.527623] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664708.535448] Call Trace: [7664708.538081] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664708.543399] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664708.548893] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664708.554727] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664708.560483] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664708.566496] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664708.572855] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664708.578951] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664708.584703] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664708.591232] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664708.597765] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664708.603946] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664708.609957] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664708.616066] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664708.622952] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664708.630184] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664708.636351] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664708.643006] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664708.650095] [<ffffffffc11dcbe7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [7664708.656708] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664708.662453] [<ffffffffa00dca58>] ? __enqueue_entity+0x78/0x80 [7664708.668459] [<ffffffffa00e367f>] ? enqueue_entity+0x2ef/0xbe0 [7664708.674467] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664708.679995] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664708.687083] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664708.694832] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664708.702087] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664708.709953] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664708.716954] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664708.724907] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664708.732169] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664708.738644] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664708.746211] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664708.751271] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664708.757538] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664708.764158] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664708.770422] Mem-Info: [7664708.772883] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33918 inactive_file:33676 isolated_file:2528 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824025 slab_unreclaimable:62296538 mapped:1588 shmem:0 pagetables:1685 bounce:0 free:590404 free_pcp:0 free_cma:0 [7664708.807153] Node 3 Normal free:524884kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:43272kB inactive_file:42952kB unevictable:840kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369308kB kernel_stack:4208kB pagetables:1852kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1367054 all_unreclaimable? yes [7664708.853840] lowmem_reserve[]: 0 0 0 0 [7664708.857808] Node 3 Normal: 131209*4kB (UEM) 6*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524884kB [7664708.870708] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664708.879573] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664708.888178] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664708.897044] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664708.905651] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664708.914516] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664708.923123] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664708.931990] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664708.940593] 73669 total pagecache pages [7664708.944607] 0 pages in swap cache [7664708.948099] Swap cache stats: add 21120678, delete 21136650, find 4513424/7609912 [7664708.955754] Free swap = 3097196kB [7664708.959330] Total swap = 4194300kB [7664708.962914] 66993253 pages RAM [7664708.966144] 0 pages HighMem/MovableOnly [7664708.970156] 1101945 pages reserved [7664708.973735] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664708.981783] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664708.990742] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664708.999529] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664709.008119] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664709.016297] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664709.024907] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664709.033167] [53099] 0 53099 6670 239 18 649 0 smartd [7664709.041348] [53101] 0 53101 1910 64 9 172 0 mdadm [7664709.049442] [53104] 0 53104 74785 324 85 252 0 sssd [7664709.057444] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664709.065969] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664709.074923] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664709.083269] [53139] 997 53139 29446 250 28 128 0 chronyd [7664709.091541] [53178] 0 53178 76774 291 95 241 0 sssd_nss [7664709.099895] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664709.108248] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664709.117124] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664709.125134] [53969] 0 53969 31572 205 20 168 0 crond [7664709.133228] [54035] 0 54035 27526 164 10 33 0 agetty [7664709.141414] [54036] 0 54036 27526 158 11 33 0 agetty [7664709.149587] [54186] 0 54186 22934 210 46 272 0 master [7664709.157760] [54206] 89 54206 25545 272 47 271 0 qmgr [7664709.165873] [36317] 0 36317 28294 187 14 61 0 bash [7664709.173872] [36328] 0 36328 154746 223 201 98 0 journalctl [7664709.176216] LustreError: 8706:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c26709a4a00 [7664709.193251] [36329] 0 36329 28177 160 14 55 0 grep [7664709.194980] LustreError: 8763:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c3a125b9000 [7664709.212219] [76204] 89 76204 25501 252 46 282 0 pickup [7664709.220402] [97173] 0 97173 48653 264 49 261 0 crond [7664709.227002] LustreError: 90680:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c1dbbd31800 [7664709.233911] LustreError: 8712:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c4a49683200 [7664709.240251] LustreError: 90700:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c45bee18a00 [7664709.240262] LustreError: 90700:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c45bee18a00 [7664709.240273] LustreError: 90700:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c45bee18a00 [7664709.240284] LustreError: 90700:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c45bee18a00 [7664709.294083] [97872] 0 97872 48653 263 49 263 0 crond [7664709.302178] [98579] 0 98579 48653 266 49 235 0 crond [7664709.310268] [98713] 0 98713 30977 211 16 529 0 python3 [7664709.318532] [99292] 0 99292 48653 257 49 261 0 crond [7664709.326626] [99450] 0 99450 30913 208 18 446 0 python3 [7664709.334890] [99592] 89 99592 25538 229 47 273 0 cleanup [7664709.343152] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664709.352104] [100032] 0 100032 48653 266 49 240 0 crond [7664709.360364] [100105] 89 100105 25553 264 47 274 0 smtp [7664709.368543] [100203] 0 100203 30816 185 17 333 0 python3 [7664709.376978] Out of memory: Kill process 53099 (smartd) score 0 or sacrifice child [7664709.384637] Killed process 53099 (smartd) total-vm:26680kB, anon-rss:0kB, file-rss:956kB, shmem-rss:0kB [7664709.504023] ll_ost_io02_077 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664709.512467] ll_ost_io02_077 cpuset=/ mems_allowed=2 [7664709.517530] CPU: 34 PID: 83188 Comm: ll_ost_io02_077 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664709.530907] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664709.538733] Call Trace: [7664709.541368] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664709.546683] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664709.552181] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664709.558018] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664709.563766] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664709.569778] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664709.576141] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664709.582240] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664709.587988] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664709.594523] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664709.601059] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664709.607244] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664709.613251] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664709.619360] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664709.626242] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664709.633467] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664709.639637] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664709.646257] [<ffffffffa002a59e>] ? __switch_to+0xce/0x580 [7664709.651924] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664709.657678] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664709.664128] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664709.671006] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664709.676535] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664709.683629] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664709.691388] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664709.698653] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664709.706549] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664709.713560] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664709.721513] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664709.728776] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664709.735285] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664709.742860] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664709.747922] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664709.754192] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664709.760919] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664709.767201] Mem-Info: [7664709.769666] active_anon:0 inactive_anon:5 isolated_anon:0 active_file:34619 inactive_file:35308 isolated_file:2528 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824034 slab_unreclaimable:62296630 mapped:1588 shmem:0 pagetables:1649 bounce:0 free:590114 free_pcp:139 free_cma:0 [7664709.804144] Node 2 Normal free:525364kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:20kB active_file:31216kB inactive_file:36544kB unevictable:8680kB isolated(anon):0kB isolated(file):1920kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715188kB slab_unreclaimable:62476080kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:43099 all_unreclaimable? no [7664709.851134] lowmem_reserve[]: 0 0 0 0 [7664709.855112] Node 2 Normal: 27391*4kB (UEM) 40175*8kB (UEM) 892*16kB (UEM) 1676*32kB (UEM) 414*64kB (UEM) 2*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525620kB [7664709.871051] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664709.879919] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664709.888522] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664709.897398] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664709.906017] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664709.914925] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664709.923543] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664709.932421] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664709.941036] 73740 total pagecache pages [7664709.945083] 0 pages in swap cache [7664709.948584] Swap cache stats: add 21120686, delete 21136658, find 4513428/7609918 [7664709.956250] Free swap = 3099756kB [7664709.959833] Total swap = 4194300kB [7664709.963415] 66993253 pages RAM [7664709.966645] 0 pages HighMem/MovableOnly [7664709.970656] 1101945 pages reserved [7664709.974236] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664709.982296] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664709.991271] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664710.000066] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664710.008676] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664710.016868] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664710.025487] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664710.033783] [53101] 0 53101 1910 64 9 172 0 mdadm [7664710.041891] [53104] 0 53104 74785 324 85 253 0 sssd [7664710.049903] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664710.058441] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664710.067442] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664710.075800] [53139] 997 53139 29446 250 28 128 0 chronyd [7664710.084076] [53178] 0 53178 76774 291 95 241 0 sssd_nss [7664710.092441] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664710.100834] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664710.109716] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664710.117730] [53969] 0 53969 31572 205 20 168 0 crond [7664710.125834] [54035] 0 54035 27526 164 10 33 0 agetty [7664710.134044] [54036] 0 54036 27526 158 11 33 0 agetty [7664710.142219] [54186] 0 54186 22934 210 46 272 0 master [7664710.150392] [54206] 89 54206 25545 272 47 271 0 qmgr [7664710.158519] [36317] 0 36317 28294 187 14 61 0 bash [7664710.166523] [36328] 0 36328 154746 223 201 98 0 journalctl [7664710.175050] [36329] 0 36329 28177 160 14 55 0 grep [7664710.183158] [76204] 89 76204 25501 252 46 282 0 pickup [7664710.191347] [97173] 0 97173 48653 264 49 261 0 crond [7664710.199458] [97872] 0 97872 48653 263 49 263 0 crond [7664710.207560] [98579] 0 98579 48653 266 49 235 0 crond [7664710.215659] [98713] 0 98713 30977 211 16 529 0 python3 [7664710.223921] [99292] 0 99292 48653 257 49 261 0 crond [7664710.232018] [99450] 0 99450 30913 208 18 446 0 python3 [7664710.240291] [99592] 89 99592 25538 229 47 273 0 cleanup [7664710.248569] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664710.257543] [100032] 0 100032 48653 266 49 240 0 crond [7664710.265841] [100105] 89 100105 25553 264 47 274 0 smtp [7664710.274024] [100203] 0 100203 30816 185 17 333 0 python3 [7664710.282472] Out of memory: Kill process 98713 (python3) score 0 or sacrifice child [7664710.290218] Killed process 98713 (python3) total-vm:123908kB, anon-rss:0kB, file-rss:844kB, shmem-rss:0kB [7664710.498769] python3: page allocation failure: order:0, mode:0x200da [7664710.505220] CPU: 15 PID: 98713 Comm: python3 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664710.517907] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664710.525739] Call Trace: [7664710.528371] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664710.533693] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664710.539790] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664710.545364] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664710.551381] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664710.557915] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664710.564447] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664710.570280] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664710.576892] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664710.583157] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664710.589079] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664710.595092] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664710.601016] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664710.606939] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664710.612515] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664710.617834] Mem-Info: [7664710.620312] active_anon:0 inactive_anon:5 isolated_anon:0 active_file:34378 inactive_file:34621 isolated_file:2709 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824036 slab_unreclaimable:62296631 mapped:1588 shmem:0 pagetables:1649 bounce:0 free:590188 free_pcp:37 free_cma:0 [7664710.654671] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664710.696426] lowmem_reserve[]: 0 1418 63868 63868 [7664710.701354] Node 0 DMA32 free:261312kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:836kB inactive_file:2856kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686256kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:56975 all_unreclaimable? yes [7664710.746050] lowmem_reserve[]: 0 0 62450 62450 [7664710.750712] Node 0 Normal free:508120kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44724kB inactive_file:42264kB unevictable:168kB isolated(anon):0kB isolated(file):5248kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243532kB kernel_stack:5840kB pagetables:2252kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:283308 all_unreclaimable? yes [7664710.797574] lowmem_reserve[]: 0 0 0 0 [7664710.801544] Node 1 Normal free:525216kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17264kB inactive_file:14160kB unevictable:26488kB isolated(anon):0kB isolated(file):944kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411344kB kernel_stack:20816kB pagetables:1916kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1244662 all_unreclaimable? yes [7664710.848666] lowmem_reserve[]: 0 0 0 0 [7664710.852642] Node 2 Normal free:525500kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32880kB inactive_file:35600kB unevictable:8680kB isolated(anon):0kB isolated(file):2688kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715200kB slab_unreclaimable:62476084kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:661237 all_unreclaimable? yes [7664710.899676] lowmem_reserve[]: 0 0 0 0 [7664710.903646] Node 3 Normal free:524884kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:43280kB inactive_file:42944kB unevictable:840kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369308kB kernel_stack:4208kB pagetables:1852kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1367054 all_unreclaimable? yes [7664710.950337] lowmem_reserve[]: 0 0 0 0 [7664710.954308] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664710.969147] Node 0 DMA32: 440*4kB (UEM) 407*8kB (UEM) 1213*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261800kB [7664710.985642] Node 0 Normal: 6370*4kB (UEM) 5719*8kB (UEM) 3934*16kB (UEM) 4480*32kB (UEM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508256kB [7664711.002482] Node 1 Normal: 88044*4kB (UEM) 21630*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525216kB [7664711.015637] Node 2 Normal: 27361*4kB (UEM) 40163*8kB (UEM) 894*16kB (UEM) 1676*32kB (UEM) 414*64kB (UEM) 2*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525436kB [7664711.031585] Node 3 Normal: 131209*4kB (UEM) 6*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524884kB [7664711.044482] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664711.053346] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664711.061956] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664711.070835] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664711.079451] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664711.088319] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664711.095509] LustreError: 80409:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c29177a8400 [7664711.095570] LustreError: 90678:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(78730) req@ffff9c207d7cf050 x1659209236854592/t0(0) o4->541f81d4-bd4f-4@10.50.7.3@o2ib2:494/0 lens 488/448 e 0 to 0 dl 1583650744 ref 1 fl Interpret:/0/0 rc 0/0 [7664711.095598] Lustre: fir-OST001d: Bulk IO write error with 541f81d4-bd4f-4 (at 10.50.7.3@o2ib2), client will retry: rc = -110 [7664711.095600] Lustre: Skipped 1 previous similar message [7664711.147575] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664711.156449] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664711.165065] 73854 total pagecache pages [7664711.169087] 0 pages in swap cache [7664711.172587] Swap cache stats: add 21120687, delete 21136659, find 4513429/7609920 [7664711.180247] Free swap = 3099756kB [7664711.183827] Total swap = 4194300kB [7664711.187406] 66993253 pages RAM [7664711.190639] 0 pages HighMem/MovableOnly [7664711.194650] 1101945 pages reserved [7664711.256602] LustreError: 3109:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c1bafc1ae00 [7664711.611662] ll_ost_io02_088 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664711.620104] ll_ost_io02_088 cpuset=/ mems_allowed=2 [7664711.625172] CPU: 10 PID: 8667 Comm: ll_ost_io02_088 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664711.638457] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664711.646285] Call Trace: [7664711.648920] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664711.654237] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664711.659730] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664711.665571] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664711.671584] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664711.677936] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664711.684030] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664711.689775] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664711.696304] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664711.702838] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664711.709025] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664711.715031] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664711.721147] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664711.728031] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664711.734184] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664711.741277] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664711.747845] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664711.754497] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664711.761844] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664711.768589] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664711.775937] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664711.783454] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664711.790376] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664711.797462] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664711.805217] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664711.812472] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664711.820341] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664711.827315] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664711.832752] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664711.839234] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664711.846802] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664711.851862] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664711.858138] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664711.864758] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664711.871025] Mem-Info: [7664711.873483] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:32145 inactive_file:35045 isolated_file:4704 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824040 slab_unreclaimable:62296639 mapped:1588 shmem:0 pagetables:1633 bounce:0 free:590317 free_pcp:0 free_cma:0 [7664711.907757] Node 2 Normal free:525116kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32344kB inactive_file:39104kB unevictable:8680kB isolated(anon):0kB isolated(file):1408kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476084kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:139544 all_unreclaimable? yes [7664711.954796] lowmem_reserve[]: 0 0 0 0 [7664711.958762] Node 2 Normal: 27350*4kB (EM) 40156*8kB (UEM) 887*16kB (EM) 1667*32kB (EM) 414*64kB (EM) 2*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524936kB [7664711.974278] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664711.983145] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664711.991749] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664712.000618] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664712.009223] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664712.018090] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664712.026696] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664712.035560] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664712.044170] 73728 total pagecache pages [7664712.048190] 0 pages in swap cache [7664712.051682] Swap cache stats: add 21120697, delete 21136669, find 4513431/7609924 [7664712.059333] Free swap = 3101548kB [7664712.062912] Total swap = 4194300kB [7664712.066495] 66993253 pages RAM [7664712.069725] 0 pages HighMem/MovableOnly [7664712.073735] 1101945 pages reserved [7664712.077317] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664712.085364] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664712.094324] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664712.095569] LustreError: 80409:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c363824fe00 [7664712.114067] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664712.122663] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664712.130840] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664712.139452] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664712.147724] [53101] 0 53101 1910 64 9 172 0 mdadm [7664712.155813] [53104] 0 53104 74785 324 85 253 0 sssd [7664712.163815] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664712.172342] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664712.181296] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664712.189650] [53139] 997 53139 29446 250 28 128 0 chronyd [7664712.197919] [53178] 0 53178 76774 291 95 241 0 sssd_nss [7664712.206271] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664712.214617] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664712.223489] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664712.231494] [53969] 0 53969 31572 205 20 168 0 crond [7664712.239586] [54035] 0 54035 27526 164 10 33 0 agetty [7664712.247763] [54036] 0 54036 27526 158 11 33 0 agetty [7664712.255944] [54186] 0 54186 22934 210 46 273 0 master [7664712.264127] [54206] 89 54206 25545 272 47 271 0 qmgr [7664712.272245] [36317] 0 36317 28294 187 14 61 0 bash [7664712.280253] [36328] 0 36328 154746 223 201 98 0 journalctl [7664712.288780] [36329] 0 36329 28177 160 14 55 0 grep [7664712.296883] [76204] 89 76204 25501 252 46 282 0 pickup [7664712.305061] [97173] 0 97173 48653 264 49 262 0 crond [7664712.313153] [97872] 0 97872 48653 263 49 263 0 crond [7664712.321249] [98579] 0 98579 48653 266 49 235 0 crond [7664712.329343] [99292] 0 99292 48653 257 49 261 0 crond [7664712.337437] [99450] 0 99450 30913 208 18 446 0 python3 [7664712.345702] [99592] 89 99592 25538 229 47 273 0 cleanup [7664712.353962] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664712.362917] [100032] 0 100032 48653 266 49 240 0 crond [7664712.371183] [100105] 89 100105 25553 264 47 274 0 smtp [7664712.379358] [100203] 0 100203 30816 185 17 333 0 python3 [7664712.387798] Out of memory: Kill process 99450 (python3) score 0 or sacrifice child [7664712.395540] Killed process 99450 (python3) total-vm:123652kB, anon-rss:0kB, file-rss:832kB, shmem-rss:0kB [7664712.608236] python3: page allocation failure: order:0, mode:0x200da [7664712.614686] CPU: 11 PID: 99450 Comm: python3 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664712.627375] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664712.635209] Call Trace: [7664712.637839] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664712.643156] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664712.649253] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664712.654829] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664712.660840] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664712.667375] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664712.673907] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664712.679747] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664712.686360] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664712.692626] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664712.698546] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664712.704551] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664712.710470] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664712.716394] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664712.721970] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664712.727282] Mem-Info: [7664712.729759] active_anon:1 inactive_anon:1 isolated_anon:0 active_file:33092 inactive_file:37027 isolated_file:3680 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824040 slab_unreclaimable:62296639 mapped:1588 shmem:0 pagetables:1633 bounce:0 free:590181 free_pcp:0 free_cma:0 [7664712.764028] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664712.805780] lowmem_reserve[]: 0 1418 63868 63868 [7664712.810710] Node 0 DMA32 free:261344kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:964kB inactive_file:2888kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686256kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:258864 all_unreclaimable? yes [7664712.855494] lowmem_reserve[]: 0 0 62450 62450 [7664712.860161] Node 0 Normal free:507988kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44696kB inactive_file:45632kB unevictable:168kB isolated(anon):0kB isolated(file):4352kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243564kB kernel_stack:6304kB pagetables:2188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1102129 all_unreclaimable? yes [7664712.907115] lowmem_reserve[]: 0 0 0 0 [7664712.911089] Node 1 Normal free:525308kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16124kB inactive_file:17192kB unevictable:26488kB isolated(anon):0kB isolated(file):128kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411344kB kernel_stack:20816kB pagetables:1916kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:31567 all_unreclaimable? no [7664712.957950] lowmem_reserve[]: 0 0 0 0 [7664712.961919] Node 2 Normal free:525048kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32568kB inactive_file:39848kB unevictable:8680kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476084kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:453640 all_unreclaimable? yes [7664713.008868] lowmem_reserve[]: 0 0 0 0 [7664713.012845] Node 3 Normal free:525184kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:8kB active_file:42668kB inactive_file:41472kB unevictable:840kB isolated(anon):0kB isolated(file):6528kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369308kB kernel_stack:4208kB pagetables:1852kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:688829 all_unreclaimable? yes [7664713.059699] lowmem_reserve[]: 0 0 0 0 [7664713.063665] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664713.078504] Node 0 DMA32: 409*4kB (UEM) 407*8kB (UEM) 1213*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261676kB [7664713.095003] Node 0 Normal: 6448*4kB (UEM) 5720*8kB (UEM) 3934*16kB (UEM) 4479*32kB (EM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508544kB [7664713.111758] Node 1 Normal: 88065*4kB (UEM) 21631*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525308kB [7664713.124912] Node 2 Normal: 27350*4kB (EM) 40156*8kB (UEM) 887*16kB (EM) 1667*32kB (UEM) 414*64kB (UEM) 2*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524936kB [7664713.140601] Node 3 Normal: 131413*4kB (UM) 1*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525660kB [7664713.153324] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664713.162191] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664713.170797] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664713.179664] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664713.188268] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664713.197136] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664713.205740] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664713.214605] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664713.223213] 73713 total pagecache pages [7664713.227227] 0 pages in swap cache [7664713.230719] Swap cache stats: add 21120697, delete 21136669, find 4513431/7609924 [7664713.238371] Free swap = 3101548kB [7664713.241949] Total swap = 4194300kB [7664713.245532] 66993253 pages RAM [7664713.248762] 0 pages HighMem/MovableOnly [7664713.252773] 1101945 pages reserved [7664713.393655] crond invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [7664713.401232] crond cpuset=/ mems_allowed=0-3 [7664713.405600] CPU: 28 PID: 53969 Comm: crond Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664713.418113] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664713.425941] Call Trace: [7664713.428583] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664713.433904] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664713.439396] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664713.445232] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664713.451246] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664713.457600] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664713.463701] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664713.469456] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664713.475990] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664713.482515] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664713.488350] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664713.494971] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664713.501245] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664713.507165] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664713.513177] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664713.519098] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664713.525018] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664713.530592] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664713.535911] Mem-Info: [7664713.538388] active_anon:0 inactive_anon:8 isolated_anon:0 active_file:32941 inactive_file:35907 isolated_file:3424 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824040 slab_unreclaimable:62296640 mapped:1588 shmem:0 pagetables:1633 bounce:0 free:590269 free_pcp:0 free_cma:0 [7664713.572668] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664713.614432] lowmem_reserve[]: 0 1418 63868 63868 [7664713.619365] Node 0 DMA32 free:261308kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:4kB active_file:988kB inactive_file:2380kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686256kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:67919 all_unreclaimable? yes [7664713.664061] lowmem_reserve[]: 0 0 62450 62450 [7664713.668729] Node 0 Normal free:508240kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:45232kB inactive_file:44788kB unevictable:168kB isolated(anon):0kB isolated(file):6656kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243564kB kernel_stack:6144kB pagetables:2188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1236710 all_unreclaimable? yes [7664713.715681] lowmem_reserve[]: 0 0 0 0 [7664713.719656] Node 1 Normal free:525304kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16272kB inactive_file:17412kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411348kB kernel_stack:20816kB pagetables:1916kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:511474 all_unreclaimable? yes [7664713.766515] lowmem_reserve[]: 0 0 0 0 [7664713.770487] Node 2 Normal free:524928kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32396kB inactive_file:39948kB unevictable:8680kB isolated(anon):0kB isolated(file):256kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476084kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:373764 all_unreclaimable? yes [7664713.817436] lowmem_reserve[]: 0 0 0 0 [7664713.821413] Node 3 Normal free:525420kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:20kB active_file:39372kB inactive_file:41620kB unevictable:840kB isolated(anon):0kB isolated(file):2944kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369308kB kernel_stack:4208kB pagetables:1852kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:235096 all_unreclaimable? no [7664713.868278] lowmem_reserve[]: 0 0 0 0 [7664713.872247] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664713.887087] Node 0 DMA32: 399*4kB (UEM) 407*8kB (UEM) 1213*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261636kB [7664713.903579] Node 0 Normal: 6449*4kB (UEM) 5721*8kB (UEM) 3921*16kB (UEM) 4479*32kB (EM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508348kB [7664713.920333] Node 1 Normal: 88065*4kB (UEM) 21631*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525308kB [7664713.933488] Node 2 Normal: 27350*4kB (EM) 40156*8kB (UEM) 887*16kB (EM) 1667*32kB (EM) 414*64kB (EM) 2*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524936kB [7664713.949002] Node 3 Normal: 131394*4kB (UM) 7*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525632kB [7664713.961725] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664713.970592] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664713.979200] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664713.988067] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664713.996682] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664714.005557] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664714.014170] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664714.023038] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664714.031650] 73805 total pagecache pages [7664714.035663] 0 pages in swap cache [7664714.039155] Swap cache stats: add 21120703, delete 21136675, find 4513432/7609927 [7664714.046810] Free swap = 3103340kB [7664714.050395] Total swap = 4194300kB [7664714.053978] 66993253 pages RAM [7664714.057215] 0 pages HighMem/MovableOnly [7664714.061234] 1101945 pages reserved [7664714.064817] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664714.072866] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664714.081827] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664714.090622] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664714.099223] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664714.107404] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664714.116014] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664714.124276] [53101] 0 53101 1910 64 9 172 0 mdadm [7664714.132369] [53104] 0 53104 74785 324 85 253 0 sssd [7664714.140378] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664714.148907] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664714.157869] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664714.166221] [53139] 997 53139 29446 250 28 128 0 chronyd [7664714.174489] [53178] 0 53178 76774 291 95 241 0 sssd_nss [7664714.182845] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664714.191199] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664714.200077] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664714.208084] [53969] 0 53969 31572 205 20 168 0 crond [7664714.216175] [54035] 0 54035 27526 164 10 33 0 agetty [7664714.224349] [54036] 0 54036 27526 158 11 33 0 agetty [7664714.232533] [54186] 0 54186 22934 210 46 273 0 master [7664714.240714] [54206] 89 54206 25545 272 47 271 0 qmgr [7664714.248845] [36317] 0 36317 28294 187 14 61 0 bash [7664714.256851] [36328] 0 36328 154746 223 201 98 0 journalctl [7664714.265381] [36329] 0 36329 28177 160 14 55 0 grep [7664714.273507] [76204] 89 76204 25501 252 46 282 0 pickup [7664714.275609] LustreError: 36965:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c10a3347e00 [7664714.292632] [97173] 0 97173 48653 264 49 262 0 crond [7664714.300731] [97872] 0 97872 48653 263 49 263 0 crond [7664714.308827] [98579] 0 98579 48653 266 49 235 0 crond [7664714.316920] [99292] 0 99292 48653 257 49 261 0 crond [7664714.325013] [99592] 89 99592 25538 229 47 273 0 cleanup [7664714.333275] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664714.342240] [100032] 0 100032 48653 266 49 240 0 crond [7664714.350506] [100105] 89 100105 25553 264 47 274 0 smtp [7664714.358689] [100203] 0 100203 30816 185 17 333 0 python3 [7664714.367131] Out of memory: Kill process 53104 (sssd) score 0 or sacrifice child [7664714.374617] Killed process 53178 (sssd_nss) total-vm:307096kB, anon-rss:0kB, file-rss:1164kB, shmem-rss:0kB [7664714.442074] LustreError: 8680:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c2ca5ff6000 [7664714.804647] sssd_nss: page allocation failure: order:0, mode:0x200da [7664714.811188] CPU: 20 PID: 53178 Comm: sssd_nss Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664714.823959] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664714.831786] Call Trace: [7664714.834420] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664714.839736] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664714.845835] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664714.851416] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664714.857424] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664714.863957] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664714.870495] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664714.876338] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664714.882954] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664714.889221] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664714.895144] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664714.901152] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664714.907077] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664714.913001] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664714.918577] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664714.923896] Mem-Info: [7664714.926372] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:34423 inactive_file:35183 isolated_file:3350 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824040 slab_unreclaimable:62296661 mapped:1588 shmem:0 pagetables:1615 bounce:0 free:590131 free_pcp:0 free_cma:0 [7664714.960649] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664715.002406] lowmem_reserve[]: 0 1418 63868 63868 [7664715.007329] Node 0 DMA32 free:261188kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:764kB inactive_file:2552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686248kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:12559 all_unreclaimable? yes [7664715.052025] lowmem_reserve[]: 0 0 62450 62450 [7664715.056696] Node 0 Normal free:508116kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:40948kB inactive_file:42636kB unevictable:168kB isolated(anon):0kB isolated(file):5336kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243568kB kernel_stack:6352kB pagetables:2188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:274981 all_unreclaimable? no [7664715.103471] lowmem_reserve[]: 0 0 0 0 [7664715.107440] Node 1 Normal free:525304kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16272kB inactive_file:17412kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411348kB kernel_stack:20816kB pagetables:1916kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:511474 all_unreclaimable? yes [7664715.154302] lowmem_reserve[]: 0 0 0 0 [7664715.158271] Node 2 Normal free:524872kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32340kB inactive_file:35312kB unevictable:8680kB isolated(anon):0kB isolated(file):4608kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476148kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:873448 all_unreclaimable? yes [7664715.205300] lowmem_reserve[]: 0 0 0 0 [7664715.209275] Node 3 Normal free:525220kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:4kB active_file:40768kB inactive_file:42720kB unevictable:840kB isolated(anon):0kB isolated(file):2560kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369300kB kernel_stack:4208kB pagetables:1780kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:937047 all_unreclaimable? yes [7664715.256132] lowmem_reserve[]: 0 0 0 0 [7664715.260102] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664715.274951] Node 0 DMA32: 412*4kB (UEM) 408*8kB (UEM) 1213*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261696kB [7664715.291443] Node 0 Normal: 6482*4kB (UEM) 5721*8kB (UEM) 3935*16kB (UEM) 4480*32kB (UEM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508736kB [7664715.308284] Node 1 Normal: 88065*4kB (UEM) 21631*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525308kB [7664715.321438] Node 2 Normal: 27354*4kB (UEM) 40162*8kB (UEM) 887*16kB (EM) 1667*32kB (EM) 414*64kB (UEM) 2*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525000kB [7664715.337126] Node 3 Normal: 131302*4kB (UEM) 7*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525264kB [7664715.349936] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664715.358804] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664715.367420] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664715.376294] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664715.384911] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664715.393782] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664715.402391] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664715.411264] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664715.419876] 73858 total pagecache pages [7664715.423891] 0 pages in swap cache [7664715.427391] Swap cache stats: add 21120703, delete 21136675, find 4513432/7609927 [7664715.435043] Free swap = 3103340kB [7664715.438622] Total swap = 4194300kB [7664715.442204] 66993253 pages RAM [7664715.445434] 0 pages HighMem/MovableOnly [7664715.449448] 1101945 pages reserved [7664715.663305] ll_ost_io03_074 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664715.671745] ll_ost_io03_074 cpuset=/ mems_allowed=3 [7664715.676806] CPU: 47 PID: 90689 Comm: ll_ost_io03_074 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664715.690185] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664715.698012] Call Trace: [7664715.700644] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664715.705961] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664715.711459] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664715.717295] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664715.723043] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664715.729060] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664715.735421] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664715.741520] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664715.747275] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664715.753799] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664715.760328] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664715.766514] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664715.772523] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664715.778637] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664715.785522] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664715.792754] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664715.798919] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664715.805575] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664715.812626] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664715.818377] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664715.824824] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664715.831697] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664715.837229] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664715.844318] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664715.852074] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664715.859340] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664715.867206] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664715.874208] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664715.882154] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664715.889408] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664715.895883] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664715.903450] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664715.908502] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664715.914767] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664715.921382] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664715.927644] Mem-Info: [7664715.930104] active_anon:0 inactive_anon:5 isolated_anon:0 active_file:32726 inactive_file:35555 isolated_file:4000 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824040 slab_unreclaimable:62296632 mapped:1588 shmem:0 pagetables:1520 bounce:0 free:590258 free_pcp:0 free_cma:0 [7664715.964372] Node 3 Normal free:525216kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:4kB active_file:39828kB inactive_file:44528kB unevictable:840kB isolated(anon):0kB isolated(file):2048kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369296kB kernel_stack:4208kB pagetables:1780kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:973906 all_unreclaimable? yes [7664716.011238] lowmem_reserve[]: 0 0 0 0 [7664716.015203] Node 3 Normal: 131309*4kB (UEM) 7*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525292kB [7664716.028016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664716.036883] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664716.045488] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664716.054353] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664716.062961] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664716.071827] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664716.080434] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664716.089298] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664716.096816] LustreError: 8671:0:(ldlm_lib.c:3262:target_bulk_io()) @@@ network error on bulk READ req@ffff9c406e9db050 x1659467991482368/t0(0) o3->fb2c1382-8f5a-4@10.50.15.10@o2ib2:501/0 lens 488/440 e 1 to 0 dl 1583650751 ref 1 fl Interpret:/0/0 rc 0/0 [7664716.096819] LustreError: 8671:0:(ldlm_lib.c:3262:target_bulk_io()) Skipped 16 previous similar messages [7664716.096847] Lustre: fir-OST001b: Bulk IO read error with fb2c1382-8f5a-4 (at 10.50.15.10@o2ib2), client will retry: rc -110 [7664716.096848] Lustre: Skipped 5 previous similar messages [7664716.096879] Lustre: fir-OST0019: Bulk IO write error with c2ca4c5a-e67e-4 (at 10.50.5.43@o2ib2), client will retry: rc = -110 [7664716.158269] 73863 total pagecache pages [7664716.162281] 0 pages in swap cache [7664716.165774] Swap cache stats: add 21120710, delete 21136682, find 4513432/7609928 [7664716.173428] Free swap = 3104364kB [7664716.177006] Total swap = 4194300kB [7664716.180586] 66993253 pages RAM [7664716.183818] 0 pages HighMem/MovableOnly [7664716.187830] 1101945 pages reserved [7664716.191410] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664716.199463] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664716.208417] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664716.217211] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664716.225805] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664716.233985] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664716.242598] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664716.250862] [53101] 0 53101 1910 64 9 172 0 mdadm [7664716.258953] [53104] 0 53104 74785 324 85 253 0 sssd [7664716.266960] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664716.275479] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664716.284433] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664716.292787] [53139] 997 53139 29446 250 28 128 0 chronyd [7664716.301054] [53179] 0 53179 71689 280 85 232 0 sssd_pam [7664716.309403] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664716.318279] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664716.326285] [53969] 0 53969 31572 205 20 168 0 crond [7664716.334380] [54035] 0 54035 27526 164 10 33 0 agetty [7664716.342559] [54036] 0 54036 27526 158 11 33 0 agetty [7664716.350731] [54186] 0 54186 22934 210 46 273 0 master [7664716.358905] [54206] 89 54206 25545 272 47 271 0 qmgr [7664716.367025] [36317] 0 36317 28294 187 14 61 0 bash [7664716.375027] [36328] 0 36328 154746 223 201 98 0 journalctl [7664716.383555] [36329] 0 36329 28177 160 14 55 0 grep [7664716.391670] [76204] 89 76204 25501 252 46 282 0 pickup [7664716.399855] [97173] 0 97173 48653 264 49 262 0 crond [7664716.407945] [97872] 0 97872 48653 263 49 263 0 crond [7664716.416041] [98579] 0 98579 48653 266 49 235 0 crond [7664716.424135] [99292] 0 99292 48653 257 49 261 0 crond [7664716.432226] [99592] 89 99592 25538 229 47 273 0 cleanup [7664716.440488] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664716.449447] [100032] 0 100032 48653 266 49 240 0 crond [7664716.457708] [100105] 89 100105 25553 264 47 274 0 smtp [7664716.462376] LustreError: 119554:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c39d93b2e00 [7664716.476923] [100203] 0 100203 30816 185 17 334 0 python3 [7664716.485364] Out of memory: Kill process 53104 (sssd) score 0 or sacrifice child [7664716.492853] Killed process 53179 (sssd_pam) total-vm:286756kB, anon-rss:0kB, file-rss:1120kB, shmem-rss:0kB [7664716.665807] sssd_pam: page allocation failure: order:0, mode:0x201da [7664716.672344] CPU: 24 PID: 53179 Comm: sssd_pam Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664716.685115] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664716.692948] Call Trace: [7664716.695582] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664716.700900] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664716.706997] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664716.712572] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664716.718582] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664716.725112] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664716.731637] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664716.737817] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664716.743825] [<ffffffffa01ba3c8>] filemap_fault+0x298/0x490 [7664716.749604] [<ffffffffc05871c6>] ext4_filemap_fault+0x36/0x50 [ext4] [7664716.756223] [<ffffffffa01e593a>] __do_fault.isra.59+0x8a/0x100 [7664716.762318] [<ffffffffa01e5eec>] do_read_fault.isra.61+0x4c/0x1b0 [7664716.768669] [<ffffffffa01ea874>] handle_pte_fault+0x2f4/0xd10 [7664716.774677] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664716.780600] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664716.786522] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664716.792095] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664716.797408] Mem-Info: [7664716.799883] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:32885 inactive_file:34869 isolated_file:3936 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824040 slab_unreclaimable:62296641 mapped:1588 shmem:0 pagetables:1520 bounce:0 free:590476 free_pcp:0 free_cma:0 [7664716.834151] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664716.875897] lowmem_reserve[]: 0 1418 63868 63868 [7664716.880817] Node 0 DMA32 free:261344kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:772kB inactive_file:2096kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686280kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:225382 all_unreclaimable? yes [7664716.925593] lowmem_reserve[]: 0 0 62450 62450 [7664716.930263] Node 0 Normal free:508628kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:43552kB inactive_file:43108kB unevictable:168kB isolated(anon):0kB isolated(file):9600kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243584kB kernel_stack:5856kB pagetables:2188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1276697 all_unreclaimable? yes [7664716.977211] lowmem_reserve[]: 0 0 0 0 [7664716.981181] Node 1 Normal free:525172kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16876kB inactive_file:16324kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:120605 all_unreclaimable? yes [7664717.028055] lowmem_reserve[]: 0 0 0 0 [7664717.032029] Node 2 Normal free:525552kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:4kB active_file:30780kB inactive_file:36352kB unevictable:8680kB isolated(anon):0kB isolated(file):4480kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476076kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:837032 all_unreclaimable? yes [7664717.079080] lowmem_reserve[]: 0 0 0 0 [7664717.083048] Node 3 Normal free:525304kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:4kB active_file:41800kB inactive_file:43728kB unevictable:840kB isolated(anon):0kB isolated(file):1280kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369292kB kernel_stack:4208kB pagetables:1780kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:973906 all_unreclaimable? yes [7664717.129917] lowmem_reserve[]: 0 0 0 0 [7664717.133885] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664717.148723] Node 0 DMA32: 410*4kB (UEM) 408*8kB (UEM) 1213*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261688kB [7664717.165217] Node 0 Normal: 6491*4kB (UEM) 5723*8kB (UEM) 3921*16kB (UEM) 4479*32kB (EM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508532kB [7664717.181979] Node 1 Normal: 88012*4kB (UEM) 21632*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525120kB [7664717.195508] Node 2 Normal: 27414*4kB (EM) 40178*8kB (UEM) 891*16kB (UEM) 1669*32kB (UEM) 410*64kB (UEM) 1*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525112kB [7664717.211282] Node 3 Normal: 131309*4kB (UEM) 7*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525292kB [7664717.224110] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664717.232976] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664717.241579] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664717.250445] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664717.259053] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664717.267919] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664717.276531] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664717.285404] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664717.294015] 74017 total pagecache pages [7664717.298030] 0 pages in swap cache [7664717.301530] Swap cache stats: add 21120713, delete 21136685, find 4513434/7609933 [7664717.309180] Free swap = 3104308kB [7664717.312759] Total swap = 4194300kB [7664717.316343] 66993253 pages RAM [7664717.319580] 0 pages HighMem/MovableOnly [7664717.323592] 1101945 pages reserved [7664717.845502] ll_ost_io02_070 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664717.853947] ll_ost_io02_070 cpuset=/ mems_allowed=2 [7664717.859006] CPU: 18 PID: 83172 Comm: ll_ost_io02_070 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664717.872384] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664717.880211] Call Trace: [7664717.882843] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664717.888158] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664717.893646] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664717.899488] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664717.905245] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664717.911258] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664717.917620] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664717.923731] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664717.929484] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664717.936025] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664717.942552] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664717.948740] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664717.954755] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664717.960870] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664717.967754] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664717.974980] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664717.981142] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664717.987795] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664717.994849] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664718.000604] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664718.007050] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664718.013933] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664718.019468] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664718.026554] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664718.034306] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664718.041564] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664718.049432] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664718.056399] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664718.061841] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664718.068315] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664718.075885] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664718.080943] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664718.087217] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664718.093830] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664718.100093] Mem-Info: [7664718.102555] active_anon:0 inactive_anon:3 isolated_anon:0 active_file:33182 inactive_file:34986 isolated_file:3617 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824040 slab_unreclaimable:62296641 mapped:1588 shmem:0 pagetables:1520 bounce:0 free:590268 free_pcp:0 free_cma:0 [7664718.136830] Node 2 Normal free:524996kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:8kB active_file:31084kB inactive_file:38012kB unevictable:8680kB isolated(anon):0kB isolated(file):2688kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476076kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:397828 all_unreclaimable? yes [7664718.183866] lowmem_reserve[]: 0 0 0 0 [7664718.187834] Node 2 Normal: 27414*4kB (UEM) 40178*8kB (UEM) 891*16kB (UEM) 1670*32kB (UEM) 409*64kB (EM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525080kB [7664718.203610] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664718.212475] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664718.221081] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664718.229949] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664718.238553] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664718.247420] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664718.256027] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664718.264896] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664718.273507] 74017 total pagecache pages [7664718.277521] 0 pages in swap cache [7664718.281013] Swap cache stats: add 21120714, delete 21136686, find 4513434/7609934 [7664718.288668] Free swap = 3105332kB [7664718.292243] Total swap = 4194300kB [7664718.295825] 66993253 pages RAM [7664718.299055] 0 pages HighMem/MovableOnly [7664718.303069] 1101945 pages reserved [7664718.306647] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664718.314694] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664718.323646] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664718.332436] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664718.341037] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664718.349215] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664718.357827] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664718.366089] [53101] 0 53101 1910 64 9 172 0 mdadm [7664718.374181] [53104] 0 53104 74785 324 85 253 0 sssd [7664718.382184] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664718.390709] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664718.399661] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664718.408008] [53139] 997 53139 29446 250 28 128 0 chronyd [7664718.416269] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664718.425145] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664718.433152] [53969] 0 53969 31572 205 20 168 0 crond [7664718.441246] [54035] 0 54035 27526 164 10 33 0 agetty [7664718.449418] [54036] 0 54036 27526 158 11 33 0 agetty [7664718.457592] [54186] 0 54186 22934 210 46 273 0 master [7664718.465766] [54206] 89 54206 25545 272 47 271 0 qmgr [7664718.473890] [36317] 0 36317 28294 187 14 61 0 bash [7664718.481893] [36328] 0 36328 154746 223 201 98 0 journalctl [7664718.490411] [36329] 0 36329 28177 160 14 55 0 grep [7664718.498510] [76204] 89 76204 25501 252 46 282 0 pickup [7664718.506695] [97173] 0 97173 48653 264 49 262 0 crond [7664718.514786] [97872] 0 97872 48653 263 49 263 0 crond [7664718.522883] [98579] 0 98579 48653 266 49 236 0 crond [7664718.530983] [99292] 0 99292 48653 257 49 261 0 crond [7664718.539083] [99592] 89 99592 25538 229 47 273 0 cleanup [7664718.547351] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664718.556318] [100032] 0 100032 48653 266 49 240 0 crond [7664718.564592] [100105] 89 100105 25553 264 47 274 0 smtp [7664718.572774] [100203] 0 100203 30816 185 17 334 0 python3 [7664718.581224] Out of memory: Kill process 53104 (sssd) score 0 or sacrifice child [7664718.588720] Killed process 53104 (sssd) total-vm:299140kB, anon-rss:0kB, file-rss:1296kB, shmem-rss:0kB [7664718.624970] sssd: page allocation failure: order:0, mode:0x201da [7664718.631189] CPU: 38 PID: 53104 Comm: sssd Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664718.643618] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664718.651460] Call Trace: [7664718.654100] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664718.659418] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664718.665521] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664718.671107] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664718.677128] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664718.683662] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664718.690224] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664718.696413] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664718.702425] [<ffffffffa01ba3c8>] filemap_fault+0x298/0x490 [7664718.708211] [<ffffffffc05871c6>] ext4_filemap_fault+0x36/0x50 [ext4] [7664718.714864] [<ffffffffa01e593a>] __do_fault.isra.59+0x8a/0x100 [7664718.720963] [<ffffffffa01e5eec>] do_read_fault.isra.61+0x4c/0x1b0 [7664718.727327] [<ffffffffa01ea874>] handle_pte_fault+0x2f4/0xd10 [7664718.733339] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664718.739292] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664718.745223] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664718.750803] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664718.756124] Mem-Info: [7664718.758603] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:34029 inactive_file:35732 isolated_file:2898 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824040 slab_unreclaimable:62296649 mapped:1588 shmem:0 pagetables:1435 bounce:0 free:590344 free_pcp:0 free_cma:0 [7664718.792891] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664718.834648] lowmem_reserve[]: 0 1418 63868 63868 [7664718.839579] Node 0 DMA32 free:261336kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:772kB inactive_file:2020kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686280kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:126997 all_unreclaimable? no [7664718.884314] lowmem_reserve[]: 0 0 62450 62450 [7664718.888989] Node 0 Normal free:508756kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:45552kB inactive_file:42564kB unevictable:168kB isolated(anon):0kB isolated(file):7424kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243584kB kernel_stack:5984kB pagetables:1848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:808835 all_unreclaimable? no [7664718.935771] lowmem_reserve[]: 0 0 0 0 [7664718.939751] Node 1 Normal free:525172kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16896kB inactive_file:16456kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:120605 all_unreclaimable? yes [7664718.986628] lowmem_reserve[]: 0 0 0 0 [7664718.990599] Node 2 Normal free:524968kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:4kB active_file:31084kB inactive_file:38980kB unevictable:8680kB isolated(anon):0kB isolated(file):1736kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715204kB slab_unreclaimable:62476108kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:150759 all_unreclaimable? yes [7664719.037639] lowmem_reserve[]: 0 0 0 0 [7664719.041621] Node 3 Normal free:525240kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:4kB active_file:42120kB inactive_file:43440kB unevictable:840kB isolated(anon):0kB isolated(file):1024kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369292kB kernel_stack:4208kB pagetables:1780kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:973906 all_unreclaimable? yes [7664719.088530] lowmem_reserve[]: 0 0 0 0 [7664719.092507] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664719.107392] Node 0 DMA32: 408*4kB (UEM) 408*8kB (UEM) 1213*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261680kB [7664719.124002] Node 0 Normal: 6581*4kB (UEM) 5723*8kB (UEM) 3923*16kB (UEM) 4479*32kB (EM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508860kB [7664719.140818] Node 1 Normal: 88012*4kB (UEM) 21632*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525120kB [7664719.154404] Node 2 Normal: 27414*4kB (UEM) 40178*8kB (UEM) 891*16kB (UEM) 1670*32kB (UEM) 409*64kB (EM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525080kB [7664719.170319] Node 3 Normal: 131309*4kB (UEM) 7*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525292kB [7664719.183186] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664719.192068] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664719.200693] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664719.209586] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664719.218202] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664719.227076] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664719.235693] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664719.244569] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664719.253199] 74029 total pagecache pages [7664719.257215] 0 pages in swap cache [7664719.260705] Swap cache stats: add 21120714, delete 21136686, find 4513434/7609935 [7664719.268364] Free swap = 3105332kB [7664719.271949] Total swap = 4194300kB [7664719.275546] 66993253 pages RAM [7664719.278784] 0 pages HighMem/MovableOnly [7664719.282800] 1101945 pages reserved [7664719.650569] ll_ost_io01_029 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664719.659016] ll_ost_io01_029 cpuset=/ mems_allowed=1 [7664719.664080] CPU: 1 PID: 123076 Comm: ll_ost_io01_029 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664719.677459] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664719.685292] Call Trace: [7664719.687924] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664719.693239] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664719.698729] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664719.704570] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664719.710581] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664719.716935] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664719.723038] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664719.728793] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664719.735328] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664719.741862] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664719.748050] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664719.754063] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664719.760170] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664719.767056] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664719.773228] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664719.780331] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664719.786896] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664719.793545] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664719.800894] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664719.807636] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664719.814984] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664719.822504] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664719.829416] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664719.836505] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664719.844258] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664719.851515] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664719.859383] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664719.866352] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664719.871790] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664719.878265] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664719.885835] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664719.890894] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664719.897162] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664719.903781] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664719.910044] Mem-Info: [7664719.912506] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:34421 inactive_file:36346 isolated_file:2034 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824042 slab_unreclaimable:62296618 mapped:1588 shmem:0 pagetables:1435 bounce:0 free:590253 free_pcp:0 free_cma:0 [7664719.946778] Node 1 Normal free:525100kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17364kB inactive_file:16176kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:120605 all_unreclaimable? yes [7664719.993637] lowmem_reserve[]: 0 0 0 0 [7664719.997606] Node 1 Normal: 88012*4kB (UEM) 21632*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525120kB [7664720.011136] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664720.020003] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664720.028608] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664720.037474] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664720.038678] LustreError: 8709:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c2ff3c81a00 [7664720.056942] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664720.061624] LustreError: 3045:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c40016e1800 [7664720.076676] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664720.085289] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664720.094155] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664720.102760] 74038 total pagecache pages [7664720.106774] 0 pages in swap cache [7664720.110265] Swap cache stats: add 21120716, delete 21136688, find 4513435/7609937 [7664720.117918] Free swap = 3106356kB [7664720.121498] Total swap = 4194300kB [7664720.125077] 66993253 pages RAM [7664720.128309] 0 pages HighMem/MovableOnly [7664720.132323] 1101945 pages reserved [7664720.135901] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664720.143956] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664720.152917] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664720.161706] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664720.170292] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664720.178469] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664720.187083] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664720.195352] [53101] 0 53101 1910 64 9 172 0 mdadm [7664720.203458] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664720.211986] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664720.220942] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664720.229289] [53139] 997 53139 29446 250 28 128 0 chronyd [7664720.237552] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664720.246422] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664720.254422] [53969] 0 53969 31572 205 20 168 0 crond [7664720.262508] [54035] 0 54035 27526 164 10 33 0 agetty [7664720.270681] [54036] 0 54036 27526 158 11 33 0 agetty [7664720.278858] [54186] 0 54186 22934 210 46 273 0 master [7664720.287034] [54206] 89 54206 25545 272 47 271 0 qmgr [7664720.295124] [36317] 0 36317 28294 187 14 61 0 bash [7664720.303130] [36328] 0 36328 154746 223 201 98 0 journalctl [7664720.311657] [36329] 0 36329 28177 160 14 55 0 grep [7664720.319765] [76204] 89 76204 25501 252 46 282 0 pickup [7664720.327947] [97173] 0 97173 48653 264 49 262 0 crond [7664720.336038] [97872] 0 97872 48653 263 49 263 0 crond [7664720.344137] [98579] 0 98579 48653 266 49 236 0 crond [7664720.352230] [99292] 0 99292 48653 257 49 261 0 crond [7664720.360322] [99592] 89 99592 25538 229 47 273 0 cleanup [7664720.368581] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664720.377534] [100032] 0 100032 48653 266 49 240 0 crond [7664720.385791] [100105] 89 100105 25553 264 47 274 0 smtp [7664720.393966] [100203] 0 100203 30816 185 17 334 0 python3 [7664720.402407] Out of memory: Kill process 54206 (qmgr) score 0 or sacrifice child [7664720.409884] Killed process 54206 (qmgr) total-vm:102180kB, anon-rss:0kB, file-rss:1088kB, shmem-rss:0kB [7664720.752886] ll_ost_io00_081 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664720.761328] ll_ost_io00_081 cpuset=/ mems_allowed=0 [7664720.766387] CPU: 12 PID: 90690 Comm: ll_ost_io00_081 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664720.779766] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664720.787590] Call Trace: [7664720.790228] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664720.795540] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664720.801029] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664720.806869] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664720.812624] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664720.818638] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664720.824992] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664720.831091] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664720.836837] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664720.843364] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664720.849890] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664720.856070] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664720.862074] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664720.868189] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664720.875075] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664720.881223] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664720.888320] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664720.894890] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664720.901536] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664720.908882] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664720.915621] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664720.922971] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664720.930497] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664720.937414] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664720.944500] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664720.952253] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664720.959509] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664720.967376] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664720.974370] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664720.982316] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664720.989571] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664720.996054] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664721.003623] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664721.008682] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664721.014950] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664721.021568] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664721.027838] Mem-Info: [7664721.030310] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:34598 inactive_file:35915 isolated_file:3581 unevictable:9044 dirty:0 writeback:10 unstable:0 slab_reclaimable:824042 slab_unreclaimable:62296621 mapped:1588 shmem:0 pagetables:1350 bounce:0 free:590093 free_pcp:0 free_cma:0 [7664721.064666] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664721.097855] LustreError: 90700:0:(ldlm_lib.c:3262:target_bulk_io()) @@@ network error on bulk READ req@ffff9c467d0ab050 x1659494269050880/t0(0) o3->164e843a-d84a-4@10.50.5.36@o2ib2:518/0 lens 488/440 e 2 to 0 dl 1583650768 ref 1 fl Interpret:/0/0 rc 0/0 [7664721.097858] LustreError: 90700:0:(ldlm_lib.c:3262:target_bulk_io()) Skipped 1 previous similar message [7664721.097957] Lustre: fir-OST001b: Bulk IO write error with c2ca4c5a-e67e-4 (at 10.50.5.43@o2ib2), client will retry: rc = -110 [7664721.150010] lowmem_reserve[]: 0 1418 63868 63868 [7664721.154932] Node 0 DMA32 free:261268kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:804kB inactive_file:2236kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:4kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686220kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:53673 all_unreclaimable? yes [7664721.187587] LustreError: 6894:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c4a91670200 [7664721.195610] LustreError: 6894:0:(ldlm_lib.c:3246:target_bulk_io()) @@@ timeout on bulk READ after -5+5s req@ffff9c506a930850 x1659467991533568/t0(0) o3->fb2c1382-8f5a-4@10.50.15.10@o2ib2:487/0 lens 488/440 e 0 to 0 dl 1583650737 ref 2 fl Interpret:/0/0 rc 0/0 [7664721.195613] LustreError: 6894:0:(ldlm_lib.c:3246:target_bulk_io()) Skipped 1 previous similar message [7664721.195645] Lustre: 6894:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (45:5s); client may timeout. req@ffff9c506a930850 x1659467991533568/t0(0) o3->fb2c1382-8f5a-4@10.50.15.10@o2ib2:487/0 lens 488/440 e 0 to 0 dl 1583650737 ref 2 fl Complete:/0/ffffffff rc -110/-1 [7664721.270463] lowmem_reserve[]: 0 0 62450 62450 [7664721.275133] Node 0 Normal free:509008kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:4kB active_file:40288kB inactive_file:43784kB unevictable:168kB isolated(anon):0kB isolated(file):3968kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:4kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243556kB kernel_stack:5984kB pagetables:1848kB unstable:0kB bounce:0kB free_pcp:176kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:70135 all_unreclaimable? no [7664721.287912] LustreError: 119547:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c433342ba00 [7664721.287921] LustreError: 119547:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c433342ba00 [7664721.344051] lowmem_reserve[]: 0 0 0 0 [7664721.348019] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664721.362855] Node 0 DMA32: 398*4kB (UEM) 409*8kB (UEM) 1214*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261664kB [7664721.379350] Node 0 Normal: 6477*4kB (UEM) 5812*8kB (UEM) 3930*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 509300kB [7664721.396193] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664721.405064] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664721.413671] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664721.422537] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664721.431142] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664721.440010] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664721.448614] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664721.457481] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664721.466088] 74015 total pagecache pages [7664721.470101] 0 pages in swap cache [7664721.473592] Swap cache stats: add 21120722, delete 21136694, find 4513438/7609944 [7664721.481244] Free swap = 3107636kB [7664721.484825] Total swap = 4194300kB [7664721.488405] 66993253 pages RAM [7664721.491636] 0 pages HighMem/MovableOnly [7664721.495648] 1101945 pages reserved [7664721.499227] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664721.507276] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664721.516237] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664721.525032] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664721.533628] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664721.541804] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664721.550418] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664721.558687] [53101] 0 53101 1910 64 9 172 0 mdadm [7664721.566781] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664721.575307] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664721.584260] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664721.592608] [53139] 997 53139 29446 250 28 128 0 chronyd [7664721.600879] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664721.609752] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664721.617759] [53969] 0 53969 31572 205 20 168 0 crond [7664721.625851] [54035] 0 54035 27526 164 10 33 0 agetty [7664721.634026] [54036] 0 54036 27526 158 11 33 0 agetty [7664721.642199] [54186] 0 54186 22934 210 46 273 0 master [7664721.650494] [36317] 0 36317 28294 187 14 61 0 bash [7664721.658500] [36328] 0 36328 154746 223 201 98 0 journalctl [7664721.667020] [36329] 0 36329 28177 160 14 55 0 grep [7664721.675127] [76204] 89 76204 25501 251 46 283 0 pickup [7664721.683310] [97173] 0 97173 48653 264 49 262 0 crond [7664721.691403] [97872] 0 97872 48653 263 49 264 0 crond [7664721.699496] [98579] 0 98579 48653 266 49 236 0 crond [7664721.707591] [99292] 0 99292 48653 257 49 261 0 crond [7664721.715686] [99592] 89 99592 25538 229 47 273 0 cleanup [7664721.723952] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664721.732906] [100032] 0 100032 48653 266 49 240 0 crond [7664721.741172] [100105] 89 100105 25553 264 47 274 0 smtp [7664721.749345] [100203] 0 100203 30816 185 17 335 0 python3 [7664721.757776] Out of memory: Kill process 100105 (smtp) score 0 or sacrifice child [7664721.765342] Killed process 100105 (smtp) total-vm:102212kB, anon-rss:0kB, file-rss:1056kB, shmem-rss:0kB [7664721.806534] ll_ost_io00_081 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664721.814972] ll_ost_io00_081 cpuset=/ mems_allowed=0 [7664721.820036] CPU: 36 PID: 90690 Comm: ll_ost_io00_081 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664721.833412] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664721.841238] Call Trace: [7664721.843872] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664721.849185] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664721.854678] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664721.860516] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664721.866270] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664721.872284] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664721.878635] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664721.884729] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664721.890477] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664721.897008] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664721.903538] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664721.909725] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664721.915730] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664721.921837] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664721.928723] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664721.934871] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664721.941964] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664721.948523] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664721.955162] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664721.962505] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664721.969239] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664721.976579] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664721.984099] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664721.991016] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664721.998104] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664722.005855] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664722.013112] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664722.020980] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664722.027972] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664722.035919] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664722.043175] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664722.049650] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664722.057226] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664722.062284] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664722.068551] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664722.075164] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664722.081427] Mem-Info: [7664722.083894] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:33366 inactive_file:35866 isolated_file:3136 unevictable:9044 dirty:0 writeback:9 unstable:0 slab_reclaimable:824042 slab_unreclaimable:62296614 mapped:1588 shmem:0 pagetables:1350 bounce:0 free:590438 free_pcp:0 free_cma:0 [7664722.118163] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664722.159919] lowmem_reserve[]: 0 1418 63868 63868 [7664722.164847] Node 0 DMA32 free:261332kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:816kB inactive_file:2316kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:4kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686224kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:51063 all_unreclaimable? yes [7664722.209546] lowmem_reserve[]: 0 0 62450 62450 [7664722.214212] Node 0 Normal free:508688kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:43396kB inactive_file:46968kB unevictable:168kB isolated(anon):0kB isolated(file):4864kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:4kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243556kB kernel_stack:5984kB pagetables:1848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1515561 all_unreclaimable? yes [7664722.245705] LustreError: 123083:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c16417cd800 [7664722.272200] lowmem_reserve[]: 0 0 0 0 [7664722.276170] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664722.291007] Node 0 DMA32: 393*4kB (UEM) 409*8kB (UEM) 1214*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261644kB [7664722.307499] Node 0 Normal: 6383*4kB (UEM) 5790*8kB (UEM) 3923*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508636kB [7664722.324345] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664722.333216] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664722.341824] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664722.350688] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664722.359295] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664722.368161] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664722.376770] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664722.385640] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664722.394249] 73903 total pagecache pages [7664722.398260] 0 pages in swap cache [7664722.401754] Swap cache stats: add 21120724, delete 21136696, find 4513439/7609946 [7664722.409405] Free swap = 3108660kB [7664722.412983] Total swap = 4194300kB [7664722.416565] 66993253 pages RAM [7664722.419796] 0 pages HighMem/MovableOnly [7664722.423808] 1101945 pages reserved [7664722.427388] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664722.435434] [ 5686] 0 5686 16012 237 39 105 0 systemd-journal [7664722.444386] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664722.453177] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664722.461773] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664722.469957] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664722.478568] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664722.486835] [53101] 0 53101 1910 64 9 172 0 mdadm [7664722.494923] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664722.503450] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664722.512404] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664722.520759] [53139] 997 53139 29446 250 28 128 0 chronyd [7664722.529028] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664722.537902] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664722.545913] [53969] 0 53969 31572 205 20 168 0 crond [7664722.554003] [54035] 0 54035 27526 164 10 33 0 agetty [7664722.562176] [54036] 0 54036 27526 158 11 33 0 agetty [7664722.570350] [54186] 0 54186 22934 210 46 273 0 master [7664722.578641] [36317] 0 36317 28294 187 14 61 0 bash [7664722.586642] [36328] 0 36328 154746 223 201 98 0 journalctl [7664722.595163] [36329] 0 36329 28177 160 14 55 0 grep [7664722.603270] [76204] 89 76204 25501 251 46 283 0 pickup [7664722.611454] [97173] 0 97173 48653 264 49 262 0 crond [7664722.619543] [97872] 0 97872 48653 263 49 264 0 crond [7664722.627639] [98579] 0 98579 48653 266 49 236 0 crond [7664722.635735] [99292] 0 99292 48653 257 49 261 0 crond [7664722.643826] [99592] 89 99592 25538 229 47 273 0 cleanup [7664722.652088] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664722.661047] [100032] 0 100032 48653 266 49 240 0 crond [7664722.669306] [100203] 0 100203 30816 185 17 335 0 python3 [7664722.677738] Out of memory: Kill process 76204 (pickup) score 0 or sacrifice child [7664722.685389] Killed process 76204 (pickup) total-vm:102004kB, anon-rss:0kB, file-rss:1004kB, shmem-rss:0kB [7664723.140797] pickup: page allocation failure: order:0, mode:0x200da [7664723.147160] CPU: 16 PID: 76204 Comm: pickup Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664723.159761] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664723.167594] Call Trace: [7664723.170226] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664723.175544] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664723.181643] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664723.187217] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664723.193231] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664723.199756] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664723.206285] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664723.212124] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664723.218736] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664723.225002] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664723.230926] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664723.236934] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664723.242855] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664723.248775] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664723.254348] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664723.259659] Mem-Info: [7664723.262136] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33358 inactive_file:34610 isolated_file:3744 unevictable:9044 dirty:0 writeback:9 unstable:0 slab_reclaimable:824043 slab_unreclaimable:62296614 mapped:1589 shmem:0 pagetables:1350 bounce:0 free:590279 free_pcp:0 free_cma:0 [7664723.296402] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664723.338154] lowmem_reserve[]: 0 1418 63868 63868 [7664723.343077] Node 0 DMA32 free:261320kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:828kB inactive_file:2460kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:4kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686224kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:49546 all_unreclaimable? no [7664723.387693] lowmem_reserve[]: 0 0 62450 62450 [7664723.392358] Node 0 Normal free:508408kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:45144kB inactive_file:47292kB unevictable:168kB isolated(anon):0kB isolated(file):2560kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:4kB mapped:172kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243556kB kernel_stack:5856kB pagetables:1848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:469191 all_unreclaimable? yes [7664723.439222] lowmem_reserve[]: 0 0 0 0 [7664723.443197] Node 1 Normal free:525244kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17036kB inactive_file:16396kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:72968 all_unreclaimable? yes [7664723.489976] lowmem_reserve[]: 0 0 0 0 [7664723.493951] Node 2 Normal free:524876kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32160kB inactive_file:36052kB unevictable:8680kB isolated(anon):0kB isolated(file):3072kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:20kB mapped:5332kB shmem:0kB slab_reclaimable:715216kB slab_unreclaimable:62476080kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:853313 all_unreclaimable? yes [7664723.541077] lowmem_reserve[]: 0 0 0 0 [7664723.545058] Node 3 Normal free:525368kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:4kB active_file:35180kB inactive_file:34176kB unevictable:840kB isolated(anon):0kB isolated(file):13568kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:8kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369264kB kernel_stack:4208kB pagetables:1440kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1115845 all_unreclaimable? yes [7664723.592090] lowmem_reserve[]: 0 0 0 0 [7664723.596059] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664723.610895] Node 0 DMA32: 391*4kB (UEM) 409*8kB (UEM) 1214*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261636kB [7664723.627391] Node 0 Normal: 6386*4kB (UEM) 5791*8kB (UEM) 3938*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508896kB [7664723.644228] Node 1 Normal: 88055*4kB (UEM) 21632*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525292kB [7664723.657758] Node 2 Normal: 27350*4kB (EM) 40188*8kB (EM) 906*16kB (UEM) 1672*32kB (UEM) 409*64kB (EM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525208kB [7664723.673356] Node 3 Normal: 131478*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525912kB [7664723.685796] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664723.694664] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664723.703269] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664723.712138] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664723.720748] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664723.729615] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664723.738223] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664723.747087] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664723.755692] 74066 total pagecache pages [7664723.759707] 0 pages in swap cache [7664723.763198] Swap cache stats: add 21120724, delete 21136696, find 4513439/7609946 [7664723.770851] Free swap = 3108660kB [7664723.774431] Total swap = 4194300kB [7664723.778012] 66993253 pages RAM [7664723.781242] 0 pages HighMem/MovableOnly [7664723.785256] 1101945 pages reserved [7664724.197228] ll_ost_io00_068 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664724.205666] ll_ost_io00_068 cpuset=/ mems_allowed=0 [7664724.210729] CPU: 32 PID: 96096 Comm: ll_ost_io00_068 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664724.224106] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664724.231932] Call Trace: [7664724.234569] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664724.239882] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664724.245372] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664724.251214] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664724.256965] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664724.262971] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664724.269322] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664724.275414] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664724.281161] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664724.287693] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664724.294223] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664724.300402] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664724.306409] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664724.312518] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664724.319401] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664724.325552] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664724.332646] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664724.339177] [<ffffffffa021bd89>] ? ___slab_alloc+0x209/0x4f0 [7664724.345130] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664724.351770] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664724.359112] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664724.365855] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664724.373202] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664724.380721] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664724.387638] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664724.394724] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664724.402476] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664724.409734] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664724.417594] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664724.424564] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664724.430005] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664724.436480] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664724.444049] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664724.449106] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664724.455376] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664724.461993] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664724.468257] Mem-Info: [7664724.470724] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:35173 inactive_file:34551 isolated_file:4201 unevictable:9044 dirty:0 writeback:9 unstable:0 slab_reclaimable:824043 slab_unreclaimable:62296614 mapped:1589 shmem:0 pagetables:1350 bounce:0 free:590173 free_pcp:0 free_cma:0 [7664724.505003] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664724.546754] lowmem_reserve[]: 0 1418 63868 63868 [7664724.551677] Node 0 DMA32 free:261320kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:836kB inactive_file:3032kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:4kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686224kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:83449 all_unreclaimable? yes [7664724.596372] lowmem_reserve[]: 0 0 62450 62450 [7664724.601034] Node 0 Normal free:508052kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:8kB active_file:42912kB inactive_file:43176kB unevictable:168kB isolated(anon):0kB isolated(file):9856kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:4kB mapped:172kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243556kB kernel_stack:5856kB pagetables:1848kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1788581 all_unreclaimable? yes [7664724.642808] LustreError: 8713:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c184e9f9400 [7664724.658832] lowmem_reserve[]: 0 0 0 0 [7664724.662800] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664724.677638] Node 0 DMA32: 390*4kB (UEM) 409*8kB (UEM) 1214*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261632kB [7664724.694130] Node 0 Normal: 6481*4kB (UEM) 5792*8kB (UEM) 3938*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 509284kB [7664724.710973] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664724.719838] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664724.728444] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664724.737311] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664724.745917] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664724.754784] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664724.763387] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664724.772265] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664724.780869] 74237 total pagecache pages [7664724.784883] 1 pages in swap cache [7664724.788374] Swap cache stats: add 21120731, delete 21136702, find 4513442/7609953 [7664724.796026] Free swap = 3109676kB [7664724.799605] Total swap = 4194300kB [7664724.803187] 66993253 pages RAM [7664724.806418] 0 pages HighMem/MovableOnly [7664724.810430] 1101945 pages reserved [7664724.814013] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664724.822058] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664724.831021] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664724.839815] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664724.848411] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664724.856587] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664724.865202] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664724.873467] [53101] 0 53101 1910 64 9 172 0 mdadm [7664724.881562] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664724.890081] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664724.899034] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664724.907381] [53139] 997 53139 29446 250 28 128 0 chronyd [7664724.915650] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664724.924526] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664724.932535] [53969] 0 53969 31572 205 20 168 0 crond [7664724.940627] [54035] 0 54035 27526 164 10 33 0 agetty [7664724.948809] [54036] 0 54036 27526 158 11 33 0 agetty [7664724.956989] [54186] 0 54186 22934 209 46 274 0 master [7664724.965278] [36317] 0 36317 28294 187 14 61 0 bash [7664724.973281] [36328] 0 36328 154746 223 201 98 0 journalctl [7664724.981801] [36329] 0 36329 28177 160 14 55 0 grep [7664724.989916] [97173] 0 97173 48653 264 49 262 0 crond [7664724.998010] [97872] 0 97872 48653 263 49 264 0 crond [7664725.006106] [98579] 0 98579 48653 266 49 236 0 crond [7664725.014199] [99292] 0 99292 48653 257 49 261 0 crond [7664725.022294] [99592] 89 99592 25538 229 47 273 0 cleanup [7664725.030561] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664725.039523] [100032] 0 100032 48653 266 49 240 0 crond [7664725.047794] [100203] 0 100203 30816 185 17 335 0 python3 [7664725.056232] Out of memory: Kill process 97872 (crond) score 0 or sacrifice child [7664725.063798] Killed process 97872 (crond) total-vm:194612kB, anon-rss:0kB, file-rss:1052kB, shmem-rss:0kB [7664725.151815] crond: page allocation failure: order:0, mode:0x200da [7664725.158088] CPU: 16 PID: 97872 Comm: crond Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664725.170600] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664725.178431] Call Trace: [7664725.181065] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664725.186384] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664725.192481] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664725.198058] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664725.204070] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664725.210597] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664725.217131] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664725.222965] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664725.229587] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664725.235861] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664725.241786] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664725.247793] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664725.253721] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664725.259640] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664725.265213] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664725.270524] Mem-Info: [7664725.273003] active_anon:0 inactive_anon:6 isolated_anon:0 active_file:33644 inactive_file:35741 isolated_file:4222 unevictable:9044 dirty:0 writeback:9 unstable:0 slab_reclaimable:824043 slab_unreclaimable:62296614 mapped:1590 shmem:0 pagetables:1303 bounce:0 free:590130 free_pcp:0 free_cma:0 [7664725.307278] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664725.349029] lowmem_reserve[]: 0 1418 63868 63868 [7664725.353952] Node 0 DMA32 free:261312kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:844kB inactive_file:2572kB unevictable:0kB isolated(anon):0kB isolated(file):512kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:4kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686224kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:186335 all_unreclaimable? no [7664725.398824] lowmem_reserve[]: 0 0 62450 62450 [7664725.403490] Node 0 Normal free:508460kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:16kB active_file:44368kB inactive_file:47692kB unevictable:168kB isolated(anon):0kB isolated(file):4352kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:4kB mapped:172kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243556kB kernel_stack:5856kB pagetables:1836kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:391143 all_unreclaimable? yes [7664725.450439] lowmem_reserve[]: 0 0 0 0 [7664725.454410] Node 1 Normal free:525244kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17052kB inactive_file:16380kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:72968 all_unreclaimable? yes [7664725.501193] lowmem_reserve[]: 0 0 0 0 [7664725.505163] Node 2 Normal free:524868kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32500kB inactive_file:38684kB unevictable:8680kB isolated(anon):0kB isolated(file):1152kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:20kB mapped:5332kB shmem:0kB slab_reclaimable:715216kB slab_unreclaimable:62476080kB kernel_stack:7760kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:411619 all_unreclaimable? yes [7664725.552285] lowmem_reserve[]: 0 0 0 0 [7664725.556260] Node 3 Normal free:524708kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:8kB active_file:40344kB inactive_file:38176kB unevictable:840kB isolated(anon):0kB isolated(file):9208kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:8kB mapped:852kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369264kB kernel_stack:4208kB pagetables:1264kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:607905 all_unreclaimable? yes [7664725.603121] lowmem_reserve[]: 0 0 0 0 [7664725.607088] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664725.621926] Node 0 DMA32: 389*4kB (UEM) 409*8kB (UEM) 1214*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261628kB [7664725.638420] Node 0 Normal: 6481*4kB (UEM) 5792*8kB (UEM) 3938*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 509284kB [7664725.655260] Node 1 Normal: 88055*4kB (UEM) 21632*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525292kB [7664725.668787] Node 2 Normal: 27350*4kB (EM) 40188*8kB (EM) 906*16kB (UEM) 1672*32kB (UEM) 409*64kB (EM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525208kB [7664725.684388] Node 3 Normal: 131237*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524948kB [7664725.696739] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664725.705606] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664725.714221] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664725.723087] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664725.731692] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664725.740559] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664725.749166] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664725.749749] Lustre: 123039:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-10s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff9c506a930850 x1659467991533568/t0(0) o3->fb2c1382-8f5a-4@10.50.15.10@o2ib2:487/0 lens 488/440 e 0 to 0 dl 1583650737 ref 1 fl Complete:/0/ffffffff rc -110/-1 [7664725.788229] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664725.796842] 74261 total pagecache pages [7664725.800855] 0 pages in swap cache [7664725.804346] Swap cache stats: add 21120731, delete 21136703, find 4513442/7609953 [7664725.811998] Free swap = 3109676kB [7664725.815578] Total swap = 4194300kB [7664725.819159] 66993253 pages RAM [7664725.822391] 0 pages HighMem/MovableOnly [7664725.826402] 1101945 pages reserved [7664726.008327] ll_ost_io01_005 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [7664726.016511] ll_ost_io01_005 cpuset=/ mems_allowed=1 [7664726.021574] CPU: 21 PID: 119516 Comm: ll_ost_io01_005 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664726.035042] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664726.042875] Call Trace: [7664726.045516] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664726.050834] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664726.056329] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664726.062167] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664726.067915] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664726.073926] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664726.080280] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664726.086376] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664726.092127] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664726.097982] LustreError: 3109:0:(ldlm_lib.c:3262:target_bulk_io()) @@@ network error on bulk READ req@ffff9c3f875a2050 x1659489523086912/t0(0) o3->f7fe261e-a413-4@10.49.28.2@o2ib1:520/0 lens 488/440 e 2 to 0 dl 1583650770 ref 1 fl Interpret:/0/0 rc 0/0 [7664726.097986] LustreError: 3109:0:(ldlm_lib.c:3262:target_bulk_io()) Skipped 4 previous similar messages [7664726.130688] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664726.137224] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664726.143480] [<ffffffffc124293f>] tgt_checksum_niobuf_rw+0xbf/0xe00 [ptlrpc] [7664726.150741] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664726.158072] [<ffffffffc0cb71e0>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [7664726.164900] [<ffffffffc1247325>] tgt_brw_read+0xc35/0x1e50 [ptlrpc] [7664726.171466] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664726.178103] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664726.185454] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664726.192802] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664726.200324] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664726.207237] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664726.214325] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664726.222077] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664726.229332] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664726.237193] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664726.244162] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664726.249606] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664726.256087] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664726.263662] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664726.268722] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664726.274989] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664726.281608] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664726.287875] Mem-Info: [7664726.290334] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:32627 inactive_file:36148 isolated_file:4093 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824044 slab_unreclaimable:62296614 mapped:1589 shmem:0 pagetables:1161 bounce:0 free:590278 free_pcp:0 free_cma:0 [7664726.324600] Node 1 Normal free:525316kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16768kB inactive_file:16488kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:497018 all_unreclaimable? yes [7664726.371467] lowmem_reserve[]: 0 0 0 0 [7664726.375432] Node 1 Normal: 88058*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525320kB [7664726.388965] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664726.397832] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664726.406446] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664726.415310] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664726.423920] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664726.432791] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664726.441398] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664726.450263] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664726.458870] 74410 total pagecache pages [7664726.462884] 0 pages in swap cache [7664726.466375] Swap cache stats: add 21120733, delete 21136705, find 4513443/7609955 [7664726.474028] Free swap = 3110188kB [7664726.477605] Total swap = 4194300kB [7664726.481188] 66993253 pages RAM [7664726.484419] 0 pages HighMem/MovableOnly [7664726.488431] 1101945 pages reserved [7664726.492010] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664726.500061] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664726.509018] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664726.517803] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664726.526383] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664726.534560] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664726.543175] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664726.551444] [53101] 0 53101 1910 64 9 172 0 mdadm [7664726.559537] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664726.568065] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664726.577029] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664726.585380] [53139] 997 53139 29446 250 28 128 0 chronyd [7664726.593642] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664726.602516] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664726.610525] [53969] 0 53969 31572 205 20 168 0 crond [7664726.618624] [54035] 0 54035 27526 164 10 33 0 agetty [7664726.626798] [54036] 0 54036 27526 158 11 33 0 agetty [7664726.634972] [54186] 0 54186 22934 209 46 274 0 master [7664726.643239] [36317] 0 36317 28294 187 14 61 0 bash [7664726.651240] [36328] 0 36328 154746 223 201 98 0 journalctl [7664726.659767] [36329] 0 36329 28177 160 14 55 0 grep [7664726.667879] [97173] 0 97173 48653 264 49 262 0 crond [7664726.675972] [98579] 0 98579 48653 266 49 236 0 crond [7664726.684061] [99292] 0 99292 48653 257 49 261 0 crond [7664726.692158] [99592] 89 99592 25538 229 47 273 0 cleanup [7664726.700425] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664726.709391] [100032] 0 100032 48653 266 49 240 0 crond [7664726.717656] [100203] 0 100203 30816 185 17 335 0 python3 [7664726.726094] Out of memory: Kill process 97173 (crond) score 0 or sacrifice child [7664726.733661] Killed process 97173 (crond) total-vm:194612kB, anon-rss:0kB, file-rss:1056kB, shmem-rss:0kB [7664726.796453] crond: page allocation failure: order:0, mode:0x200da [7664726.802730] CPU: 32 PID: 97173 Comm: crond Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664726.815241] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664726.823064] Call Trace: [7664726.825702] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664726.831024] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664726.837124] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664726.842700] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664726.848711] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664726.855236] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664726.861764] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664726.867598] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664726.874218] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664726.880484] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664726.886407] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664726.892419] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664726.898343] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664726.904269] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664726.909847] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664726.915158] Mem-Info: [7664726.917638] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:33428 inactive_file:36293 isolated_file:3392 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824045 slab_unreclaimable:62296623 mapped:1588 shmem:0 pagetables:1161 bounce:0 free:590387 free_pcp:0 free_cma:0 [7664726.951909] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664726.993661] lowmem_reserve[]: 0 1418 63868 63868 [7664726.998586] Node 0 DMA32 free:261348kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:888kB inactive_file:2784kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686204kB kernel_stack:384kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:58954 all_unreclaimable? yes [7664727.043279] lowmem_reserve[]: 0 0 62450 62450 [7664727.047943] Node 0 Normal free:508404kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:4kB active_file:45620kB inactive_file:49060kB unevictable:168kB isolated(anon):0kB isolated(file):1792kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243608kB kernel_stack:5856kB pagetables:1336kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:548345 all_unreclaimable? yes [7664727.094805] lowmem_reserve[]: 0 0 0 0 [7664727.098785] Node 1 Normal free:525320kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16240kB inactive_file:17196kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:497018 all_unreclaimable? yes [7664727.145668] lowmem_reserve[]: 0 0 0 0 [7664727.149640] Node 2 Normal free:525408kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32088kB inactive_file:39344kB unevictable:8680kB isolated(anon):0kB isolated(file):256kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476080kB kernel_stack:7760kB pagetables:552kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:163371 all_unreclaimable? yes [7664727.196616] lowmem_reserve[]: 0 0 0 0 [7664727.200591] Node 3 Normal free:525172kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:37268kB inactive_file:37800kB unevictable:840kB isolated(anon):0kB isolated(file):12544kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369268kB kernel_stack:4208kB pagetables:1212kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1256385 all_unreclaimable? yes [7664727.247617] lowmem_reserve[]: 0 0 0 0 [7664727.251582] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664727.266422] Node 0 DMA32: 352*4kB (UEM) 412*8kB (UEM) 1214*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261504kB [7664727.282914] Node 0 Normal: 6356*4kB (UEM) 5763*8kB (UEM) 3922*16kB (UEM) 4479*32kB (EM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508264kB [7664727.299668] Node 1 Normal: 88058*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525320kB [7664727.313195] Node 2 Normal: 27371*4kB (UEM) 40191*8kB (UEM) 915*16kB (UEM) 1672*32kB (UEM) 409*64kB (EM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525460kB [7664727.328969] Node 3 Normal: 131362*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525448kB [7664727.341408] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664727.350275] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664727.358880] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664727.367746] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664727.376353] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664727.385219] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664727.393824] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664727.402690] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664727.411296] 74308 total pagecache pages [7664727.415309] 0 pages in swap cache [7664727.418802] Swap cache stats: add 21120733, delete 21136705, find 4513443/7609955 [7664727.426453] Free swap = 3110188kB [7664727.430033] Total swap = 4194300kB [7664727.433614] 66993253 pages RAM [7664727.436845] 0 pages HighMem/MovableOnly [7664727.440857] 1101945 pages reserved [7664728.267989] ll_ost_io02_074 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664728.276432] ll_ost_io02_074 cpuset=/ mems_allowed=2 [7664728.281489] CPU: 14 PID: 83183 Comm: ll_ost_io02_074 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664728.294866] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664728.302690] Call Trace: [7664728.305326] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664728.310648] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664728.316138] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664728.321978] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664728.327991] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664728.334345] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664728.340440] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664728.346191] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664728.352719] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664728.359255] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664728.365441] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664728.371446] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664728.377556] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664728.384438] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664728.391661] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664728.397829] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664728.404455] [<ffffffffc0a844f5>] ? lnet_try_match_md+0x1e5/0x330 [lnet] [7664728.411334] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664728.417085] [<ffffffffa00dca58>] ? __enqueue_entity+0x78/0x80 [7664728.423091] [<ffffffffa00e367f>] ? enqueue_entity+0x2ef/0xbe0 [7664728.429098] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664728.434631] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664728.441725] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664728.449473] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664728.456731] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664728.464598] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664728.471601] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664728.479562] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664728.486819] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664728.493291] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664728.500861] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664728.505919] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664728.512188] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664728.518810] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664728.525081] Mem-Info: [7664728.527541] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:34268 inactive_file:35841 isolated_file:3424 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824045 slab_unreclaimable:62296619 mapped:1588 shmem:0 pagetables:1112 bounce:0 free:590389 free_pcp:0 free_cma:0 [7664728.561810] Node 2 Normal free:525456kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32088kB inactive_file:39600kB unevictable:8680kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476064kB kernel_stack:7760kB pagetables:536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:186618 all_unreclaimable? yes [7664728.608587] lowmem_reserve[]: 0 0 0 0 [7664728.612553] Node 2 Normal: 27375*4kB (UEM) 40191*8kB (UEM) 915*16kB (UEM) 1672*32kB (UEM) 409*64kB (EM) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525476kB [7664728.628329] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664728.637194] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664728.645802] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664728.654667] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664728.663275] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664728.672139] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664728.680748] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664728.689621] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664728.698226] 74311 total pagecache pages [7664728.702240] 0 pages in swap cache [7664728.705731] Swap cache stats: add 21120740, delete 21136712, find 4513444/7609957 [7664728.713385] Free swap = 3110700kB [7664728.716963] Total swap = 4194300kB [7664728.720544] 66993253 pages RAM [7664728.723776] 0 pages HighMem/MovableOnly [7664728.727787] 1101945 pages reserved [7664728.731368] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664728.739414] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664728.748374] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664728.757162] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664728.765759] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664728.773943] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664728.782555] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664728.790816] [53101] 0 53101 1910 64 9 172 0 mdadm [7664728.798910] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664728.807440] [53108] 0 53108 38960 167 19 84 0 dsm_sa_eventmgr [7664728.816400] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664728.824754] [53139] 997 53139 29446 250 28 128 0 chronyd [7664728.833014] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664728.841892] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664728.849897] [53969] 0 53969 31572 205 20 168 0 crond [7664728.857991] [54035] 0 54035 27526 164 10 33 0 agetty [7664728.866164] [54036] 0 54036 27526 158 11 33 0 agetty [7664728.874339] [54186] 0 54186 22934 209 46 274 0 master [7664728.882633] [36317] 0 36317 28294 187 14 61 0 bash [7664728.890651] [36328] 0 36328 154746 223 201 98 0 journalctl [7664728.899176] [36329] 0 36329 28177 160 14 55 0 grep [7664728.907293] [98579] 0 98579 48653 266 49 236 0 crond [7664728.915383] [99292] 0 99292 48653 257 49 261 0 crond [7664728.923470] [99592] 89 99592 25538 229 47 273 0 cleanup [7664728.931738] [99739] 89 99739 25502 246 47 261 0 trivial-rewrite [7664728.940699] [100032] 0 100032 48653 266 49 240 0 crond [7664728.948969] [100203] 0 100203 30816 185 17 335 0 python3 [7664728.957408] Out of memory: Kill process 99739 (trivial-rewrite) score 0 or sacrifice child [7664728.965849] Killed process 99739 (trivial-rewrite) total-vm:102008kB, anon-rss:0kB, file-rss:984kB, shmem-rss:0kB [7664729.057875] trivial-rewrite: page allocation failure: order:0, mode:0x201da [7664729.065016] CPU: 15 PID: 99739 Comm: trivial-rewrite Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664729.078398] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664729.086230] Call Trace: [7664729.088860] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664729.094178] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664729.100274] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664729.105854] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664729.111866] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664729.118395] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664729.124929] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664729.131117] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664729.137131] [<ffffffffa01ba3c8>] filemap_fault+0x298/0x490 [7664729.142913] [<ffffffffc05871c6>] ext4_filemap_fault+0x36/0x50 [ext4] [7664729.149533] [<ffffffffa01e593a>] __do_fault.isra.59+0x8a/0x100 [7664729.155625] [<ffffffffa01e5eec>] do_read_fault.isra.61+0x4c/0x1b0 [7664729.161986] [<ffffffffa01ea874>] handle_pte_fault+0x2f4/0xd10 [7664729.167990] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664729.173912] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664729.179831] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664729.185404] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664729.190716] Mem-Info: [7664729.193192] active_anon:0 inactive_anon:1 isolated_anon:0 active_file:32713 inactive_file:34310 isolated_file:4480 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824045 slab_unreclaimable:62296619 mapped:1588 shmem:0 pagetables:1112 bounce:0 free:590487 free_pcp:9 free_cma:0 [7664729.227467] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664729.269224] lowmem_reserve[]: 0 1418 63868 63868 [7664729.274152] Node 0 DMA32 free:261352kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1008kB inactive_file:2976kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686204kB kernel_stack:352kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9326 all_unreclaimable? yes [7664729.318845] lowmem_reserve[]: 0 0 62450 62450 [7664729.323509] Node 0 Normal free:508352kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:4kB active_file:44920kB inactive_file:47280kB unevictable:168kB isolated(anon):0kB isolated(file):3584kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243608kB kernel_stack:6288kB pagetables:1268kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:68642 all_unreclaimable? no [7664729.370199] lowmem_reserve[]: 0 0 0 0 [7664729.374174] Node 1 Normal free:525320kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16548kB inactive_file:16640kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:497018 all_unreclaimable? yes [7664729.421038] lowmem_reserve[]: 0 0 0 0 [7664729.425014] Node 2 Normal free:525564kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:29596kB inactive_file:31524kB unevictable:8680kB isolated(anon):0kB isolated(file):8832kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476064kB kernel_stack:7760kB pagetables:536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:430364 all_unreclaimable? yes [7664729.472052] lowmem_reserve[]: 0 0 0 0 [7664729.476027] Node 3 Normal free:525408kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:39776kB inactive_file:43592kB unevictable:840kB isolated(anon):0kB isolated(file):4096kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369268kB kernel_stack:4208kB pagetables:1100kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:575432 all_unreclaimable? yes [7664729.522891] lowmem_reserve[]: 0 0 0 0 [7664729.526862] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664729.541701] Node 0 DMA32: 359*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3688*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261572kB [7664729.558194] Node 0 Normal: 6426*4kB (UEM) 5764*8kB (UEM) 3942*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508904kB [7664729.575037] Node 1 Normal: 88058*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525320kB [7664729.588564] Node 2 Normal: 27436*4kB (UEM) 40241*8kB (UEM) 892*16kB (UEM) 1671*32kB (UEM) 408*64kB (UEM) 2*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525784kB [7664729.604511] Node 3 Normal: 131215*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524860kB [7664729.616861] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664729.625729] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664729.634345] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664729.643216] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664729.651823] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664729.660688] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664729.669296] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664729.678161] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664729.686766] 74201 total pagecache pages [7664729.690781] 1 pages in swap cache [7664729.694273] Swap cache stats: add 21120743, delete 21136714, find 4513445/7609959 [7664729.701924] Free swap = 3110692kB [7664729.705502] Total swap = 4194300kB [7664729.709084] 66993253 pages RAM [7664729.712316] 0 pages HighMem/MovableOnly [7664729.716329] 1101945 pages reserved [7664729.902284] ll_ost_io00_031 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664729.910728] ll_ost_io00_031 cpuset=/ mems_allowed=0 [7664729.915796] CPU: 4 PID: 123073 Comm: ll_ost_io00_031 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664729.929171] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664729.937006] Call Trace: [7664729.939646] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664729.944963] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664729.950457] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664729.956300] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664729.962051] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664729.968055] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664729.974408] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664729.980503] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664729.986258] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664729.992815] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664729.999349] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664730.005559] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664730.011564] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664730.017672] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664730.024558] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664730.030725] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664730.037818] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664730.044387] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664730.051031] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664730.058378] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664730.065118] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664730.072468] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664730.080022] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664730.086936] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664730.094025] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664730.101795] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664730.109051] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664730.116964] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664730.123966] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664730.131921] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664730.139177] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664730.145657] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664730.153233] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664730.158292] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664730.164560] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664730.171171] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664730.177437] Mem-Info: [7664730.179903] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:32922 inactive_file:34799 isolated_file:5088 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824045 slab_unreclaimable:62296624 mapped:1588 shmem:0 pagetables:1112 bounce:0 free:590371 free_pcp:0 free_cma:0 [7664730.214183] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664730.255937] lowmem_reserve[]: 0 1418 63868 63868 [7664730.260865] Node 0 DMA32 free:261136kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1084kB inactive_file:2240kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686204kB kernel_stack:352kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:19213 all_unreclaimable? no [7664730.305562] lowmem_reserve[]: 0 0 62450 62450 [7664730.310228] Node 0 Normal free:508120kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:47312kB inactive_file:46832kB unevictable:168kB isolated(anon):0kB isolated(file):2304kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243620kB kernel_stack:6672kB pagetables:1268kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:317270 all_unreclaimable? yes [7664730.357101] lowmem_reserve[]: 0 0 0 0 [7664730.361110] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664730.375950] Node 0 DMA32: 340*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261464kB [7664730.392448] Node 0 Normal: 6464*4kB (UEM) 5764*8kB (UEM) 3918*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508672kB [7664730.409292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664730.418155] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664730.426764] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664730.435629] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664730.444234] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664730.453099] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664730.461708] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664730.470572] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664730.479179] 74443 total pagecache pages [7664730.483193] 1 pages in swap cache [7664730.486686] Swap cache stats: add 21120748, delete 21136719, find 4513447/7609964 [7664730.494337] Free swap = 3111632kB [7664730.497916] Total swap = 4194300kB [7664730.501498] 66993253 pages RAM [7664730.504728] 0 pages HighMem/MovableOnly [7664730.508741] 1101945 pages reserved [7664730.512320] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664730.520369] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664730.529327] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664730.538112] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664730.546707] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664730.554889] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664730.563501] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664730.571769] [53101] 0 53101 1910 64 9 172 0 mdadm [7664730.579863] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664730.588382] [53108] 0 53108 38960 167 19 86 0 dsm_sa_eventmgr [7664730.597335] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664730.605683] [53139] 997 53139 29446 250 28 128 0 chronyd [7664730.613952] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664730.622829] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664730.630834] [53969] 0 53969 31572 204 20 169 0 crond [7664730.638926] [54035] 0 54035 27526 164 10 33 0 agetty [7664730.647099] [54036] 0 54036 27526 158 11 33 0 agetty [7664730.655272] [54186] 0 54186 22934 209 46 274 0 master [7664730.663565] [36317] 0 36317 28294 187 14 61 0 bash [7664730.671600] [36328] 0 36328 154746 223 201 98 0 journalctl [7664730.680121] [36329] 0 36329 28177 160 14 55 0 grep [7664730.688247] [98579] 0 98579 48653 266 49 236 0 crond [7664730.696337] [99292] 0 99292 48653 257 49 261 0 crond [7664730.704429] [99592] 89 99592 25538 229 47 273 0 cleanup [7664730.712689] [100032] 0 100032 48653 266 49 240 0 crond [7664730.720950] [100203] 0 100203 30816 185 17 335 0 python3 [7664730.729390] Out of memory: Kill process 99292 (crond) score 0 or sacrifice child [7664730.736955] Killed process 99292 (crond) total-vm:194612kB, anon-rss:0kB, file-rss:1028kB, shmem-rss:0kB [7664730.792964] crond: page allocation failure: order:0, mode:0x200da [7664730.799287] CPU: 28 PID: 99292 Comm: crond Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664730.811819] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664730.819674] Call Trace: [7664730.822315] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664730.827631] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664730.833730] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664730.839313] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664730.845327] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664730.851859] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664730.858387] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664730.864218] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664730.870833] [<ffffffffa076aaba>] ? __schedule+0x42a/0x860 [7664730.876502] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664730.882779] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664730.888704] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664730.894720] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664730.900648] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664730.906573] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664730.912147] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664730.917460] Mem-Info: [7664730.919935] active_anon:0 inactive_anon:4 isolated_anon:0 active_file:34635 inactive_file:33367 isolated_file:5184 unevictable:9044 dirty:0 writeback:1 unstable:0 slab_reclaimable:824045 slab_unreclaimable:62296625 mapped:1595 shmem:0 pagetables:1112 bounce:0 free:590251 free_pcp:0 free_cma:0 [7664730.954219] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664730.995974] lowmem_reserve[]: 0 1418 63868 63868 [7664731.000903] Node 0 DMA32 free:261188kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1084kB inactive_file:2288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686204kB kernel_stack:352kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:58529 all_unreclaimable? yes [7664731.045688] lowmem_reserve[]: 0 0 62450 62450 [7664731.050356] Node 0 Normal free:508608kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44976kB inactive_file:47188kB unevictable:168kB isolated(anon):0kB isolated(file):2304kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243620kB kernel_stack:6672kB pagetables:1268kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:572632 all_unreclaimable? yes [7664731.098924] Lustre: fir-OST001d: Bulk IO write error with 72866633-325f-4 (at 10.50.15.9@o2ib2), client will retry: rc = -110 [7664731.097227] lowmem_reserve[]: 0 0 0 0 [7664731.112674] Node 1 Normal free:525336kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:18044kB inactive_file:14408kB unevictable:26488kB isolated(anon):0kB isolated(file):128kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711288kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:301214 all_unreclaimable? yes [7664731.159722] lowmem_reserve[]: 0 0 0 0 [7664731.163693] Node 2 Normal free:524692kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:12kB active_file:28564kB inactive_file:34364kB unevictable:8680kB isolated(anon):0kB isolated(file):7936kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:4kB mapped:5356kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476064kB kernel_stack:7760kB pagetables:536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:119043 all_unreclaimable? no [7664731.210737] lowmem_reserve[]: 0 0 0 0 [7664731.214713] Node 3 Normal free:525096kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:4kB active_file:44136kB inactive_file:40748kB unevictable:840kB isolated(anon):0kB isolated(file):5760kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:852kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369276kB kernel_stack:4208kB pagetables:1100kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:198515 all_unreclaimable? yes [7664731.261563] lowmem_reserve[]: 0 0 0 0 [7664731.265529] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664731.280368] Node 0 DMA32: 340*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261464kB [7664731.296861] Node 0 Normal: 6502*4kB (UEM) 5764*8kB (UEM) 3931*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 509032kB [7664731.313700] Node 1 Normal: 88060*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525328kB [7664731.327231] Node 2 Normal: 27486*4kB (UEM) 40197*8kB (UEM) 888*16kB (EM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525120kB [7664731.342544] Node 3 Normal: 131457*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525828kB [7664731.354895] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664731.363761] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664731.372367] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664731.381234] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664731.389841] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664731.398705] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664731.407312] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664731.416177] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664731.424785] 74427 total pagecache pages [7664731.428800] 0 pages in swap cache [7664731.432300] Swap cache stats: add 21120748, delete 21136720, find 4513447/7609964 [7664731.439951] Free swap = 3111632kB [7664731.443530] Total swap = 4194300kB [7664731.447109] 66993253 pages RAM [7664731.450342] 0 pages HighMem/MovableOnly [7664731.454353] 1101945 pages reserved [7664732.173644] ll_ost_io02_054 invoked oom-killer: gfp_mask=0x82d2, order=0, oom_score_adj=0 [7664732.181995] ll_ost_io02_054 cpuset=/ mems_allowed=2 [7664732.187063] CPU: 2 PID: 6889 Comm: ll_ost_io02_054 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664732.200270] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664732.208101] Call Trace: [7664732.210741] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664732.216055] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664732.221553] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664732.227394] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664732.233149] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664732.239161] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664732.245524] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664732.251624] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664732.257378] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664732.263905] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664732.270440] [<ffffffffa01fd95f>] __vmalloc_node_range+0x12f/0x280 [7664732.276871] [<ffffffffc11e6a03>] ? ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664732.283924] [<ffffffffa01fdd5e>] vzalloc_node+0x4e/0x50 [7664732.289450] [<ffffffffc11e6a03>] ? ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664732.296535] [<ffffffffc11e6a03>] ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664732.303450] [<ffffffffc11e6ea1>] ptlrpc_grow_req_bufs+0xe1/0x2a0 [ptlrpc] [7664732.310543] [<ffffffffc11efc85>] ptlrpc_main+0xc05/0x1460 [ptlrpc] [7664732.317035] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664732.324607] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664732.329668] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664732.335935] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664732.342554] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664732.348820] Mem-Info: [7664732.351282] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:34450 inactive_file:35193 isolated_file:2400 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824047 slab_unreclaimable:62296627 mapped:1591 shmem:0 pagetables:1016 bounce:0 free:590377 free_pcp:0 free_cma:0 [7664732.385555] Node 2 Normal free:524864kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:8kB active_file:30784kB inactive_file:37400kB unevictable:8680kB isolated(anon):0kB isolated(file):3968kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5340kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476072kB kernel_stack:7760kB pagetables:368kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1223279 all_unreclaimable? yes [7664732.432681] lowmem_reserve[]: 0 0 0 0 [7664732.436654] Node 2 Normal: 27488*4kB (UEM) 40197*8kB (UEM) 889*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525144kB [7664732.452060] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664732.460934] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664732.469541] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664732.478416] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664732.487029] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664732.495897] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664732.504510] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664732.513375] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664732.521981] 74412 total pagecache pages [7664732.525997] 0 pages in swap cache [7664732.529495] Swap cache stats: add 21120750, delete 21136722, find 4513449/7609966 [7664732.537149] Free swap = 3112144kB [7664732.540726] Total swap = 4194300kB [7664732.544309] 66993253 pages RAM [7664732.547539] 0 pages HighMem/MovableOnly [7664732.551549] 1101945 pages reserved [7664732.555130] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664732.563180] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664732.572138] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664732.580937] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664732.589539] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664732.597717] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664732.606329] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664732.614599] [53101] 0 53101 1910 64 9 172 0 mdadm [7664732.622693] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664732.631220] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664732.640182] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664732.648537] [53139] 997 53139 29446 250 28 128 0 chronyd [7664732.656806] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664732.665682] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664732.673688] [53969] 0 53969 31572 204 20 169 0 crond [7664732.681781] [54035] 0 54035 27526 164 10 33 0 agetty [7664732.689955] [54036] 0 54036 27526 158 11 33 0 agetty [7664732.698136] [54186] 0 54186 22934 209 46 274 0 master [7664732.706434] [36317] 0 36317 28294 187 14 61 0 bash [7664732.714442] [36328] 0 36328 154746 223 201 98 0 journalctl [7664732.722967] [36329] 0 36329 28177 160 14 55 0 grep [7664732.731082] [98579] 0 98579 48653 266 49 236 0 crond [7664732.739175] [99592] 89 99592 25538 229 47 273 0 cleanup [7664732.747443] [100032] 0 100032 48653 266 49 240 0 crond [7664732.755712] [100203] 0 100203 30816 185 17 335 0 python3 [7664732.764151] Out of memory: Kill process 99592 (cleanup) score 0 or sacrifice child [7664732.771897] Killed process 99592 (cleanup) total-vm:102152kB, anon-rss:0kB, file-rss:916kB, shmem-rss:0kB [7664732.812301] cleanup: page allocation failure: order:0, mode:0x200da [7664732.818752] CPU: 0 PID: 99592 Comm: cleanup Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664732.831358] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664732.839189] Call Trace: [7664732.841833] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664732.847158] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664732.853254] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664732.858841] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664732.864849] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664732.871379] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664732.877913] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664732.883753] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664732.890374] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664732.896647] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664732.902572] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664732.908596] [<ffffffffa076a10e>] ? schedule_hrtimeout_range_clock+0xbe/0x150 [7664732.915906] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664732.921835] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664732.927763] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664732.933346] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664732.938668] Mem-Info: [7664732.941150] active_anon:0 inactive_anon:2 isolated_anon:0 active_file:32804 inactive_file:36254 isolated_file:3488 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824047 slab_unreclaimable:62296628 mapped:1591 shmem:0 pagetables:1016 bounce:0 free:590333 free_pcp:134 free_cma:0 [7664732.975602] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664733.010251] LustreError: 120630:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c2126afac00 [7664733.028395] lowmem_reserve[]: 0 1418 63868 63868 [7664733.033322] Node 0 DMA32 free:261344kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1100kB inactive_file:2360kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:282489 all_unreclaimable? yes [7664733.078195] lowmem_reserve[]: 0 0 62450 62450 [7664733.082864] Node 0 Normal free:508256kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44732kB inactive_file:43404kB unevictable:168kB isolated(anon):0kB isolated(file):7936kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243616kB kernel_stack:6640kB pagetables:1108kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:39428 all_unreclaimable? no [7664733.129565] lowmem_reserve[]: 0 0 0 0 [7664733.133535] Node 1 Normal free:525328kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16248kB inactive_file:17544kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711296kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:513849 all_unreclaimable? yes [7664733.173817] LustreError: 8683:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c2d1437e400 [7664733.191265] lowmem_reserve[]: 0 0 0 0 [7664733.195234] Node 2 Normal free:524888kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:4kB active_file:30776kB inactive_file:38860kB unevictable:8680kB isolated(anon):0kB isolated(file):2176kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5340kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476076kB kernel_stack:7760kB pagetables:368kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:73248 all_unreclaimable? no [7664733.199739] LustreError: 107086:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c29177a9200 [7664733.253119] lowmem_reserve[]: 0 0 0 0 [7664733.257089] Node 3 Normal free:524936kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:4kB active_file:40824kB inactive_file:40456kB unevictable:840kB isolated(anon):0kB isolated(file):6144kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:852kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:1048kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:214118 all_unreclaimable? no [7664733.303870] lowmem_reserve[]: 0 0 0 0 [7664733.307840] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664733.322676] Node 0 DMA32: 338*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261456kB [7664733.339170] Node 0 Normal: 6323*4kB (UEM) 5744*8kB (UEM) 3931*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508156kB [7664733.356010] Node 1 Normal: 88058*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525320kB [7664733.369538] Node 2 Normal: 27485*4kB (UEM) 40197*8kB (UEM) 889*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525132kB [7664733.384940] Node 3 Normal: 131320*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525280kB [7664733.397376] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664733.406243] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664733.414867] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664733.423734] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664733.432339] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664733.441216] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664733.449829] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664733.458695] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664733.467301] 74577 total pagecache pages [7664733.471319] 0 pages in swap cache [7664733.474817] Swap cache stats: add 21120750, delete 21136722, find 4513449/7609966 [7664733.482468] Free swap = 3112144kB [7664733.486045] Total swap = 4194300kB [7664733.489628] 66993253 pages RAM [7664733.492858] 0 pages HighMem/MovableOnly [7664733.496871] 1101945 pages reserved [7664733.811936] ll_ost_io03_058 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664733.820370] ll_ost_io03_058 cpuset=/ mems_allowed=3 [7664733.825426] CPU: 15 PID: 7282 Comm: ll_ost_io03_058 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664733.838724] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664733.846550] Call Trace: [7664733.849182] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664733.854501] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664733.859995] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664733.865827] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664733.871834] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664733.878187] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664733.884288] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664733.890033] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664733.896563] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664733.903096] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664733.909281] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664733.915287] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664733.921396] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664733.928280] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664733.935505] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664733.941668] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664733.948319] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664733.955374] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664733.961128] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664733.967569] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664733.974449] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664733.979985] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664733.987076] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664733.994833] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664734.002104] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664734.009967] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664734.016931] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664734.022366] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664734.028841] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664734.036416] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664734.041469] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664734.047744] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664734.054363] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664734.060629] Mem-Info: [7664734.063089] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34898 inactive_file:36671 isolated_file:1685 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824052 slab_unreclaimable:62296631 mapped:1589 shmem:0 pagetables:1016 bounce:0 free:590132 free_pcp:0 free_cma:0 [7664734.097367] Node 3 Normal free:525248kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:4kB active_file:43272kB inactive_file:42992kB unevictable:840kB isolated(anon):0kB isolated(file):852kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:852kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:1048kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1041966 all_unreclaimable? yes [7664734.144228] lowmem_reserve[]: 0 0 0 0 [7664734.148195] Node 3 Normal: 131365*4kB (UEM) 2*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525476kB [7664734.161008] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664734.169873] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664734.178482] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664734.187355] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664734.195962] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664734.204828] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664734.213432] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664734.222297] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664734.230908] 74669 total pagecache pages [7664734.234931] 0 pages in swap cache [7664734.238429] Swap cache stats: add 21120759, delete 21136731, find 4513450/7609968 [7664734.246079] Free swap = 3113168kB [7664734.249659] Total swap = 4194300kB [7664734.253239] 66993253 pages RAM [7664734.256470] 0 pages HighMem/MovableOnly [7664734.260485] 1101945 pages reserved [7664734.264063] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664734.272112] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664734.281070] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664734.289853] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664734.298448] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664734.306631] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664734.315245] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664734.323512] [53101] 0 53101 1910 64 9 172 0 mdadm [7664734.331608] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664734.340133] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664734.349087] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664734.357433] [53139] 997 53139 29446 250 28 128 0 chronyd [7664734.365695] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664734.374569] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664734.382576] [53969] 0 53969 31572 204 20 169 0 crond [7664734.390669] [54035] 0 54035 27526 164 10 33 0 agetty [7664734.398842] [54036] 0 54036 27526 158 11 33 0 agetty [7664734.407014] [54186] 0 54186 22934 209 46 274 0 master [7664734.415301] [36317] 0 36317 28294 187 14 61 0 bash [7664734.423299] [36328] 0 36328 154746 223 201 98 0 journalctl [7664734.431821] [36329] 0 36329 28177 160 14 55 0 grep [7664734.439940] [98579] 0 98579 48653 266 49 236 0 crond [7664734.448032] [100032] 0 100032 48653 266 49 240 0 crond [7664734.456293] [100203] 0 100203 30816 185 17 335 0 python3 [7664734.464727] Out of memory: Kill process 100032 (crond) score 0 or sacrifice child [7664734.472379] Killed process 100203 (python3) total-vm:123264kB, anon-rss:0kB, file-rss:740kB, shmem-rss:0kB [7664734.525533] Lustre: 90710:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-14s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff9c114c91b850 x1660577746876416/t0(0) o4->31cd270d-535d-4@10.50.5.29@o2ib2:492/0 lens 488/448 e 1 to 0 dl 1583650742 ref 2 fl Interpret:/0/0 rc 0/0 [7664734.526031] LustreError: 2948:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c1ae02c0e00 [7664734.676783] python3: page allocation failure: order:0, mode:0x200da [7664734.683236] CPU: 20 PID: 100203 Comm: python3 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664734.696009] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664734.703833] Call Trace: [7664734.706467] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664734.711786] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664734.717886] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664734.723467] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664734.729481] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664734.736017] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664734.742550] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664734.748385] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664734.755004] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664734.761269] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664734.767189] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664734.773206] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664734.779132] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664734.785052] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664734.790624] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664734.795937] Mem-Info: [7664734.798411] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34305 inactive_file:36083 isolated_file:2272 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824052 slab_unreclaimable:62296627 mapped:1588 shmem:0 pagetables:969 bounce:0 free:590208 free_pcp:0 free_cma:0 [7664734.832591] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664734.874346] lowmem_reserve[]: 0 1418 63868 63868 [7664734.879276] Node 0 DMA32 free:261332kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1128kB inactive_file:2640kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:280662 all_unreclaimable? yes [7664734.924146] lowmem_reserve[]: 0 0 62450 62450 [7664734.928817] Node 0 Normal free:508272kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44512kB inactive_file:45304kB unevictable:168kB isolated(anon):0kB isolated(file):6016kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243624kB kernel_stack:6016kB pagetables:1088kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:487124 all_unreclaimable? yes [7664734.975685] lowmem_reserve[]: 0 0 0 0 [7664734.979655] Node 1 Normal free:525320kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:18692kB inactive_file:15264kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711304kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:572114 all_unreclaimable? yes [7664735.026507] lowmem_reserve[]: 0 0 0 0 [7664735.030477] Node 2 Normal free:525148kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31560kB inactive_file:40584kB unevictable:8680kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715236kB slab_unreclaimable:62476060kB kernel_stack:7760kB pagetables:368kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:122880 all_unreclaimable? no [7664735.077168] lowmem_reserve[]: 0 0 0 0 [7664735.081142] Node 3 Normal free:524856kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:39888kB inactive_file:44164kB unevictable:840kB isolated(anon):0kB isolated(file):2560kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:880kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1207843 all_unreclaimable? yes [7664735.128005] lowmem_reserve[]: 0 0 0 0 [7664735.131970] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664735.146807] Node 0 DMA32: 336*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261448kB [7664735.163300] Node 0 Normal: 6355*4kB (UEM) 5745*8kB (UEM) 3923*16kB (UEM) 4480*32kB (UEM) 2040*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508164kB [7664735.180141] Node 1 Normal: 88058*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525320kB [7664735.193671] Node 2 Normal: 27525*4kB (UEM) 40223*8kB (UEM) 896*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525612kB [7664735.209071] Node 3 Normal: 131035*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524140kB [7664735.221421] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664735.230287] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664735.238897] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664735.247769] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664735.256376] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664735.265242] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664735.273848] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664735.282713] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664735.291319] 74860 total pagecache pages [7664735.295333] 0 pages in swap cache [7664735.298826] Swap cache stats: add 21120759, delete 21136731, find 4513450/7609968 [7664735.306476] Free swap = 3113168kB [7664735.310056] Total swap = 4194300kB [7664735.313635] 66993253 pages RAM [7664735.316869] 0 pages HighMem/MovableOnly [7664735.320881] 1101945 pages reserved [7664735.734921] ll_ost_io01_005 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [7664735.743103] ll_ost_io01_005 cpuset=/ mems_allowed=1 [7664735.748167] CPU: 21 PID: 119516 Comm: ll_ost_io01_005 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664735.761633] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664735.769466] Call Trace: [7664735.772099] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664735.777414] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664735.782903] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664735.788745] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664735.794499] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664735.800511] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664735.806864] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664735.812957] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664735.818703] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664735.825228] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664735.831756] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664735.838015] [<ffffffffc124293f>] tgt_checksum_niobuf_rw+0xbf/0xe00 [ptlrpc] [7664735.845273] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664735.852607] [<ffffffffc0cb71e0>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [7664735.859438] [<ffffffffc1247325>] tgt_brw_read+0xc35/0x1e50 [ptlrpc] [7664735.866008] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664735.872647] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664735.880000] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664735.887345] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664735.894866] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664735.901781] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664735.908866] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664735.916621] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664735.923885] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664735.931751] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664735.938716] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664735.944155] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664735.950632] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664735.958207] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664735.963265] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664735.969532] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664735.976143] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664735.982407] Mem-Info: [7664735.984868] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34394 inactive_file:35163 isolated_file:4023 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824053 slab_unreclaimable:62296627 mapped:1588 shmem:0 pagetables:969 bounce:0 free:590151 free_pcp:0 free_cma:0 [7664736.019049] Node 1 Normal free:525320kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16272kB inactive_file:17236kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:60286 all_unreclaimable? yes [7664736.065832] lowmem_reserve[]: 0 0 0 0 [7664736.069803] Node 1 Normal: 88057*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525316kB [7664736.083332] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664736.092198] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664736.100806] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664736.101757] LustreError: 8713:0:(ldlm_lib.c:3262:target_bulk_io()) @@@ network error on bulk WRITE req@ffff9c1f527ad050 x1659307336926208/t0(0) o4->0716ac8f-8ab5-4@10.50.4.38@o2ib2:522/0 lens 488/448 e 2 to 0 dl 1583650772 ref 1 fl Interpret:/0/0 rc 0/0 [7664736.101760] LustreError: 8713:0:(ldlm_lib.c:3262:target_bulk_io()) Skipped 6 previous similar messages [7664736.141790] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664736.150395] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664736.159264] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664736.167867] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664736.176734] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664736.185340] 74801 total pagecache pages [7664736.189353] 0 pages in swap cache [7664736.192846] Swap cache stats: add 21120761, delete 21136733, find 4513450/7609970 [7664736.200499] Free swap = 3114448kB [7664736.204078] Total swap = 4194300kB [7664736.207660] 66993253 pages RAM [7664736.210890] 0 pages HighMem/MovableOnly [7664736.214903] 1101945 pages reserved [7664736.218483] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664736.226525] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664736.235479] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664736.244268] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664736.252840] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664736.261014] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664736.269625] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664736.277887] [53101] 0 53101 1910 64 9 172 0 mdadm [7664736.285974] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664736.294500] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664736.303453] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664736.311800] [53139] 997 53139 29446 250 28 128 0 chronyd [7664736.320068] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664736.328935] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664736.336946] [53969] 0 53969 31572 204 20 169 0 crond [7664736.345036] [54035] 0 54035 27526 164 10 33 0 agetty [7664736.353209] [54036] 0 54036 27526 158 11 33 0 agetty [7664736.361382] [54186] 0 54186 22934 209 46 274 0 master [7664736.369645] [36317] 0 36317 28294 187 14 61 0 bash [7664736.377648] [36328] 0 36328 154746 223 201 98 0 journalctl [7664736.386170] [36329] 0 36329 28177 160 14 55 0 grep [7664736.394292] [98579] 0 98579 48653 266 49 236 0 crond [7664736.402390] [100032] 0 100032 48653 266 49 240 0 crond [7664736.410655] Out of memory: Kill process 100032 (crond) score 0 or sacrifice child [7664736.418312] Killed process 100032 (crond) total-vm:194612kB, anon-rss:0kB, file-rss:1064kB, shmem-rss:0kB [7664736.464388] crond: page allocation failure: order:0, mode:0x200da [7664736.470664] CPU: 20 PID: 100032 Comm: crond Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664736.483268] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664736.491096] Call Trace: [7664736.493730] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664736.499048] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664736.505145] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664736.510719] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664736.516726] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664736.523261] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664736.529787] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664736.535629] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664736.542249] [<ffffffffa076aaba>] ? __schedule+0x42a/0x860 [7664736.547907] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664736.554173] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664736.560094] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664736.566099] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664736.572020] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664736.577948] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664736.583528] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664736.588840] Mem-Info: [7664736.591319] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34031 inactive_file:35210 isolated_file:4311 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824053 slab_unreclaimable:62296627 mapped:1587 shmem:0 pagetables:952 bounce:0 free:590218 free_pcp:0 free_cma:0 [7664736.625506] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664736.667257] lowmem_reserve[]: 0 1418 63868 63868 [7664736.672180] Node 0 DMA32 free:261308kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1148kB inactive_file:2672kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:24147 all_unreclaimable? yes [7664736.716961] lowmem_reserve[]: 0 0 62450 62450 [7664736.721624] Node 0 Normal free:508572kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:43692kB inactive_file:40916kB unevictable:168kB isolated(anon):0kB isolated(file):11868kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243624kB kernel_stack:6064kB pagetables:1040kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:796187 all_unreclaimable? no [7664736.768488] lowmem_reserve[]: 0 0 0 0 [7664736.772464] Node 1 Normal free:525320kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16208kB inactive_file:17300kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:1536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:60286 all_unreclaimable? yes [7664736.819237] lowmem_reserve[]: 0 0 0 0 [7664736.823206] Node 2 Normal free:525568kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31812kB inactive_file:36140kB unevictable:8680kB isolated(anon):0kB isolated(file):3584kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715236kB slab_unreclaimable:62476060kB kernel_stack:7760kB pagetables:348kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:900914 all_unreclaimable? yes [7664736.870244] lowmem_reserve[]: 0 0 0 0 [7664736.874219] Node 3 Normal free:524200kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:44168kB inactive_file:44048kB unevictable:840kB isolated(anon):0kB isolated(file):256kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:880kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1207843 all_unreclaimable? yes [7664736.920995] lowmem_reserve[]: 0 0 0 0 [7664736.924960] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664736.939799] Node 0 DMA32: 331*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261428kB [7664736.956293] Node 0 Normal: 6397*4kB (UEM) 5775*8kB (UEM) 3953*16kB (UEM) 4489*32kB (UEM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 509404kB [7664736.973131] Node 1 Normal: 88057*4kB (UEM) 21634*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525316kB [7664736.986662] Node 2 Normal: 27543*4kB (UEM) 40231*8kB (UEM) 896*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525748kB [7664737.002061] Node 3 Normal: 131035*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524140kB [7664737.014413] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664737.023279] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664737.031886] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664737.040752] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664737.049358] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664737.058224] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664737.066831] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664737.075696] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664737.084302] 74607 total pagecache pages [7664737.088315] 0 pages in swap cache [7664737.091807] Swap cache stats: add 21120761, delete 21136733, find 4513450/7609970 [7664737.099460] Free swap = 3114448kB [7664737.103038] Total swap = 4194300kB [7664737.106621] 66993253 pages RAM [7664737.109851] 0 pages HighMem/MovableOnly [7664737.113862] 1101945 pages reserved [7664737.132040] ll_ost_io03_087 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664737.140482] ll_ost_io03_087 cpuset=/ mems_allowed=3 [7664737.145547] CPU: 43 PID: 8685 Comm: ll_ost_io03_087 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664737.158842] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664737.166671] Call Trace: [7664737.169304] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664737.174623] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664737.180119] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664737.185957] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664737.191711] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664737.197718] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664737.204070] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664737.210161] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664737.215907] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664737.222434] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664737.228961] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664737.235138] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664737.241145] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664737.247253] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664737.254139] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664737.261370] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664737.267535] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664737.274184] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664737.281232] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664737.286987] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664737.293434] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664737.300313] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664737.305843] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664737.312927] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664737.320684] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664737.327952] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664737.335821] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664737.342818] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664737.350771] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664737.358027] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664737.364508] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664737.372077] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664737.377135] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664737.383404] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664737.390024] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664737.396288] Mem-Info: [7664737.398750] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34302 inactive_file:36159 isolated_file:3415 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824053 slab_unreclaimable:62296627 mapped:1587 shmem:0 pagetables:903 bounce:0 free:590213 free_pcp:0 free_cma:0 [7664737.432938] Node 3 Normal free:524200kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:44168kB inactive_file:44048kB unevictable:840kB isolated(anon):0kB isolated(file):256kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854268kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:768kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1207843 all_unreclaimable? yes [7664737.479720] lowmem_reserve[]: 0 0 0 0 [7664737.483692] Node 3 Normal: 131035*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524140kB [7664737.496045] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664737.504909] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664737.513515] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664737.522383] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664737.530987] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664737.539856] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664737.548468] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664737.557337] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664737.565950] 74698 total pagecache pages [7664737.569966] 0 pages in swap cache [7664737.573464] Swap cache stats: add 21120764, delete 21136736, find 4513450/7609971 [7664737.581115] Free swap = 3114960kB [7664737.584694] Total swap = 4194300kB [7664737.588275] 66993253 pages RAM [7664737.591508] 0 pages HighMem/MovableOnly [7664737.595519] 1101945 pages reserved [7664737.599098] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664737.607148] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664737.616106] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664737.624889] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664737.633489] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664737.641666] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664737.650278] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664737.658540] [53101] 0 53101 1910 64 9 172 0 mdadm [7664737.666636] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664737.675167] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664737.684124] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664737.692477] [53139] 997 53139 29446 250 28 128 0 chronyd [7664737.700737] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664737.709611] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664737.717612] [53969] 0 53969 31572 204 20 169 0 crond [7664737.725706] [54035] 0 54035 27526 164 10 33 0 agetty [7664737.733877] [54036] 0 54036 27526 158 11 33 0 agetty [7664737.742051] [54186] 0 54186 22934 209 46 274 0 master [7664737.750342] [36317] 0 36317 28294 187 14 61 0 bash [7664737.758346] [36328] 0 36328 154746 223 201 98 0 journalctl [7664737.766873] [36329] 0 36329 28177 160 14 55 0 grep [7664737.774990] [98579] 0 98579 48653 266 49 236 0 crond [7664737.783086] Out of memory: Kill process 98579 (crond) score 0 or sacrifice child [7664737.790655] Killed process 98579 (crond) total-vm:194612kB, anon-rss:0kB, file-rss:1064kB, shmem-rss:0kB [7664737.929375] crond: page allocation failure: order:0, mode:0x200da [7664737.935651] CPU: 27 PID: 98579 Comm: crond Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664737.948164] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664737.955988] Call Trace: [7664737.958625] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664737.963938] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664737.970032] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664737.975612] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664737.981617] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664737.988145] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664737.994682] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664738.000522] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664738.007142] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664738.013410] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664738.019338] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664738.025350] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664738.031269] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664738.037189] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664738.042760] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664738.048074] Mem-Info: [7664738.050550] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34946 inactive_file:37099 isolated_file:2304 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824053 slab_unreclaimable:62296627 mapped:1587 shmem:0 pagetables:903 bounce:0 free:590052 free_pcp:0 free_cma:0 [7664738.084729] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664738.126483] lowmem_reserve[]: 0 1418 63868 63868 [7664738.131405] Node 0 DMA32 free:261020kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1216kB inactive_file:2452kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:69388 all_unreclaimable? yes [7664738.176188] lowmem_reserve[]: 0 0 62450 62450 [7664738.180858] Node 0 Normal free:508672kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:46040kB inactive_file:45564kB unevictable:168kB isolated(anon):0kB isolated(file):3584kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243624kB kernel_stack:6016kB pagetables:984kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1088878 all_unreclaimable? yes [7664738.227712] lowmem_reserve[]: 0 0 0 0 [7664738.231681] Node 1 Normal free:525328kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16624kB inactive_file:17200kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411340kB kernel_stack:20816kB pagetables:1524kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:382715 all_unreclaimable? yes [7664738.278548] lowmem_reserve[]: 0 0 0 0 [7664738.282521] Node 2 Normal free:525168kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31068kB inactive_file:40408kB unevictable:8680kB isolated(anon):0kB isolated(file):128kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715244kB slab_unreclaimable:62476060kB kernel_stack:7760kB pagetables:332kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:423385 all_unreclaimable? yes [7664738.329459] lowmem_reserve[]: 0 0 0 0 [7664738.333428] Node 3 Normal free:524652kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:44164kB inactive_file:44148kB unevictable:840kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854272kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:768kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:456293 all_unreclaimable? yes [7664738.380119] lowmem_reserve[]: 0 0 0 0 [7664738.384091] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664738.398929] Node 0 DMA32: 305*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261324kB [7664738.415423] Node 0 Normal: 6274*4kB (UEM) 5755*8kB (UEM) 3938*16kB (UEM) 4489*32kB (UEM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508512kB [7664738.432260] Node 1 Normal: 88056*4kB (UEM) 21636*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525328kB [7664738.445792] Node 2 Normal: 27435*4kB (UEM) 40216*8kB (UEM) 896*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525196kB [7664738.461190] Node 3 Normal: 131148*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524592kB [7664738.473629] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664738.482497] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664738.491104] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664738.499979] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664738.508596] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664738.517468] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664738.526074] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664738.534939] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664738.543544] 74846 total pagecache pages [7664738.547559] 0 pages in swap cache [7664738.551052] Swap cache stats: add 21120764, delete 21136736, find 4513450/7609971 [7664738.558703] Free swap = 3114960kB [7664738.562280] Total swap = 4194300kB [7664738.565863] 66993253 pages RAM [7664738.569095] 0 pages HighMem/MovableOnly [7664738.573107] 1101945 pages reserved [7664738.584290] ll_ost_io00_058 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664738.592733] ll_ost_io00_058 cpuset=/ mems_allowed=0 [7664738.597793] CPU: 0 PID: 3176 Comm: ll_ost_io00_058 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664738.610998] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664738.618824] Call Trace: [7664738.621458] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664738.626773] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664738.632259] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664738.638094] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664738.643849] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664738.649863] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664738.656223] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664738.662324] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664738.668079] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664738.674607] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664738.681142] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664738.687328] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664738.693333] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664738.699441] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664738.706326] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664738.712473] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664738.719568] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664738.726129] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664738.732766] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664738.740116] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664738.746853] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664738.754201] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664738.761716] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664738.768637] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664738.775723] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664738.783474] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664738.790733] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664738.798602] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664738.805606] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664738.813561] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664738.820822] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664738.827297] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664738.834872] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664738.839933] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664738.846210] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664738.852827] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664738.859093] Mem-Info: [7664738.861559] active_anon:0 inactive_anon:4 isolated_anon:0 active_file:33767 inactive_file:35484 isolated_file:3424 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824056 slab_unreclaimable:62296628 mapped:1587 shmem:0 pagetables:903 bounce:0 free:590183 free_pcp:0 free_cma:0 [7664738.895746] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664738.937506] lowmem_reserve[]: 0 1418 63868 63868 [7664738.942433] Node 0 DMA32 free:261324kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1256kB inactive_file:3552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:70892 all_unreclaimable? yes [7664738.987216] lowmem_reserve[]: 0 0 62450 62450 [7664738.991880] Node 0 Normal free:508512kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:48564kB inactive_file:41540kB unevictable:168kB isolated(anon):0kB isolated(file):6016kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243624kB kernel_stack:6096kB pagetables:984kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:372493 all_unreclaimable? yes [7664739.038662] lowmem_reserve[]: 0 0 0 0 [7664739.042628] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664739.057468] Node 0 DMA32: 305*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261324kB [7664739.073958] Node 0 Normal: 6274*4kB (UEM) 5756*8kB (UEM) 3922*16kB (UEM) 4489*32kB (UEM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508264kB [7664739.090801] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664739.099666] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664739.108272] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664739.117138] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664739.125745] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664739.134612] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664739.143224] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664739.152091] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664739.160698] 74841 total pagecache pages [7664739.164710] 0 pages in swap cache [7664739.168203] Swap cache stats: add 21120768, delete 21136740, find 4513451/7609973 [7664739.175854] Free swap = 3115472kB [7664739.179434] Total swap = 4194300kB [7664739.183015] 66993253 pages RAM [7664739.186246] 0 pages HighMem/MovableOnly [7664739.190258] 1101945 pages reserved [7664739.193838] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664739.201886] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664739.210846] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664739.219640] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664739.228234] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664739.236414] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664739.245027] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664739.253296] [53101] 0 53101 1910 64 9 172 0 mdadm [7664739.261391] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664739.269919] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664739.278885] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664739.287234] [53139] 997 53139 29446 250 28 128 0 chronyd [7664739.295496] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664739.304370] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664739.312379] [53969] 0 53969 31572 204 20 169 0 crond [7664739.320470] [54035] 0 54035 27526 164 10 33 0 agetty [7664739.328647] [54036] 0 54036 27526 158 11 33 0 agetty [7664739.336825] [54186] 0 54186 22934 209 46 274 0 master [7664739.345112] [36317] 0 36317 28294 187 14 61 0 bash [7664739.353117] [36328] 0 36328 154746 223 201 98 0 journalctl [7664739.361639] [36329] 0 36329 28177 160 14 55 0 grep [7664739.369768] Out of memory: Kill process 54186 (master) score 0 or sacrifice child [7664739.377427] Killed process 54186 (master) total-vm:91736kB, anon-rss:0kB, file-rss:836kB, shmem-rss:0kB [7664739.610832] master: page allocation failure: order:0, mode:0x200da [7664739.617198] CPU: 44 PID: 54186 Comm: master Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664739.629795] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664739.637622] Call Trace: [7664739.640255] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664739.645572] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664739.651673] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664739.657254] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664739.663268] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664739.669796] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664739.676329] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664739.682169] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664739.688782] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664739.695049] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664739.700967] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664739.706974] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664739.712893] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664739.718812] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664739.724385] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664739.729700] [<ffffffffa0037c36>] ? save_xstate_sig+0x166/0x1f0 [7664739.735800] [<ffffffffa0037c23>] ? save_xstate_sig+0x153/0x1f0 [7664739.741891] [<ffffffffa01ed3ad>] ? handle_mm_fault+0x39d/0x9b0 [7664739.747985] [<ffffffffa002b949>] do_signal+0x479/0x6f0 [7664739.753383] [<ffffffffa0772628>] ? __do_page_fault+0x228/0x4f0 [7664739.759475] [<ffffffffa002bc32>] do_notify_resume+0x72/0xc0 [7664739.765309] [<ffffffffa076e56c>] retint_signal+0x48/0x8c [7664739.770881] Mem-Info: [7664739.773358] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34293 inactive_file:37546 isolated_file:2368 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824056 slab_unreclaimable:62296628 mapped:1587 shmem:0 pagetables:854 bounce:0 free:590151 free_pcp:0 free_cma:0 [7664739.807542] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664739.849280] lowmem_reserve[]: 0 1418 63868 63868 [7664739.854205] Node 0 DMA32 free:261324kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1256kB inactive_file:3552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:4kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:71468 all_unreclaimable? yes [7664739.898983] lowmem_reserve[]: 0 0 62450 62450 [7664739.903648] Node 0 Normal free:508264kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:43792kB inactive_file:43952kB unevictable:168kB isolated(anon):0kB isolated(file):6016kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610912kB slab_unreclaimable:60243624kB kernel_stack:6352kB pagetables:788kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1218480 all_unreclaimable? no [7664739.950423] lowmem_reserve[]: 0 0 0 0 [7664739.954392] Node 1 Normal free:525328kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16676kB inactive_file:16860kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411340kB kernel_stack:20816kB pagetables:1524kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:382715 all_unreclaimable? yes [7664740.001254] lowmem_reserve[]: 0 0 0 0 [7664740.005222] Node 2 Normal free:525196kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31244kB inactive_file:41008kB unevictable:8680kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715244kB slab_unreclaimable:62476060kB kernel_stack:7760kB pagetables:332kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:322787 all_unreclaimable? yes [7664740.051998] lowmem_reserve[]: 0 0 0 0 [7664740.055970] Node 3 Normal free:524588kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:43948kB inactive_file:38428kB unevictable:840kB isolated(anon):0kB isolated(file):5120kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854272kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:768kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1881692 all_unreclaimable? yes [7664740.102838] lowmem_reserve[]: 0 0 0 0 [7664740.106805] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664740.121643] Node 0 DMA32: 305*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261324kB [7664740.138134] Node 0 Normal: 6322*4kB (UEM) 5759*8kB (UEM) 3942*16kB (UEM) 4489*32kB (UEM) 2041*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508800kB [7664740.154975] Node 1 Normal: 88056*4kB (UEM) 21636*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525328kB [7664740.168503] Node 2 Normal: 27435*4kB (UEM) 40216*8kB (UEM) 896*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525196kB [7664740.183904] Node 3 Normal: 131148*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524592kB [7664740.196342] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664740.205211] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664740.213823] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664740.222690] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664740.231300] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664740.240171] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664740.248778] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664740.257650] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664740.266256] 74854 total pagecache pages [7664740.270272] 0 pages in swap cache [7664740.273761] Swap cache stats: add 21120768, delete 21136740, find 4513451/7609973 [7664740.281415] Free swap = 3115472kB [7664740.284994] Total swap = 4194300kB [7664740.288575] 66993253 pages RAM [7664740.291806] 0 pages HighMem/MovableOnly [7664740.295818] 1101945 pages reserved [7664740.356458] ll_ost_io01_033 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664740.364897] ll_ost_io01_033 cpuset=/ mems_allowed=1 [7664740.369959] CPU: 25 PID: 123087 Comm: ll_ost_io01_033 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664740.383426] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664740.391257] Call Trace: [7664740.393894] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664740.399208] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664740.404703] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664740.410547] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664740.416301] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664740.422312] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664740.428664] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664740.434760] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664740.440515] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664740.447048] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664740.453583] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664740.459761] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664740.465766] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664740.471873] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664740.478751] [<ffffffffa076d7a0>] ? _raw_spin_lock+0x20/0x30 [7664740.484589] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664740.491817] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664740.497984] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664740.504642] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664740.511729] [<ffffffffc11dcbe7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [7664740.518385] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664740.525440] [<ffffffffa00dca58>] ? __enqueue_entity+0x78/0x80 [7664740.531453] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664740.536981] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664740.544069] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664740.551820] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664740.559074] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664740.566937] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664740.573941] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664740.581892] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664740.589149] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664740.595634] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664740.603208] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664740.608267] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664740.614534] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664740.621153] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664740.627419] Mem-Info: [7664740.629877] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33694 inactive_file:35804 isolated_file:4416 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824056 slab_unreclaimable:62296628 mapped:1587 shmem:0 pagetables:854 bounce:0 free:590255 free_pcp:0 free_cma:0 [7664740.664060] Node 1 Normal free:525328kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16676kB inactive_file:16316kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411340kB kernel_stack:20816kB pagetables:1524kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:135299 all_unreclaimable? yes [7664740.710924] lowmem_reserve[]: 0 0 0 0 [7664740.714891] Node 1 Normal: 88058*4kB (UEM) 21636*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525336kB [7664740.728425] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664740.737298] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664740.745910] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664740.754776] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664740.763384] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664740.772248] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664740.780856] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664740.789722] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664740.798329] 74854 total pagecache pages [7664740.802342] 0 pages in swap cache [7664740.805832] Swap cache stats: add 21120776, delete 21136748, find 4513454/7609978 [7664740.813486] Free swap = 3116496kB [7664740.817063] Total swap = 4194300kB [7664740.820646] 66993253 pages RAM [7664740.823876] 0 pages HighMem/MovableOnly [7664740.827889] 1101945 pages reserved [7664740.831468] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664740.839519] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664740.848475] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664740.857262] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664740.865840] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664740.874017] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664740.882633] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664740.890899] [53101] 0 53101 1910 64 9 172 0 mdadm [7664740.898987] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664740.907513] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664740.916467] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664740.924820] [53139] 997 53139 29446 250 28 128 0 chronyd [7664740.933082] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664740.941954] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664740.949956] [53969] 0 53969 31572 204 20 169 0 crond [7664740.958048] [54035] 0 54035 27526 164 10 33 0 agetty [7664740.966222] [54036] 0 54036 27526 158 11 33 0 agetty [7664740.974486] [36317] 0 36317 28294 187 14 61 0 bash [7664740.982488] [36328] 0 36328 154746 223 201 98 0 journalctl [7664740.991007] [36329] 0 36329 28177 160 14 55 0 grep [7664740.999132] Out of memory: Kill process 36328 (journalctl) score 0 or sacrifice child [7664741.007137] Killed process 36328 (journalctl) total-vm:618984kB, anon-rss:0kB, file-rss:892kB, shmem-rss:0kB [7664741.188673] ll_ost_io02_052 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [7664741.196863] ll_ost_io02_052 cpuset=/ mems_allowed=2 [7664741.201933] CPU: 34 PID: 6885 Comm: ll_ost_io02_052 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664741.215226] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664741.223058] Call Trace: [7664741.225701] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664741.231026] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664741.236518] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664741.242358] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664741.248110] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664741.254127] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664741.260485] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664741.266576] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664741.272324] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664741.278852] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664741.285387] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664741.291646] [<ffffffffc124293f>] tgt_checksum_niobuf_rw+0xbf/0xe00 [ptlrpc] [7664741.298903] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664741.306236] [<ffffffffc0cb71e0>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [7664741.313084] [<ffffffffc1247325>] tgt_brw_read+0xc35/0x1e50 [ptlrpc] [7664741.319637] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664741.326990] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664741.334334] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664741.341851] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664741.348776] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664741.355868] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664741.363622] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664741.370883] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664741.378758] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664741.385725] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664741.391164] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664741.397644] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664741.405212] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664741.410274] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664741.416550] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664741.423168] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664741.429435] Mem-Info: [7664741.431894] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33739 inactive_file:35965 isolated_file:4284 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824056 slab_unreclaimable:62296622 mapped:1587 shmem:0 pagetables:607 bounce:0 free:590432 free_pcp:0 free_cma:0 [7664741.466081] Node 2 Normal free:525284kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31244kB inactive_file:39600kB unevictable:8680kB isolated(anon):0kB isolated(file):1280kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715244kB slab_unreclaimable:62476044kB kernel_stack:7760kB pagetables:260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:208183 all_unreclaimable? yes [7664741.513123] lowmem_reserve[]: 0 0 0 0 [7664741.517097] Node 2 Normal: 27445*4kB (UEM) 40220*8kB (UEM) 897*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525284kB [7664741.532499] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664741.541366] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664741.549972] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664741.558838] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664741.567447] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664741.576319] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664741.584924] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664741.593789] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664741.602395] 74986 total pagecache pages [7664741.606412] 0 pages in swap cache [7664741.609912] Swap cache stats: add 21120779, delete 21136751, find 4513455/7609983 [7664741.617562] Free swap = 3116752kB [7664741.621142] Total swap = 4194300kB [7664741.624724] 66993253 pages RAM [7664741.627952] 0 pages HighMem/MovableOnly [7664741.631966] 1101945 pages reserved [7664741.635546] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664741.643594] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664741.652553] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664741.661348] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664741.669941] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664741.678123] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664741.686735] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664741.695005] [53101] 0 53101 1910 64 9 172 0 mdadm [7664741.703101] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664741.711627] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664741.720586] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664741.728933] [53139] 997 53139 29446 250 28 128 0 chronyd [7664741.737202] [53180] 0 53180 6704 219 18 222 0 systemd-logind [7664741.746078] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664741.754088] [53969] 0 53969 31572 204 20 169 0 crond [7664741.762179] [54035] 0 54035 27526 164 10 33 0 agetty [7664741.770360] [54036] 0 54036 27526 158 11 33 0 agetty [7664741.778659] [36317] 0 36317 28294 187 14 61 0 bash [7664741.786662] [36329] 0 36329 28177 160 14 55 0 grep [7664741.794783] Out of memory: Kill process 53180 (systemd-logind) score 0 or sacrifice child [7664741.803137] Killed process 53180 (systemd-logind) total-vm:26816kB, anon-rss:0kB, file-rss:876kB, shmem-rss:0kB [7664741.817391] systemd-logind: page allocation failure: order:0, mode:0x200da [7664741.824446] CPU: 27 PID: 53180 Comm: systemd-logind Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664741.837738] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664741.845570] Call Trace: [7664741.848199] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664741.853524] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664741.859619] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664741.865195] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664741.871206] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664741.877732] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664741.884261] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664741.890100] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664741.896712] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664741.902977] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664741.908898] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664741.914904] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664741.920824] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664741.926741] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664741.929616] LustreError: 90696:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c14ee558c00 [7664741.943272] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664741.948594] [<ffffffffa028f7a1>] ? ep_send_events_proc+0x101/0x1b0 [7664741.955040] [<ffffffffa076a17d>] ? schedule_hrtimeout_range_clock+0x12d/0x150 [7664741.962436] [<ffffffffa028f6a0>] ? ep_ptable_queue_proc+0xb0/0xb0 [7664741.968794] [<ffffffffa028fe1a>] ep_scan_ready_list.isra.7+0x9a/0x1f0 [7664741.975496] [<ffffffffa02900b3>] ep_poll+0x123/0x360 [7664741.980729] [<ffffffffa00d7c40>] ? wake_up_state+0x20/0x20 [7664741.986476] [<ffffffffa029169d>] SyS_epoll_wait+0xed/0x120 [7664741.992229] [<ffffffffa0777ddb>] system_call_fastpath+0x22/0x27 [7664741.998407] Mem-Info: [7664742.000885] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33879 inactive_file:38131 isolated_file:2848 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824058 slab_unreclaimable:62296622 mapped:1587 shmem:0 pagetables:607 bounce:0 free:590294 free_pcp:0 free_cma:0 [7664742.035073] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664742.076829] lowmem_reserve[]: 0 1418 63868 63868 [7664742.081756] Node 0 DMA32 free:261328kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1256kB inactive_file:3552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:21814 all_unreclaimable? yes [7664742.126538] lowmem_reserve[]: 0 0 62450 62450 [7664742.131201] Node 0 Normal free:508776kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:43564kB inactive_file:47148kB unevictable:168kB isolated(anon):0kB isolated(file):5504kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610920kB slab_unreclaimable:60243624kB kernel_stack:5856kB pagetables:492kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:543077 all_unreclaimable? yes [7664742.177976] lowmem_reserve[]: 0 0 0 0 [7664742.181946] Node 1 Normal free:524952kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17052kB inactive_file:16828kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411332kB kernel_stack:20816kB pagetables:1212kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:181939 all_unreclaimable? yes [7664742.228810] lowmem_reserve[]: 0 0 0 0 [7664742.232796] Node 2 Normal free:525280kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31592kB inactive_file:34256kB unevictable:8680kB isolated(anon):0kB isolated(file):6784kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715248kB slab_unreclaimable:62476044kB kernel_stack:7760kB pagetables:260kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:416034 all_unreclaimable? yes [7664742.279828] lowmem_reserve[]: 0 0 0 0 [7664742.283799] Node 3 Normal free:524964kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:40208kB inactive_file:41396kB unevictable:840kB isolated(anon):0kB isolated(file):4864kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854272kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:464kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:112249 all_unreclaimable? no [7664742.330493] lowmem_reserve[]: 0 0 0 0 [7664742.334459] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664742.349296] Node 0 DMA32: 306*4kB (UEM) 413*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261328kB [7664742.365792] Node 0 Normal: 6278*4kB (EM) 5759*8kB (UEM) 3950*16kB (UEM) 4490*32kB (UEM) 2042*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508848kB [7664742.382544] Node 1 Normal: 87986*4kB (UEM) 21562*8kB (UM) 4*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524504kB [7664742.396074] Node 2 Normal: 27443*4kB (UEM) 40221*8kB (UEM) 897*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525284kB [7664742.411473] Node 3 Normal: 131235*4kB (UEM) 6*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524988kB [7664742.424284] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664742.433148] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664742.441757] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664742.450625] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664742.459238] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664742.468104] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664742.476711] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664742.485577] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664742.494181] 75195 total pagecache pages [7664742.498193] 0 pages in swap cache [7664742.501686] Swap cache stats: add 21120779, delete 21136751, find 4513455/7609983 [7664742.509339] Free swap = 3116752kB [7664742.512917] Total swap = 4194300kB [7664742.516499] 66993253 pages RAM [7664742.519728] 0 pages HighMem/MovableOnly [7664742.523742] 1101945 pages reserved [7664743.092356] ll_ost_io00_067 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664743.100797] ll_ost_io00_067 cpuset=/ mems_allowed=0 [7664743.105859] CPU: 24 PID: 96095 Comm: ll_ost_io00_067 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664743.119237] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664743.127065] Call Trace: [7664743.129697] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664743.135015] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664743.140509] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664743.146350] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664743.152105] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664743.158118] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664743.164473] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664743.170573] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664743.176320] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664743.182855] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664743.189395] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664743.195579] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664743.201594] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664743.207706] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664743.214591] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664743.221817] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664743.227962] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664743.234577] [<ffffffffa021ccc1>] ? __slab_free+0x81/0x2f0 [7664743.240235] [<ffffffffa021ccc1>] ? __slab_free+0x81/0x2f0 [7664743.245897] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664743.251650] [<ffffffffa00ddd9e>] ? account_entity_dequeue+0xae/0xd0 [7664743.258176] [<ffffffffa00e192c>] ? dequeue_entity+0x11c/0x5e0 [7664743.264183] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664743.269704] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664743.276783] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664743.284529] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664743.291784] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664743.299646] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664743.306607] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664743.312053] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664743.318529] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664743.326099] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664743.331151] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664743.337418] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664743.344039] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664743.350312] Mem-Info: [7664743.352779] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35180 inactive_file:37446 isolated_file:1152 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824059 slab_unreclaimable:62296617 mapped:1587 shmem:0 pagetables:589 bounce:0 free:590070 free_pcp:0 free_cma:0 [7664743.386960] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664743.428711] lowmem_reserve[]: 0 1418 63868 63868 [7664743.433634] Node 0 DMA32 free:261328kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1256kB inactive_file:3552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:30550 all_unreclaimable? yes [7664743.478419] lowmem_reserve[]: 0 0 62450 62450 [7664743.483087] Node 0 Normal free:508020kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:48472kB inactive_file:47780kB unevictable:168kB isolated(anon):0kB isolated(file):256kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610920kB slab_unreclaimable:60243616kB kernel_stack:6160kB pagetables:424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1505522 all_unreclaimable? yes [7664743.529862] lowmem_reserve[]: 0 0 0 0 [7664743.533829] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664743.548667] Node 0 DMA32: 306*4kB (UEM) 414*8kB (UEM) 1216*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261336kB [7664743.565160] Node 0 Normal: 6218*4kB (UEM) 5747*8kB (UEM) 3920*16kB (UEM) 4490*32kB (UEM) 2042*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508032kB [7664743.582001] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664743.590869] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664743.599474] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664743.608340] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664743.616947] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664743.625822] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664743.634434] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664743.643302] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664743.651907] 75346 total pagecache pages [7664743.655920] 0 pages in swap cache [7664743.659412] Swap cache stats: add 21120783, delete 21136755, find 4513455/7609984 [7664743.667064] Free swap = 3117520kB [7664743.670643] Total swap = 4194300kB [7664743.674223] 66993253 pages RAM [7664743.677456] 0 pages HighMem/MovableOnly [7664743.681469] 1101945 pages reserved [7664743.685049] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664743.693092] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664743.702045] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664743.710833] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664743.719420] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664743.727597] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664743.736212] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664743.744480] [53101] 0 53101 1910 64 9 172 0 mdadm [7664743.752575] [53106] 0 53106 5514 188 15 221 0 irqbalance [7664743.761101] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664743.770052] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664743.778399] [53139] 997 53139 29446 250 28 128 0 chronyd [7664743.786664] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664743.794667] [53969] 0 53969 31572 204 20 169 0 crond [7664743.802755] [54035] 0 54035 27526 164 10 33 0 agetty [7664743.810934] [54036] 0 54036 27526 158 11 33 0 agetty [7664743.819220] [36317] 0 36317 28294 187 14 61 0 bash [7664743.827222] [36329] 0 36329 28177 160 14 55 0 grep [7664743.835345] Out of memory: Kill process 53106 (irqbalance) score 0 or sacrifice child [7664743.843349] Killed process 53106 (irqbalance) total-vm:22056kB, anon-rss:0kB, file-rss:752kB, shmem-rss:0kB [7664743.938893] irqbalance: page allocation failure: order:0, mode:0x200da [7664743.945604] CPU: 20 PID: 53106 Comm: irqbalance Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664743.958549] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664743.966383] Call Trace: [7664743.969016] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664743.974334] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664743.980432] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664743.986014] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664743.992028] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664743.998567] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664744.005098] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664744.010942] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664744.017559] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664744.023829] [<ffffffffa0200cd2>] swapin_readahead+0xe2/0x110 [7664744.029753] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664744.035761] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664744.041687] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664744.047608] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664744.053188] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664744.058499] Mem-Info: [7664744.060974] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34697 inactive_file:36940 isolated_file:1934 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824059 slab_unreclaimable:62296617 mapped:1587 shmem:0 pagetables:589 bounce:0 free:590053 free_pcp:0 free_cma:0 [7664744.095157] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664744.136908] lowmem_reserve[]: 0 1418 63868 63868 [7664744.141832] Node 0 DMA32 free:261316kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1148kB inactive_file:1284kB unevictable:0kB isolated(anon):0kB isolated(file):1332kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686208kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:766438 all_unreclaimable? yes [7664744.186962] lowmem_reserve[]: 0 0 62450 62450 [7664744.191631] Node 0 Normal free:508032kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:47368kB inactive_file:46768kB unevictable:168kB isolated(anon):0kB isolated(file):896kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610920kB slab_unreclaimable:60243616kB kernel_stack:6096kB pagetables:424kB unstable:0kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:320428 all_unreclaimable? yes [7664744.238321] lowmem_reserve[]: 0 0 0 0 [7664744.242297] Node 1 Normal free:524680kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:15772kB inactive_file:17576kB unevictable:26488kB isolated(anon):0kB isolated(file):1408kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:1212kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:752627 all_unreclaimable? yes [7664744.289420] lowmem_reserve[]: 0 0 0 0 [7664744.293396] Node 2 Normal free:525300kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31240kB inactive_file:39344kB unevictable:8680kB isolated(anon):0kB isolated(file):1536kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715248kB slab_unreclaimable:62476028kB kernel_stack:7760kB pagetables:260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:385766 all_unreclaimable? yes [7664744.340434] lowmem_reserve[]: 0 0 0 0 [7664744.344409] Node 3 Normal free:525128kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:44156kB inactive_file:43388kB unevictable:840kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854272kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:460kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:596549 all_unreclaimable? yes [7664744.390925] lowmem_reserve[]: 0 0 0 0 [7664744.394900] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664744.409738] Node 0 DMA32: 465*4kB (UEM) 401*8kB (EM) 1212*16kB (UEM) 3687*32kB (UEM) 1489*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261804kB [7664744.426143] Node 0 Normal: 6224*4kB (UEM) 5748*8kB (UEM) 3911*16kB (UEM) 4490*32kB (UEM) 2042*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 507920kB [7664744.442983] Node 1 Normal: 88031*4kB (UEM) 21563*8kB (UM) 4*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524692kB [7664744.456513] Node 2 Normal: 27443*4kB (UEM) 40221*8kB (UEM) 898*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525300kB [7664744.471914] Node 3 Normal: 131268*4kB (UEM) 7*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525128kB [7664744.484724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664744.493591] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664744.502206] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664744.511070] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664744.519679] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664744.528552] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664744.537159] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664744.546023] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664744.554630] 75187 total pagecache pages [7664744.558644] 0 pages in swap cache [7664744.562137] Swap cache stats: add 21120784, delete 21136756, find 4513455/7609984 [7664744.569787] Free swap = 3117520kB [7664744.573368] Total swap = 4194300kB [7664744.576949] 66993253 pages RAM [7664744.580186] 0 pages HighMem/MovableOnly [7664744.584201] 1101945 pages reserved [7664744.670646] ll_ost_io03_053 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664744.679092] ll_ost_io03_053 cpuset=/ mems_allowed=3 [7664744.684148] CPU: 31 PID: 7277 Comm: ll_ost_io03_053 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664744.697444] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664744.705274] Call Trace: [7664744.707911] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664744.713224] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664744.718712] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664744.724551] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664744.730297] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664744.736303] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664744.742655] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664744.748749] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664744.754497] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664744.761031] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664744.767568] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664744.773753] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664744.779759] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664744.785866] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664744.792752] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664744.799982] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664744.806146] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664744.812797] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664744.819845] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664744.825598] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664744.832036] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664744.838909] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664744.844440] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664744.851525] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664744.859280] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664744.866543] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664744.874412] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664744.881411] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664744.889360] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664744.896621] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664744.903096] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664744.910663] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664744.915725] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664744.921998] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664744.924369] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664744.924379] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664744.924388] LustreError: 3033:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small for magic/version check [7664744.924391] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664744.924396] LustreError: 3033:0:(sec.c:2191:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.49.20.19@o2ib1 x1659023349352448 [7664744.924399] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664744.994212] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664745.000484] Mem-Info: [7664745.002942] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33705 inactive_file:36695 isolated_file:2944 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824059 slab_unreclaimable:62296617 mapped:1587 shmem:0 pagetables:574 bounce:0 free:590210 free_pcp:0 free_cma:0 [7664745.037135] Node 3 Normal free:525128kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:41852kB inactive_file:42144kB unevictable:840kB isolated(anon):0kB isolated(file):512kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854272kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:460kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:302072 all_unreclaimable? yes [7664745.083823] lowmem_reserve[]: 0 0 0 0 [7664745.087790] Node 3 Normal: 131274*4kB (UEM) 7*8kB (U) 5*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525232kB [7664745.100976] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664745.109850] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664745.118456] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664745.127322] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664745.135926] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664745.144794] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664745.153399] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664745.162264] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664745.170871] 75179 total pagecache pages [7664745.174885] 0 pages in swap cache [7664745.178377] Swap cache stats: add 21120785, delete 21136757, find 4513455/7609985 [7664745.186030] Free swap = 3118288kB [7664745.189609] Total swap = 4194300kB [7664745.193191] 66993253 pages RAM [7664745.196421] 0 pages HighMem/MovableOnly [7664745.200432] 1101945 pages reserved [7664745.204012] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664745.212060] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664745.221020] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664745.229817] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664745.238420] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664745.246596] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664745.255201] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664745.256579] LustreError: 107084:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c39d93b1e00 [7664745.268215] LustreError: 6896:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c41ea2f7200 [7664745.277951] LustreError: 27189:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c2126afb200 [7664745.296302] [53101] 0 53101 1910 64 9 172 0 mdadm [7664745.304394] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664745.313348] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664745.321702] [53139] 997 53139 29446 250 28 128 0 chronyd [7664745.329976] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664745.337979] [53969] 0 53969 31572 204 20 169 0 crond [7664745.346072] [54035] 0 54035 27526 164 10 33 0 agetty [7664745.354244] [54036] 0 54036 27526 158 11 33 0 agetty [7664745.362540] [36317] 0 36317 28294 187 14 61 0 bash [7664745.370547] [36329] 0 36329 28177 160 14 55 0 grep [7664745.378674] Out of memory: Kill process 53139 (chronyd) score 0 or sacrifice child [7664745.386415] Killed process 53139 (chronyd) total-vm:117784kB, anon-rss:0kB, file-rss:1000kB, shmem-rss:0kB [7664745.533835] ll_ost_io03_093 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664745.542270] ll_ost_io03_093 cpuset=/ mems_allowed=3 [7664745.547324] CPU: 19 PID: 8717 Comm: ll_ost_io03_093 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664745.560619] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664745.568445] Call Trace: [7664745.571083] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664745.576402] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664745.581899] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664745.587736] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664745.593486] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664745.599500] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664745.605252] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664745.611779] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664745.618312] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664745.624491] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664745.630498] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664745.636604] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664745.643489] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664745.649645] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664745.656744] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664745.663307] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664745.669953] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664745.677299] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664745.684038] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664745.691386] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664745.698908] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664745.705829] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664745.712925] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664745.720676] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664745.724600] LustreError: 8712:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c5075370e00 [7664745.738793] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664745.746661] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664745.753662] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664745.761610] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664745.768869] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664745.775345] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664745.782914] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664745.787974] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664745.794240] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664745.800853] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664745.807116] Mem-Info: [7664745.809578] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33054 inactive_file:36598 isolated_file:2939 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824050 slab_unreclaimable:62296614 mapped:1587 shmem:0 pagetables:574 bounce:0 free:590189 free_pcp:0 free_cma:0 [7664745.843767] Node 3 Normal free:525268kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:41248kB inactive_file:46336kB unevictable:840kB isolated(anon):0kB isolated(file):896kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854244kB slab_unreclaimable:62369264kB kernel_stack:4208kB pagetables:460kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:235339 all_unreclaimable? yes [7664745.890456] lowmem_reserve[]: 0 0 0 0 [7664745.894424] Node 3 Normal: 131354*4kB (UEM) 7*8kB (U) 5*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525552kB [7664745.907610] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664745.916474] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664745.925080] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664745.933946] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664745.942553] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664745.951419] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664745.960024] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664745.968892] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664745.977499] 75199 total pagecache pages [7664745.981509] 0 pages in swap cache [7664745.985001] Swap cache stats: add 21120788, delete 21136760, find 4513458/7609990 [7664745.992655] Free swap = 3118800kB [7664745.996233] Total swap = 4194300kB [7664745.999816] 66993253 pages RAM [7664746.003044] 0 pages HighMem/MovableOnly [7664746.007057] 1101945 pages reserved [7664746.010636] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664746.018685] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664746.027646] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664746.036437] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664746.045032] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664746.053213] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664746.061827] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664746.070094] [53101] 0 53101 1910 64 9 172 0 mdadm [7664746.078185] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664746.087142] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664746.095493] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664746.103499] [53969] 0 53969 31572 204 20 169 0 crond [7664746.108102] Lustre: fir-OST001b: Bulk IO read error with fb2c1382-8f5a-4 (at 10.50.15.10@o2ib2), client will retry: rc -110 [7664746.108104] Lustre: Skipped 10 previous similar messages [7664746.128371] [54035] 0 54035 27526 164 10 33 0 agetty [7664746.136550] [54036] 0 54036 27526 158 11 33 0 agetty [7664746.144846] [36317] 0 36317 28294 187 14 61 0 bash [7664746.152845] [36329] 0 36329 28177 160 14 55 0 grep [7664746.160974] Out of memory: Kill process 53969 (crond) score 0 or sacrifice child [7664746.168540] Killed process 53969 (crond) total-vm:126288kB, anon-rss:0kB, file-rss:816kB, shmem-rss:0kB [7664747.313604] ll_ost_io02_075 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664747.322052] ll_ost_io02_075 cpuset=/ mems_allowed=2 [7664747.327113] CPU: 6 PID: 83185 Comm: ll_ost_io02_075 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664747.340402] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664747.348238] Call Trace: [7664747.350881] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664747.356203] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664747.361697] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664747.367542] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664747.373295] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664747.379310] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664747.385673] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664747.391772] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664747.397530] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664747.404065] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664747.410599] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664747.416785] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664747.422800] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664747.428919] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664747.435804] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664747.443035] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664747.449207] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664747.455859] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664747.462958] [<ffffffffc11dcbe7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [7664747.469586] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664747.475338] [<ffffffffa00dca58>] ? __enqueue_entity+0x78/0x80 [7664747.481350] [<ffffffffa00e367f>] ? enqueue_entity+0x2ef/0xbe0 [7664747.487366] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664747.492901] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664747.499992] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664747.507747] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664747.515008] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664747.522875] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664747.529842] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664747.535291] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664747.541769] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664747.549346] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664747.554405] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664747.560680] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664747.567300] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664747.573576] Mem-Info: [7664747.576043] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:33450 inactive_file:37198 isolated_file:3040 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824058 slab_unreclaimable:62296629 mapped:1587 shmem:0 pagetables:554 bounce:0 free:590294 free_pcp:0 free_cma:0 [7664747.610237] Node 2 Normal free:525308kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31240kB inactive_file:39464kB unevictable:8680kB isolated(anon):0kB isolated(file):1408kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715244kB slab_unreclaimable:62476028kB kernel_stack:7760kB pagetables:260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:552020 all_unreclaimable? yes [7664747.657278] lowmem_reserve[]: 0 0 0 0 [7664747.661250] Node 2 Normal: 27465*4kB (UEM) 40220*8kB (UEM) 899*16kB (UEM) 1669*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525396kB [7664747.676672] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664747.685541] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664747.694155] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664747.703030] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664747.711643] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664747.720510] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664747.729128] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664747.737997] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664747.746608] 75248 total pagecache pages [7664747.750627] 0 pages in swap cache [7664747.754130] Swap cache stats: add 21120793, delete 21136765, find 4513458/7609991 [7664747.761788] Free swap = 3119312kB [7664747.765369] Total swap = 4194300kB [7664747.768952] 66993253 pages RAM [7664747.772190] 0 pages HighMem/MovableOnly [7664747.776210] 1101945 pages reserved [7664747.779790] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664747.787844] [ 5686] 0 5686 16012 235 39 106 0 systemd-journal [7664747.796806] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664747.805603] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664747.814200] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664747.822384] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664747.830993] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664747.839257] [53101] 0 53101 1910 64 9 172 0 mdadm [7664747.847349] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664747.856313] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664747.864673] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664747.872685] [54035] 0 54035 27526 164 10 33 0 agetty [7664747.880866] [54036] 0 54036 27526 158 11 33 0 agetty [7664747.889157] [36317] 0 36317 28294 187 14 61 0 bash [7664747.897165] [36329] 0 36329 28177 160 14 55 0 grep [7664747.905283] Out of memory: Kill process 5686 (systemd-journal) score 0 or sacrifice child [7664747.913630] Killed process 5686 (systemd-journal) total-vm:64048kB, anon-rss:0kB, file-rss:940kB, shmem-rss:0kB [7664748.370656] systemd-journal: page allocation failure: order:0, mode:0x200da [7664748.377799] CPU: 25 PID: 5686 Comm: systemd-journal Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664748.391093] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664748.398928] Call Trace: [7664748.401567] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664748.406886] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664748.412984] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664748.418557] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664748.424564] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664748.431100] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664748.437634] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664748.443476] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664748.450096] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664748.456360] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664748.462281] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664748.468290] [<ffffffffa028f800>] ? ep_send_events_proc+0x160/0x1b0 [7664748.474734] [<ffffffffa076a17d>] ? schedule_hrtimeout_range_clock+0x12d/0x150 [7664748.482127] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664748.488048] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664748.493967] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664748.499538] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664748.504852] Mem-Info: [7664748.507324] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34985 inactive_file:37342 isolated_file:3008 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824059 slab_unreclaimable:62296658 mapped:1587 shmem:0 pagetables:526 bounce:0 free:590176 free_pcp:0 free_cma:0 [7664748.541508] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664748.583261] lowmem_reserve[]: 0 1418 63868 63868 [7664748.588182] Node 0 DMA32 free:261344kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:772kB inactive_file:3472kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686220kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:14958 all_unreclaimable? yes [7664748.633050] lowmem_reserve[]: 0 0 62450 62450 [7664748.637712] Node 0 Normal free:508380kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:48844kB inactive_file:43540kB unevictable:168kB isolated(anon):0kB isolated(file):5376kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610952kB slab_unreclaimable:60243736kB kernel_stack:6128kB pagetables:424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:65243 all_unreclaimable? no [7664748.684316] lowmem_reserve[]: 0 0 0 0 [7664748.688292] Node 1 Normal free:524924kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:16916kB inactive_file:16808kB unevictable:26488kB isolated(anon):0kB isolated(file):256kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411336kB kernel_stack:20816kB pagetables:1072kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:166087 all_unreclaimable? yes [7664748.735323] lowmem_reserve[]: 0 0 0 0 [7664748.739295] Node 2 Normal free:525368kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31240kB inactive_file:39456kB unevictable:8680kB isolated(anon):0kB isolated(file):1408kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715244kB slab_unreclaimable:62476060kB kernel_stack:7760kB pagetables:196kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:555146 all_unreclaimable? yes [7664748.786332] lowmem_reserve[]: 0 0 0 0 [7664748.790299] Node 3 Normal free:524832kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:42564kB inactive_file:43740kB unevictable:840kB isolated(anon):0kB isolated(file):2944kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854244kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:412kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:736429 all_unreclaimable? yes [7664748.837075] lowmem_reserve[]: 0 0 0 0 [7664748.841042] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664748.855878] Node 0 DMA32: 399*4kB (UEM) 401*8kB (EM) 1212*16kB (UEM) 3689*32kB (UEM) 1491*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261732kB [7664748.872286] Node 0 Normal: 6283*4kB (UEM) 5750*8kB (UEM) 3945*16kB (UEM) 4487*32kB (UEM) 2042*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508620kB [7664748.889126] Node 1 Normal: 88086*4kB (UEM) 21564*8kB (UM) 4*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524920kB [7664748.902654] Node 2 Normal: 27465*4kB (UEM) 40221*8kB (UEM) 899*16kB (UEM) 1667*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525340kB [7664748.918056] Node 3 Normal: 131206*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524824kB [7664748.930493] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664748.939360] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664748.947966] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664748.956832] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664748.965446] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664748.974312] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664748.982918] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664748.991789] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664749.000400] 75404 total pagecache pages [7664749.004413] 0 pages in swap cache [7664749.007906] Swap cache stats: add 21120793, delete 21136765, find 4513458/7609991 [7664749.015559] Free swap = 3119312kB [7664749.019144] Total swap = 4194300kB [7664749.022726] 66993253 pages RAM [7664749.025956] 0 pages HighMem/MovableOnly [7664749.029972] 1101945 pages reserved [7664749.920145] ll_ost_io02_074 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664749.928587] ll_ost_io02_074 cpuset=/ mems_allowed=2 [7664749.933649] CPU: 22 PID: 83183 Comm: ll_ost_io02_074 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664749.947028] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664749.954855] Call Trace: [7664749.957487] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664749.962832] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664749.968344] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664749.974218] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664749.980260] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664749.986678] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664749.992790] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664749.998551] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664750.005080] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664750.011639] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664750.017819] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664750.023835] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664750.029967] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664750.036872] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664750.044131] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664750.050320] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664750.056999] [<ffffffffc0a844f5>] ? lnet_try_match_md+0x1e5/0x330 [lnet] [7664750.063895] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664750.069645] [<ffffffffa00dca58>] ? __enqueue_entity+0x78/0x80 [7664750.075652] [<ffffffffa00e367f>] ? enqueue_entity+0x2ef/0xbe0 [7664750.081659] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664750.087195] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664750.094284] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664750.102035] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664750.109316] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664750.117203] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664750.124217] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664750.132180] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664750.139461] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664750.145977] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664750.153569] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664750.158670] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664750.164949] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664750.171576] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664750.177849] Mem-Info: [7664750.180321] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35389 inactive_file:38087 isolated_file:1376 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824059 slab_unreclaimable:62296676 mapped:1587 shmem:0 pagetables:487 bounce:0 free:590376 free_pcp:0 free_cma:0 [7664750.214523] Node 2 Normal free:525364kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31240kB inactive_file:40108kB unevictable:8680kB isolated(anon):0kB isolated(file):768kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715244kB slab_unreclaimable:62476092kB kernel_stack:7760kB pagetables:172kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:141218 all_unreclaimable? no [7664750.261398] lowmem_reserve[]: 0 0 0 0 [7664750.265365] Node 2 Normal: 27467*4kB (UEM) 40223*8kB (UEM) 899*16kB (UEM) 1667*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525364kB [7664750.280795] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664750.289663] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664750.298275] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664750.307144] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664750.315759] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664750.324631] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664750.333236] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664750.342101] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664750.350710] 75384 total pagecache pages [7664750.354725] 0 pages in swap cache [7664750.358224] Swap cache stats: add 21120794, delete 21136766, find 4513460/7609994 [7664750.365875] Free swap = 3119568kB [7664750.369453] Total swap = 4194300kB [7664750.373036] 66993253 pages RAM [7664750.376266] 0 pages HighMem/MovableOnly [7664750.380280] 1101945 pages reserved [7664750.383859] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664750.391909] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664750.400700] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664750.409290] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664750.417474] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664750.426087] [53084] 32 53084 17316 110 37 146 0 rpcbind [7664750.434355] [53101] 0 53101 1910 64 9 172 0 mdadm [7664750.442450] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664750.451411] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664750.459770] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664750.467777] [54035] 0 54035 27526 164 10 33 0 agetty [7664750.475955] [54036] 0 54036 27526 158 11 33 0 agetty [7664750.484252] [36317] 0 36317 28294 187 14 61 0 bash [7664750.492258] [36329] 0 36329 28177 160 14 55 0 grep [7664750.500382] Out of memory: Kill process 53084 (rpcbind) score 0 or sacrifice child [7664750.508127] Killed process 53084 (rpcbind) total-vm:69264kB, anon-rss:0kB, file-rss:440kB, shmem-rss:0kB [7664750.719150] rpcbind: page allocation failure: order:0, mode:0x200da [7664750.725594] CPU: 31 PID: 53084 Comm: rpcbind Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664750.738279] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664750.746105] Call Trace: [7664750.748736] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664750.754056] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664750.760154] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664750.765732] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664750.771743] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664750.778270] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664750.784805] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664750.790644] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664750.797257] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664750.803523] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664750.809444] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664750.815457] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664750.821377] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664750.827298] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664750.832878] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664750.838189] Mem-Info: [7664750.840665] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34819 inactive_file:37178 isolated_file:1602 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824059 slab_unreclaimable:62296677 mapped:1587 shmem:0 pagetables:487 bounce:0 free:590306 free_pcp:0 free_cma:0 [7664750.874845] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664750.916600] lowmem_reserve[]: 0 1418 63868 63868 [7664750.921529] Node 0 DMA32 free:261348kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:928kB inactive_file:3032kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686228kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:491959 all_unreclaimable? yes [7664750.966312] lowmem_reserve[]: 0 0 62450 62450 [7664750.970974] Node 0 Normal free:508604kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:44180kB inactive_file:43236kB unevictable:168kB isolated(anon):0kB isolated(file):9472kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610952kB slab_unreclaimable:60243768kB kernel_stack:6160kB pagetables:392kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:476544 all_unreclaimable? yes [7664751.017750] lowmem_reserve[]: 0 0 0 0 [7664751.021727] Node 1 Normal free:525092kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17312kB inactive_file:17260kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711308kB slab_unreclaimable:63411340kB kernel_stack:20816kB pagetables:980kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:53206 all_unreclaimable? yes [7664751.068416] lowmem_reserve[]: 0 0 0 0 [7664751.072391] Node 2 Normal free:525364kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31240kB inactive_file:40236kB unevictable:8680kB isolated(anon):0kB isolated(file):640kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715244kB slab_unreclaimable:62476092kB kernel_stack:7760kB pagetables:172kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:839306 all_unreclaimable? yes [7664751.119330] lowmem_reserve[]: 0 0 0 0 [7664751.123302] Node 3 Normal free:524912kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:43584kB inactive_file:44356kB unevictable:840kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854244kB slab_unreclaimable:62369280kB kernel_stack:4208kB pagetables:404kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:481666 all_unreclaimable? yes [7664751.169816] lowmem_reserve[]: 0 0 0 0 [7664751.173783] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664751.197481] LustreError: 83171:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c363824f600 [7664751.210996] LustreError: 3017:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c105d52a200 [7664751.188620] Node 0 DMA32: 367*4kB (EM) 402*8kB (UEM) 1203*16kB (UEM) 3687*32kB (UEM) 1491*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261404kB [7664751.226841] Node 0 Normal: 5966*4kB (EM) 5703*8kB (UEM) 3932*16kB (UEM) 4486*32kB (UEM) 2042*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 506736kB [7664751.243593] Node 1 Normal: 88118*4kB (UEM) 21565*8kB (UM) 7*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525104kB [7664751.257123] Node 2 Normal: 27467*4kB (UEM) 40223*8kB (UEM) 899*16kB (UEM) 1667*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525364kB [7664751.272524] Node 3 Normal: 131213*4kB (UEM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524852kB [7664751.284961] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664751.293827] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664751.302435] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664751.311301] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664751.319905] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664751.328772] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664751.337377] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664751.346244] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664751.354849] 75568 total pagecache pages [7664751.358862] 0 pages in swap cache [7664751.362355] Swap cache stats: add 21120794, delete 21136766, find 4513460/7609994 [7664751.370008] Free swap = 3119568kB [7664751.373587] Total swap = 4194300kB [7664751.377167] 66993253 pages RAM [7664751.380398] 0 pages HighMem/MovableOnly [7664751.384412] 1101945 pages reserved [7664751.530657] ll_ost_io02_023 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664751.539118] ll_ost_io02_023 cpuset=/ mems_allowed=2 [7664751.544182] CPU: 6 PID: 123044 Comm: ll_ost_io02_023 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664751.557561] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664751.565399] Call Trace: [7664751.568039] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664751.573360] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664751.578855] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664751.584700] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664751.590452] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664751.596468] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664751.602828] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664751.608931] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664751.614685] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664751.621219] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664751.627753] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664751.633943] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664751.639957] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664751.646077] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664751.652969] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664751.659132] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664751.666234] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664751.672804] [<ffffffffc11966b2>] ? ldlm_res_hop_get_locked+0x12/0x20 [ptlrpc] [7664751.680217] [<ffffffffc0a13297>] ? cfs_hash_bd_lookup_intent+0xf7/0x170 [libcfs] [7664751.687913] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664751.695266] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664751.702015] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664751.709364] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664751.716886] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664751.723802] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664751.730895] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664751.738653] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664751.745914] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664751.753787] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664751.760787] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664751.768740] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664751.776000] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664751.782477] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664751.790047] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664751.795111] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664751.801390] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664751.808003] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664751.814280] Mem-Info: [7664751.816744] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35933 inactive_file:37401 isolated_file:1024 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824059 slab_unreclaimable:62296678 mapped:1587 shmem:0 pagetables:450 bounce:0 free:590142 free_pcp:0 free_cma:0 [7664751.850937] Node 2 Normal free:525524kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31240kB inactive_file:40748kB unevictable:8680kB isolated(anon):0kB isolated(file):128kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715244kB slab_unreclaimable:62476092kB kernel_stack:7760kB pagetables:36kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:122309 all_unreclaimable? yes [7664751.897803] lowmem_reserve[]: 0 0 0 0 [7664751.901781] Node 2 Normal: 27507*4kB (UEM) 40223*8kB (UEM) 899*16kB (UEM) 1667*32kB (UEM) 406*64kB (EM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525524kB [7664751.917188] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664751.926065] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664751.934680] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664751.943550] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664751.952158] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664751.961031] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664751.969636] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664751.978506] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664751.987124] 75585 total pagecache pages [7664751.991144] 0 pages in swap cache [7664751.994641] Swap cache stats: add 21120796, delete 21136768, find 4513461/7609996 [7664752.002294] Free swap = 3120080kB [7664752.005876] Total swap = 4194300kB [7664752.009463] 66993253 pages RAM [7664752.012702] 0 pages HighMem/MovableOnly [7664752.016715] 1101945 pages reserved [7664752.020295] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664752.028341] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664752.037133] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664752.045723] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664752.053906] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664752.062531] [53101] 0 53101 1910 64 9 172 0 mdadm [7664752.070627] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664752.079586] [53113] 0 53113 48774 114 37 130 0 gssproxy [7664752.087946] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664752.095956] [54035] 0 54035 27526 164 10 33 0 agetty [7664752.104137] [54036] 0 54036 27526 158 11 33 0 agetty [7664752.112429] [36317] 0 36317 28294 187 14 61 0 bash [7664752.120440] [36329] 0 36329 28177 160 14 55 0 grep [7664752.128576] Out of memory: Kill process 53113 (gssproxy) score 0 or sacrifice child [7664752.136407] Killed process 53113 (gssproxy) total-vm:195096kB, anon-rss:0kB, file-rss:456kB, shmem-rss:0kB [7664752.792255] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664752.802589] LustreError: 124244:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small for magic/version check [7664752.802592] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664752.802605] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664752.802613] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664752.802622] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664752.802630] LustreError: 80392:0:(events.c:305:request_in_callback()) event type 2, status -103, service ost_io [7664752.802634] LustreError: 119556:0:(sec.c:2191:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.49.25.17@o2ib1 x1659540551090944 [7664752.802637] LustreError: 119556:0:(sec.c:2191:sptlrpc_svc_unwrap_request()) Skipped 3 previous similar messages [7664752.889056] LustreError: 124244:0:(pack_generic.c:605:__lustre_unpack_msg()) Skipped 8 previous similar messages [7664753.157589] LustreError: 90707:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c4f9aed0c00 [7664753.685370] LustreError: 82913:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffff9c211883fe00 [7664756.109065] LustreError: 90696:0:(ldlm_lib.c:3262:target_bulk_io()) @@@ network error on bulk WRITE req@ffff9c207b493850 x1659596557272128/t0(0) o4->37a8be97-d1cb-4@10.50.3.70@o2ib2:549/0 lens 504/448 e 3 to 0 dl 1583650799 ref 1 fl Interpret:/0/0 rc 0/0 [7664756.131829] LustreError: 90696:0:(ldlm_lib.c:3262:target_bulk_io()) Skipped 9 previous similar messages [7664756.141474] Lustre: fir-OST001b: Bulk IO write error with 37a8be97-d1cb-4 (at 10.50.3.70@o2ib2), client will retry: rc = -110 [7664756.152943] Lustre: Skipped 3 previous similar messages [7664764.070016] ll_ost_io00_045 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [7664764.078203] ll_ost_io00_045 cpuset=/ mems_allowed=0 [7664764.083262] CPU: 36 PID: 3011 Comm: ll_ost_io00_045 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664764.096555] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664764.104389] Call Trace: [7664764.107023] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664764.112339] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664764.117835] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664764.123673] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664764.129419] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664764.135425] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664764.141776] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664764.147869] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664764.153617] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664764.160145] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664764.166678] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664764.172929] [<ffffffffc124293f>] tgt_checksum_niobuf_rw+0xbf/0xe00 [ptlrpc] [7664764.180182] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664764.187518] [<ffffffffc0cb71e0>] ? obd_dif_crc_fn+0x20/0x20 [obdclass] [7664764.194354] [<ffffffffc1247325>] tgt_brw_read+0xc35/0x1e50 [ptlrpc] [7664764.200888] [<ffffffffa021bdce>] ? ___slab_alloc+0x24e/0x4f0 [7664764.206852] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664764.213487] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664764.220839] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664764.228184] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664764.235701] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664764.242623] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664764.249708] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664764.257459] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664764.264719] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664764.272587] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664764.279557] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664764.284996] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664764.291470] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664764.299037] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664764.304098] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664764.310366] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664764.316983] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664764.323250] Mem-Info: [7664764.325716] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34531 inactive_file:37066 isolated_file:2912 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296797 mapped:1587 shmem:0 pagetables:413 bounce:0 free:590118 free_pcp:0 free_cma:0 [7664764.359898] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664764.401654] lowmem_reserve[]: 0 1418 63868 63868 [7664764.406582] Node 0 DMA32 free:261320kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1076kB inactive_file:3376kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686216kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:196905 all_unreclaimable? yes [7664764.451453] lowmem_reserve[]: 0 0 62450 62450 [7664764.456123] Node 0 Normal free:507756kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:47016kB inactive_file:48152kB unevictable:168kB isolated(anon):0kB isolated(file):2816kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610948kB slab_unreclaimable:60243912kB kernel_stack:5792kB pagetables:232kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2655704 all_unreclaimable? yes [7664764.502991] lowmem_reserve[]: 0 0 0 0 [7664764.506966] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664764.521805] Node 0 DMA32: 332*4kB (EM) 401*8kB (EM) 1195*16kB (UEM) 3691*32kB (UEM) 1492*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261320kB [7664764.538126] Node 0 Normal: 6185*4kB (UEM) 5719*8kB (UEM) 3942*16kB (UEM) 4481*32kB (UEM) 2037*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 507420kB [7664764.554966] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664764.563843] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664764.572455] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664764.581322] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664764.589924] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664764.598795] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664764.607409] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664764.616276] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664764.624888] 76065 total pagecache pages [7664764.628901] 0 pages in swap cache [7664764.632393] Swap cache stats: add 21120844, delete 21136816, find 4513466/7610007 [7664764.640046] Free swap = 3120592kB [7664764.643626] Total swap = 4194300kB [7664764.647206] 66993253 pages RAM [7664764.650437] 0 pages HighMem/MovableOnly [7664764.654450] 1101945 pages reserved [7664764.658028] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664764.666076] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664764.674869] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664764.683460] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664764.691633] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664764.700239] [53101] 0 53101 1910 64 9 172 0 mdadm [7664764.708324] [53108] 0 53108 38960 161 19 86 0 dsm_sa_eventmgr [7664764.717282] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664764.725286] [54035] 0 54035 27526 164 10 33 0 agetty [7664764.733460] [54036] 0 54036 27526 158 11 33 0 agetty [7664764.741755] [36317] 0 36317 28294 187 14 61 0 bash [7664764.749761] [36329] 0 36329 28177 160 14 55 0 grep [7664764.757875] Out of memory: Kill process 53108 (dsm_sa_eventmgr) score 0 or sacrifice child [7664764.766312] Killed process 53108 (dsm_sa_eventmgr) total-vm:155840kB, anon-rss:0kB, file-rss:644kB, shmem-rss:0kB [7664764.779313] ll_ost_io02_028 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664764.787760] ll_ost_io02_028 cpuset=/ mems_allowed=2 [7664764.792829] CPU: 10 PID: 123082 Comm: ll_ost_io02_028 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664764.806303] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664764.814148] Call Trace: [7664764.816799] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664764.822132] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664764.827655] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664764.833495] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664764.839256] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664764.845276] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664764.851634] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664764.857738] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664764.863493] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664764.870024] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664764.876556] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664764.882744] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664764.888758] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664764.894866] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664764.901756] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664764.908988] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664764.915167] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664764.921830] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664764.928884] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664764.934634] [<ffffffffa006213e>] ? physflat_send_IPI_mask+0xe/0x10 [7664764.941083] [<ffffffffa0056f42>] ? native_smp_send_reschedule+0x52/0x70 [7664764.947965] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664764.953507] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664764.960602] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664764.968354] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664764.975615] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664764.983490] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664764.990501] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664764.998461] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664765.005725] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664765.012206] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664765.019779] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664765.024838] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664765.031113] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664765.037734] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664765.044006] Mem-Info: [7664765.046466] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35472 inactive_file:37843 isolated_file:1600 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296797 mapped:1587 shmem:0 pagetables:413 bounce:0 free:590050 free_pcp:0 free_cma:0 [7664765.080656] Node 2 Normal free:524752kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32192kB inactive_file:40352kB unevictable:8680kB isolated(anon):0kB isolated(file):128kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476440kB kernel_stack:7760kB pagetables:36kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:406321 all_unreclaimable? yes [7664765.127519] lowmem_reserve[]: 0 0 0 0 [7664765.131486] Node 2 Normal: 27419*4kB (UEM) 40157*8kB (UEM) 917*16kB (UEM) 1663*32kB (UEM) 404*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524676kB [7664765.146976] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664765.155843] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664765.164457] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664765.173322] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664765.181928] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664765.190796] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664765.199411] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664765.208277] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664765.216890] 76063 total pagecache pages [7664765.220903] 0 pages in swap cache [7664765.224397] Swap cache stats: add 21120847, delete 21136819, find 4513467/7610009 [7664765.232050] Free swap = 3120592kB [7664765.235635] Total swap = 4194300kB [7664765.239217] 66993253 pages RAM [7664765.242446] 0 pages HighMem/MovableOnly [7664765.246461] 1101945 pages reserved [7664765.250047] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664765.258095] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664765.266893] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664765.275489] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664765.283677] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664765.292288] [53101] 0 53101 1910 64 9 172 0 mdadm [7664765.300382] [53133] 0 53108 38960 162 19 86 0 dsm_sa_eventmgr [7664765.309346] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664765.317352] [54035] 0 54035 27526 164 10 33 0 agetty [7664765.325535] [54036] 0 54036 27526 158 11 33 0 agetty [7664765.333828] [36317] 0 36317 28294 187 14 61 0 bash [7664765.341838] [36329] 0 36329 28177 160 14 55 0 grep [7664765.349963] Out of memory: Kill process 53133 (dsm_sa_eventmgr) score 0 or sacrifice child [7664765.358402] Killed process 53133 (dsm_sa_eventmgr) total-vm:155840kB, anon-rss:0kB, file-rss:648kB, shmem-rss:0kB [7664765.762291] dsm_sa_eventmgr: page allocation failure: order:0, mode:0x200da [7664765.769434] CPU: 19 PID: 53133 Comm: dsm_sa_eventmgr Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664765.782814] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664765.790650] Call Trace: [7664765.793290] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664765.798609] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664765.804706] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664765.810281] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664765.816294] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664765.822821] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664765.829348] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664765.835188] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664765.841799] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664765.848066] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664765.853988] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664765.860002] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664765.865932] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664765.871859] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664765.877439] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664765.882759] Mem-Info: [7664765.885242] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35195 inactive_file:37889 isolated_file:2560 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296798 mapped:1587 shmem:0 pagetables:413 bounce:0 free:590065 free_pcp:0 free_cma:0 [7664765.919426] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664765.961180] lowmem_reserve[]: 0 1418 63868 63868 [7664765.966109] Node 0 DMA32 free:261328kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1076kB inactive_file:4024kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686216kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:402182 all_unreclaimable? yes [7664766.010986] lowmem_reserve[]: 0 0 62450 62450 [7664766.015657] Node 0 Normal free:507532kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:45276kB inactive_file:46256kB unevictable:168kB isolated(anon):0kB isolated(file):8320kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610948kB slab_unreclaimable:60243916kB kernel_stack:6064kB pagetables:232kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:900271 all_unreclaimable? yes [7664766.062440] lowmem_reserve[]: 0 0 0 0 [7664766.066417] Node 1 Normal free:525484kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17160kB inactive_file:16928kB unevictable:26488kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711284kB slab_unreclaimable:63411348kB kernel_stack:20816kB pagetables:980kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:65347 all_unreclaimable? yes [7664766.113109] lowmem_reserve[]: 0 0 0 0 [7664766.117087] Node 2 Normal free:524752kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32192kB inactive_file:40608kB unevictable:8680kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476440kB kernel_stack:7760kB pagetables:36kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:585961 all_unreclaimable? yes [7664766.163784] lowmem_reserve[]: 0 0 0 0 [7664766.167756] Node 3 Normal free:525268kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:43188kB inactive_file:43112kB unevictable:840kB isolated(anon):0kB isolated(file):1152kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854212kB slab_unreclaimable:62369272kB kernel_stack:4208kB pagetables:404kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1306682 all_unreclaimable? yes [7664766.214618] lowmem_reserve[]: 0 0 0 0 [7664766.218584] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664766.233423] Node 0 DMA32: 332*4kB (EM) 402*8kB (UEM) 1195*16kB (UEM) 3691*32kB (UEM) 1492*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261328kB [7664766.249831] Node 0 Normal: 6315*4kB (UEM) 5724*8kB (UEM) 3960*16kB (UEM) 4481*32kB (UEM) 2037*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508268kB [7664766.266670] Node 1 Normal: 88171*4kB (UEM) 21586*8kB (UM) 7*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525484kB [7664766.280199] Node 2 Normal: 27419*4kB (UEM) 40157*8kB (UEM) 917*16kB (UEM) 1663*32kB (UEM) 404*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524676kB [7664766.295688] Node 3 Normal: 131335*4kB (UEM) 1*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525348kB [7664766.308497] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664766.317362] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664766.325968] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664766.334834] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664766.343440] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664766.352306] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664766.360912] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664766.369780] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664766.378387] 75973 total pagecache pages [7664766.382400] 0 pages in swap cache [7664766.385899] Swap cache stats: add 21120847, delete 21136819, find 4513467/7610009 [7664766.393551] Free swap = 3120592kB [7664766.397132] Total swap = 4194300kB [7664766.400713] 66993253 pages RAM [7664766.403952] 0 pages HighMem/MovableOnly [7664766.407965] 1101945 pages reserved [7664767.827054] ll_ost_io02_101 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664767.835499] ll_ost_io02_101 cpuset=/ mems_allowed=2 [7664767.840564] CPU: 10 PID: 8716 Comm: ll_ost_io02_101 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664767.853856] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664767.861688] Call Trace: [7664767.864327] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664767.869647] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664767.875141] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664767.880987] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664767.886737] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664767.892749] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664767.899101] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664767.905194] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664767.910943] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664767.917477] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664767.924013] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664767.930198] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664767.936210] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664767.942323] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664767.949210] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664767.955374] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664767.962474] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664767.969046] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664767.975696] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664767.983042] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664767.989784] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664767.997131] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664768.004649] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664768.011569] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664768.018658] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664768.026409] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664768.033666] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664768.041536] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664768.048501] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664768.053949] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664768.060426] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664768.068001] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664768.073060] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664768.079333] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664768.085947] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664768.092213] Mem-Info: [7664768.094673] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35294 inactive_file:36371 isolated_file:2432 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296798 mapped:1587 shmem:0 pagetables:394 bounce:0 free:590259 free_pcp:0 free_cma:0 [7664768.128854] Node 2 Normal free:524756kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32192kB inactive_file:40480kB unevictable:8680kB isolated(anon):0kB isolated(file):0kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476440kB kernel_stack:7760kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:243189 all_unreclaimable? yes [7664768.175547] lowmem_reserve[]: 0 0 0 0 [7664768.179521] Node 2 Normal: 27420*4kB (UEM) 40157*8kB (UEM) 917*16kB (UEM) 1663*32kB (UEM) 404*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524680kB [7664768.195009] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664768.203878] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664768.212483] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664768.221349] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664768.229955] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664768.238821] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664768.247427] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664768.256296] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664768.264907] 75967 total pagecache pages [7664768.268923] 0 pages in swap cache [7664768.272413] Swap cache stats: add 21120849, delete 21136821, find 4513468/7610011 [7664768.280066] Free swap = 3120848kB [7664768.283644] Total swap = 4194300kB [7664768.287226] 66993253 pages RAM [7664768.290456] 0 pages HighMem/MovableOnly [7664768.294469] 1101945 pages reserved [7664768.298050] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664768.306096] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664768.314891] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664768.323480] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664768.331657] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664768.340272] [53101] 0 53101 1910 64 9 172 0 mdadm [7664768.348370] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664768.356373] [54035] 0 54035 27526 164 10 33 0 agetty [7664768.364551] [54036] 0 54036 27526 158 11 33 0 agetty [7664768.372838] [36317] 0 36317 28294 187 14 61 0 bash [7664768.380845] [36329] 0 36329 28177 160 14 55 0 grep [7664768.388965] Out of memory: Kill process 36317 (bash) score 0 or sacrifice child [7664768.396444] Killed process 36329 (grep) total-vm:112708kB, anon-rss:0kB, file-rss:640kB, shmem-rss:0kB [7664768.525361] grep: page allocation failure: order:0, mode:0x200da [7664768.531547] CPU: 26 PID: 36329 Comm: grep Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664768.543969] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664768.551810] Call Trace: [7664768.554446] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664768.559763] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664768.565876] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664768.571467] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664768.577489] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664768.584033] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664768.590577] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664768.596444] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664768.603080] [<ffffffffa076aaba>] ? __schedule+0x42a/0x860 [7664768.608746] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664768.615037] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664768.620982] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664768.626992] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664768.632962] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664768.638882] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664768.644483] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664768.649799] Mem-Info: [7664768.652272] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:36070 inactive_file:35370 isolated_file:4094 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296798 mapped:1587 shmem:0 pagetables:394 bounce:0 free:590297 free_pcp:0 free_cma:0 [7664768.686476] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664768.728274] lowmem_reserve[]: 0 1418 63868 63868 [7664768.733221] Node 0 DMA32 free:261328kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1076kB inactive_file:4060kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686216kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:403302 all_unreclaimable? yes [7664768.778120] lowmem_reserve[]: 0 0 62450 62450 [7664768.782810] Node 0 Normal free:508380kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:50616kB inactive_file:48292kB unevictable:168kB isolated(anon):0kB isolated(file):4088kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610948kB slab_unreclaimable:60243916kB kernel_stack:5760kB pagetables:200kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:688965 all_unreclaimable? yes [7664768.829652] lowmem_reserve[]: 0 0 0 0 [7664768.833659] Node 1 Normal free:525496kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:18924kB inactive_file:13408kB unevictable:26488kB isolated(anon):0kB isolated(file):3328kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711284kB slab_unreclaimable:63411348kB kernel_stack:20816kB pagetables:976kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1762246 all_unreclaimable? yes [7664768.880870] lowmem_reserve[]: 0 0 0 0 [7664768.884852] Node 2 Normal free:524932kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:32016kB inactive_file:36748kB unevictable:8680kB isolated(anon):0kB isolated(file):4096kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476440kB kernel_stack:7760kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:795135 all_unreclaimable? yes [7664768.931833] lowmem_reserve[]: 0 0 0 0 [7664768.935840] Node 3 Normal free:525324kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:44212kB inactive_file:42856kB unevictable:840kB isolated(anon):0kB isolated(file):768kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854212kB slab_unreclaimable:62369272kB kernel_stack:4208kB pagetables:368kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2461940 all_unreclaimable? yes [7664768.982707] lowmem_reserve[]: 0 0 0 0 [7664768.986754] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664769.001732] Node 0 DMA32: 332*4kB (EM) 402*8kB (UEM) 1195*16kB (UEM) 3691*32kB (UEM) 1492*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261328kB [7664769.018209] Node 0 Normal: 6323*4kB (UEM) 5724*8kB (UEM) 3950*16kB (UEM) 4481*32kB (UEM) 2037*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508140kB [7664769.035126] Node 1 Normal: 88172*4kB (UEM) 21587*8kB (UM) 7*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525496kB [7664769.048862] Node 2 Normal: 27473*4kB (UEM) 40182*8kB (UEM) 919*16kB (UEM) 1663*32kB (UEM) 404*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525124kB [7664769.064605] Node 3 Normal: 131349*4kB (UEM) 1*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525404kB [7664769.077644] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664769.086555] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664769.095198] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664769.104131] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664769.112767] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664769.121658] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664769.130299] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664769.139252] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664769.147884] 75889 total pagecache pages [7664769.151927] 0 pages in swap cache [7664769.155452] Swap cache stats: add 21120849, delete 21136821, find 4513468/7610011 [7664769.163138] Free swap = 3120848kB [7664769.166748] Total swap = 4194300kB [7664769.170341] 66993253 pages RAM [7664769.173623] 0 pages HighMem/MovableOnly [7664769.177655] 1101945 pages reserved [7664769.509309] ll_ost_io03_086 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664769.517745] ll_ost_io03_086 cpuset=/ mems_allowed=3 [7664769.522809] CPU: 7 PID: 8677 Comm: ll_ost_io03_086 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664769.536014] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664769.543846] Call Trace: [7664769.546478] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664769.551796] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664769.557283] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664769.563125] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664769.568877] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664769.573189] LustreError: 101203:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.50.10.29@o2ib2 ns: filter-fir-OST001f_UUID lock: ffff9c203123e0c0/0xb0d9932fd7de3c6a lrc: 3/0,0 mode: PR/PR res: [0x480000401:0x6e827a2:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->134217727) flags: 0x60000400000020 nid: 10.50.10.29@o2ib2 remote: 0xafe739f8d57c3f80 expref: 63 pid: 124126 timeout: 7664678 lvb_type: 1 [7664769.616343] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664769.622699] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664769.628792] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664769.634540] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664769.641072] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664769.647601] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664769.653786] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664769.659791] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664769.665900] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664769.672783] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664769.680009] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664769.686173] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664769.692827] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664769.699911] [<ffffffffc11dcbe7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [7664769.706534] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664769.712289] [<ffffffffa00dca58>] ? __enqueue_entity+0x78/0x80 [7664769.718301] [<ffffffffa00e367f>] ? enqueue_entity+0x2ef/0xbe0 [7664769.724309] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664769.729846] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664769.736933] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664769.744684] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664769.751942] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664769.759808] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664769.766813] [<ffffffffc11e499e>] ? ptlrpc_server_post_idle_rqbds+0x7e/0xf0 [ptlrpc] [7664769.774765] [<ffffffffc11e6e10>] ? ptlrpc_grow_req_bufs+0x50/0x2a0 [ptlrpc] [7664769.782024] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664769.788504] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664769.796077] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664769.801131] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664769.807406] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664769.814025] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664769.820292] Mem-Info: [7664769.822750] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:36231 inactive_file:36744 isolated_file:1824 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296796 mapped:1587 shmem:0 pagetables:380 bounce:0 free:590402 free_pcp:19 free_cma:0 [7664769.857018] Node 3 Normal free:525404kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:44788kB inactive_file:42760kB unevictable:840kB isolated(anon):0kB isolated(file):256kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854212kB slab_unreclaimable:62369272kB kernel_stack:4208kB pagetables:312kB unstable:0kB bounce:0kB free_pcp:76kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2461940 all_unreclaimable? yes [7664769.903883] lowmem_reserve[]: 0 0 0 0 [7664769.907848] Node 3 Normal: 131349*4kB (UEM) 1*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525404kB [7664769.920660] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664769.929529] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664769.938142] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664769.947009] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664769.955615] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664769.964481] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664769.973087] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664769.981954] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664769.990566] 75878 total pagecache pages [7664769.994582] 0 pages in swap cache [7664769.998074] Swap cache stats: add 21120851, delete 21136823, find 4513469/7610013 [7664770.005729] Free swap = 3121104kB [7664770.009312] Total swap = 4194300kB [7664770.012893] 66993253 pages RAM [7664770.016123] 0 pages HighMem/MovableOnly [7664770.020137] 1101945 pages reserved [7664770.023719] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664770.031768] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664770.040557] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664770.049145] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664770.057323] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664770.065937] [53101] 0 53101 1910 64 9 172 0 mdadm [7664770.074037] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664770.082041] [54035] 0 54035 27526 164 10 33 0 agetty [7664770.090219] [54036] 0 54036 27526 158 11 33 0 agetty [7664770.098517] [36317] 0 36317 28294 187 14 61 0 bash [7664770.106648] Out of memory: Kill process 36317 (bash) score 0 or sacrifice child [7664770.114134] Killed process 36317 (bash) total-vm:113176kB, anon-rss:0kB, file-rss:748kB, shmem-rss:0kB [7664770.212170] bash: page allocation failure: order:0, mode:0x200da [7664770.218360] CPU: 38 PID: 36317 Comm: bash Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664770.230785] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664770.238610] Call Trace: [7664770.241245] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664770.246562] [<ffffffffa01bdec0>] warn_alloc_failed+0x110/0x180 [7664770.252663] [<ffffffffa01c0be0>] ? drain_pages+0xb0/0xb0 [7664770.258246] [<ffffffffa00c3f50>] ? wake_up_atomic_t+0x30/0x30 [7664770.264259] [<ffffffffa076074e>] __alloc_pages_slowpath+0x6b6/0x724 [7664770.270794] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664770.277328] [<ffffffffa02128c5>] alloc_pages_vma+0xb5/0x200 [7664770.283171] [<ffffffffa0200b15>] __read_swap_cache_async+0x115/0x190 [7664770.289791] [<ffffffffa0200bb6>] read_swap_cache_async+0x26/0x60 [7664770.296065] [<ffffffffa0200c9c>] swapin_readahead+0xac/0x110 [7664770.301994] [<ffffffffa01ead92>] handle_pte_fault+0x812/0xd10 [7664770.308007] [<ffffffffa01ed3ad>] handle_mm_fault+0x39d/0x9b0 [7664770.313936] [<ffffffffa0772603>] __do_page_fault+0x203/0x4f0 [7664770.319863] [<ffffffffa0772925>] do_page_fault+0x35/0x90 [7664770.325445] [<ffffffffa076e768>] page_fault+0x28/0x30 [7664770.330766] [<ffffffffa0388990>] ? __put_user_4+0x20/0x30 [7664770.336433] [<ffffffffa009e2e1>] ? wait_consider_task+0x8a1/0xb30 [7664770.342784] [<ffffffffa009e670>] do_wait+0x100/0x260 [7664770.348013] [<ffffffffa009f960>] SyS_wait4+0x80/0x110 [7664770.353333] [<ffffffffa009d3c0>] ? task_stopped_code+0x60/0x60 [7664770.359428] [<ffffffffa0777ddb>] system_call_fastpath+0x22/0x27 [7664770.365612] Mem-Info: [7664770.368086] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34634 inactive_file:38360 isolated_file:1824 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296796 mapped:1587 shmem:0 pagetables:380 bounce:0 free:590205 free_pcp:0 free_cma:0 [7664770.402275] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664770.444025] lowmem_reserve[]: 0 1418 63868 63868 [7664770.448954] Node 0 DMA32 free:261328kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1084kB inactive_file:4068kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686216kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:407174 all_unreclaimable? yes [7664770.493823] lowmem_reserve[]: 0 0 62450 62450 [7664770.498491] Node 0 Normal free:508128kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:45384kB inactive_file:45216kB unevictable:168kB isolated(anon):0kB isolated(file):4480kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610948kB slab_unreclaimable:60243916kB kernel_stack:5952kB pagetables:200kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:731572 all_unreclaimable? yes [7664770.545271] lowmem_reserve[]: 0 0 0 0 [7664770.549247] Node 1 Normal free:525504kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17036kB inactive_file:16244kB unevictable:26488kB isolated(anon):0kB isolated(file):128kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711284kB slab_unreclaimable:63411340kB kernel_stack:20816kB pagetables:976kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:597742 all_unreclaimable? yes [7664770.596185] lowmem_reserve[]: 0 0 0 0 [7664770.600154] Node 2 Normal free:525096kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31380kB inactive_file:40576kB unevictable:8680kB isolated(anon):0kB isolated(file):384kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476440kB kernel_stack:7760kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:462422 all_unreclaimable? yes [7664770.647018] lowmem_reserve[]: 0 0 0 0 [7664770.650994] Node 3 Normal free:524860kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:41244kB inactive_file:40164kB unevictable:840kB isolated(anon):0kB isolated(file):5120kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854212kB slab_unreclaimable:62369272kB kernel_stack:4208kB pagetables:312kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1132211 all_unreclaimable? yes [7664770.697858] lowmem_reserve[]: 0 0 0 0 [7664770.701830] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664770.716670] Node 0 DMA32: 332*4kB (EM) 402*8kB (UEM) 1195*16kB (UEM) 3691*32kB (UEM) 1492*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261328kB [7664770.733078] Node 0 Normal: 6324*4kB (UEM) 5724*8kB (UEM) 3950*16kB (UEM) 4481*32kB (UEM) 2037*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508144kB [7664770.749914] Node 1 Normal: 88172*4kB (UEM) 21588*8kB (UM) 7*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525504kB [7664770.763444] Node 2 Normal: 27473*4kB (UEM) 40182*8kB (UEM) 919*16kB (UEM) 1663*32kB (UEM) 404*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525124kB [7664770.778931] Node 3 Normal: 131229*4kB (UEM) 2*8kB (U) 1*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 524980kB [7664770.792488] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664770.801364] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664770.809977] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664770.818844] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664770.827450] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664770.836324] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664770.844931] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664770.853805] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664770.862411] 76003 total pagecache pages [7664770.866427] 0 pages in swap cache [7664770.869927] Swap cache stats: add 21120851, delete 21136823, find 4513469/7610013 [7664770.877586] Free swap = 3121104kB [7664770.881167] Total swap = 4194300kB [7664770.884757] 66993253 pages RAM [7664770.887996] 0 pages HighMem/MovableOnly [7664770.892008] 1101945 pages reserved [7664770.896405] ll_ost_io01_087 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664770.904846] ll_ost_io01_087 cpuset=/ mems_allowed=1 [7664770.909909] CPU: 45 PID: 83036 Comm: ll_ost_io01_087 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664770.923288] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664770.931121] Call Trace: [7664770.933754] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664770.939069] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664770.944556] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664770.950400] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664770.956413] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664770.962772] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664770.968864] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664770.974610] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664770.981138] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664770.987665] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664770.993851] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664770.999856] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664771.005963] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664771.012851] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664771.019016] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664771.026104] [<ffffffffc11e3d25>] ? request_in_callback+0x485/0x920 [ptlrpc] [7664771.033373] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664771.039916] [<ffffffffc0b11143>] ? kiblnd_post_rx+0x163/0x520 [ko2iblnd] [7664771.046909] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664771.054258] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664771.060995] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664771.068342] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664771.075855] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664771.082769] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664771.089859] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664771.097609] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664771.104864] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664771.112693] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664771.118136] [<ffffffffc11ebb7f>] ? ptlrpc_server_handle_req_in+0x8df/0xd60 [ptlrpc] [7664771.126090] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664771.132576] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664771.140149] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664771.145211] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664771.151485] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664771.158104] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664771.164369] Mem-Info: [7664771.166828] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35112 inactive_file:38314 isolated_file:1760 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296796 mapped:1587 shmem:0 pagetables:366 bounce:0 free:590218 free_pcp:0 free_cma:0 [7664771.201008] Node 1 Normal free:525504kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:17452kB inactive_file:16016kB unevictable:26488kB isolated(anon):0kB isolated(file):128kB present:67108352kB managed:66054620kB mlocked:26488kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:711284kB slab_unreclaimable:63411340kB kernel_stack:20816kB pagetables:976kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:597742 all_unreclaimable? yes [7664771.247961] lowmem_reserve[]: 0 0 0 0 [7664771.251928] Node 1 Normal: 88172*4kB (UEM) 21588*8kB (UM) 7*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525504kB [7664771.265459] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664771.274325] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664771.282931] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664771.291795] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664771.300402] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664771.309268] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664771.317874] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664771.326741] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664771.335347] 76002 total pagecache pages [7664771.339360] 0 pages in swap cache [7664771.342851] Swap cache stats: add 21120852, delete 21136824, find 4513469/7610013 [7664771.350504] Free swap = 3121360kB [7664771.354083] Total swap = 4194300kB [7664771.357664] 66993253 pages RAM [7664771.360896] 0 pages HighMem/MovableOnly [7664771.364908] 1101945 pages reserved [7664771.368487] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664771.376532] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664771.385316] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664771.393889] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664771.402065] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664771.410672] [53101] 0 53101 1910 64 9 172 0 mdadm [7664771.418769] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664771.426775] [54035] 0 54035 27526 164 10 33 0 agetty [7664771.434948] [54036] 0 54036 27526 158 11 33 0 agetty [7664771.443326] Out of memory: Kill process 53101 (mdadm) score 0 or sacrifice child [7664771.450893] Killed process 53101 (mdadm) total-vm:7640kB, anon-rss:0kB, file-rss:256kB, shmem-rss:0kB [7664771.706326] ll_ost_io02_088 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664771.714766] ll_ost_io02_088 cpuset=/ mems_allowed=2 [7664771.719827] CPU: 10 PID: 8667 Comm: ll_ost_io02_088 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664771.733118] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664771.740944] Call Trace: [7664771.743577] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664771.748892] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664771.754380] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664771.760221] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664771.765967] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664771.771981] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664771.777726] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664771.784252] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664771.790782] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664771.796969] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664771.802980] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664771.809089] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664771.815971] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664771.822132] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664771.829228] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664771.835795] [<ffffffffc11d5b56>] ? ptl_send_buf+0x146/0x530 [ptlrpc] [7664771.842456] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664771.849802] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664771.856555] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664771.863900] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664771.871414] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664771.878326] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664771.885422] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664771.893179] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664771.900440] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664771.908300] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664771.915261] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664771.920693] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664771.927165] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664771.934735] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664771.939794] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664771.946062] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664771.952672] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664771.958938] Mem-Info: [7664771.961398] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35827 inactive_file:36827 isolated_file:2208 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296796 mapped:1587 shmem:0 pagetables:366 bounce:0 free:590178 free_pcp:0 free_cma:0 [7664771.995578] Node 2 Normal free:524852kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:31428kB inactive_file:36036kB unevictable:8680kB isolated(anon):0kB isolated(file):4352kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476440kB kernel_stack:7760kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:532083 all_unreclaimable? no [7664772.042463] lowmem_reserve[]: 0 0 0 0 [7664772.046455] Node 2 Normal: 27540*4kB (UEM) 40211*8kB (UEM) 919*16kB (UEM) 1663*32kB (UEM) 404*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525624kB [7664772.061943] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664772.070810] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664772.079417] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664772.088282] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664772.096888] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664772.105754] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664772.114360] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664772.123226] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664772.131832] 76002 total pagecache pages [7664772.135848] 0 pages in swap cache [7664772.139337] Swap cache stats: add 21120860, delete 21136832, find 4513471/7610016 [7664772.146990] Free swap = 3122128kB [7664772.150568] Total swap = 4194300kB [7664772.154152] 66993253 pages RAM [7664772.157382] 0 pages HighMem/MovableOnly [7664772.161393] 1101945 pages reserved [7664772.164973] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664772.173022] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664772.181814] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664772.190403] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664772.198581] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664772.207202] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664772.215208] [54035] 0 54035 27526 164 10 33 0 agetty [7664772.223382] [54036] 0 54036 27526 158 11 33 0 agetty [7664772.231794] Out of memory: Kill process 54035 (agetty) score 0 or sacrifice child [7664772.239448] Killed process 54035 (agetty) total-vm:110104kB, anon-rss:0kB, file-rss:656kB, shmem-rss:0kB [7664772.405609] ll_ost_io03_091 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664772.414050] ll_ost_io03_091 cpuset=/ mems_allowed=3 [7664772.419102] CPU: 47 PID: 8714 Comm: ll_ost_io03_091 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664772.432390] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664772.440222] Call Trace: [7664772.442856] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664772.448172] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664772.453666] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664772.459498] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664772.465505] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664772.471860] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664772.477960] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664772.483712] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664772.490240] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664772.496766] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664772.502946] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664772.508962] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664772.515079] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664772.521961] [<ffffffffc172e1ca>] ofd_preprw+0x6fa/0x11b0 [ofd] [7664772.528124] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664772.535228] [<ffffffffc12470cb>] tgt_brw_read+0x9db/0x1e50 [ptlrpc] [7664772.541790] [<ffffffffc0c82a79>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [7664772.549135] [<ffffffffc1217476>] ? null_alloc_rs+0x186/0x340 [ptlrpc] [7664772.555875] [<ffffffffc11df335>] ? lustre_pack_reply_v2+0x135/0x290 [ptlrpc] [7664772.563223] [<ffffffffc11df4ff>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [7664772.570735] [<ffffffffc11df681>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] [7664772.577650] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664772.584743] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664772.592498] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664772.599755] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664772.607623] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664772.614584] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664772.620025] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664772.626498] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664772.634065] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664772.639118] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664772.645388] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664772.652004] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664772.658269] Mem-Info: [7664772.660729] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:34439 inactive_file:36985 isolated_file:3104 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296796 mapped:1587 shmem:0 pagetables:357 bounce:0 free:590336 free_pcp:0 free_cma:0 [7664772.694910] Node 3 Normal free:525024kB min:525460kB low:656824kB high:788188kB active_anon:0kB inactive_anon:0kB active_file:41860kB inactive_file:43652kB unevictable:840kB isolated(anon):0kB isolated(file):2816kB present:67108352kB managed:66038732kB mlocked:840kB dirty:0kB writeback:0kB mapped:848kB shmem:0kB slab_reclaimable:854212kB slab_unreclaimable:62369272kB kernel_stack:4208kB pagetables:288kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1117683 all_unreclaimable? yes [7664772.741776] lowmem_reserve[]: 0 0 0 0 [7664772.745742] Node 3 Normal: 131256*4kB (UEM) 2*8kB (U) 1*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525088kB [7664772.759298] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664772.768167] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664772.776784] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664772.785654] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664772.794260] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664772.803126] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664772.811732] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664772.820599] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664772.829206] 75916 total pagecache pages [7664772.833216] 0 pages in swap cache [7664772.836710] Swap cache stats: add 21120860, delete 21136832, find 4513471/7610016 [7664772.844361] Free swap = 3122384kB [7664772.847940] Total swap = 4194300kB [7664772.851521] 66993253 pages RAM [7664772.854752] 0 pages HighMem/MovableOnly [7664772.858767] 1101945 pages reserved [7664772.862347] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664772.870388] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664772.879175] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664772.887768] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664772.895942] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664772.904562] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664772.912565] [54036] 0 54036 27526 158 11 33 0 agetty [7664772.920971] Out of memory: Kill process 54036 (agetty) score 0 or sacrifice child [7664772.928623] Killed process 54036 (agetty) total-vm:110104kB, anon-rss:0kB, file-rss:632kB, shmem-rss:0kB [7664772.940071] ll_ost_io00_025 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [7664772.948515] ll_ost_io00_025 cpuset=/ mems_allowed=0 [7664772.953577] CPU: 12 PID: 123043 Comm: ll_ost_io00_025 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664772.967042] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664772.974870] Call Trace: [7664772.977512] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664772.982825] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664772.988322] [<ffffffffa0102372>] ? ktime_get_ts64+0x52/0xf0 [7664772.994161] [<ffffffffa01595af>] ? delayacct_end+0x8f/0xb0 [7664772.999907] [<ffffffffa01bb904>] oom_kill_process+0x254/0x3d0 [7664773.005911] [<ffffffffa01bb3ad>] ? oom_unkillable_task+0xcd/0x120 [7664773.012265] [<ffffffffa01bb456>] ? find_lock_task_mm+0x56/0xc0 [7664773.018356] [<ffffffffa01bc146>] out_of_memory+0x4b6/0x4f0 [7664773.024104] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664773.030630] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664773.037157] [<ffffffffa020f438>] alloc_pages_current+0x98/0x110 [7664773.043335] [<ffffffffa01b7767>] __page_cache_alloc+0x97/0xb0 [7664773.049340] [<ffffffffa01b88e5>] find_or_create_page+0x45/0xa0 [7664773.055448] [<ffffffffc15ac5c3>] osd_bufs_get+0x413/0x870 [osd_ldiskfs] [7664773.062332] [<ffffffffc172d0a6>] ofd_preprw_write.isra.31+0x476/0xea0 [ofd] [7664773.069556] [<ffffffffc172def2>] ofd_preprw+0x422/0x11b0 [ofd] [7664773.075703] [<ffffffffc12491bc>] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] [7664773.082349] [<ffffffffc11dcbd0>] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] [7664773.089428] [<ffffffffc11dcbe7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [7664773.096078] [<ffffffffc1204163>] ? __req_capsule_get+0x163/0x740 [ptlrpc] [7664773.103127] [<ffffffffa021ccc1>] ? __slab_free+0x81/0x2f0 [7664773.108789] [<ffffffffa00e143c>] ? update_curr+0x14c/0x1e0 [7664773.114542] [<ffffffffa00ddd9e>] ? account_entity_dequeue+0xae/0xd0 [7664773.121067] [<ffffffffa00e192c>] ? dequeue_entity+0x11c/0x5e0 [7664773.127074] [<ffffffffa0769192>] ? mutex_lock+0x12/0x2f [7664773.132593] [<ffffffffc124536a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [7664773.139672] [<ffffffffc1220da1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [7664773.147422] [<ffffffffc0a07bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [7664773.154673] [<ffffffffc11ec24b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [7664773.162534] [<ffffffffc11e7805>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [7664773.169499] [<ffffffffa00cfeb4>] ? __wake_up+0x44/0x50 [7664773.174935] [<ffffffffc11efbac>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [7664773.181408] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664773.188982] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664773.194043] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664773.200309] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664773.206920] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664773.213184] Mem-Info: [7664773.215656] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35235 inactive_file:38111 isolated_file:1248 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296796 mapped:1587 shmem:0 pagetables:357 bounce:0 free:590400 free_pcp:0 free_cma:0 [7664773.249843] Node 0 DMA free:15904kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [7664773.291597] lowmem_reserve[]: 0 1418 63868 63868 [7664773.296518] Node 0 DMA32 free:261328kB min:11552kB low:14440kB high:17328kB active_anon:0kB inactive_anon:0kB active_file:1084kB inactive_file:4068kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1633052kB managed:1452284kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:404488kB slab_unreclaimable:686216kB kernel_stack:352kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:411910 all_unreclaimable? yes [7664773.341378] lowmem_reserve[]: 0 0 62450 62450 [7664773.346048] Node 0 Normal free:508256kB min:508832kB low:636040kB high:763248kB active_anon:0kB inactive_anon:0kB active_file:49216kB inactive_file:48688kB unevictable:168kB isolated(anon):0kB isolated(file):384kB present:64998912kB managed:63949072kB mlocked:168kB dirty:0kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:610948kB slab_unreclaimable:60243916kB kernel_stack:5904kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1450599 all_unreclaimable? yes [7664773.392815] lowmem_reserve[]: 0 0 0 0 [7664773.396781] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB [7664773.411619] Node 0 DMA32: 332*4kB (EM) 402*8kB (UEM) 1195*16kB (UEM) 3691*32kB (UEM) 1492*64kB (UEM) 140*128kB (UEM) 24*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 261328kB [7664773.428034] Node 0 Normal: 6330*4kB (UEM) 5727*8kB (UEM) 3952*16kB (UEM) 4482*32kB (UEM) 2037*64kB (UEM) 570*128kB (UEM) 106*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 508256kB [7664773.444876] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664773.453742] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664773.462348] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664773.471216] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664773.479829] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664773.488694] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664773.497298] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664773.506165] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664773.514772] 75916 total pagecache pages [7664773.518787] 0 pages in swap cache [7664773.522279] Swap cache stats: add 21120860, delete 21136832, find 4513471/7610016 [7664773.529931] Free swap = 3122640kB [7664773.533510] Total swap = 4194300kB [7664773.537089] 66993253 pages RAM [7664773.540319] 0 pages HighMem/MovableOnly [7664773.544335] 1101945 pages reserved [7664773.547914] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664773.555957] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664773.564744] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664773.573334] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664773.581510] [53079] 81 53079 17590 260 36 171 -900 dbus-daemon [7664773.590132] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664773.598374] Out of memory: Kill process 53079 (dbus-daemon) score 0 or sacrifice child [7664773.606471] Killed process 53079 (dbus-daemon) total-vm:70360kB, anon-rss:0kB, file-rss:1040kB, shmem-rss:0kB [7664773.887702] ll_ost_io02_054 invoked oom-killer: gfp_mask=0x82d2, order=0, oom_score_adj=0 [7664773.896074] ll_ost_io02_054 cpuset=/ mems_allowed=2 [7664773.901139] CPU: 38 PID: 6889 Comm: ll_ost_io02_054 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664773.914468] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664773.922303] Call Trace: [7664773.924938] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664773.930252] [<ffffffffa075fb6a>] dump_header+0x90/0x229 [7664773.935752] [<ffffffffa01bc16c>] out_of_memory+0x4dc/0x4f0 [7664773.941506] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664773.948045] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664773.954591] [<ffffffffa01fd95f>] __vmalloc_node_range+0x12f/0x280 [7664773.961020] [<ffffffffc11e6a03>] ? ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664773.968079] [<ffffffffa01fdd5e>] vzalloc_node+0x4e/0x50 [7664773.973613] [<ffffffffc11e6a03>] ? ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664773.980708] [<ffffffffc11e6a03>] ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664773.987625] [<ffffffffc11e6ea1>] ptlrpc_grow_req_bufs+0xe1/0x2a0 [ptlrpc] [7664773.994716] [<ffffffffc11efc85>] ptlrpc_main+0xc05/0x1460 [ptlrpc] [7664774.001206] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664774.008783] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664774.013845] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664774.020122] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664774.026740] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664774.033013] Mem-Info: [7664774.035474] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:35054 inactive_file:37945 isolated_file:2336 unevictable:9044 dirty:0 writeback:0 unstable:0 slab_reclaimable:824039 slab_unreclaimable:62296796 mapped:1587 shmem:0 pagetables:346 bounce:0 free:590337 free_pcp:0 free_cma:0 [7664774.069661] Node 2 Normal free:525456kB min:525584kB low:656980kB high:788376kB active_anon:0kB inactive_anon:0kB active_file:30956kB inactive_file:40012kB unevictable:8680kB isolated(anon):0kB isolated(file):1024kB present:67108352kB managed:66054620kB mlocked:8680kB dirty:0kB writeback:0kB mapped:5332kB shmem:0kB slab_reclaimable:715224kB slab_unreclaimable:62476440kB kernel_stack:7760kB pagetables:20kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:209338 all_unreclaimable? yes [7664774.116614] lowmem_reserve[]: 0 0 0 0 [7664774.120587] Node 2 Normal: 27543*4kB (UEM) 40211*8kB (UEM) 919*16kB (UEM) 1663*32kB (UEM) 404*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 525636kB [7664774.136134] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664774.145020] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664774.153645] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664774.162546] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664774.171180] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664774.180086] Node 2 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664774.188699] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [7664774.197573] Node 3 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [7664774.206229] 75943 total pagecache pages [7664774.210254] 0 pages in swap cache [7664774.213754] Swap cache stats: add 21120866, delete 21136838, find 4513473/7610018 [7664774.221446] Free swap = 3123152kB [7664774.225026] Total swap = 4194300kB [7664774.228619] 66993253 pages RAM [7664774.231862] 0 pages HighMem/MovableOnly [7664774.235908] 1101945 pages reserved [7664774.239515] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [7664774.247585] [ 5717] 0 5717 11193 233 24 190 -1000 systemd-udevd [7664774.256396] [ 6726] 0 6726 2066254 5088 166 0 -1000 multipathd [7664774.265024] [53050] 0 53050 13880 124 28 138 -1000 auditd [7664774.273213] [53860] 0 53860 28216 276 57 257 -1000 sshd [7664774.281439] Kernel panic - not syncing: Out of memory and no killable processes... [7664774.290831] CPU: 38 PID: 6889 Comm: ll_ost_io02_054 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 [7664774.304114] Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 [7664774.311939] Call Trace: [7664774.314576] [<ffffffffa0765147>] dump_stack+0x19/0x1b [7664774.319891] [<ffffffffa075e850>] panic+0xe8/0x21f [7664774.324858] [<ffffffffa01bc17a>] out_of_memory+0x4ea/0x4f0 [7664774.330611] [<ffffffffa076066e>] __alloc_pages_slowpath+0x5d6/0x724 [7664774.337137] [<ffffffffa01c2524>] __alloc_pages_nodemask+0x404/0x420 [7664774.343663] [<ffffffffa01fd95f>] __vmalloc_node_range+0x12f/0x280 [7664774.350077] [<ffffffffc11e6a03>] ? ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664774.357120] [<ffffffffa01fdd5e>] vzalloc_node+0x4e/0x50 [7664774.362644] [<ffffffffc11e6a03>] ? ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664774.369722] [<ffffffffc11e6a03>] ptlrpc_alloc_rqbd+0x213/0x5d0 [ptlrpc] [7664774.376630] [<ffffffffc11e6ea1>] ptlrpc_grow_req_bufs+0xe1/0x2a0 [ptlrpc] [7664774.383710] [<ffffffffc11efc85>] ptlrpc_main+0xc05/0x1460 [ptlrpc] [7664774.390188] [<ffffffffc11ef080>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [7664774.397752] [<ffffffffa00c2e81>] kthread+0xd1/0xe0 [7664774.402805] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40 [7664774.409072] [<ffffffffa0777c24>] ret_from_fork_nospec_begin+0xe/0x21 [7664774.415682] [<ffffffffa00c2db0>] ? insert_kthread_work+0x40/0x40