[LU-10089] kiblnd_fmr_pool_map() Failed to map mr 10/11 elements Created: 05/Oct/17 Updated: 02/Feb/18 Resolved: 24/Oct/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Olaf Faaland | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
Build Version: 2.8.0_12.chaos |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
The following group of messages appear in the console logs of MDTs. 2017-10-04 08:11:00 [407096.858161] LNetError: 174158:0:(o2iblnd.c:1893:kiblnd_fmr_pool_map()) Failed to map mr 10/11 elements 2017-10-04 08:11:00 [407096.869697] LNetError: 174158:0:(o2iblnd_cb.c:590:kiblnd_fmr_map_tx()) Can't map 41033 pages: -22 2017-10-04 08:11:00 [407096.880686] LNetError: 174158:0:(o2iblnd_cb.c:1582:kiblnd_send()) Can't setup GET sink for 172.19.1.112@o2ib100: -22 2017-10-04 08:11:00 [407096.893504] LustreError: 174158:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff883ebaa9bb00 2017-10-04 08:12:40 [407196.901157] LustreError: 174158:0:(ldlm_lib.c:3186:target_bulk_io()) @@@ timeout on bulk WRITE after 100+0s req@ffff883f27232850 x1579913603429696/t0(0) o1000->lquake-MDT0001-mdtlov_UUID@172.19.1.112@o2ib100:-1/-1 lens 352/0 e 0 to 0 dl 1507130003 ref 1 fl Interpret:/0/ffffffff rc 0/-1 The nodes have Mellanox ConnectX-4 IB adapters. |
| Comments |
| Comment by Olaf Faaland [ 05/Oct/17 ] |
|
The file system has 16 MDTs, each on a separate MDS. This log snippet is from the server hosting MGS and MDT0, NID 172.19.1.111@o2ib100. The nodes referenced by NIDs in the range172.19.1.112@o2ib100 to 172.19.1.126@o2ib100 are the other MDSs. |
| Comment by Olaf Faaland [ 05/Oct/17 ] |
|
The console log for the MGS/MDS0000 node, which the "Failed to map mr" messages appear in, is attached as console.log.jet1.lu-10089.txt |
| Comment by Olaf Faaland [ 05/Oct/17 ] |
|
I spot checked a few examples. There appears to be a corresponding error message on the node referred to by the "timeout on bulk WRITE" message. For example: jet1: 2017-10-05 06:55:32 [488964.187806] LNetError: 174158:0:(o2iblnd.c:1893:kiblnd_fmr_pool_map()) Failed to map mr 10/11 elements 2017-10-05 06:55:32 [488964.199374] LNetError: 174158:0:(o2iblnd_cb.c:590:kiblnd_fmr_map_tx()) Can't map 41033 pages: -22 2017-10-05 06:55:32 [488964.210361] LNetError: 174158:0:(o2iblnd_cb.c:1582:kiblnd_send()) Can't setup GET sink for 172.19.1.112@o2ib100: -22 2017-10-05 06:55:32 [488964.223175] LustreError: 174158:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff883ebaa9bf00 2017-10-05 06:57:12 [489064.231425] LustreError: 174158:0:(ldlm_lib.c:3186:target_bulk_io()) @@@ timeout on bulk WRITE after 100+0s req@ffff883f27239c50 x1579913603914692/t0(0) o1000->lquake-MDT0001-mdtlov_UUID@172.19.1.112@o2ib100:-1/-1 lens 352/0 e 2 to 0 dl 1507211843 ref 1 fl Interpret:/0/ffffffff rc 0/-1 jet2: 2017-10-05 06:57:12 [489015.158245] LustreError: 11-0: lquake-MDT0000-osp-MDT0001: operation out_update to node 172.19.1.111@o2ib100 failed: rc = -110 2017-10-05 06:57:12 [489015.173453] LustreError: 16866:0:(layout.c:2025:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server) 2017-10-05 06:57:12 [489015.173453] req@ffff882b342cdd00 x1579913603914692/t0(0) o1000->lquake-MDT0000-osp-MDT0001@172.19.1.111@o2ib100:24/4 lens 352/192 e 2 to 0 dl 1507211888 ref 2 fl Interpret:ReM/0/0 rc -110/-110 2017-10-05 06:57:16 [489019.757615] LustreError: 17382:0:(llog_cat.c:744:llog_cat_cancel_records()) lquake-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog-records: rc = -116 The nodes use ntp to sync and all report they are within the same 1/20th of a second or so. |
| Comment by Amir Shehata (Inactive) [ 05/Oct/17 ] |
|
Can you try: https://review.whamcloud.com/29290 |
| Comment by Olaf Faaland [ 05/Oct/17 ] |
Yes, I will. Creating remote directories seems to hang, then fail, and trigger these messages. Is there likely a separate problem, or does it make sense that these two symptoms would be connected? |
| Comment by Amir Shehata (Inactive) [ 05/Oct/17 ] |
|
I was able to reproduce this issue on mlx5 as well. It looks to be due to: Will need to investigate this. |
| Comment by Olaf Faaland [ 06/Oct/17 ] |
|
Amir, OK. Please check our patch stack to see what we've got. Pointer is under "Environment" above. With patch 29290, I see the following in the console logs of the MDS's, on a quiet filesystem: 2017-10-05 18:19:01 [ 1342.476344] LustreError: 15171:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff887f17347e00 2017-10-05 18:19:01 [ 1342.481443] mlx5_0:dump_cqe:262:(pid 15170): dump error cqe 2017-10-05 18:19:01 [ 1342.481444] 00000000 00000000 00000000 00000000 2017-10-05 18:19:01 [ 1342.481445] 00000000 00000000 00000000 00000000 2017-10-05 18:19:01 [ 1342.481445] 00000000 00000000 00000000 00000000 2017-10-05 18:19:01 [ 1342.481445] 00000000 9d005304 0800180b 000ff6d2 2017-10-05 18:19:01 [ 1342.481630] LustreError: 15170:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff887f1cb76800 2017-10-05 18:19:01 [ 1342.486614] mlx5_0:dump_cqe:262:(pid 15170): dump error cqe 2017-10-05 18:19:01 [ 1342.486617] 00000000 00000000 00000000 00000000 2017-10-05 18:19:01 [ 1342.486618] 00000000 00000000 00000000 00000000 2017-10-05 18:19:01 [ 1342.486619] 00000000 00000000 00000000 00000000 2017-10-05 18:19:01 [ 1342.486620] 00000000 9d005304 0800180c 000ef0d2 2017-10-05 18:19:01 [ 1342.486810] LustreError: 15170:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff887f17347a00 |
| Comment by James A Simmons [ 06/Oct/17 ] |
|
Olaf are you carrying the |
| Comment by Olaf Faaland [ 07/Oct/17 ] |
|
James: |
| Comment by Olaf Faaland [ 09/Oct/17 ] |
|
Amir, |
| Comment by Amir Shehata (Inactive) [ 10/Oct/17 ] |
|
Olaf, yes. we fixed this issue for MLX-5 but the same fix doesn't work on MLX-4. I can upload a test patch for MLX-5 for now since I think this is what you're using, correct? |
| Comment by Olaf Faaland [ 10/Oct/17 ] |
|
Amir, thanks. We have both MLX-4 and MLX-5 hardware in our center, so we need both to work. You're correct, the symptoms reported in this issue appear on servers with MLX-5. The filesystem is mounted via routers with MLX-5 and OmniPath and clients using OmniPath (based on the loaded drivers, mlx5_core/ib). I can give your existing patch a try if it would be helpful. |
| Comment by Gerrit Updater [ 10/Oct/17 ] |
|
Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/29551 |
| Comment by Amir Shehata (Inactive) [ 10/Oct/17 ] |
|
Ok, so I think we have it resolved. There are three different fixes: If you could try these three patches and see if they resolve your issue. |
| Comment by Olaf Faaland [ 11/Oct/17 ] |
|
I ran on an MLX-5 machine with good success. I'll test on MLX-4 and OPA this morning. |
| Comment by Olaf Faaland [ 11/Oct/17 ] |
|
With brief testing, I see no errors on MLX-4 and OPA machines. What is the next step? |
| Comment by Amir Shehata (Inactive) [ 11/Oct/17 ] |
|
I believe that I'll push for landing these three patches as they stabilize master as well. I'm not sure how you guys will pick up the patches. Do you maintain your own tree? or do would you need this patches backported? |
| Comment by Olaf Faaland [ 11/Oct/17 ] |
|
Backport them to 2.8 fe, please. They will likely apply cleanly. The relationship between our stack and 2.8fe is complicated but they are not that different, and we will be switching to 2.8fe + a small stack of commits very soon. |
| Comment by Olaf Faaland [ 12/Oct/17 ] |
|
Note that we cannot land these patches to our production tree until they are through your review and testing process, and are merged to master at a minimum. Let me know if there's anything I can do to help that along. |
| Comment by Amir Shehata (Inactive) [ 13/Oct/17 ] |
|
Some notes:
So the best solution is to: The three patches I described earlier seems like the ideal solution for now. |
| Comment by Olaf Faaland [ 16/Oct/17 ] |
|
This appears to be working well in my tests. |
| Comment by Olaf Faaland [ 19/Oct/17 ] |
|
Hi Amir, I see https://review.whamcloud.com/#/c/29551/ has status "Ready to land", but hasn't been landed. Is there further work needed, or is it just waiting for the next time a set of patches get merged? Thanks. |
| Comment by James A Simmons [ 19/Oct/17 ] |
|
Their is a question about querying the IB device to see if it supports IB_MR_TYPE_SG_GAPS instead of assuming that IB_MR_TYPE_SG_GAPS is always the case. |
| Comment by Amir Shehata (Inactive) [ 19/Oct/17 ] |
|
I updated https://review.whamcloud.com/#/c/29551/ to address the comments. One thing to note, if you're using OPA you should use map-on-demand set to 256. I'm still analyzing this issue and hopefully will have a patch soon. This issue is tracked under |
| Comment by Gerrit Updater [ 24/Oct/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29551/ |
| Comment by Peter Jones [ 24/Oct/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 25/Oct/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29771 |
| Comment by Minh Diep [ 02/Feb/18 ] |
|
we don't need this in 2.10 |