[LU-6723] Setting map_on_demand for o2iblnd driver prevents lustre bring up. Created: 15/Jun/15 Updated: 16/Dec/15 Resolved: 16/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0, Lustre 2.5.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James A Simmons | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | lnet | ||
| Environment: |
Cray routers running SLES11 SP3. Found this issue exist for all lustre versions. |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
While testing setting map_on_demand with the patch from |
| Comments |
| Comment by James A Simmons [ 15/Jun/15 ] |
|
As a note this happened also when the patch from |
| Comment by Andreas Dilger [ 16/Jun/15 ] |
|
James, could you please provide a bit more information about what you mean by "Cray routers prevented Lustre from functioning"? Any errors in the logs? Does "lctl ping" work? Does the IB-level network testing still work? Is Cray running a customized OFED? It may be that this isn't a Lustre/LNet problem at all. |
| Comment by James A Simmons [ 17/Jun/15 ] |
|
Is is the errors that appear on the OSS nodes when I enabled map_on_demand on the Cray routers. 00000020:02000400:10.0:1433178825.928974:0:28309:0:(tgt_handler.c:1834:tgt_brw_read()) sultan-OST0034: Bulk IO read error with b9cf5051-0ff9-6cf9-cd67-9364a2516176 (at 30@gni1), client will retry: rc -110 The Cray routers are using the mlx5 driver from the OFED 3.12 stack. Realizing what the problem is I need to collect logs from the routers so we know what is really going on. The OSS bulk timeouts are a symptom of the real problem. |
| Comment by James A Simmons [ 22/Jun/15 ] |
|
As a small note the OSS that also had problems when map_on_demand is enabled was running RHEL6.5 with the default distro infiniband stack. So it is not a inifinband issue. |
| Comment by Jian Yu [ 08/Oct/15 ] |
|
Hi James, |
| Comment by James A Simmons [ 08/Oct/15 ] |
|
I haven't tried in a while. Will do. |
| Comment by James A Simmons [ 16/Dec/15 ] |
|
Just tried it. Now that the OFED stack has been updated to a newer 3.12 the mlx5 driver no longer supports FMR so this issue has gone away. I will be trying the |
| Comment by Jian Yu [ 16/Dec/15 ] |
|
Thank you James. |