[LU-9461] lustre client mount fail after update IB driver and Lustre patch. Created: 08/May/17 Updated: 18/Sep/17 Resolved: 30/Aug/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | sebg-crd-pm (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | llnl | ||
| Environment: |
CentOS7.3 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
The lustre client mount fail after update IB driver and Lustre client patch( mount fail error mesage |
| Comments |
| Comment by Peter Jones [ 08/May/17 ] |
|
Hi there The Doug Do you have any suggestions here Peter |
| Comment by Doug Oucharek (Inactive) [ 09/May/17 ] |
|
The "Async QP event type 3" is a IB_EVENT_QP_ACCESS_ERR. This error will stop the connection from continuing and explains all the errors which follow in your logs. This error happens when a call to ib_create_qp() fails. It could fail if the version of the parameters being passed in is wrong (i.e. the size of the parameter structure is incorrect). This could be related to the other issue I am currently working on that involves MOFED 4. If MOFED 4 has changed the structure we use as a parameter to this call and we have not adapted to that change, we could see an error like this. Does this error happen on each mount attempt or was this a one off? |
| Comment by sebg-crd-pm (Inactive) [ 10/May/17 ] |
|
This error happen on each mount attempt.(the test lustre filesystem servers OFED is 3.4) |
| Comment by Doug Oucharek (Inactive) [ 16/May/17 ] |
|
I ran into this "Async" error at the same time as the issues I talk about in |
| Comment by Li Dongyang (Inactive) [ 17/May/17 ] |
|
I think it's my fault. diff --git a/lnet/klnds/o2iblnd/o2iblnd.c b/lnet/klnds/o2iblnd/o2iblnd.c
index 047fe3c..ba7829b 100644
--- a/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1900,8 +1900,6 @@ again:
return n < 0 ? n : -EINVAL;
}
- mr->iova = iov;
-
wr = &frd->frd_fastreg_wr;
memset(wr, 0, sizeof(*wr));
|
| Comment by Doug Oucharek (Inactive) [ 17/May/17 ] |
|
I'm curious as to why you would not want to set the mr->iova value? Is this an unneeded step? |
| Comment by Li Dongyang (Inactive) [ 18/May/17 ] |
|
Hi Doug, I believe the issue only applies to mlx5 cards using MOFED4. in MOFED4, mr->iova is set by ib_map_mr_sg()->ib_sg_to_pages() It doesn't make sense to reset mr->iova after calling ib_map_mr_sg(). That line of code was introduced to address an similar issue, see the comments on https://review.whamcloud.com/#/c/19168/ I've done some testing using MOFED4 + lustre-release with mlx4 cards forcing fast reg as well, so far I've seen no problems. |
| Comment by sebg-crd-pm (Inactive) [ 18/May/17 ] |
|
I can mount lustre ok ( 2.9.57_69_g0bc1964 + Is it ok to apply only |
| Comment by Doug Oucharek (Inactive) [ 18/May/17 ] |
|
Li: Thank you for the information. I checked and you are right, ib_map_mr_sg() does set mr->iova so that line is not needed (could cause a problem). I will update sebg-crd-pm: Correct, you only need -- |
| Comment by Giuseppe Di Natale (Inactive) [ 29/Aug/17 ] |
|
We have just encountered this issue as well. Is it possible to have |
| Comment by Peter Jones [ 30/Aug/17 ] |
|
dinatale2 can you please open a new ticket to track this request? |
| Comment by sebg-crd-pm (Inactive) [ 08/Sep/17 ] |
|
Hi, I got create striped directory error in Lustre 2.10 with [root@hsm client]# lfs mkdir -c 2 dir1 Should I apply any other patch for this issue? Thanks |
| Comment by Peter Jones [ 08/Sep/17 ] |
|
This issue is being tracked under |