[LU-2977] network connection rejected due to consumer defined fatal error Created: 17/Mar/13 Updated: 12/Jan/19 Resolved: 12/Jan/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Liang Zhen (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS6.3 Lustre-2.1.3 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 7259 |
| Description |
|
When client mounts the Lustre, we saw following error. What does "consumer defined fatal error" mean and why is this connection rejected? Mar 17 08:42:52 r3169 kernel: Lustre: Lustre: Build Version: RC2--PRISTINE-2.6.32-279.19.1.el6.x86_64 Mar 17 08:42:52 r3169 kernel: Lustre: Added LNI 10.9.55.1@o2ib3 [8/64/0/180] Mar 17 08:42:53 r3169 kernel: Lustre: Lustre OSC module (ffffffffa0e9b880). Mar 17 08:42:53 r3169 kernel: Lustre: Lustre LOV module (ffffffffa0f2dce0). Mar 17 08:42:53 r3169 kernel: Lustre: Lustre client module (ffffffffa1019020). Mar 17 08:42:53 r3169 kernel: Lustre: MGC10.9.103.1@o2ib3: Reactivating import Mar 17 08:42:53 r3169 kernel: LustreError: 929:0:(o2iblnd_cb.c:2569:kiblnd_rejected()) 10.9.102.38@o2ib3 rejected: consumer defined fatal error Mar 17 08:42:53 r3169 kernel: Lustre: 3305:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1363470173/real 1363470173] req@ffff88062749b400 x1429702099075137/t0(0) o8->images-OST002e-osc-ffff880865647000@10.9.102.38@o2ib3:28/4 lens 368/512 e 0 to 1 dl 1363470178 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 17 08:43:43 r3169 kernel: LustreError: 11-0: an error occurred while communicating with 10.9.102.37@o2ib3. The ost_connect operation failed with -19 Mar 17 08:44:08 r3169 kernel: LustreError: 927:0:(o2iblnd_cb.c:2569:kiblnd_rejected()) 10.9.102.38@o2ib3 rejected: consumer defined fatal error Mar 17 08:44:08 r3169 kernel: Lustre: 3305:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1363470248/real 1363470248] req@ffff881064484800 x1429702099075267/t0(0) o8->images-OST002e-osc-ffff880865647000@10.9.102.38@o2ib3:28/4 lens 368/512 e 0 to 1 dl 1363470259 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 17 08:44:33 r3169 kernel: LustreError: 11-0: an error occurred while communicating with 10.9.102.37@o2ib3. The ost_connect operation failed with -19 Mar 17 08:44:58 r3169 kernel: LustreError: 935:0:(o2iblnd_cb.c:2569:kiblnd_rejected()) 10.9.102.38@o2ib3 rejected: consumer defined fatal error Mar 17 08:44:58 r3169 kernel: Lustre: 3305:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1363470298/real 1363470298] req@ffff88100a3e6800 x1429702099078280/t0(0) o8->images-OST002e-osc-ffff880865647000@10.9.102.38@o2ib3:28/4 lens 368/512 e 0 to 1 dl 1363470314 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 17 08:45:23 r3169 kernel: LustreError: 11-0: an error occurred while communicating with 10.9.102.37@o2ib3. The ost_connect operation failed with -19 Mar 17 08:45:48 r3169 kernel: LustreError: 935:0:(o2iblnd_cb.c:2569:kiblnd_rejected()) 10.9.102.38@o2ib3 rejected: consumer defined fatal error Mar 17 08:45:48 r3169 kernel: Lustre: 3305:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1363470348/real 1363470348] req@ffff88100a019000 x1429702099078408/t0(0) o8->images-OST002e-osc-ffff880865647000@10.9.102.38@o2ib3:28/4 lens 368/512 e 0 to 1 dl 1363470369 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Mar 17 08:46:13 r3169 kernel: LustreError: 11-0: an error occurred while communicating with 10.9.102.37@o2ib3. The ost_connect operation failed with -19 Mar 17 08:46:38 r3169 kernel: LustreError: 935:0:(o2iblnd_cb.c:2569:kiblnd_rejected()) 10.9.102.38@o2ib3 rejected: consumer defined fatal error |
| Comments |
| Comment by Peter Jones [ 17/Mar/13 ] |
|
Liang Could you please advise on this one? Thanks Peter |
| Comment by Isaac Huang (Inactive) [ 20/Mar/13 ] |
|
That's a weird error - client didn't seem to recognize the magic number in reject messages. Was there any error showing up on 10.9.102.38@o2ib3? |
| Comment by Cory Spitz [ 07/Jun/17 ] |
|
From lustre-discuss@lists.lustre.org 4/25/2017: Regarding: Andreas Dilger noted:
Doug Oucharek responded:
|
| Comment by Cory Spitz [ 07/Jun/17 ] |
|
FYI: About the lustre-discuss conversation – it was determined to be a failing IB subnet manager. While sminfo reported good health, a more formal check of the manager proved that it was faulty. |