Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
None
-
None
-
Lustre 2.10.1-RC1 + OFED4.0 Melanox IB EDR + MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64.tgz Test with 2 MDS servers (1 MDT/Server)
-
3
-
9223372036854775807
Description
Hi,
I try to use OFED4.0 driver to fix some IB error during lustre io stress test but got create striped directory error.
I have test 2.10.1-RC1 and Lustre2.10.0(+ LU-9500 patch) also have the same error.
It looks like something wrong between MDTs communication in different node.
(all mdts in the same node have no problem)
//Two mdts must in different servers
[root@hsm client]# lfs mkdir -c 2 dir1
error on LL_IOC_LMV_SETSTRIPE 'dir1' (3): Input/output error
error: mkdir: create stripe dir 'dir1' failed
ALL osp state are normal
[mdt1 server]
./osp/jlustre-MDT0000-osp-MDT0001/state:current_state: FULL
./osp/jlustre-MDT0000-osp-MDT0001/import: state: FULL
./osp/jlustre-OST0000-osc-MDT0001/state:current_state: FULL
./osp/jlustre-OST0000-osc-MDT0001/import: state: FULL
[mdt0 server]
./osp/jlustre-MDT0001-osp-MDT0000/state:current_state: FULL
./osp/jlustre-MDT0001-osp-MDT0000/import: state: FULL
./osp/jlustre-OST0000-osc-MDT0000/state:current_state: FULL
./osp/jlustre-OST0000-osc-MDT0000/import: state: FULL
[/var/log/message in mdt0 server]
Sep 19 02:50:14 ossb2 kernel: LNetError: 21764:0:(o2iblnd.c:1940:kiblnd_fmr_pool_map()) Failed to map mr 10/11 elements
Sep 19 02:50:14 ossb2 kernel: LNetError: 21764:0:(o2iblnd_cb.c:560:kiblnd_fmr_map_tx()) Can't map 41033 pages: -22
Sep 19 02:50:14 ossb2 kernel: LNetError: 21764:0:(o2iblnd_cb.c:1554:kiblnd_send()) Can't setup GET sink for 172.20.110.209@o2ib: -22
Sep 19 02:50:14 ossb2 kernel: LustreError: 21764:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff88086ea2e400
Sep 19 02:51:54 ossb2 kernel: LustreError: 21764:0:(ldlm_lib.c:3237:target_bulk_io()) @@@ timeout on bulk WRITE after 100+0s req@ffff880457208c50 x1578948605516272/t0(0) o1000->jlustre-MDT0001-mdtlov_UUID@172.20.110.209@o2ib:210/0 lens 376/0 e 4 to 0 dl 1505803920 ref 1 fl Interpret:/0/ffffffff rc 0/-1
[/var/log/messages in mdt1 server]
Sep 19 14:51:22 ossb1 kernel: LustreError: 11-0: jlustre-MDT0000-osp-MDT0001: operation out_update to node 172.20.110.210@o2ib failed: rc = -110
Sep 19 14:51:22 ossb1 kernel: LustreError: 31069:0:(layout.c:2085:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)#012 req@ffff8807d3aa7800 x1578948605516272/t0(0) o1000->jlustre-MDT0000-osp-MDT0001@172.20.110.210@o2ib:24/4 lens 376/192 e 4 to 0 dl 1505803889 ref 2 fl Interpret:ReM/0/0 rc -110/-110
Sep 19 14:51:24 ossb1 kernel: LustreError: 30780:0:(llog_cat.c:773:llog_cat_cancel_records()) jlustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog
Attachments
Issue Links
- duplicates
-
LU-9958 Create striped directory fail in 2.10(with LU-9500 patch)
- Resolved