Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10010

Create striped directory fail in 2.10.1-RC1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • None
    • None
    • Lustre 2.10.1-RC1 + OFED4.0 Melanox IB EDR + MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64.tgz Test with 2 MDS servers (1 MDT/Server)
    • 3
    • 9223372036854775807

    Description

      Hi,
      I try to use OFED4.0 driver to fix some IB error during lustre io stress test but got create striped directory error.
      I have test 2.10.1-RC1 and Lustre2.10.0(+ LU-9500 patch) also have the same error.
      It looks like something wrong between MDTs communication in different node.
      (all mdts in the same node have no problem)

      //Two mdts must in different servers
      [root@hsm client]# lfs mkdir -c 2 dir1
      error on LL_IOC_LMV_SETSTRIPE 'dir1' (3): Input/output error
      error: mkdir: create stripe dir 'dir1' failed

      ALL osp state are normal
      [mdt1 server]
      ./osp/jlustre-MDT0000-osp-MDT0001/state:current_state: FULL
      ./osp/jlustre-MDT0000-osp-MDT0001/import: state: FULL
      ./osp/jlustre-OST0000-osc-MDT0001/state:current_state: FULL
      ./osp/jlustre-OST0000-osc-MDT0001/import: state: FULL
      [mdt0 server]
      ./osp/jlustre-MDT0001-osp-MDT0000/state:current_state: FULL
      ./osp/jlustre-MDT0001-osp-MDT0000/import: state: FULL
      ./osp/jlustre-OST0000-osc-MDT0000/state:current_state: FULL
      ./osp/jlustre-OST0000-osc-MDT0000/import: state: FULL

      [/var/log/message in mdt0 server]
      Sep 19 02:50:14 ossb2 kernel: LNetError: 21764:0:(o2iblnd.c:1940:kiblnd_fmr_pool_map()) Failed to map mr 10/11 elements
      Sep 19 02:50:14 ossb2 kernel: LNetError: 21764:0:(o2iblnd_cb.c:560:kiblnd_fmr_map_tx()) Can't map 41033 pages: -22
      Sep 19 02:50:14 ossb2 kernel: LNetError: 21764:0:(o2iblnd_cb.c:1554:kiblnd_send()) Can't setup GET sink for 172.20.110.209@o2ib: -22
      Sep 19 02:50:14 ossb2 kernel: LustreError: 21764:0:(events.c:449:server_bulk_callback()) event type 5, status -5, desc ffff88086ea2e400
      Sep 19 02:51:54 ossb2 kernel: LustreError: 21764:0:(ldlm_lib.c:3237:target_bulk_io()) @@@ timeout on bulk WRITE after 100+0s req@ffff880457208c50 x1578948605516272/t0(0) o1000->jlustre-MDT0001-mdtlov_UUID@172.20.110.209@o2ib:210/0 lens 376/0 e 4 to 0 dl 1505803920 ref 1 fl Interpret:/0/ffffffff rc 0/-1
      [/var/log/messages in mdt1 server]
      Sep 19 14:51:22 ossb1 kernel: LustreError: 11-0: jlustre-MDT0000-osp-MDT0001: operation out_update to node 172.20.110.210@o2ib failed: rc = -110
      Sep 19 14:51:22 ossb1 kernel: LustreError: 31069:0:(layout.c:2085:__req_capsule_get()) @@@ Wrong buffer for field `object_update_reply' (1 of 1) in format `OUT_UPDATE': 0 vs. 4096 (server)#012 req@ffff8807d3aa7800 x1578948605516272/t0(0) o1000->jlustre-MDT0000-osp-MDT0001@172.20.110.210@o2ib:24/4 lens 376/192 e 4 to 0 dl 1505803889 ref 2 fl Interpret:ReM/0/0 rc -110/-110
      Sep 19 14:51:24 ossb1 kernel: LustreError: 30780:0:(llog_cat.c:773:llog_cat_cancel_records()) jlustre-MDT0000-osp-MDT0001: fail to cancel 1 of 1 llog

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              sebg-crd-pm sebg-crd-pm (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: