Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1105

file creation fails with Input/Output error due to MDT - OST reconnections

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • None
    • None
    • Lustre 2.1 with Bull patches, bullxlinux6.1 x86_64 (based on Redhat 6.1)
      file system formated with lustre 2.0.
    • 3
    • 6457

    Description

      After creating hundreds of files, the creation fails with "Input/Ouput error".

      Here is an extract from MDS log.

      00000100:02020000:5.0:1329310922.993555:0:3101:0:(client.c:1125:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.17.0.4@o2ib. The ost_connect operation failed with -114
      00000100:02020000:5.0:1329310922.993559:0:3101:0:(client.c:1125:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.17.0.3@o2ib. The ost_connect operation failed with -114
      00000100:02020000:5.0:1329310922.993563:0:3101:0:(client.c:1125:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.17.0.4@o2ib. The ost_connect operation failed with -114
      00000100:02020000:5.0:1329310922.993566:0:3101:0:(client.c:1125:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.17.0.3@o2ib. The ost_connect operation failed with -114
      00000100:02020000:5.0:1329310922.993589:0:3101:0:(client.c:1125:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.17.0.3@o2ib. The ost_connect operation failed with -114
      00000100:02020000:5.0:1329310922.993592:0:3101:0:(client.c:1125:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.17.0.3@o2ib. The ost_connect operation failed with -114
      00000100:02020000:5.0:1329310922.993595:0:3101:0:(client.c:1125:ptlrpc_check_status()) 11-0: an error occurred while communicating with 10.17.0.3@o2ib. The ost_connect operation failed with -114
      00020000:00020000:7.0F:1329310937.993023:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 8/30: rc = -11
      00020000:00020000:7.0:1329310937.993032:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 15/30: rc = -11
      00020000:00020000:7.0:1329310937.993035:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 16/30: rc = -11
      00020000:00020000:7.0:1329310937.993037:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 25/30: rc = -11
      00020000:00020000:7.0:1329310937.993040:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 12/30: rc = -11
      00020000:00020000:7.0:1329310937.993042:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 9/30: rc = -11
      00020000:00020000:7.0:1329310937.993044:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 5/30: rc = -11
      00020000:00020000:7.0:1329310937.993046:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 19/30: rc = -11
      00020000:00020000:7.0:1329310937.993049:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 21/30: rc = -11
      00020000:00020000:7.0:1329310937.993051:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 28/30: rc = -11
      00020000:00020000:7.0:1329310937.993053:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 4/30: rc = -11
      00020000:00020000:7.0:1329310937.993055:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 27/30: rc = -11
      00020000:00020000:7.0:1329310937.993057:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 22/30: rc = -11
      00020000:00020000:7.0:1329310937.993060:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 0/30: rc = -11
      00020000:00020000:7.0:1329310937.993062:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 11/30: rc = -11
      00020000:00020000:7.0:1329310937.993064:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 20/30: rc = -11
      00020000:00020000:7.0:1329310937.993066:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 7/30: rc = -11
      00020000:00020000:7.0:1329310937.993068:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 26/30: rc = -11
      00020000:00020000:7.0:1329310937.993071:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 23/30: rc = -11
      00020000:00020000:7.0:1329310937.993073:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 13/30: rc = -11
      00020000:00020000:7.0:1329310937.993075:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 2/30: rc = -11
      00020000:00020000:7.0:1329310937.993077:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 24/30: rc = -11
      00020000:00020000:7.0:1329310937.993080:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 6/30: rc = -11
      00020000:00020000:7.0:1329310937.993082:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 14/30: rc = -11
      00020000:00020000:7.0:1329310937.993084:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 10/30: rc = -11
      00020000:00020000:7.0:1329310937.993086:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 18/30: rc = -11
      00020000:00020000:7.0:1329310937.993090:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 3/30: rc = -11
      00020000:00020000:7.0:1329310937.993100:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 17/30: rc = -11
      00020000:00020000:7.0:1329310937.993103:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 29/30: rc = -11
      00020000:00020000:7.0:1329310937.993109:0:3100:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 1/30: rc = -11
      00020000:00020000:3.0F:1329310937.993138:0:3354:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 7/30: rc = -5
      00020000:00020000:3.0:1329310937.993151:0:3354:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 14/30: rc = -5
      00020000:00020000:3.0:1329310937.993158:0:3354:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 15/30: rc = -5
      00020000:00020000:3.0:1329310937.993164:0:3354:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 24/30: rc = -5
      00020000:00020000:3.0:1329310937.993171:0:3354:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x6a39 sub-object on OST idx 11/30: rc = -5
      

      Here is an extract from OSS log:

      00010000:00020000:3.0F:1329310843.814130:0:15034:0:(ldlm_lib.c:620:target_handle_reconnect()) scratch-MDT0000-mdtlov_UUID reconnecting from NET_0x500000a110002_UUID, handle mismatch (ours 0x1d014c26baadc6f4, theirs 0x561e1655a7ff215f)
      00010000:00020000:3.0:1329310843.814139:0:15034:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (-114)  req@ffff88057cd77000 x1393874236755056/t0(0) o-1-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1329310943 ref 1 fl Interpret:/ffffffff/ffffffff rc -114/-1
      00010000:00020000:3.0:1329310843.814178:0:15034:0:(ldlm_lib.c:620:target_handle_reconnect()) scratch-MDT0000-mdtlov_UUID reconnecting from NET_0x500000a110002_UUID, handle mismatch (ours 0x1d014c26baadc733, theirs 0x561e1655a7ff2189)
      00010000:00020000:7.0F:1329310843.814181:0:15008:0:(ldlm_lib.c:620:target_handle_reconnect()) scratch-MDT0000-mdtlov_UUID reconnecting from NET_0x500000a110002_UUID, handle mismatch (ours 0x1d014c26baadc82f, theirs 0x561e1655a7ff21b3)
      00010000:00020000:3.0:1329310843.814181:0:15034:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (-114)  req@ffff88061c7b8000 x1393874236755058/t0(0) o-1-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1329310943 ref 1 fl Interpret:/ffffffff/ffffffff rc -114/-1
      00010000:00020000:7.0:1329310843.814187:0:15008:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (-114)  req@ffff8805bd75f400 x1393874236755060/t0(0) o-1-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1329310943 ref 1 fl Interpret:/ffffffff/ffffffff rc -114/-1
      00010000:00020000:7.0:1329310843.814211:0:15008:0:(ldlm_lib.c:620:target_handle_reconnect()) scratch-MDT0000-mdtlov_UUID reconnecting from NET_0x500000a110002_UUID, handle mismatch (ours 0x1d014c26baadc8d7, theirs 0x561e1655a7ff221c)
      00010000:00020000:3.0:1329310843.814211:0:15034:0:(ldlm_lib.c:620:target_handle_reconnect()) scratch-MDT0000-mdtlov_UUID reconnecting from NET_0x500000a110002_UUID, handle mismatch (ours 0x1d014c26baadc859, theirs 0x561e1655a7ff2207)
      00010000:00020000:7.0:1329310843.814215:0:15008:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (-114)  req@ffff8805bd4f8400 x1393874236755065/t0(0) o-1-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1329310943 ref 1 fl Interpret:/ffffffff/ffffffff rc -114/-1
      00010000:00020000:4.0F:1329310843.814215:0:15032:0:(ldlm_lib.c:620:target_handle_reconnect()) scratch-MDT0000-mdtlov_UUID reconnecting from NET_0x500000a110002_UUID, handle mismatch (ours 0x1d014c26baadc8c2, theirs 0x561e1655a7ff21dd)
      00010000:00020000:3.0:1329310843.814215:0:15034:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (-114)  req@ffff8805cd235850 x1393874236755064/t0(0) o-1-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1329310943 ref 1 fl Interpret:/ffffffff/ffffffff rc -114/-1
      00010000:00020000:4.0:1329310843.814222:0:15032:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (-114)  req@ffff8805c0977000 x1393874236755062/t0(0) o-1-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl 1329310943 ref 1 fl Interpret:/ffffffff/ffffffff rc -114/-1
      

      Attachments

        1. dk_MDS
          42 kB
        2. dk_OSS1
          22 kB
        3. dk_OSS2
          21 kB

        Activity

          People

            bobijam Zhenyu Xu
            pichong Gregoire Pichon
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: