Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7913

lustre 2.8 servers and 2.5.5 client - temporary I/O error on 2.8 clients

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.8.0
    • servers:lustre-2.8.0-2.6.32_573.18.1.1chaos.ch5.4.x86_64_g0bbc784.x86_64
      client: lustre-2.5.5-3chaos_2.6.32_573.8.1.2chaos.ch5.4.x86_64.x86_64
    • 3
    • 9223372036854775807

    Description

      On 2.8 client, user sees:
      ls: cannot access /p/ldne/faaland1: Input/output error

      After unmounting and remounting on the 2.8 clients, the problem went away.

      Client console log shows:
      2016-03-23 16:44:33 LustreError: 32645:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5

      MDS console log shows:
      2016-03-23 16:44:33 LustreError: 44914:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r2016-03-23 16:44:33 LustreError: 44914:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 20 previous similar messages
      2016-03-23 16:45:22 LustreError: 44914:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r
      2016-03-23 16:45:22 LustreError: 44914:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 20 previous similar messages
      2016-03-23 16:45:28 LustreError: 44195:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r
      2016-03-23 16:45:28 LustreError: 44195:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 19 previous similar messages
      2016-03-23 16:46:03 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r2016-03-23 16:51:42 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r
      2016-03-23 16:51:42 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 1 previous similar message
      2016-03-23 16:54:19 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r2016-03-23 16:54:19 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 3 previous similar messages

      The following sequence of steps produced the corrupt directory entry:

      1. formatted filesystem on servers running 2.8 RC4
      2. mounted on nodes running 2.8 RC4 clients
      3. mounted on node running 2.55 client
      4. on 2.55 client, created directories under root including "faaland1"
      5. on 2.8 client zwicky80, renamed faaland1 faaland1.old
      6. on 2.8 client zwicky80, lfs mkdir faaland1 --count=4; lfs setdirstripe -D --count=4 faaland1
      7. On all 2.8 clients OTHER THAN zwicky80, (e.g. zwicky82), attempt to access faaland1 fails. I thought I saw ENOTSUPP in error output somewhere, but cannot find it now.
      8. After umounting on all the clients, and then remounting on the 2.8 clients, access to the directory appeared to work normally again.

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: