Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1262

mkdir followed by rmdir on a different client fails -- Object doesn't exist!

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.1.1
    • None
    • 2
    • 6427

    Description

      Customer creates a directory on one node, puts a file in that directory. Then, on a different client, tries to recursively remove the directory. Then, back on the first client, tries making the directory again. This fails.

      Here are the exact steps:

      usrs400 $ mkdir /mnt/lustre/foo
      usrs400 $ touch /mnt/lustre/foo/bar

      usrs399 $ rm -rf /mnt/lustre/foo

      usrs400 $ mkdir /mnt/lustre/foo
      mkdir: cannot create directory `foo': File exists
      usrs400 $ ls /mnt/lustre/foo
      ls: cannot access foo: No such file or directory
      usrs400 $ mkdir /mnt/lustre/foo
      mkdir: cannot create directory `foo': File exists
      usrs400 $ rmdir /mnt/lustre/foo
      rmdir: failed to remove `foo': No such file or directory
      usrs400 $ mkdir /mnt/lustre/foo

      The customer has waited 10 minutes for this to complete. For the following output from /var/log/messages, the customer only waited a second or two. Also, the customer unmounted and remounted the clients, to keep things simpler.

      usrs400:/var/log/messages:

      Mar 27 12:26:57 usrs400 kernel: [677137.616534] Lustre: Lustre: Build Version: ../lustre/scripts-20120222220600-PRISTINE-../lustre/scripts
      Mar 27 12:26:57 usrs400 kernel: [677137.758165] Lustre: Added LNI 192.168.185.6@tcp [8/256/0/180]
      Mar 27 12:26:57 usrs400 kernel: [677137.758237] Lustre: Accept secure, port 988
      Mar 27 12:27:01 usrs400 kernel: [677141.529389] Lustre: MGC192.168.185.35@tcp: Reactivating import
      Mar 27 12:27:01 usrs400 kernel: [677141.568728] LustreError: 31684:0:(obd_config.c:1147:class_process_proc_param()) writing proc entry checksum_pages err -11
      Mar 27 12:27:01 usrs400 kernel: [677141.774591] Lustre: Client xxxxxx-client has started
      Mar 27 12:27:32 usrs400 kernel: [677173.081643] LustreError: 31698:0:(file.c:2228:ll_inode_revalidate_fini()) failure -2 inode 144115239078592513
      Mar 27 12:27:32 usrs400 kernel: [677173.208438] LustreError: 31698:0:(file.c:2228:ll_inode_revalidate_fini()) failure -2 inode 144115239078592513

      usrs399:/var/log/messages:

      Mar 27 12:26:20 usrs399 kernel: [677221.853364] Lustre: Lustre: Build Version: ../lustre/scripts-20120222220600-PRISTINE-../lustre/scripts
      Mar 27 12:26:20 usrs399 kernel: [677221.993703] Lustre: Added LNI 192.168.185.7@tcp [8/256/0/180]
      Mar 27 12:26:20 usrs399 kernel: [677221.993777] Lustre: Accept secure, port 988
      Mar 27 12:26:39 usrs399 kernel: [677240.559658] Lustre: MGC192.168.185.35@tcp: Reactivating import
      Mar 27 12:26:39 usrs399 kernel: [677240.592294] LustreError: 621:0:(obd_config.c:1147:class_process_proc_param()) writing proc entry checksum_pages err -11
      Mar 27 12:26:39 usrs399 kernel: [677240.825955] Lustre: Client xxxxxx-client has started

      On the MDS (I think that there is some clock skew):

      Mar 27 12:26:39 ts-xxxxxxxx-01 kernel: Lustre: 2523:0:(ldlm_lib.c:877:target_handle_connect()) MGS: connection from b92afcf0-1504-ed4b-819e-d31039236758@192.168.185.7@tcp t0 exp (null) cur 1332851199 last 0
      Mar 27 12:26:39 ts-xxxxxxxx-01 kernel: Lustre: 2523:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGS->NET_0x20000c0a8b907_UUID netid 20000: select flavor null
      Mar 27 12:26:39 ts-xxxxxxxx-01 kernel: Lustre: 10205:0:(ldlm_lib.c:877:target_handle_connect()) xxxxxx-MDT0000: connection from 11909088-d4a2-77fa-030f-1c9e2a493436@192.168.185.7@tcp t0 exp (null) cur 1332851199 last 0
      Mar 27 12:26:39 ts-xxxxxxxx-01 kernel: Lustre: 10205:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import xxxxxx-MDT0000->NET_0x20000c0a8b907_UUID netid 20000: select flavor null
      Mar 27 12:27:01 ts-xxxxxxxx-01 kernel: Lustre: 2523:0:(ldlm_lib.c:877:target_handle_connect()) MGS: connection from 8927fc00-6820-4bea-63ad-35146f43bb3e@192.168.185.6@tcp t0 exp (null) cur 1332851221 last 0
      Mar 27 12:27:01 ts-xxxxxxxx-01 kernel: Lustre: 2523:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import MGS->NET_0x20000c0a8b906_UUID netid 20000: select flavor null
      Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) header@ffff88061fac1ec0[0x0, 1, [0x200000be0:0x1:0x0] hash lru]

      { Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) ....mdt@ffff88061fac1f18mdt-object@ffff88061fac1ec0(ioepoch=0 flags=0x0, epochcount=0, writecount=0) Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) ....cmm@ffff88061fbc83c0[local] Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) ....mdd@ffff88061fd80380mdd-object@ffff88061fd80380(open_count=0, valid=0, cltime=0, flags=0) Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) ....osd-ldiskfs@ffff88061e88b180osd-ldiskfs-object@ffff88061e88b180(i:(null):0/0)[plain] Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) }

      header@ffff88061fac1ec0
      Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) Object doesn't exist!
      Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) header@ffff88061fac1ec0[0x0, 1, [0x200000be0:0x1:0x0] hash lru]

      { Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) ....mdt@ffff88061fac1f18mdt-object@ffff88061fac1ec0(ioepoch=0 flags=0x0, epochcount=0, writecount=0) Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) ....cmm@ffff88061fbc83c0[local] Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) ....mdd@ffff88061fd80380mdd-object@ffff88061fd80380(open_count=0, valid=0, cltime=0, flags=0) Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) ....osd-ldiskfs@ffff88061e88b180osd-ldiskfs-object@ffff88061e88b180(i:(null):0/0)[plain] Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) }

      header@ffff88061fac1ec0
      Mar 27 12:27:32 ts-xxxxxxxx-01 kernel: Lustre: 2541:0:(mdt_handler.c:1010:mdt_getattr_name_lock()) Object doesn't exist!

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            rspellman Roger Spellman (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: