Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4395

files created with non-existent objects

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 12059

    Description

      KIT has run into an issue where the MDT is creating files with objects that do not exist. Some of the symptoms look similar to LU-4034.

      On client:

      [root@client scc]# touch tmp/gaga
      touch: setting times of `tmp/gaga': No such file or directory
      [root@client scc]# lfs getstripe tmp/gaga
      tmp/gaga
      lmm_stripe_count:   4
      lmm_stripe_size:    1048576
      lmm_layout_gen:     0
      lmm_stripe_offset:  5
      	obdidx		 objid		 objid		 group
      	     5	      65948624	    0x3ee4bd0	             0
      	    25	      66739551	    0x3fa5d5f	             0
      	     9	      65922640	    0x3ede650	             0
      	    24	      66084357	    0x3f05e05	             0
      LustreError: 11-0: HC3WORK-OST0005-osc-ffff8804987dec00: Communicating 
      with 172.26.3.138@o2ib, operation ldlm_enqueue failed with -12.
      [root@mds2 perftest]# ls -al
      ls: cannot access eaea: Cannot allocate memory
      ls: cannot access gaga: Cannot allocate memory
      total 12
      drwxr-xr-x  3 er2341 scc  4096 Dec 12 21:40 .
      drwx------ 10 er2341 scc  4096 Sep 19 16:16 ..
      -rw-r--r--  1 root   root    0 Dec 12 21:41 e
      -?????????  ? ?      ?       ?            ? eaea
      -rw-r--r--  1 root   root    0 Dec 12 21:41 f
      -?????????  ? ?      ?       ?            ? gaga
      drwxr-xr-x  2 root   root 4096 Dec 12 21:40 tmp
      

      on OSS:

      Dec 12 22:25:05 oss1 kernel: : LustreError: 14167:0:(ldlm_resource.c:1165:ldlm_resource_get()) HC3WORK-OST0005: lvbo_init failed for resource 0x3ee4bd0:0x0: rc = -2
      

      On thing that's odd is that all the other OSTs on the system delete orphan objects around that object ID number, but not ost5:

      # echo $((0x3ee4bd0))
      65948624
      
      Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0002: deleting orphan objects from 0x0:66829746 to 0x0:66830014
      Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0006: deleting orphan objects from 0x0:66151265 to 0x0:66151535
      Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0000: deleting orphan objects from 0x0:66341886 to 0x0:66342155
      Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767207
      Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0003: deleting orphan objects from 0x0:66145109 to 0x0:66145379
      

      Another weird thing is that the OSTs seem to delete the same objects repeatedly:

      Dec  9 15:04:30 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767015
      Dec  9 15:11:41 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767015
      Dec 10 09:58:31 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767047
      Dec 10 16:20:25 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767079
      Dec 10 16:33:00 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767111
      Dec 11 15:54:57 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767143
      Dec 11 16:50:03 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767175
      Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767207
      

      The filesystem was put back into production by disabling the OSTs that have this symptom. Are there any suggestions for what to look at in order to further debug this issue? Any logs we should get?

      Thanks,
      Kit

      Attachments

        1. kern-mds1
          2.51 MB
        2. kern-mds2
          2.78 MB
        3. kern-oss1
          917 kB

        Activity

          People

            niu Niu Yawei (Inactive)
            kitwestneat Kit Westneat (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: