Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8810

sanity-lfsck test_18d: @@@@@@ FAIL: (3.0) MDS1 is not the expected 'scanning-phase2'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      == sanity-lfsck test 18d: Find out orphan OST-object and repair it (4) =============================== 16:03:13 (1477411393)
      #####
      The target MDT-object layout EA slot is occpuied by some new
      created OST-object when repair dangling reference case. Such
      conflict OST-object has never been modified. Then when found
      the orphan OST-object, LFSCK will replace it with the orphan
      OST-object.
      #####
      [0x280000400:0x4:0x0]
      /mnt/lustre/d18d.sanity-lfsck/a1/f1
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  0
      	obdidx		 objid		 objid		 group
      	     0	             2	          0x2	             0
      
      [0x280000400:0x5:0x0]
      /mnt/lustre/d18d.sanity-lfsck/a1/f2
      lmm_stripe_count:   1
      lmm_stripe_size:    1048576
      lmm_pattern:        1
      lmm_layout_gen:     0
      lmm_stripe_offset:  0
      	obdidx		 objid		 objid		 group
      	     0	             3	          0x3	             0
      
      Inject failure to make /mnt/lustre/d18d.sanity-lfsck/a1/f1 and /mnt/lustre/d18d.sanity-lfsck/a1/f2
      to reference the same OST-object (which is f1's OST-obejct).
      Then drop /mnt/lustre/d18d.sanity-lfsck/a1/f1 and its OST-object, so f2 becomes
      dangling reference case, but f2's old OST-object is there.
      
      fail_loc=0x1618
      fail_loc=0
      stopall to cleanup object cache
      setupall
      pdsh@fre0127: fre0125: ssh exited with exit code 1
      pdsh@fre0127: fre0125: ssh exited with exit code 1
      pdsh@fre0127: fre0125: ssh exited with exit code 1
      pdsh@fre0127: fre0125: ssh exited with exit code 1
      pdsh@fre0127: fre0125: ssh exited with exit code 1
      pdsh@fre0127: fre0125: ssh exited with exit code 1
      pdsh@fre0127: fre0126: ssh exited with exit code 1
      pdsh@fre0127: fre0126: ssh exited with exit code 1
      pdsh@fre0127: fre0126: ssh exited with exit code 1
      pdsh@fre0127: fre0126: ssh exited with exit code 1
      The file size should be incorrect since dangling referenced
      ls: cannot access /mnt/lustre/d18d.sanity-lfsck/a1/f2: No such file or directory
      fail_val=5
      fail_loc=0x1602
      Trigger layout LFSCK on all devices to find out orphan OST-object
      Started LFSCK on the device lustre-MDT0000: scrub layout
      Waiting 120 secs for update
      Waiting 110 secs for update
      Waiting 100 secs for update
      Waiting 90 secs for update
      Waiting 80 secs for update
      Waiting 70 secs for update
      Waiting 60 secs for update
      Waiting 50 secs for update
      Waiting 40 secs for update
      Waiting 30 secs for update
      Waiting 20 secs for update
      Waiting 10 secs for update
      Update not seen after 120s: wanted 'scanning-phase2' got 'completed'
       sanity-lfsck test_18d: @@@@@@ FAIL: (3.0) MDS1 is not the expected 'scanning-phase2' 
      ...
      Resetting fail_loc on all nodes...done.
      FAIL 18d (214s)
      
      

      Attachments

        Activity

          People

            yong.fan nasf (Inactive)
            yong.fan nasf (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: