Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14197

sanity-lfsck test_28: Timeout occurred after 251 mins, last suite running was sanity-lfsck

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      == sanity-lfsck test 28: Skip the failed MDT(s) when handle orphan MDT-objects ======================= 04:12:55 (1607400775)
      #####
      The target name entry is lost. The LFSCK should insert the
      orphan MDT-object under .lustre/lost+found/MDTxxxx. But if
      the MDT (on which the orphan MDT-object resides) has ever
      failed to respond some name entry verification during the
      first stage-scanning, then the LFSCK should skip to handle
      orphan MDT-object on this MDT. But other MDTs should not
      be affected.
      #####
      Inject failure stub on MDT0 to simulate the case that
      d1/a1's name entry will be removed, but the d1/a1's object
      and its linkEA are kept in the system. And the case that
      d2/a2's name entry will be removed, but the d2/a2's object
      and its linkEA are kept in the system.
      CMD: trevis-28vm4 /usr/sbin/lctl set_param fail_loc=0x1624
      fail_loc=0x1624
      CMD: trevis-28vm10 /usr/sbin/lctl set_param fail_loc=0x1624
      fail_loc=0x1624
      CMD: trevis-28vm4 /usr/sbin/lctl set_param fail_loc=0
      fail_loc=0
      CMD: trevis-28vm10 /usr/sbin/lctl set_param fail_loc=0
      fail_loc=0
      Inject failure, to simulate the MDT0 fail to handle
      MDT1 LFSCK request during the first-stage scanning.
      CMD: trevis-28vm10 /usr/sbin/lctl set_param fail_loc=0x161c fail_val=0
      fail_loc=0x161c
      fail_val=0
      Trigger namespace LFSCK on all devices to find out orphan object
      CMD: trevis-28vm4 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
      Started LFSCK on the device lustre-MDT0000: scrub namespace
      CMD: trevis-28vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
      awk '/^status/

      { print \$2 }'
      CMD: trevis-28vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace |
      awk '/^status/ { print $2 }

      '
      CMD: trevis-28vm10 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.lfsck_namespace |
      awk '/^status/

      { print \$2 }'
      CMD: trevis-28vm10 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.lfsck_namespace |
      awk '/^status/ { print $2 }

      '
      CMD: trevis-28vm10 /usr/sbin/lctl set_param fail_loc=0 fail_val=0
      fail_loc=0
      fail_val=0
      CMD: trevis-28vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
      CMD: trevis-28vm10 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.lfsck_namespace
      Trigger namespace LFSCK on all devices again to cleanup
      CMD: trevis-28vm4 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace -r -A
      Started LFSCK on the device lustre-MDT0000: scrub namespace
      CMD: trevis-28vm4 /usr/sbin/lctl lfsck_query -t namespace -M lustre-MDT0000 -w |
      awk '/^namespace_mdts_completed/

      { print \$2 }

      '

      Attachments

        Activity

          People

            wc-triage WC Triage
            lixi_wc Li Xi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: