Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8574

The logic of verifying FID-in-dirent may handle the dangling or corrupted name entry improperly

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      If the name entry corrupted and contains bad inode#, when osd_dirent_check_repair() verifies such name entry, it will use the wrong inode# stored in the name entry to locate the target inode, such inode may does not exist, or belong to other.

      == sanity-lfsck test 23b: LFSCK can repair dangling name entry (2) == 01:46:53 (1471830413)
      #####
      The objectA has multiple hard links, one of them corresponding
      to the name entry_B. But there is something wrong for the name
      entry_B and cause entry_B to references non-exist object_C.
      In the first-stage scanning, the LFSCK will think the entry_B
      as dangling, and re-create the lost object_C. When the LFSCK
      comes to the second-stage scanning, it will find that the
      former re-creating object_C is not proper, and will try to
      replace the object_C with the real object_A.
      #####
      Inject failure stub on MDT0 to simulate dangling name entry
      fail_loc=0x1621
      fail_loc=0
      'ls' should fail because of dangling name entry
      Trigger namespace LFSCK to find out dangling name entry
      Started LFSCK on the device lustre-MDT0000: scrub namespace
      Waiting 32 secs for update
      Updated after 5s: wanted 'completed' got 'completed'
       sanity-lfsck test_23b: @@@@@@ FAIL: (9) Fail to repair dangling name entry: 0 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4853:error()
        = /usr/lib64/lustre/tests/sanity-lfsck.sh:3045:test_23b()
        = /usr/lib64/lustre/tests/test-framework.sh:5113:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5151:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4955:run_test()
        = /usr/lib64/lustre/tests/sanity-lfsck.sh:3056:main()
      Dumping lctl log to /tmp/test_logs/1471830385/sanity-lfsck.test_23b.*.1471830419.log
      fre1234: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts.
      
      fre1233: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts.
      
      fre1236: Warning: Permanently added 'fre1235,192.168.112.35' (RSA) to the list of known hosts.
      
      Resetting fail_loc on all nodes...done.
      FAIL 23b (10s)
      

      Attachments

        Activity

          People

            yong.fan nasf (Inactive)
            yong.fan nasf (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: