Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.11.0, Lustre 2.10.3
    • None
    • 2
    • 9223372036854775807

    Description

      We has some files that were created with 2.11 client and 2.10.3 server that hit LU-10437 bug. We have since update our server to 2.10.5. But the old files still can't bee seen by the user. We get this.

      on 2.11 clients + updated 2.10.5 server

      $ ls -l
      ls: cannot access 'test': Invalid argument
      total 1040384
      -????????? ? ? ? ? ? test
      

      on 2.10.3 clients + updated 2.10.5 server

      ls -l test
      -rw------- yyyy xxx 8388608 Aug 28 15:44 test
      

      Only root can view them correctly.
      Can we recover those files with-out copying them.

      Attachments

        Issue Links

          Activity

            [LU-11291] recovering from LU-10437
            yujian Jian Yu added a comment -

            Sure, Andreas, I'll work on these improvements.

            yujian Jian Yu added a comment - Sure, Andreas, I'll work on these improvements.

            If the problem relates to FLR functionality added in 2.11 as indicated in LU-10437, it is possible that running a layout LFSCK on the MDT would detect and correct this problem. However, the FLR support for LFSCK was only recently landed (commit v2_11_53_0-33-g36ba989 patch https://review.whamcloud.com/32705 "LU-10288 lfsck: layout LFSCK for mirrored file") so that functionality is not available in the MDS version you are using, nor in any released version to date.

            My recommendation would be to find the inaccessible files with a 2.11 client, and then use "lfs migrate" on a 2.10 client to fix the layout. Depending on what "lfs getstripe -v" reports for such files (e.g. strange lcme_flags) it may be possible to use something like "lfs find /mnt/lustre --comp-count +1 --comp-flags=stale,prefer,offline" to find these files on a 2.10 client directly. Depending on how many files the lfs find operation locates, it may well be faster to migrate them to clear the flags rather than waiting for a code fix to be developed, tested, and be installed on your system.

            Jian,
            for future usage, it would be desirable for "lfs getstripe" to also print out unknown flags in hex form after it has printed all of the known flags, like "init,prefer,0x18c40" so that we have some forward compatibility when new flags are added. Similarly, "lfs find" should be able to search for flags by hex value in addition to named flags for the same reason. That would allow something like "lfs find --comp-flags 0x7fffffe0 ..." to locate any files with flags that we don't currently have assigned. It might be desirable to allow a modified master "lfs --component-set" to clear some the offending flags directly from the client without doing the migration, but that is not possible for all of the flags (e.g. stale at least). We might consider to allow clearing the stale flag from a component if all of the init'd components in the file are stale.

            adilger Andreas Dilger added a comment - If the problem relates to FLR functionality added in 2.11 as indicated in LU-10437 , it is possible that running a layout LFSCK on the MDT would detect and correct this problem. However, the FLR support for LFSCK was only recently landed (commit v2_11_53_0-33-g36ba989 patch https://review.whamcloud.com/32705 " LU-10288 lfsck: layout LFSCK for mirrored file ") so that functionality is not available in the MDS version you are using, nor in any released version to date. My recommendation would be to find the inaccessible files with a 2.11 client, and then use " lfs migrate " on a 2.10 client to fix the layout. Depending on what " lfs getstripe -v " reports for such files (e.g. strange lcme_flags ) it may be possible to use something like " lfs find /mnt/lustre --comp-count +1 --comp-flags=stale,prefer,offline " to find these files on a 2.10 client directly. Depending on how many files the lfs find operation locates, it may well be faster to migrate them to clear the flags rather than waiting for a code fix to be developed, tested, and be installed on your system. Jian, for future usage, it would be desirable for " lfs getstripe " to also print out unknown flags in hex form after it has printed all of the known flags, like " init,prefer,0x18c40 " so that we have some forward compatibility when new flags are added. Similarly, " lfs find " should be able to search for flags by hex value in addition to named flags for the same reason. That would allow something like " lfs find --comp-flags 0x7fffffe0 ... " to locate any files with flags that we don't currently have assigned. It might be desirable to allow a modified master " lfs --component-set " to clear some the offending flags directly from the client without doing the migration, but that is not possible for all of the flags (e.g. stale at least). We might consider to allow clearing the stale flag from a component if all of the init 'd components in the file are stale.

            here is what we get in the logs.

            [588862.457231] LustreError: 96988:0:(lcommon_cl.c:187:cl_file_inode_init()) Failure to initialize cl object [0x200000bd6:0xc7d1:0x0]: -22
            [588862.457245] LustreError: 96988:0:(llite_lib.c:2357:ll_prep_inode()) new_inode -fatal: rc -22
            
            mhanafi Mahmoud Hanafi added a comment - here is what we get in the logs. [588862.457231] LustreError: 96988:0:(lcommon_cl.c:187:cl_file_inode_init()) Failure to initialize cl object [0x200000bd6:0xc7d1:0x0]: -22 [588862.457245] LustreError: 96988:0:(llite_lib.c:2357:ll_prep_inode()) new_inode -fatal: rc -22

            People

              yujian Jian Yu
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: