Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4958

do not crash accessing LOV object with FID {0, 0}

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.1.6, Lustre 2.5.1, Lustre 2.4.3
    • 3
    • 13718

    Description

      If an orphan object with stripe_index != 0 is linked to a recreated MDS inode in http://review.whamcloud.com/7810, but not all of the objects are present (e.g. some of the stripes of that file were lost, but a non-zero stripe_index orphan remained) the client will crash if the file is accessed (e.g. "ls -l"):

      LustreError: 19393:0:(ldlm_resource.c:1077:ldlm_resource_get()) ASSERTION( name->name[0] != 0 ) failed: 
      LustreError: 19393:0:(ldlm_resource.c:1077:ldlm_resource_get()) LBUG
      Pid: 19393, comm: ls
      
      Call Trace:
       [<ffffffffa0ef9895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0ef9e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa07c4f20>] ldlm_resource_get+0x700/0x900 [ptlrpc]
       [<ffffffffa07bf1b9>] ldlm_lock_create+0x59/0xcc0 [ptlrpc]
       [<ffffffffa07d8314>] ldlm_cli_enqueue+0xa4/0x790 [ptlrpc]
       [<ffffffffa09ebd44>] osc_enqueue_base+0x1e4/0x5b0 [osc]
       [<ffffffffa0a082fd>] osc_lock_enqueue+0x1ed/0x8c0 [osc]
       [<ffffffffa105be7c>] cl_enqueue_try+0xfc/0x300 [obdclass]
       [<ffffffffa0a5d42a>] lov_lock_enqueue+0x22a/0x850 [lov]
       [<ffffffffa105be7c>] cl_enqueue_try+0xfc/0x300 [obdclass]
       [<ffffffffa105d0cf>] cl_enqueue_locked+0x6f/0x1f0 [obdclass]
       [<ffffffffa105dd1e>] cl_lock_request+0x7e/0x270 [obdclass]
       [<ffffffffa123dba0>] cl_glimpse_lock+0x180/0x490 [lustre]
       [<ffffffffa123e415>] cl_glimpse_size0+0x1a5/0x1d0 [lustre]
       [<ffffffffa11eb55d>] ll_inode_revalidate_it+0x1cd/0x660 [lustre]
       [<ffffffffa11eba3a>] ll_getattr_it+0x4a/0x1b0 [lustre]
       [<ffffffffa11ebbd7>] ll_getattr+0x37/0x40 [lustre]
       [<ffffffff81186db1>] vfs_getattr+0x51/0x80
       [<ffffffff81186e40>] vfs_fstatat+0x60/0x80
       [<ffffffff81186ece>] vfs_lstat+0x1e/0x20
       [<ffffffff81186ef4>] sys_newlstat+0x24/0x50
      

      I'm not sure what the right way to handle this is, since this would affect all old clients trying to access files in .lustre/lost+found so fixing just the 2.6 client is not enough. Either we need to backport the fix to 2.5.2 and 2.4.3 and 2.1.7 clients (not very good, since we aren't sure if the client has the fix), or use some other lmm_magic or lmm_pattern to ensure that unpatched clients will not understand it.

      In the second case (using a different lmm_magic or lmm_pattern, maybe LOV_PATTERN_F_SPARSE?) the lfsck_layout_extend_lovea() code would need to decide as stripes are added if the layout is sparse (set the flag, old clients cannot access) or if it is full (clear the flag, old clients can access).

      Attachments

        Issue Links

          Activity

            [LU-4958] do not crash accessing LOV object with FID {0, 0}
            pjones Peter Jones added a comment -

            Landed for 2.5.4 and 2.7

            pjones Peter Jones added a comment - Landed for 2.5.4 and 2.7

            Probably no need to port this to those older branches.

            adilger Andreas Dilger added a comment - Probably no need to port this to those older branches.
            ys Yang Sheng added a comment -

            Does this patch need port to b2_4? and b2_1?

            ys Yang Sheng added a comment - Does this patch need port to b2_4? and b2_1?

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12740/
            Subject: LU-4958 lov: don't crash accessing LOV object with FID

            {0,0}

            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set:
            Commit: 35046f30b35696b5644328094fc470ba6ccfe71b

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12740/ Subject: LU-4958 lov: don't crash accessing LOV object with FID {0,0} Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: 35046f30b35696b5644328094fc470ba6ccfe71b

            Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/12742
            Subject: LU-4958 lov: don't crash accessing LOV object with FID

            {0,0}

            Project: fs/lustre-release
            Branch: b2_4
            Current Patch Set: 1
            Commit: e6ffce296bec44015f2d44a741044096229eab9e

            gerrit Gerrit Updater added a comment - Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/12742 Subject: LU-4958 lov: don't crash accessing LOV object with FID {0,0} Project: fs/lustre-release Branch: b2_4 Current Patch Set: 1 Commit: e6ffce296bec44015f2d44a741044096229eab9e

            Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/12740
            Subject: LU-4958 lov: don't crash accessing LOV object with FID

            {0,0}

            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: b182cd79e47deca83be7d25f26d908f92192202a

            gerrit Gerrit Updater added a comment - Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/12740 Subject: LU-4958 lov: don't crash accessing LOV object with FID {0,0} Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: b182cd79e47deca83be7d25f26d908f92192202a

            YangSheng,
            Could you have a look at this one?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - YangSheng, Could you have a look at this one? Thank you!

            See if we ca reassign this bug to someone else.

            adilger Andreas Dilger added a comment - See if we ca reassign this bug to someone else.

            Yes, this bug should still be fixed on the older clients because there may still be corrupt LOV EA that has FID

            {0,0}

            even if it was not created by LFSCK. Clients should not crash from data sent by the network.

            adilger Andreas Dilger added a comment - Yes, this bug should still be fixed on the older clients because there may still be corrupt LOV EA that has FID {0,0} even if it was not created by LFSCK. Clients should not crash from data sent by the network.

            With the patch http://review.whamcloud.com/#/c/10042/ to be landed on master, when old client talks with Lustre-2.6 server, the MDS will reply failure to the old client directly if the LOV EA contains hole. So it will NOT crash the old client. So do we still need to fix client-side code to operate the file with LOV EA hole?

            yong.fan nasf (Inactive) added a comment - With the patch http://review.whamcloud.com/#/c/10042/ to be landed on master, when old client talks with Lustre-2.6 server, the MDS will reply failure to the old client directly if the LOV EA contains hole. So it will NOT crash the old client. So do we still need to fix client-side code to operate the file with LOV EA hole?

            People

              ys Yang Sheng
              yong.fan nasf (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: