Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5145

kernel slab data memory coruption on MDS due to a file with truncated link extended attribute

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.4.3
    • None
    • 3
    • 14196

    Description

      Hi,

      CEA had 3 consecutive crashes of MDS server due to a corruption of kernel slab, and these crashes were concomitant to Robinhood startup.

      In fact in the Lustre Changelogs we can see a line like:

      4281746500 01CREAT 17:47:33.690285184 2014.03.25 0x0 t=[0x22cb19e89:0x1f9c9:0x0] p=[0x22cb19e89:0x1f9c8:0x0] GRANDEURS_CENTREES
      

      And this is because Robinhood tried to process this Changelog entry that we crashed the MDS. Indeed, looking at the MDT with debugfs, we found out that this file was under the /OBJECTS directory (it used to be a regular file, that was moved here by Lustre for an unkown reason), and that its link EA was containing zero fid (link = "df f1 ea 11 00 00 00 00 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 " (24)).

      This issue can be reproduced by doing the following on a Lustre filesystem originally formatted with Lustre 2.1, and upgraded to Lustre 2.4:

      • on a Lustre client :
      1. touch <lustre dir>/file
      2. lfs path2fid <lustre dir>/file
      • stop the file system and mount the MDT with ldiskfs
      • move the file from <mdt ldiskfs>/ROOT/file to <mdt ldiskfs>/OBJECTS directory
      • with setfattr change the link EA to remove all links :
      1. setfattr -n trusted.link -v 0xdff1ea110000000018000000000000000000000000000000 <mdt ldiskfs>/OBJECTS/file
      • umount ldiskfs MDT and restart the file system
      • on a Lustre client :
      1. lfs fid2path <lustre dir> <fid obtained at first step>
      • the MDS server crashes

      The crash can be avoided with the patch from LU-3474 at http://review.whamcloud.com/10464 .
      But the scenario at CEA is worst that the error case found by Andreas (lfs fid2path on old IGIF FIDs) because here it crashes the MDS.

      So at first we would need this patch to be landed to b2_4, or at least to know if this patch is suitable for use in production with Lustre 2.4.3.

      Secondly, given that a file system at CEA has more than 5,000 files in the OBJECTS directory, we are wondering:
      a) why some regular files are moved to OBJECTS directory? what is the Lustre mechanism leading to this?
      b) why link EA contains zero fid when files are moved to OBJECTS directory?
      c) why moving files to OBJECTS directory generates an entry in Lustre Changelogs?

      TIA,
      Sebastien.

      Attachments

        Issue Links

          Activity

            [LU-5145] kernel slab data memory coruption on MDS due to a file with truncated link extended attribute

            Fixed with patch https://review.whamcloud.com/6772 in 2.5.0 and 2.4.1.

            adilger Andreas Dilger added a comment - Fixed with patch https://review.whamcloud.com/6772 in 2.5.0 and 2.4.1.

            I do this -ENOENT return because in this case, I want to be sure that
            the return will be not -EAGAIN, maybe that is not a good idea, My inspiration
            come from the mdt_object_find() call on mdt_fid2path() function :

            "lustre/mdt/mdt_handler.c" :
            5814         obj = mdt_object_find(info->mti_env, mdt, &fp->gf_fid);
            5815         if (obj == NULL || IS_ERR(obj)) {
            5816                 CDEBUG(D_IOCTL, "no object "DFID": %ld\n", PFID(&fp->gf_fid),
            5817                        PTR_ERR(obj));
            5818                 RETURN(-EINVAL);
            5819         }
             

            You can remove this part of my patch ...

            Did you test a fid2path on a file modified by my proposal command the 06/Jun/14 9:18 ?
            in that case the lee_reclen is zero ... and in this case the only way to protect the strcpy
            is to verify ln_namelen is no equal to zero

            You miss this part of my patch that also could useful

            @@ -5701,7 +5701,7 @@ static int mdt_path_current(struct mdt_t
                            rc = mdt_links_read(info, mdt_obj, &ldata);
                            mdt_object_put(info->mti_env, mdt_obj);
                            if (rc != 0)
            -                       GOTO(out, rc = PTR_ERR(buf));
            +                       GOTO(out, rc);
             
            apercher Antoine Percher added a comment - I do this -ENOENT return because in this case, I want to be sure that the return will be not -EAGAIN, maybe that is not a good idea, My inspiration come from the mdt_object_find() call on mdt_fid2path() function : "lustre/mdt/mdt_handler.c" : 5814 obj = mdt_object_find(info->mti_env, mdt, &fp->gf_fid); 5815 if (obj == NULL || IS_ERR(obj)) { 5816 CDEBUG(D_IOCTL, "no object "DFID": %ld\n", PFID(&fp->gf_fid), 5817 PTR_ERR(obj)); 5818 RETURN(-EINVAL); 5819 } You can remove this part of my patch ... Did you test a fid2path on a file modified by my proposal command the 06/Jun/14 9:18 ? in that case the lee_reclen is zero ... and in this case the only way to protect the strcpy is to verify ln_namelen is no equal to zero You miss this part of my patch that also could useful @@ -5701,7 +5701,7 @@ static int mdt_path_current(struct mdt_t rc = mdt_links_read(info, mdt_obj, &ldata); mdt_object_put(info->mti_env, mdt_obj); if (rc != 0) - GOTO(out, rc = PTR_ERR(buf)); + GOTO(out, rc);

            Hello Antoine and Sebastien, I spent some time to review the enhanced patch version (provided at the end of trace_debug_lascaux110_corrupt_slab_mdt_get_info.txt attachment for this ticket) from Antoine vs the current one for LU-3474 (ie, my patch at http://review.whamcloud.com/6772).

            Basically and as far as I understand, Antoine's version has 2 additional hunks :

            @@ -5688,7 +5688,7 @@ static int mdt_path_current(struct mdt_t
             		mdt_obj = mdt_object_find(info->mti_env, mdt,
             					  &pli->pli_fids[pli->pli_fidcount]);
             		if (IS_ERR(mdt_obj))
            -			GOTO(out, rc = PTR_ERR(mdt_obj));
            +			GOTO(out, rc = -ENOENT);
             		if (mdt_object_remote(mdt_obj)) {
             			mdt_object_put(info->mti_env, mdt_obj);
             			GOTO(remote_out, rc = -EREMOTE);
            

            and

            @@ -5726,6 +5726,8 @@ static int mdt_path_current(struct mdt_t
             		ptr -= tmpname->ln_namelen;
             		if (ptr - 1 <= pli->pli_path)
             			GOTO(out, rc = -EOVERFLOW);
            +               if (tmpname->ln_namelen <= 0)
            +                        GOTO(out, rc = -ENOENT);
             		strncpy(ptr, tmpname->ln_name, tmpname->ln_namelen);
             		*(--ptr) = '/';
             
            

            both in mdt_path_current()/mdt_handler.c routine/file.

            My opinion, and you 2 may want to discuss it further, is that with the common patch part for our 2 changes, which now enables correct error from mdt_links_read()/linkea_init() return+handling, the second additional hunk is not needed since due to leh_reccount being 0 in a link-EA with no parent-fid, linkea_init() will return ENODATA that will now be correctly forwarded by mdt_links_read() that will be handled in mdt_path_current() to not continue further+wrongly in trying to decode non-existing Link-EA entry, which has leaded to the unexpected behavior you have encountered (memory corruption due to a negative length passed to strcpy() when trying to extract parent-FID from un-exiting Link-EA entry).

            Concerning the 1st additional hunk, and we may already have discussed about it off-line ..., but I don't understand the reason why you want to unconditionally return -ENOENT upon mdt_object_find() error return.

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Antoine and Sebastien, I spent some time to review the enhanced patch version (provided at the end of trace_debug_lascaux110_corrupt_slab_mdt_get_info.txt attachment for this ticket) from Antoine vs the current one for LU-3474 (ie, my patch at http://review.whamcloud.com/6772 ). Basically and as far as I understand, Antoine's version has 2 additional hunks : @@ -5688,7 +5688,7 @@ static int mdt_path_current(struct mdt_t mdt_obj = mdt_object_find(info->mti_env, mdt, &pli->pli_fids[pli->pli_fidcount]); if (IS_ERR(mdt_obj)) - GOTO(out, rc = PTR_ERR(mdt_obj)); + GOTO(out, rc = -ENOENT); if (mdt_object_remote(mdt_obj)) { mdt_object_put(info->mti_env, mdt_obj); GOTO(remote_out, rc = -EREMOTE); and @@ -5726,6 +5726,8 @@ static int mdt_path_current(struct mdt_t ptr -= tmpname->ln_namelen; if (ptr - 1 <= pli->pli_path) GOTO(out, rc = -EOVERFLOW); + if (tmpname->ln_namelen <= 0) + GOTO(out, rc = -ENOENT); strncpy(ptr, tmpname->ln_name, tmpname->ln_namelen); *(--ptr) = '/'; both in mdt_path_current()/mdt_handler.c routine/file. My opinion, and you 2 may want to discuss it further, is that with the common patch part for our 2 changes, which now enables correct error from mdt_links_read()/linkea_init() return+handling, the second additional hunk is not needed since due to leh_reccount being 0 in a link-EA with no parent-fid, linkea_init() will return ENODATA that will now be correctly forwarded by mdt_links_read() that will be handled in mdt_path_current() to not continue further+wrongly in trying to decode non-existing Link-EA entry, which has leaded to the unexpected behavior you have encountered (memory corruption due to a negative length passed to strcpy() when trying to extract parent-FID from un-exiting Link-EA entry). Concerning the 1st additional hunk, and we may already have discussed about it off-line ..., but I don't understand the reason why you want to unconditionally return -ENOENT upon mdt_object_find() error return.

            Hi Bruno,

            Did you have the opportunity to look at the enhanced fix proposal from Antoine?

            Cheers,
            Sebastien.

            sebastien.buisson Sebastien Buisson (Inactive) added a comment - Hi Bruno, Did you have the opportunity to look at the enhanced fix proposal from Antoine? Cheers, Sebastien.

            OI scrub will only update the OI mapping files, it may also move the orphans from /lost+found to /O/xxxx/ for OST device. It will not change other directory structure.

            yong.fan nasf (Inactive) added a comment - OI scrub will only update the OI mapping files, it may also move the orphans from /lost+found to /O/xxxx/ for OST device. It will not change other directory structure.

            Sorry for the delay

            add debugfs trace for OBJECTS and PENDING stat information in debugfs.pdf

            answer about lustre downgrade : No, we never downgrade own lustre distribution
            between 2.4 and 2.&. the fs was created with
            lustre 2.[01] in Marsh 2012 and after wa have upgrade in 2.4.2
            in Marsh 17th and after we have upgraded in 2.4.3.

            The crtime of all user file in /OBJECTS have a date between Mars 19th
            and April 6th. Around April 6th we decide to stop OI_scrub. Is it
            possible that OI-scrub could put file on /OBJECTS directories ?

            apercher Antoine Percher added a comment - Sorry for the delay add debugfs trace for OBJECTS and PENDING stat information in debugfs.pdf answer about lustre downgrade : No, we never downgrade own lustre distribution between 2.4 and 2.&. the fs was created with lustre 2. [01] in Marsh 2012 and after wa have upgrade in 2.4.2 in Marsh 17th and after we have upgraded in 2.4.3. The crtime of all user file in /OBJECTS have a date between Mars 19th and April 6th. Around April 6th we decide to stop OI_scrub. Is it possible that OI-scrub could put file on /OBJECTS directories ?

            From the file name format, it seems more like that those files have been "created" under b2_1, but not been "moved" to /OBJECTS. Have you ever downgraded to b2_1 after upgraded to b2_4?

            Generally, the files under /OBJECTS should not be visible to client, and they should not be found in the Changelog, then whether it has valid linkEA will not affect the Robinhood. So would you please to verify whether those files under the /OBJECTS are users' normal files or not?

            yong.fan nasf (Inactive) added a comment - From the file name format, it seems more like that those files have been "created" under b2_1, but not been "moved" to /OBJECTS. Have you ever downgraded to b2_1 after upgraded to b2_4? Generally, the files under /OBJECTS should not be visible to client, and they should not be found in the Changelog, then whether it has valid linkEA will not affect the Robinhood. So would you please to verify whether those files under the /OBJECTS are users' normal files or not?

            Bruno, Here is a badly link example that could reproduce the corrupt with the patch 10464

            setfattr -n trusted.link -v 0xdff1ea11000000002f00000000000000000000000000000000000000000200002b10000000010000000066696c6531 <mdt ldiskfs>/OBJECTS/file
            

            as you know, never trust data on disk

            apercher Antoine Percher added a comment - Bruno, Here is a badly link example that could reproduce the corrupt with the patch 10464 setfattr -n trusted.link -v 0xdff1ea11000000002f00000000000000000000000000000000000000000200002b10000000010000000066696c6531 <mdt ldiskfs>/OBJECTS/file as you know, never trust data on disk

            Antoine, thanks for your help+work on this !!
            Also, I will have a look to your enhanced fix proposal.

            bfaccini Bruno Faccini (Inactive) added a comment - Antoine, thanks for your help+work on this !! Also, I will have a look to your enhanced fix proposal.

            I will check, I just can add that all files name in OBJECTS has
            the format <hex inode file number>:<hex gen number of ?>
            and some of them have empty link attr but some other have
            a good link attribute with good informations but each time I check
            the link parent fid does not exist anymore. and some other was symlink files.

            apercher Antoine Percher added a comment - I will check, I just can add that all files name in OBJECTS has the format <hex inode file number>:<hex gen number of ?> and some of them have empty link attr but some other have a good link attribute with good informations but each time I check the link parent fid does not exist anymore. and some other was symlink files.

            People

              bfaccini Bruno Faccini (Inactive)
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: