[LU-12580] usercopy exposure attempt detected in LL_IOC_LOV_GETSTRIPE ioctl Created: 23/Jul/19  Updated: 10/Jan/21  Resolved: 24/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Upstream
Fix Version/s: Lustre 2.14.0, Lustre 2.12.5

Type: Bug Priority: Critical
Reporter: Marco Grossi Assignee: Marco Grossi
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.6.1810, starting with kernel 3.10.0-957


Attachments: Text File kernel_trace.txt     File reproducer.c    
Issue Links:
Duplicate
is duplicated by LU-13596 usercopy: kernel memory exposure atte... Resolved
Related
is related to LU-14316 lfs: using old ioctl(LL_IOC_LOV_GETST... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Software like darshan are still using the old `ioctl` way to gather striping info for a file.

The kernel BUG is easily triggered on a not-PFL striped file, using:

lum->lmm_magic = LOV_USER_MAGIC;
lum->lmm_stripe_count = LOV_MAX_STRIPE_COUNT;

Any `lmm_stripe_count` greater than the actual file's stripe count will trigger the bug.

Kernel side the issue appears to be in `lov_getstripe`: with a positive `lum_size`(line 409), `lmm_size` is set as `lum_size`(line 442) even if `lmm_magic != LOV_MAGIC_COMP_V1`(line 414), while instead the structure is just as big as `lmmk_size`:

404         if (lum.lmm_magic == LOV_USER_MAGIC_V1 ||
405             lum.lmm_magic == LOV_USER_MAGIC_V3)
406                 lum_size = lov_user_md_size(lum.lmm_stripe_count,
407                                             lum.lmm_magic);
408 
409         if (lum_size != 0) {
410                 struct lov_mds_md *comp_md = lmmk;
411 
412                 /* Legacy app (ADIO for instance) treats the layout as V1/V3
413                  * blindly, we'd return a reasonable V1/V3 for them. */
414                 if (lmmk->lmm_magic == LOV_MAGIC_COMP_V1) {
[...]
439                 }
440 
441                 lmm = comp_md;
442                 lmm_size = lum_size;
443         } else {
444                 lmm = lmmk;
445                 lmm_size = lmmk_size;
446         }
447         /**
448          * User specified limited buffer size, usually the buffer is
449          * from ll_lov_setstripe(), and the buffer can only hold basic
450          * layout template info.
451          */
452         if (size == 0 || size > lmm_size)
453                 size = lmm_size;
454         if (copy_to_user(lump, lmm, size))
455                 GOTO(out_free, rc = -EFAULT);

Please find as attachment the kernel trace and a reproducer, to be invoked as:

$ gcc reproducer.c -o reproducer -W -Wall --pedantic
$ ./reproducer <file_path>


 Comments   
Comment by Patrick Farrell (Inactive) [ 23/Jul/19 ]

Marco,

It looks like you've worked this out and have a suggested change - are you planning to submit a patch for this?

Comment by Marco Grossi [ 23/Jul/19 ]

Hi Patrick,

Yes I've a patch in mind. I still need to double-check the correctness of a progressive layout case.

I'll check this in the morning, then update the issue.

Kinda newbie here for patch submission, so might need some direction.

Comment by Peter Jones [ 23/Jul/19 ]

mgrossi details of the patch submission steps are at http://wiki.lustre.org/Using_Gerrit . Don't be shy to ask for help.

Comment by Patrick Farrell (Inactive) [ 23/Jul/19 ]

Great, thanks Marco - Like Peter said, don't be shy about asking for help.

Comment by Marco Grossi [ 24/Jul/19 ]

I've few questions about the expected behavior of this ioctl LL_IOC_LOV_GETSTRIPE

`lov_getstripe` is currently setting `lmm_layout_gen`, via `lov_lsm_pack`, but looking at the variety of codes using this ioctl, like ADIO and darshan, they are mainly expecting to get a value for `lmm_stripe_offset` instead. 

The comments associated to "lov_user_md_v1" and "v3" are as following:

__u16 lmm_stripe_offset; /* starting stripe offset in
                          * lmm_objects, use when writing */
__u16 lmm_layout_gen;    /* layout generation number
                          * used when reading */

Can you clarify which one is to return?

Considering a PFL file, at the moment the ioctl always returns 0 for both object id and sequential; also offset is set to -1 for a not instantiated file, while it turns to 0 on first write. What should be returned here?

On a side note, man for "llapi_file_get_stripe.3" still reports old data structure. Should I open an issue for this?

Comment by Marco Grossi [ 24/Jul/19 ]

Forgot to ask last bit: currently is returning -ENODATA on a directory. Is this expected?

Comment by Patrick Farrell (Inactive) [ 24/Jul/19 ]

Those comments are... strange.  They don't reflect on what you should be returning, they're about internal use within the fs.  (I'm also not sure they're fully correct.)

So layout_gen is generally only of interest to tools which are manipulating the layout directly - Not just doing getstripe/setstripe.  It's incremented when the layout changes, like in FLR or when a PFL component is first written to (and is instantiated).

Stripe offset - if set - gives the offset that will be used for the first stripe of an object.  It is "-1" to indicate no preference has been indicated.  Once a component is instantiated, it is set to the first stripe of that component.  (Not 0 - I just tested to confirm, and it's zero for me only if the first stripe is on OST0.)

Considering a PFL file, at the moment the ioctl always returns 0 for both object id and sequential; also offset is set to -1 for a not instantiated file, while it turns to 0 on first write. What should be returned here? 

I can't quite parse this sentence.  For a PFL file, 0 is in which field, of which struct?  What do you mean by "sequential" - are you referring to a field (so, which one, in which struct)?

The offset behavior sounds correct to me: -1 until instantiated, because no offset has been picked, then the offset of the first stripe.

FWIW, looking at stripe offset is kind of odd.  Applications should really be concerned with the offset of all of the stripes, not just the first one for each component.  The stripes are not necessarily sequential by OST or anything, so the offset isn't enough info by itself.  It's really only interesting when you're looking at an uninstantiated layout and you want to see if a default is set.

Comment by Patrick Farrell (Inactive) [ 24/Jul/19 ]

You asked two other things, let's see...

Please do open an issue for the man page, or if you feel it's closely related, that's a minor enough change it could be done under this LU as a cleanup.  (Preferably in a separate patch, but still using this LU.)

I haven't verified this, but I suspect you are getting ENODATA because there is no default layout set on the directory.

Try setting a default layout on the directory:
lfs setstripe -E 10M -E 100M -E -1 testdir

Then calling the ioctl on it.

Comment by Marco Grossi [ 24/Jul/19 ]

Sorry for the confusion; will start by clarifying the first question.

Considering the "struct lov_user_md" passed to the ioctl invocation, it can be either "lov_user_md_v1" or "lov_user_md_v3":

#define lov_user_md lov_user_md_v1

struct lov_user_md_v1 {
        __u32 lmm_magic;
        __u32 lmm_pattern;
        struct ost_id lmm_oi;
        __u32 lmm_stripe_size;
        __u16 lmm_stripe_count;
        union {
                __u16 lmm_stripe_offset;
                __u16 lmm_layout_gen;
        };
        struct lov_user_ost_data_v1 lmm_objects[0];
} __attribute__((packed,  __may_alias__));

struct lov_user_md_v3 {
        __u32 lmm_magic;
        __u32 lmm_pattern;
        struct ost_id lmm_oi;
        __u32 lmm_stripe_size;
        __u16 lmm_stripe_count;
        union {
                __u16 lmm_stripe_offset;
                __u16 lmm_layout_gen;
        };
        char  lmm_pool_name[LOV_MAXPOOLNAME + 1];
        struct lov_user_ost_data_v1 lmm_objects[0];
} __attribute__((packed));

Given that "lmm_stripe_offset" and "lmm_layout_gen" share a union, only one can be returned.

Now, considering a case of "lsm_magic" either of LOV_MAGIC_V1 or LOV_MAGIC_V3, "lov_lsm_pack_v1v3" is the one invoked. In this function, "lmm_layout_gen" is the one chosen to be returned:

lmmv1->lmm_layout_gen = cpu_to_le16(lsm->lsm_layout_gen);

The fact is that majority of codes are expecting to get "lmm_stripe_offset", like ADIO:

    lumlen = sizeof(struct lov_user_md) +
            MAX_LOV_UUID_COUNT * sizeof(struct lov_user_ost_data);
    lum = (struct lov_user_md *)ADIOI_Calloc(1,lumlen);

[...]

    memset(lum, 0, lumlen);
    lum->lmm_magic = LOV_USER_MAGIC;
    err = ioctl(fd->fd_sys, LL_IOC_LOV_GETSTRIPE, (void *)lum);
    if (!err) {

[...]

        fd->hints->fs_hints.lustre.start_iodevice = lum->lmm_stripe_offset;
        sprintf(value, "%d", lum->lmm_stripe_offset);
        ADIOI_Info_set(fd->info, "romio_lustre_start_iodevice", value);

Semantically, which one between "lmm_stripe_offset" and "lmm_layout_gen" should be the correct one to return?

Comment by Gerrit Updater [ 07/Feb/20 ]

Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/37469
Subject: LU-12580 lov: fix out of bound usercopy
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 93df0eba99fbc22af41d11003ba3564394979b29

Comment by Gerrit Updater [ 10/Feb/20 ]

Li Dongyang (dongyangli@ddn.com) uploaded a new patch: https://review.whamcloud.com/37493
Subject: LU-12580 lov: fix typo in lov_comp_md_size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a65fa5d21bb8776735f8d7cf3516aed0db94e64c

Comment by Gerrit Updater [ 24/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37493/
Subject: LU-12580 lov: fix typo in lov_comp_md_size
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d41716533682ed88b8a77654f9b5b050ef5c672c

Comment by Gerrit Updater [ 24/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37469/
Subject: LU-12580 lov: fix out of bound usercopy
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2f1beb33144523467b596f4b6fab882b0a839187

Comment by Peter Jones [ 24/Mar/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 25/Mar/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38050
Subject: LU-12580 lov: fix typo in lov_comp_md_size
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 865177bca926efd4f69d4dcaa74624054a2a5aaf

Comment by Gerrit Updater [ 25/Mar/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38051
Subject: LU-12580 lov: fix out of bound usercopy
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: b47ff3cd33984e72fc2f59032ad8147074628d28

Comment by Gerrit Updater [ 06/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38050/
Subject: LU-12580 lov: fix typo in lov_comp_md_size
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 13dc723aea39c60d8ed128a312fba1520921c792

Comment by Gerrit Updater [ 06/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38051/
Subject: LU-12580 lov: fix out of bound usercopy
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: d4dd52a3a1dea9e6117512889837e245fb983556

Generated at Sat Feb 10 02:53:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.