[LU-13119] lustre-initialization crashed in common_file_perm() on SLES12 Created: 09/Jan/20  Updated: 23/Jan/20  Resolved: 23/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11832 ARM servers crashing on MDS startup Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f8038c56-3208-11ea-adca-52540065bddc

lustre-initialization failed with the following error:

'trevis-42vm12 crashed during lustre-initialization-1'

The stack trace on the MDS looks like:

LDISKFS-fs (dm-4): mounted filesystem with ordered data mode.
Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffff812d5995>] common_file_perm+0x15/0x180
Oops: 0000 [#1] SMP 
Supported: No, Unsupported modules are loaded
CPU: 0 PID: 2995 Comm: mount.lustre Tainted: 4.4.180-94.100_lustre
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
security_file_permission+0x3e/0xc0
iterate_dir+0x32/0x110
osd_ios_general_scan+0x12e/0x250 [osd_ldiskfs]
osd_initial_OI_scrub+0x5e/0xc00 [osd_ldiskfs]
osd_scrub_setup+0x8f5/0x960 [osd_ldiskfs]
osd_device_alloc+0x5ac/0x8c0 [osd_ldiskfs]
obd_setup+0xb8/0x230 [obdclass]
class_setup+0x468/0x7c0 [obdclass]
class_process_config+0x1890/0x27b0 [obdclass]
do_lcfg+0x235/0x490 [obdclass]
lustre_start_simple+0x85/0x1f0 [obdclass]
server_fill_super+0xe81/0x1640 [obdclass]
lustre_fill_super+0x436/0x8d0 [obdclass]
mount_nodev+0x48/0xa0
mount_fs+0x3a/0x170
vfs_kern_mount+0x62/0x110
do_mount+0x213/0xcd0
SyS_mount+0x85/0xd0

It could be that this is related to iterate_dir() taking a fake filp as an argument, and somehow filp is not filled in sufficiently for security_file_permission()->common_file_perm().

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
lustre-initialization lustre-initialization - 'trevis-42vm12 crashed during lustre-initialization-1'



 Comments   
Comment by Andreas Dilger [ 09/Jan/20 ]

It looks like the use of iterate_dir() was introduced by patch https://review.whamcloud.com/34714 "LU-11832 ldiskfs: properly handle VFS parallel locking".

Comment by James A Simmons [ 09/Jan/20 ]

Which SLES is this?

Comment by Andreas Dilger [ 09/Jan/20 ]

According to the data on the Maloo page it is SLES12.3.

Comment by James A Simmons [ 09/Jan/20 ]

SUSE is using app amour which has different requirements. osd-ldiskfs open codes struct file creating instead of using 

alloc_file_pseudo() so bits are missed. Looking at common_file_perm() in the SUSE kernel code it expects 

file->f_cred to set. Eventually I like to move to alloc_file_pseudo() but that is a bit tricky in the way struct file data structures are handling in ldiskfs as scratch areas.

Comment by Gerrit Updater [ 10/Jan/20 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/37184
Subject: LU-13119 osd-ldiskfs: set f_cred for app armour
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 82d5ab9c64a290791be79b83ff006c926ddc0311

Comment by Andreas Dilger [ 11/Jan/20 ]

It seems that the SLES kernel is trying to apply AppArmor security policies to the Lustre filesystem (not sure why, since we , since common_file_perm() only exists in security/apparmor/lsm.c:

static int common_file_perm(int op, struct file *file, u32 mask)
{
        struct aa_file_cxt *fcxt = file->f_security;
        struct aa_profile *profile, *fprofile = aa_cred_profile(file->f_cred);
        int error = 0;

        BUG_ON(!fprofile);

        if (!file->f_path.mnt ||
            !mediated_filesystem(file->f_path.dentry))
                return 0;

        profile = __aa_current_profile();

        /* revalidate access, if task is unconfined, or the cached cred
         * doesn't match or if the request is for more permissions than
         * was granted.
         *
         * Note: the test for !unconfined(fprofile) is to handle file
         *       delegation from unconfined tasks
         */
        if (!unconfined(profile) && !unconfined(fprofile) &&
            ((fprofile != profile) || (mask & ~fcxt->allow)))
                error = aa_file_perm(op, profile, file, mask);

        return error;
}


static int apparmor_file_permission(struct file *file, int mask)
{
        return common_file_perm(OP_FPERM, file, mask);
}

int iterate_dir(struct file *file, struct dir_context *ctx)
{               
        struct inode *inode = file_inode(file);
        bool shared = false;
        int res = -ENOTDIR;
        if (file->f_op->iterate_shared)
                shared = true;
        else if (!file->f_op->iterate)
                goto out;

        res = security_file_permission(file, MAY_READ);
        :
        :
}

osd_ios_general_scan(struct osd_thread_info *info, struct osd_device *dev,
                     struct dentry *dentry, filldir_t filldir)
{
        struct file                  *filp  = &info->oti_file;
        struct inode                 *inode = dentry->d_inode;
        const struct file_operations *fops  = inode->i_fop;

        filp->f_pos = 0;
        filp->f_path.dentry = dentry;
        filp->f_flags |= O_NOATIME;
        filp->f_mode = FMODE_64BITHASH | FMODE_NONOTIFY;
        filp->f_mapping = inode->i_mapping;
        filp->f_op = fops;
        filp->private_data = NULL;
        set_file_inode(filp, inode);
        rc = osd_security_file_alloc(filp);
        if (rc)
                RETURN(rc);        do {
                buf.oifb_items = 0;
                rc = iterate_dir(filp, &buf.ctx);
        } while (rc >= 0 && buf.oifb_items > 0 &&
Comment by Andreas Dilger [ 11/Jan/20 ]

Sigh. I didn't commit my comment until now...

Comment by Arshad Hussain [ 19/Jan/20 ]

Seen on Master. https://testing.whamcloud.com/test_sets/aedd7978-3a88-11ea-80b4-52540065bddc

Comment by Gerrit Updater [ 23/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37184/
Subject: LU-13119 osd-ldiskfs: set f_cred for app armour
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 33082e057d214793c70085a33f1d82b3915db3a9

Comment by Peter Jones [ 23/Jan/20 ]

Landed for 2.14

Generated at Sat Feb 10 02:58:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.