Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3717

Kernel panic in ll_encode_fh() while testing file handle syscalls on FC18 client

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.1, Lustre 2.5.0
    • None
    • 3
    • 9574

    Description

      Hit a kernel panic while trying to test the new file handle syscalls (name_to_handle_at()/open_by_handle_at())

      To reproduce follow the following steps:
      1) Apply patch (http://review.whamcloud.com/#/c/7247/). This patch adds a new file lustre/tests/check_fhandle_syscalls.c
      2) Compile lustre client
      3) Setup lustre (sh lustre/tests/llmount.sh)
      4) Create a temporary file in FS (echo "testing new syscalls" > /mnt/lustre/temp_file)
      5) Run the test utility as follows:
      cd lustre/tests;
      ./check_fhandle_syscalls temp_file /mnt/lustre

      The following is the stack trace of the panic:

      crash> bt -l
      PID: 2139 TASK: ffff880011495c40 CPU: 0 COMMAND: "check_fhandle_s"
      #0 [ffff8800115cbc90] machine_kexec at ffffffff8103e9a5
      /usr/src/debug/kernel-3.6.fc18/linux-3.6.10-4.fc18.x86_64/arch/x86/kernel/machine_kexec_64.c: 339
      #1 [ffff8800115cbd00] crash_kexec at ffffffff810c4118
      /usr/src/debug/kernel-3.6.fc18/linux-3.6.10-4.fc18.x86_64/kernel/kexec.c: 1100
      #2 [ffff8800115cbdd0] panic at ffffffff816198e2
      /usr/src/debug/kernel-3.6.fc18/linux-3.6.10-4.fc18.x86_64/arch/x86/include/asm/smp.h: 95
      #3 [ffff8800115cbe50] lbug_with_loc at ffffffffa0418e5b [libcfs]
      #4 [ffff8800115cbe90] ll_encode_fh at ffffffffa0961b75 [lustre]
      #5 [ffff8800115cbed0] exportfs_encode_fh at ffffffff81264ce4
      /usr/src/debug/kernel-3.6.fc18/linux-3.6.10-4.fc18.x86_64/fs/exportfs/expfs.c: 361
      #6 [ffff8800115cbf10] sys_name_to_handle_at at ffffffff811e8f36
      /usr/src/debug/kernel-3.6.fc18/linux-3.6.10-4.fc18.x86_64/fs/fhandle.c: 52
      #7 [ffff8800115cbf80] system_call_fastpath at ffffffff8162bae9
      /usr/src/debug/kernel-3.6.fc18/linux-3.6.10-4.fc18.x86_64/arch/x86/kernel/entry_64.S: 532
      RIP: 00000030828f309a RSP: 00007fff2f0d94c8 RFLAGS: 00010202
      RAX: 000000000000012f RBX: ffffffff8162bae9 RCX: 00007fff2f0d9578
      RDX: 0000000000720010 RSI: 00007fff2f0da853 RDI: 0000000000000003
      RBP: 00007fff2f0d95b0 R8: 0000000000000400 R9: 616e20676e696c6c
      R10: 00007fff2f0d9578 R11: 0000000000000202 R12: 0000000000000000
      R13: 0000000000000000 R14: 00007fff2f0d9690 R15: 00000000004008f0
      ORIG_RAX: 000000000000012f CS: 0033 SS: 002b

      Attachments

        Issue Links

          Activity

            [LU-3717] Kernel panic in ll_encode_fh() while testing file handle syscalls on FC18 client
            yujian Jian Yu added a comment -

            The patch in http://review.whamcloud.com/8347 resolves the failure. Let's close this ticket as a duplicate of LU-4231.

            yujian Jian Yu added a comment - The patch in http://review.whamcloud.com/8347 resolves the failure. Let's close this ticket as a duplicate of LU-4231 .

            Just tested the http://review.whamcloud.com/8347 patch and I get this:

            [root@spoon46 ~]# /usr/lib64/lustre/tests/check_fhandle_syscalls temp-file /lustre/barry/
            fh_bytes: 32
            fh_type: 151
            fh_data: 0 4 0 0 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
            check_fhandle_syscalls test Passed!

            It appears to work correctly.

            simmonsja James A Simmons added a comment - Just tested the http://review.whamcloud.com/8347 patch and I get this: [root@spoon46 ~] # /usr/lib64/lustre/tests/check_fhandle_syscalls temp-file /lustre/barry/ fh_bytes: 32 fh_type: 151 fh_data: 0 4 0 0 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 check_fhandle_syscalls test Passed! It appears to work correctly.

            It's a similar issue to LU-4231. I have provided other patch http://review.whamcloud.com/8347 that more accurate distinguish cases where parent is NULL and not.

            dmiter Dmitry Eremin (Inactive) added a comment - It's a similar issue to LU-4231 . I have provided other patch http://review.whamcloud.com/8347 that more accurate distinguish cases where parent is NULL and not.
            yujian Jian Yu added a comment -

            Patch for master branch is in http://review.whamcloud.com/8072.

            yujian Jian Yu added a comment - Patch for master branch is in http://review.whamcloud.com/8072 .
            yujian Jian Yu added a comment -

            In Linux kernel 3.6.10-4 used by FC18, exportfs_encode_fh() was called from do_sys_name_to_handle() as follows:

            static long do_sys_name_to_handle(struct path *path,
                                              struct file_handle __user *ufh,
                                              int __user *mnt_id)
            {
                    //......
                    /* we ask for a non connected handle */
                    retval = exportfs_encode_fh(path->dentry,
                                                (struct fid *)handle->f_handle,
                                                &handle_dwords,  0); <------ Here, 0 was passed to exportfs_encode_fh().
                    //......
            }
            

            While in exportfs_encode_fh(), the codes are:

            int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
                                   int connectable)
            {
                    //......
                    struct inode *inode = dentry->d_inode, *parent = NULL;
            
                    if (connectable && !S_ISDIR(inode->i_mode)) { <------ Here, connectable was 0.
                            p = dget_parent(dentry);
                            //......
                            parent = p->d_inode;
                    }
                    if (nop->encode_fh)
                            error = nop->encode_fh(inode, fid->raw, max_len, parent); <------ Here, parent was NULL.
                    //......
            }
            

            So, exportfs_encode_fh() finally passed "parent" parameter as NULL to ll_encode_fh().
            The ll_encode_fh() should check the "parent" value before running ll_inode2fid(). I'll upload a patch.

            yujian Jian Yu added a comment - In Linux kernel 3.6.10-4 used by FC18, exportfs_encode_fh() was called from do_sys_name_to_handle() as follows: static long do_sys_name_to_handle(struct path *path, struct file_handle __user *ufh, int __user *mnt_id) { //...... /* we ask for a non connected handle */ retval = exportfs_encode_fh(path->dentry, (struct fid *)handle->f_handle, &handle_dwords, 0); <------ Here, 0 was passed to exportfs_encode_fh(). //...... } While in exportfs_encode_fh(), the codes are: int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, int connectable) { //...... struct inode *inode = dentry->d_inode, *parent = NULL; if (connectable && !S_ISDIR(inode->i_mode)) { <------ Here, connectable was 0. p = dget_parent(dentry); //...... parent = p->d_inode; } if (nop->encode_fh) error = nop->encode_fh(inode, fid->raw, max_len, parent); <------ Here, parent was NULL. //...... } So, exportfs_encode_fh() finally passed "parent" parameter as NULL to ll_encode_fh(). The ll_encode_fh() should check the "parent" value before running ll_inode2fid(). I'll upload a patch.
            yujian Jian Yu added a comment -

            On FC18 client node, I set panic_on_lbug=0 and got the lctl debug log as follows:

            00000080:00000001:2.0:1382612125.266070:0:21111:0:(llite_nfs.c:187:ll_encode_fh()) Process entered
            00000080:00000040:2.0:1382612125.266071:0:21111:0:(llite_nfs.c:191:ll_encode_fh()) encoding for (144115205255725059,[0x200000400:0x3:0x0]) maxlen=32 minlen=32
            00000080:00040000:2.0:1382612125.266073:0:21111:0:(llite_internal.h:1166:ll_inode2fid()) ASSERTION( inode != ((void *)0) ) failed:
            00000080:00040000:2.0:1382612125.277251:0:21111:0:(llite_internal.h:1166:ll_inode2fid()) LBUG

            In ll_encode_fh():

            static int ll_encode_fh(struct inode *inode, __u32 *fh, int *plen,
                                    struct inode *parent)
            {
                    //......
                    CDEBUG(D_INFO, "encoding for (%lu,"DFID") maxlen=%d minlen=%d\n",
                           inode->i_ino, PFID(ll_inode2fid(inode)), *plen,
                           (int)sizeof(struct lustre_nfs_fid));
            
                    //......
                    nfs_fid->lnf_child = *ll_inode2fid(inode);
                    nfs_fid->lnf_parent = *ll_inode2fid(parent); <------ parent was NULL, which caused the ASSERTION failure
                    //......
            }
            

            Need to dig out why "parent" passed from exportfs_encode_fh() to ll_encode_fh() was NULL.

            yujian Jian Yu added a comment - On FC18 client node, I set panic_on_lbug=0 and got the lctl debug log as follows: 00000080:00000001:2.0:1382612125.266070:0:21111:0:(llite_nfs.c:187:ll_encode_fh()) Process entered 00000080:00000040:2.0:1382612125.266071:0:21111:0:(llite_nfs.c:191:ll_encode_fh()) encoding for (144115205255725059, [0x200000400:0x3:0x0] ) maxlen=32 minlen=32 00000080:00040000:2.0:1382612125.266073:0:21111:0:(llite_internal.h:1166:ll_inode2fid()) ASSERTION( inode != ((void *)0) ) failed: 00000080:00040000:2.0:1382612125.277251:0:21111:0:(llite_internal.h:1166:ll_inode2fid()) LBUG In ll_encode_fh(): static int ll_encode_fh(struct inode *inode, __u32 *fh, int *plen, struct inode *parent) { //...... CDEBUG(D_INFO, "encoding for (%lu," DFID ") maxlen=%d minlen=%d\n" , inode->i_ino, PFID(ll_inode2fid(inode)), *plen, ( int )sizeof(struct lustre_nfs_fid)); //...... nfs_fid->lnf_child = *ll_inode2fid(inode); nfs_fid->lnf_parent = *ll_inode2fid(parent); <------ parent was NULL, which caused the ASSERTION failure //...... } Need to dig out why "parent" passed from exportfs_encode_fh() to ll_encode_fh() was NULL.
            green Oleg Drokin added a comment -

            The LBUG is due to ll_inode2fid() wishing that the inode is not NULL, and it's somehow passed in as NULL to ll_encode_fh()
            So we need to check for that and take some appropriate action.

            green Oleg Drokin added a comment - The LBUG is due to ll_inode2fid() wishing that the inode is not NULL, and it's somehow passed in as NULL to ll_encode_fh() So we need to check for that and take some appropriate action.

            People

              yujian Jian Yu
              spimpale Swapnil Pimpale (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: