[LU-2986] Kernel Oops on ioctl LL_IOC_GET_MDTIDX Created: 19/Mar/13  Updated: 05/Dec/14  Resolved: 23/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Critical
Reporter: Ned Bass Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: ppc
Environment:

github.com/chaos/lustre tag 2.3.58-14chaos
PPC client


Severity: 3
Rank (Obsolete): 7278

 Description   

A PowerPC64 sequoia LAC node panicked when I ran the following test program:

#include <stdio.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <string.h>

#include <sys/stat.h>
#include <fcntl.h>
#include <lustre/lustre_user.h>

int main(int argc, char **argv)
{
        int rc = -1;
        int mdtidx = 0;
        int fd;
        char *path = argv[1];

        if (argc != 2) {
                fprintf(stderr, "Usage: %s <path>\n", argv[0]);
                goto out;
        }

        fd = open(path, O_RDONLY);
        if (fd < 0) {
                fprintf(stderr, "open() error on %s: %s\n", path,
                        strerror(errno));
                goto out;
        }

        rc = ioctl(fd, LL_IOC_GET_MDTIDX, &mdtidx);
        if (rc < 0) {
                fprintf(stderr, "ioctl() error: %s\n", strerror(errno));
                goto out;
        }
        printf("mdtidx %d\n", mdtidx);

out:
        return rc;
}

Here is the backtrace from crash:

PID: 4522   TASK: c000000f565f6900  CPU: 29  COMMAND: "a.out"
 #0 [c000000f4c2632c0] .crash_kexec at c0000000000e5bf4
 #1 [c000000f4c2634c0] .die at c0000000000309d8
 #2 [c000000f4c263570] .bad_page_fault at c000000000043378
 #3 [c000000f4c2635f0] handle_page_fault at c000000000005228
 Data Access error  [300] exception frame:
 R0:  0000000000000000    R1:  c000000f4c2638e0    R2:  d0000000127283e0   
 R3:  c000000e0d6f6f00    R4:  c000000de5ea1700    R5:  0000000000000000   
 R6:  c000000de5ea1838    R7:  0000000000000001    R8:  0000000000000720   
 R9:  0000000000000000    R10: 2b94515100000000    R11: 0000000000003000   
 R12: d0000000136b27f0    R13: c000000001006d80    R14: 000000001012b3dc   
 R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
 R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
 R21: 000000001012daf0    R22: c000000f4b772480    R23: 00000000400466af   
 R24: c000000e046d33f8    R25: 0000000000000000    R26: c000000e0d6f6f00   
 R27: d00000000cc74b48    R28: c000000de5ea1700    R29: d00000000cc74b48   
 R30: d000000012726ee0    R31: c000000f4c2638e0   
 NIP: d000000012701940    MSR: 8000000000009032    OR3: d0000000136e7588
 CTR: d0000000127018c0    LR:  d0000000136166ec    XER: 0000000020000010
 CCR: 0000000024000428    MQ:  0000000000000001    DAR: 0000000000000000
 DSISR: 0000000042000000     Syscall Result: 0000000000000000

 #4 [c000000f4c2638e0] .mdc_getattr at d000000012701940 [mdc]
 [Link Register ]  [c000000f4c2638e0] .ll_get_mdt_idx at d0000000136166ec
 #5 [c000000f4c263990] .ll_get_mdt_idx at d0000000136166ec [lustre]  (unreliable)
 #6 [c000000f4c263a60] .ll_dir_ioctl at d0000000136234c4 [lustre]
 #7 [c000000f4c263c00] .vfs_ioctl at c0000000001d7f24
 #8 [c000000f4c263c90] .do_vfs_ioctl at c0000000001d8170
 #9 [c000000f4c263d80] .sys_ioctl at c0000000001d8954
#10 [c000000f4c263e30] syscall_exit at c000000000008564
 syscall  [c01] exception frame:
 R0:  0000000000000036    R1:  00000fffffffedb0    R2:  0000008053993268   
 R3:  0000000000000003    R4:  00000000400466af    R5:  00000fffffffeea0   
 R6:  0000000000004000    R7:  00000080538c91f0    R8:  800000000200f032   
 R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
 R12: 0000000000000000    R13: 00000080537ac250   
 NIP: 00000080538cfc70    MSR: 800000000200f032    OR3: 0000000000000003
 CTR: 00000080538cfbd0    LR:  000000001000082c    XER: 0000000000000010
 CCR: 0000000042000428    MQ:  0000000000000001    DAR: 00000080538c91dc
 DSISR: 0000000040000000     Syscall Result: 0000000000000003

Console panic message:

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xd000000012701940
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: xt_owner nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack mgc(U) lustre(U) mdc(U) fid(U) fld(U) lov(U) osc(U) ptlrpc(U) obdclass(U) lvfs(U) nfs fscache lockd auth_rpcgss nfs_acl ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) sunrpc ipt_LOG xt_multiport iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 uinput raid1 sg ses enclosure mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core e1000e ehea ext4 jbd2 mbcache raid456 async_pq async_xor xor async_raid6_recov raid6_pq async_memcpy async_tx sd_mod crc_t10dif ipr dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
NIP: d000000012701940 LR: d0000000136166ec CTR: d0000000127018c0
REGS: c000000f4c263660 TRAP: 0300   Not tainted  (2.6.32-348.1chaos.bgq62.ppc64)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24000428  XER: 20000010
DAR: 0000000000000000, DSISR: 0000000042000000
TASK = c000000f565f6900[4522] 'a.out' THREAD: c000000f4c260000 CPU: 29
GPR00: 0000000000000000 c000000f4c2638e0 d0000000127283e0 c000000e0d6f6f00 
GPR04: c000000de5ea1700 0000000000000000 c000000de5ea1838 0000000000000001 
GPR08: 0000000000000720 0000000000000000 2b94515100000000 0000000000003000 
GPR12: d0000000136b27f0 c000000001006d80 000000001012b3dc 0000000000000000 
GPR16: 0000000000000000 0000000010129c58 0000000010129bf8 000000001012b948 
GPR20: 0000000000000000 000000001012daf0 c000000f4b772480 00000000400466af 
GPR24: c000000e046d33f8 0000000000000000 c000000e0d6f6f00 d00000000cc74b48 
GPR28: c000000de5ea1700 d00000000cc74b48 d000000012726ee0 c000000f4c2638e0 
NIP [d000000012701940] .mdc_getattr+0x80/0x3d0 [mdc]
LR [d0000000136166ec] .ll_get_mdt_idx+0x1ac/0x8d0 [lustre]
Call Trace:
[c000000f4c2638e0] [c000000f4c263990] 0xc000000f4c263990 (unreliable)
[c000000f4c263990] [d0000000136166ec] .ll_get_mdt_idx+0x1ac/0x8d0 [lustre]
[c000000f4c263a60] [d0000000136234c4] .ll_dir_ioctl+0x1d14/0x8080 [lustre]
[c000000f4c263c00] [c0000000001d7f24] .vfs_ioctl+0x54/0x140
[c000000f4c263c90] [c0000000001d8170] .do_vfs_ioctl+0x90/0x7c0
[c000000f4c263d80] [c0000000001d8954] .SyS_ioctl+0xb4/0xd0
[c000000f4c263e30] [c000000000008564] syscall_exit+0x0/0x40
Instruction dump:
7fa85840 41dd02e0 eb7e8028 801b0000 780907e1 41820014 e93e8030 80090000 
7809ffe3 408201fc 38000000 7f43d378 <f8190000> 48012c3d e8410028 e89e8048 

Source code information:

(gdb) l *(mdc_getattr+0x80)
0x11940 is in mdc_getattr (/builddir/build/BUILD/lustre-2.3.58/lustre/mdc/mdc_request.c:211).
206     /builddir/build/BUILD/lustre-2.3.58/lustre/mdc/mdc_request.c: No such file or directory.
        in /builddir/build/BUILD/lustre-2.3.58/lustre/mdc/mdc_request.c
(gdb) 
 204 int mdc_getattr(struct obd_export *exp, struct md_op_data *op_data,
 205                 struct ptlrpc_request **request)
 206 {
 207         struct ptlrpc_request *req;
 208         int                    rc;
 209         ENTRY;
 210 
 211         *request = NULL;
 212         req = ptlrpc_request_alloc(class_exp2cliimp(exp), &RQF_MDS_GETATTR);
 213         if (req == NULL)
 214                 RETURN(-ENOMEM);


 Comments   
Comment by Bruno Faccini (Inactive) [ 19/Mar/13 ]

Hello Ned,
According to the sources, seems that ll_get_mdt_idx() simply forget to declare a "struct ptlrpc_request *req = NULL;" and pass its address/reference as the 3rd parameter of md_getattr().
Will post+try a patch this way.

Comment by Bruno Faccini (Inactive) [ 19/Mar/13 ]

Ned,
Just for my understanding, on which path did you run your program ?
Seems that using it you skipped the lmv layer ... And thus went straight in the mdc to trigger this crash which should not happen when crossing lmv due to MF_GET_MDT_IDX usage, as far as I understand!

Comment by Ned Bass [ 19/Mar/13 ]

Hi Bruno, I ran it on the filesystem root, /p/lscratchrza.

Comment by Oleg Drokin [ 19/Mar/13 ]

I wonder if this is actually a dup of lu-2960

Comment by Bruno Faccini (Inactive) [ 19/Mar/13 ]

Why not, these endianness issues can have strange consequences, but what puzzles me here is that lmv has been skipped and mdc called straight which triggered a hidden bug (not detected by the static analysis tools !!) where md_getattr() 3rd parameter is passed as a NULL from ll_get_mdt_idx().

I think this can at least be fixed with http://review.whamcloud.com/5769 I just submitted.

Also, thank's to Johann, lmv does not appear to be loaded in the modules list shown in the crash output !! This may simply come from a wrong configuration missing lvm with DNE code in ???

Comment by Ned Bass [ 19/Mar/13 ]

From the comments I've read, it seems the client will skip LMV only if it thinks the server was upgraded from 1.8. That is not the case here; the server has always run (pre) Lustre 2.4. Since I can only reproduce this on a PPC client, I wonder if an endianness issue is corrupting the connect flags in a way that makes the client think it is talking to an upgraded 1.8 server.

Comment by Ned Bass [ 19/Mar/13 ]

As another data point, I mounted another Lustre 2.4 filesystem (lsrzb) on the same PPC client. In this case, it did use LMV. I didn't test the ioctl, but I verified that the lmv module reference count goes to 1, and /proc entries show up for it, i.e. /proc/fs/lustre/lmv/lsrzb-clilmv-c000000d475b7300/target_obds/lsrzb-MDT0000-mdc-c000000d475b7300.

So something about the first filesystem (lsrza) makes the client not configure an LMV device. One difference between the two filesystems is that lsrzb was recently updated with the patch from LU-2240. They were also created at different times, so lsrzb was probably formatted with a slightly more recent code revision.

Comment by Ned Bass [ 19/Mar/13 ]

Sorry, I was mistaken about the server version. The lsrza server is running lustre 2.1.2. The lsrzb server is running 2.4. Is it expected that a 2.4 client won't use LMV when talking to a 2.1 server?

Comment by Ned Bass [ 19/Mar/13 ]

Ah, this server was upgraded from 1.8 after all. Sorry for the confusion. Too many filesystems to keep straight.

Comment by Bruno Faccini (Inactive) [ 20/Mar/13 ]

I will check which scenario may fit and lead to the trigger of the bug, but this may be the trick and be what Andreas refers as "single-MDS system without an LMV".

Patch #2 has been submitted to conform to Andreas advice on how to short-cut unnecessary dialog with Server.

Comment by Bruno Faccini (Inactive) [ 26/Mar/13 ]

Patch/Change 5769 #2 successfully passed auto-tests, so Ned is it possible for you to give it a try since you have the platform+reproducer available ?? It would greatly help me to save time in not having to setup the same ... Just tell me if it is possible for you ?

Comment by Ned Bass [ 26/Mar/13 ]

Bruno, yes I can give the patch a try. The affected systems are in production so I can not do the test right away, but I should be able to get it done this week. Thanks

Comment by Bruno Faccini (Inactive) [ 08/Apr/13 ]

Ned, any news?

Comment by Ned Bass [ 12/Apr/13 ]

Bruno, with your patch I am no longer able to reproduce this bug. Thanks

Comment by Peter Jones [ 23/Apr/13 ]

Landed for 2.4

Generated at Sat Feb 10 01:29:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.