Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.0, Lustre 1.8.6
-
None
-
Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-238.12.1.el5)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el5,ib_stack=ofa/
Network: IB (OFED 1.5.3.1)
Lustre Servers:
Branch: master
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.gd4ea36c)
Build: http://newbuild.whamcloud.com/job/lustre-master/262/arch=x86_64,build_type=server,distro=el5,ib_stack=inkernel/
Network: IB (inkernel OFED)
Lustre Clients: Tag: 1.8.6-wc1 Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-238.12.1.el5) Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el5,ib_stack=ofa/ Network: IB (OFED 1.5.3.1) Lustre Servers: Branch: master Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.gd4ea36c) Build: http://newbuild.whamcloud.com/job/lustre-master/262/arch=x86_64,build_type=server,distro=el5,ib_stack=inkernel/ Network: IB (inkernel OFED)
-
3
-
6568
Description
While running racer test, Lustre 1.8.6-wc1 client (fat-amd-3) hit kernel panic as follows:
Lustre: DEBUG MARKER: -----============= acceptance-small: racer ============----- Thu Aug 18 02:28:59 PDT 2011 general protection fault: 0000 [1] SMP last sysfs file: /block/lloop15/range CPU 7 Modules linked in: llite_lloop(U) lustre(U) mgc(U) lov(U) osc(U) mdc(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi iscsi_tcp bnx2i cnic uio iw_cxgb3(U) cxgb3(U) libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi rds(U) ib_sdp(U) ib_ipoib(U) ipoib_helper(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ipv6 xfrm_nalgo crypto_api ib_sa(U) loop dm_mirror dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport shpchp mlx4_ib(U) ib_mad(U) ib_core(U) igb 8021q tpm_tis tpm k10temp tpm_bios i2c_piix4 serio_raw sg dca hwmon pcspkr i2c_core mlx4_core(U) amd64_edac_mod edac_mc dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 27776, comm: rm Tainted: G 2.6.18-238.12.1.el5 #1 RIP: 0010:[] [] :ptlrpc:lustre_msg_buf+0x4/0x90 RSP: 0018:ffff8100cdb65cc8 EFLAGS: 00010292 RAX: ffff81041e84f8e8 RBX: ffff81021f43b680 RCX: ffff81021adfe940 RDX: 00000000000000a8 RSI: 0000000000000002 RDI: 5a5a5a5a5a5a5a5a RBP: ffff81021adfe940 R08: 0000000000000000 R09: 0000000000000000 R10: ffff810214393c00 R11: 0000000000000248 R12: ffff810214393c00 R13: ffff8100cdb65df8 R14: ffff81041a2c7b80 R15: ffff81041a2c7c50 FS: 00002afe193106e0(0000) GS:ffff810123aac2c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000001770393c CR3: 00000002158c0000 CR4: 00000000000006e0 Process rm (pid: 27776, threadinfo ffff8100cdb64000, task ffff8100d39e7080) Stack: ffff8100d1098000 ffff81041a2c7ce0 0000000200000400 0000000000000a03 0000000200000400 ffffffff88b074e7 ffff81021f43b680 ffff81021e5a5540 ffff81021adfe940 ffff8100cdb65df8 0000000000000000 ffffffff88b0a4a3 Call Trace: [] :lustre:ll_och_fill+0x67/0x100 [] :lustre:ll_local_open+0xe3/0x190 [] :libcfs:cfs_alloc+0x68/0xc0 [] :lustre:ll_file_open+0x956/0xd10 [] :lustre:ll_file_open+0x0/0xd10 [] __dentry_open+0xd9/0x1dc [] do_filp_open+0x2a/0x38 [] do_sys_open+0x44/0xbe [] tracesys+0xd5/0xe0 Code: 8b 47 08 3d d0 0b d0 0b 74 09 3d d3 0b d0 0b 75 1b eb 0e 83 RIP [] :ptlrpc:lustre_msg_buf+0x4/0x90 RSP <0>Kernel panic - not syncing: Fatal exception <7>APIC error on CPU13: 00(04)
[root@fat-amd-3 ~]# gdb /lib/modules/2.6.18-238.12.1.el5/updates/kernel/fs/lustre/ptlrpc.ko (gdb) l *(lustre_msg_buf+0x4) 0x47584 is in lustre_msg_buf (/var/lib/jenkins/workspace/lustre-b1_8/arch/x86_64/build_type/client/distro/el5/ib_stack/ofa/BUILD/BUILD/lustre-1.8.6/lustre/ptlrpc/pack_generic.c:603). 598 /var/lib/jenkins/workspace/lustre-b1_8/arch/x86_64/build_type/client/distro/el5/ib_stack/ofa/BUILD/BUILD/lustre-1.8.6/lustre/ptlrpc/pack_generic.c: No such file or directory. in /var/lib/jenkins/workspace/lustre-b1_8/arch/x86_64/build_type/client/distro/el5/ib_stack/ofa/BUILD/BUILD/lustre-1.8.6/lustre/ptlrpc/pack_generic.c (gdb) [root@fat-amd-3 ~]# vi /usr/src/lustre-1.8.6/lustre/ptlrpc/pack_generic.c 601 void *lustre_msg_buf(struct lustre_msg *m, int n, int min_size) 602 { 603 switch (m->lm_magic) { 604 case LUSTRE_MSG_MAGIC_V1: 605 return lustre_msg_buf_v1(m, n - 1, min_size); 606 case LUSTRE_MSG_MAGIC_V2: 607 return lustre_msg_buf_v2(m, n, min_size); 608 default: 609 CERROR("incorrect message magic: %08x\n", m->lm_magic); 610 return NULL; 611 } 612 }
Maloo report: https://maloo.whamcloud.com/test_sets/4468ed3a-c97f-11e0-8d02-52540025f9af
Attachments
Issue Links
- Trackbacks
-
Lustre 2.1.0 release testing tracker Lustre 2.1.0 RC0 Tag: v2100RC0 Created Date: 20110820 The difference between RC0 and RC1 is only a date change in lustre/ChangeLog. Lustre 2.1....
-
Changelog 1.8 Changes from version 1.8.7wc1 to version 1.8.8wc1 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.13.1.el6 (RHEL6) Recommended e2fsprogs version: 1.41.90....