Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-604

1.8<->2.1 interop: RIP: ptlrpc:lustre_msg_buf+0x4/0x90

    XMLWordPrintable

Details

    • 3
    • 6568

    Description

      While running racer test, Lustre 1.8.6-wc1 client (fat-amd-3) hit kernel panic as follows:

      Lustre: DEBUG MARKER: -----============= acceptance-small: racer ============----- Thu Aug 18 02:28:59 PDT 2011
      general protection fault: 0000 [1] SMP 
      last sysfs file: /block/lloop15/range
      CPU 7 
      Modules linked in: llite_lloop(U) lustre(U) mgc(U) lov(U) osc(U) mdc(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi iscsi_tcp bnx2i cnic uio iw_cxgb3(U) cxgb3(U) libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi rds(U) ib_sdp(U) ib_ipoib(U) ipoib_helper(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ipv6 xfrm_nalgo crypto_api ib_sa(U) loop dm_mirror dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport shpchp mlx4_ib(U) ib_mad(U) ib_core(U) igb 8021q tpm_tis tpm k10temp tpm_bios i2c_piix4 serio_raw sg dca hwmon pcspkr i2c_core mlx4_core(U) amd64_edac_mod edac_mc dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
      Pid: 27776, comm: rm Tainted: G      2.6.18-238.12.1.el5 #1
      RIP: 0010:[]  [] :ptlrpc:lustre_msg_buf+0x4/0x90
      RSP: 0018:ffff8100cdb65cc8  EFLAGS: 00010292
      RAX: ffff81041e84f8e8 RBX: ffff81021f43b680 RCX: ffff81021adfe940
      RDX: 00000000000000a8 RSI: 0000000000000002 RDI: 5a5a5a5a5a5a5a5a
      RBP: ffff81021adfe940 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff810214393c00 R11: 0000000000000248 R12: ffff810214393c00
      R13: ffff8100cdb65df8 R14: ffff81041a2c7b80 R15: ffff81041a2c7c50
      FS:  00002afe193106e0(0000) GS:ffff810123aac2c0(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 000000001770393c CR3: 00000002158c0000 CR4: 00000000000006e0
      Process rm (pid: 27776, threadinfo ffff8100cdb64000, task ffff8100d39e7080)
      Stack:  ffff8100d1098000 ffff81041a2c7ce0 0000000200000400 0000000000000a03
       0000000200000400 ffffffff88b074e7 ffff81021f43b680 ffff81021e5a5540
       ffff81021adfe940 ffff8100cdb65df8 0000000000000000 ffffffff88b0a4a3
      Call Trace:
      
       [] :lustre:ll_och_fill+0x67/0x100
       [] :lustre:ll_local_open+0xe3/0x190
       [] :libcfs:cfs_alloc+0x68/0xc0
       [] :lustre:ll_file_open+0x956/0xd10
       [] :lustre:ll_file_open+0x0/0xd10
       [] __dentry_open+0xd9/0x1dc
       [] do_filp_open+0x2a/0x38
       [] do_sys_open+0x44/0xbe
       [] tracesys+0xd5/0xe0
      
      Code: 8b 47 08 3d d0 0b d0 0b 74 09 3d d3 0b d0 0b 75 1b eb 0e 83 
      RIP  [] :ptlrpc:lustre_msg_buf+0x4/0x90
       RSP 
       <0>Kernel panic - not syncing: Fatal exception
       <7>APIC error on CPU13: 00(04)
      
      [root@fat-amd-3 ~]# gdb /lib/modules/2.6.18-238.12.1.el5/updates/kernel/fs/lustre/ptlrpc.ko
      (gdb) l *(lustre_msg_buf+0x4)
      0x47584 is in lustre_msg_buf (/var/lib/jenkins/workspace/lustre-b1_8/arch/x86_64/build_type/client/distro/el5/ib_stack/ofa/BUILD/BUILD/lustre-1.8.6/lustre/ptlrpc/pack_generic.c:603).
      598     /var/lib/jenkins/workspace/lustre-b1_8/arch/x86_64/build_type/client/distro/el5/ib_stack/ofa/BUILD/BUILD/lustre-1.8.6/lustre/ptlrpc/pack_generic.c: No such file or directory.
              in /var/lib/jenkins/workspace/lustre-b1_8/arch/x86_64/build_type/client/distro/el5/ib_stack/ofa/BUILD/BUILD/lustre-1.8.6/lustre/ptlrpc/pack_generic.c
      (gdb) 
      
      [root@fat-amd-3 ~]# vi /usr/src/lustre-1.8.6/lustre/ptlrpc/pack_generic.c
          601 void *lustre_msg_buf(struct lustre_msg *m, int n, int min_size)
          602 {
          603         switch (m->lm_magic) {
          604         case LUSTRE_MSG_MAGIC_V1:
          605                 return lustre_msg_buf_v1(m, n - 1, min_size);
          606         case LUSTRE_MSG_MAGIC_V2:
          607                 return lustre_msg_buf_v2(m, n, min_size);
          608         default:
          609                 CERROR("incorrect message magic: %08x\n", m->lm_magic);
          610                 return NULL;
          611         }
          612 }
      

      Maloo report: https://maloo.whamcloud.com/test_sets/4468ed3a-c97f-11e0-8d02-52540025f9af

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: