[LU-1251] GPF RIP ptlrpc:lustre_msg_buf+0x8/0x90 Created: 22/Mar/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ned Bass Assignee: Hongchao Zhang
Resolution: Cannot Reproduce Votes: 0
Labels: llnl
Environment:

client: https://github.com/chaos/lustre/tree/1.8.5-llnl
server: https://github.com/chaos/lustre/tree/2.1.0-llnl


Severity: 3
Rank (Obsolete): 8544

 Description   

This looks possibly related to LU-604 or LU-1020. Client is 1.8.5, server is 2.1.0, with local patch stacks at the github links in Environment section. We didn't get a crash dump.

2012-03-21 17:09:39 LustreError: 21982:0:(file.c:3312:ll_inode_revalidate_fini()) failure -2 inode 144302670109215237
2012-03-21 17:09:39 general protection fault: 0000 [1] SMP2012-03-21 17:09:39 last sysfs file:
/devices/pci0000:00/0000:00:0a.0/0000:02:00.0/irq2012-03-21 17:09:39 CPU 12012-03-21 17:09:39 Modules linked in:
cpufreq_ondemand(U) xt_state(U) ip_conntrack(U) nfnetlink(U) iptable_filter(U) ip_tables(U) ipt_owner(U) mgc(U)
lustre(U) lov(U) mdc(U) lquota(U) osc(U) ptlrpc(U) obdclass(U) lvfs(U
) nfs(U) fscache(U) nfs_acl(U) perfctr(U) ko2iblnd(U) lnet(U) libcfs(U) job(U) lockd(U) sunrpc(U) ib_ipoib(U)
rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_mthca(U) ipoib_helper(U) ib_cm
(U) ib_sa(U) ib_mad(U) ib_core(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) myri10ge(U) inet_lro(U) e1000(U) ipt_LOG
(U) xt_tcpudp(U) xt_multiport(U) x_tables(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) sbs(U) power_met
er(U) backlight(U) i2c_ec(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) par
port_pc(U) lp(U) parport(U) ksm(U) kvm(U) joydev(U) sg(U) floppy(U) k8_edac(U) i2c_nforce2(U) usb_storage(U) s
hpchp(U) k8temp(U) edac_mc(U) i2c_core(U) hwmon(U) pcspkr(U) serio_raw(U) dm_raid45(U) dm_message(U) dm_region
_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) sata_nv(U) pata_acpi(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) j
bd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
2012-03-21 17:09:39 Pid: 21982, comm: perl Tainted: G      2.6.18-108chaos #1
2012-03-21 17:09:39 RIP: 0010:[<ffffffff88863338>]  [<ffffffff88863338>] :ptlrpc:lustre_msg_buf+0x8/0x90
2012-03-21 17:09:39 RSP: 0018:ffff81007188fc78  EFLAGS: 00010286
2012-03-21 17:09:39 RAX: ffff81011715ac48 RBX: ffff8100a10277c0 RCX: ffff8100682c1c80
2012-03-21 17:09:39 RDX: 00000000000000a8 RSI: 0000000000000002 RDI: 5a5a5a5a5a5a5a5a
2012-03-21 17:09:39 RBP: ffff81007188fc98 R08: 0000000000000000 R09: 0000000200000003
2012-03-21 17:09:39 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8100682c1c80
2012-03-21 17:09:39 R13: ffff81012c9fec00 R14: ffff81007188fdb8 R15: ffff810038261c00
2012-03-21 17:09:39 FS:  00002aaaac28e420(0000) GS:ffff8102038bc0c0(0000) knlGS:0000000057cf61e0
2012-03-21 17:09:39 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2012-03-21 17:09:39 CR2: 00002aaaac4911f8 CR3: 00000001b4ed5000 CR4: 00000000000006e0
2012-03-21 17:09:39 Process perl (pid: 21982, threadinfo ffff81007188e000, task ffff8100b45357b0)
2012-03-21 17:09:39 Stack:  0000000000000000 ffff8100a10277c0 ffff81004ee78000 ffff810038261d60
2012-03-21 17:09:39  ffff81007188fcd8 ffffffff889e81a6 0000000200aa8390 ffff8100a10277c0
2012-03-21 17:09:39  ffff81007b8e2bc0 ffff8100682c1c80 ffff81007188fdb8 0000000000000000
2012-03-21 17:09:39 Call Trace:
2012-03-21 17:09:39  [<ffffffff889e81a6>] :lustre:ll_och_fill+0x66/0x100
2012-03-21 17:09:39  [<ffffffff889eb0b3>] :lustre:ll_local_open+0xe3/0x190
2012-03-21 17:09:39  [<ffffffff88a17908>] :lustre:ll_stats_ops_tally+0x48/0xf0
2012-03-21 17:09:39  [<ffffffff889ec72b>] :lustre:ll_file_open+0x98b/0xd60
2012-03-21 17:09:39  [<ffffffff8000ced8>] do_path_lookup+0x277/0x2f5
2012-03-21 17:09:39  [<ffffffff889ebda0>] :lustre:ll_file_open+0x0/0xd60
2012-03-21 17:09:39  [<ffffffff8001ef44>] __dentry_open+0xe9/0x1ef
2012-03-21 17:09:39  [<ffffffff80025f04>] nameidata_to_filp+0x2d/0x3f
2012-03-21 17:09:39  [<ffffffff80027f31>] do_filp_open+0x36/0x46
2012-03-21 17:09:39  [<ffffffff800166aa>] get_unused_fd+0x72/0x102
2012-03-21 17:09:39  [<ffffffff8001a6df>] do_sys_open+0x4f/0xcd
2012-03-21 17:09:39  [<ffffffff8003254a>] sys_open+0x1b/0x1d
2012-03-21 17:09:39  [<ffffffff80060116>] system_call+0x7e/0x83
2012-03-21 17:09:39
2012-03-21 17:09:39
2012-03-21 17:09:39 Code: 8b 47 08 3d d0 0b d0 0b 74 0e 3d d3 0b d0 0b 75 1e eb 0f 0f
2012-03-21 17:09:39 RIP  [<ffffffff88863338>] :ptlrpc:lustre_msg_buf+0x8/0x90
2012-03-21 17:09:39  RSP <ffff81007188fc78>
2012-03-21 17:09:39 REWRITING MCP55 CFG REG
2012-03-21 17:09:39 CFG = c1
2012-03-21 17:09:40 Linux version 2.6.18-108chaos (mockbuild@chaos4builder1) (gcc version 4.1.2 20080704 (Red Hat
4.1.2-48)) #1 SMP Mon Sep 12 15:32:06 PDT 2011


 Comments   
Comment by Peter Jones [ 22/Mar/12 ]

Hi Hongchao

Could you please look into this one?

Thanks

Peter

Comment by Hongchao Zhang [ 23/Mar/12 ]

Yes, this issue should be a duplicate of LU-1020, which uses a freed "lookup_intent->d.lustre.it_data".

Hi Ned, could you please apply the debug patch in LU-1020(http://review.whamcloud.com/#change,2152) to help collect more debug info? Thanks!

Comment by Ned Bass [ 23/Mar/12 ]

Hi Hongchao,

It is uncertain whether we will be doing any further updates to our 1.8 systems. However, if we do we will include the debug patch. Our plan is to move everything to 2.1. Can you confirm that this bug does not affect 2.1?

Thanks,
Ned

Comment by Hongchao Zhang [ 12/Apr/12 ]

it's not sure whether 2.1 still has the bug, but it does not show up in 2.1 until now, then this issue could not affect 2.1

Comment by Peter Jones [ 04/Jun/12 ]

Chris

Have you seen this issue with 2.1.x clients yet? Would you apply the LU-1020 diagnostic patch into production if we ported it?

Peter

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:14:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.