[LU-4708] BUG: unable to handle kernel paging request in ldiskfs Created: 04/Mar/14  Updated: 29/Apr/14  Resolved: 20/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Major
Reporter: Blake Caldwell Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL 6.4/ distro IB
Kernel 2.6.32-358.23.2.el6.atlas.x86_64
lustre-2.4.2-2.6.32_358.23.2.el6.atlas.x86_64.x86_64
– with LU4008


Severity: 3
Rank (Obsolete): 12945

 Description   

This filesystem was upgraded from lustre 1.8 to 2.4.2 about 3 weeks ago. It was recently rebooted with LU4008 yesterday. Can this crash be attributed to the upgrade or the LU4008 patch or something else?

2014-03-04 07:56:40 [66446.201681] BUG: unable to handle kernel paging request at ffff8803b298b000
2014-03-04 07:56:40 [66446.202606] IP: [<ffffffff81281ac0>] memcpy+0x10/0x120
2014-03-04 07:56:40 [66446.202606] PGD 1a86063 PUD 3427067 PMD 35bc067 PTE 80000003b298b060
2014-03-04 07:56:40 [66446.202606] Oops: 0000 1 SMP DEBUG_PAGEALLOC
2014-03-04 07:56:40 [66446.202606] last sysfs file: /sys/devices/pci0000:00/0000:00:14.4/0000:01:04.0/local_cpus
2014-03-04 07:56:40 [66446.202606] CPU 0
2014-03-04 07:56:40 [66446.202606] Modules linked in: osp(U) lod(U) mdt(U) mgs(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) ldiskfs(U) mbcache lquota(U) jbd2 mdd(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) ko2iblnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 ipmi_devintf 8021q garp stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr mlx4_ib ib_sa ib_mad ib_core dm_mirror dm_region_hash dm_log dm_multipath dm_mod sg ses enclosure sd_mod crc_t10dif sr_mod cdrom aacraid microcode serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd ata_generic pata_acpi pata_atiixp i2c_piix4 i2c_core ahci shpchp ipv6 nfs lockd fscache auth_rpcgss nfs_acl sunrpc mlx4_en mlx4_core igb dca ptp pps_core [last unloaded: scsi_wait_scan]
2014-03-04 07:56:40 [66446.202606]
2014-03-04 07:56:40 [66446.202606] Pid: 20122, comm: mdt_rdpg00_003 Not tainted 2.6.32-358.23.2.el6.atlas.x86_64 #1 Penguin Computing Altus 2800/H8DGU
2014-03-04 07:56:40 [66446.202606] RIP: 0010:[<ffffffff81281ac0>] [<ffffffff81281ac0>] memcpy+0x10/0x120
2014-03-04 07:56:40 [66446.202606] RSP: 0018:ffff8803b49b3908 EFLAGS: 00010202
2014-03-04 07:56:40 [66446.202606] RAX: ffff8803e6e8a3ae RBX: 000000006a8ea4d4 RCX: 0000000000000001
2014-03-04 07:56:40 [66446.202606] RDX: 0000000000000001 RSI: ffff8803b298b000 RDI: ffff8803e6e8a3b6
2014-03-04 07:56:40 [66446.202606] RBP: ffff8803b49b3950 R08: 0000000000000000 R09: ffff8803e6e8a380
2014-03-04 07:56:40 [66446.202606] R10: 0000000000000008 R11: 00000000a0cde035 R12: ffff8803e6e8a380
2014-03-04 07:56:40 [66446.202606] R13: ffff8803b4211700 R14: ffff8803b298aff0 R15: ffff8803b4211700
2014-03-04 07:56:40 [66446.202606] FS: 00007f928c43e700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
2014-03-04 07:56:40 [66446.202606] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2014-03-04 07:56:40 [66446.202606] CR2: ffff8803b298b000 CR3: 0000000418ff2000 CR4: 00000000000007f0
2014-03-04 07:56:40 [66446.202606] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2014-03-04 07:56:40 [66446.202606] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2014-03-04 07:56:40 [66446.202606] Process mdt_rdpg00_003 (pid: 20122, threadinfo ffff8803b49b2000, task ffff880417168080)
2014-03-04 07:56:40 [66446.202606] Stack:
2014-03-04 07:56:40 [66446.202606] ffffffffa0c354f0 6a8ea4d48cf0fcbe f46fc73300000001 ffff8803b49b3950
2014-03-04 07:56:40 [66446.202606] <d> ffff8803b298aff0 ffff8803a4a367b0 ffff8803b49b3a30 ffff8807ac4293d0
2014-03-04 07:56:40 [66446.202606] <d> 000000000000000e ffff8803b49b39c0 ffffffffa0c57718 0000000000000002
2014-03-04 07:56:40 [66446.202606] Call Trace:
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0c354f0>] ? ldiskfs_htree_store_dirent+0xb0/0x1c0 [ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0c57718>] htree_dirblock_to_tree+0x128/0x190 [ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0c5a0ba>] ldiskfs_htree_fill_tree+0x16a/0x280 [ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0c35887>] ldiskfs_readdir+0x127/0x730 [ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0c353a5>] ? call_filldir+0xb5/0x150 [ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0cc3170>] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0c35d09>] ? ldiskfs_readdir+0x5a9/0x730 [ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0cc3170>] ? osd_ldiskfs_filldir+0x0/0x480 [osd_ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0c74190>] ? htree_lock_try+0x40/0x80 [ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0cb597b>] osd_ldiskfs_it_fill+0xab/0x1e0 [osd_ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0cb5c26>] osd_it_ea_next+0x96/0x190 [osd_ldiskfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0e7f4f1>] lod_it_next+0x21/0x90 [lod]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0b3b171>] mdd_dir_page_build+0x91/0x210 [mdd]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0583b12>] dt_index_walk+0x162/0x3d0 [obdclass]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0b3b0e0>] ? mdd_dir_page_build+0x0/0x210 [mdd]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0b3ce2b>] mdd_readpage+0x38b/0x5a0 [mdd]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0dc728f>] mdt_readpage+0x47f/0x960 [mdt]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0db5b37>] mdt_handle_common+0x647/0x16d0 [mdt]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa070bd3c>] ? lustre_msg_get_transno+0x8c/0x100 [ptlrpc]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa0df18c5>] mds_readpage_handle+0x15/0x20 [mdt]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa071b558>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa03af5de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa03c0d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa07128b9>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
2014-03-04 07:56:40 [66446.202606] [<ffffffff81055c93>] ? __wake_up+0x53/0x70
2014-03-04 07:56:40 [66446.202606] [<ffffffffa071c8ee>] ptlrpc_main+0xace/0x1700 [ptlrpc]
2014-03-04 07:56:40 [66446.202606] [<ffffffffa071be20>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
2014-03-04 07:56:40 [66446.202606] [<ffffffff8100c0ca>] child_rip+0xa/0x20
2014-03-04 07:56:40 [66446.202606] Code: c0 49 89 70 58 41 c6 40 4c 04 83 e0 fc 83 c0 08 41 88 40 4d c9 c3 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 f3 48 a5 89 d1 <f3> a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b
2014-03-04 07:56:40 [66446.202606] RIP [<ffffffff81281ac0>] memcpy+0x10/0x120
2014-03-04 07:56:40 [66446.202606] RSP <ffff8803b49b3908>
2014-03-04 07:56:40 [66446.202606] CR2: ffff8803b298b000



 Comments   
Comment by Peter Jones [ 04/Mar/14 ]

Oleg

Could you please assess whether this is related to the LU-4008 patch or not?

Thanks

Peter

Comment by Oleg Drokin [ 05/Mar/14 ]

I don't think this is related to LU-4008.
mdt_readpage does not receive the MDS_MD buffer so it's not affected.
Also this crash is apparently trying to fill in the entry from the cache, and so it's not even related to any of the client passed buffers at all I think.

Reading through the code, it sems like we are trying to read more data from the dentry name than there actually is due to LDISKFS_DIRENT_LUFID flag being set, but apparently to little data actually being present.

This is something Andreas needs to take a look I guess.

Comment by Andreas Dilger [ 05/Mar/14 ]

http://review.whamcloud.com/9510

Comment by Andreas Dilger [ 11/Mar/14 ]

Patch for b2_5: http://review.whamcloud.com/9578

Comment by Peter Jones [ 20/Mar/14 ]

Landed for 2.6. b2_5 landing to follow

Generated at Sat Feb 10 06:47:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.