[LU-4364] OST Page Fault test sanity test_133f: fldb_seq_start+0x6d Created: 09/Dec/13  Updated: 04/Mar/14  Resolved: 04/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: revzfs, zfs

Severity: 3
Rank (Obsolete): 11947

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/8b0e35da-4d42-11e3-95a5-52540035b04c.

The sub-test test_133f failed with the following error:

test failed to respond and timed out

Info required for matching: sanity 133f

OST console log:

01:15:03:BUG: unable to handle kernel paging request at fffffffffffffffe
01:15:03:IP: [<ffffffffa0b86f8d>] fldb_seq_start+0x6d/0xc0 [fld]
01:15:03:PGD 1a87067 PUD 1a88067 PMD 0 
01:15:03:Oops: 0000 [#1] SMP 
01:15:03:last sysfs file: /sys/devices/system/cpu/possible
01:15:03:CPU 0 
01:15:03:Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_zfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
01:15:03:
01:15:03:Pid: 13161, comm: cat Tainted: P           ---------------    2.6.32-358.23.2.el6_lustre.g02571dc.x86_64 #1 Red Hat KVM
01:15:03:RIP: 0010:[<ffffffffa0b86f8d>]  [<ffffffffa0b86f8d>] fldb_seq_start+0x6d/0xc0 [fld]
01:15:03:RSP: 0018:ffff880072bdbdf8  EFLAGS: 00010246
01:15:03:RAX: fffffffffffffffe RBX: ffff8800662bea40 RCX: 0000000000000000
01:15:03:RDX: ffff880072e7b800 RSI: ffff88007200e470 RDI: ffff88006d1d1400
01:15:03:RBP: ffff880072bdbe18 R08: ffffc90004447000 R09: 0000000000000000
01:15:03:R10: 0000000000000001 R11: ffffffffffffffff R12: ffff880072bdbe60
01:15:03:R13: 0000000000000000 R14: 0000000000008000 R15: ffff880072bdbe60
01:15:03:FS:  00007f4e65a26700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
01:15:03:CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
01:15:03:CR2: fffffffffffffffe CR3: 000000006fa80000 CR4: 00000000000006f0
01:15:03:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
01:15:03:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
01:15:03:Process cat (pid: 13161, threadinfo ffff880072bda000, task ffff88006faa1500)
01:15:03:Stack:
01:15:03: ffff88007bfeb740 ffff88006ea59e00 ffff88007bfeb740 0000000000000000
01:15:03:<d> ffff880072bdbe98 ffffffff811a5356 ffff88006faa1500 0000000001860000
01:15:03:<d> ffff88007bfeb778 ffff880072bdbf48 0000000000008000 0000000000000000
01:15:03:Call Trace:
01:15:03: [<ffffffff811a5356>] seq_read+0x96/0x400
01:15:03: [<ffffffff811e9bae>] proc_reg_read+0x7e/0xc0
01:15:03: [<ffffffff81181ac5>] vfs_read+0xb5/0x1a0
01:15:03: [<ffffffff81181c01>] sys_read+0x51/0x90
01:15:03: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

First occurrence is linked above, came on patch http://review.whamcloud.com/7884 which touch fld code and may be responsible.



 Comments   
Comment by Peter Jones [ 10/Dec/13 ]

Di

Could you please comment on this one?

Thanks

Peter

Comment by Oleg Drokin [ 10/Dec/13 ]

it appears like an attempt to dereference a pointer that's -Esomething.

Comment by Di Wang [ 11/Dec/13 ]

Hmm, it seems zfs has different iteration behavior than ldiskfs. probably this line needs to change

*pos = be64_to_cpu(*(__u64 *)iops->key(&param->fsp_env, param->fsp_it));

i.e. we need check return value of key.

Comment by Di Wang [ 11/Dec/13 ]

http://review.whamcloud.com/8534

Comment by nasf (Inactive) [ 27/Dec/13 ]

Another failure instance:

https://maloo.whamcloud.com/test_sets/ec56daa6-6e83-11e3-b713-52540035b04c

Comment by Andreas Dilger [ 28/Dec/13 ]

Patch landed to master a few hours ago, hopefully it will fix the problem.

Comment by Jodi Levi (Inactive) [ 04/Mar/14 ]

Patch landed to Master.

Generated at Sat Feb 10 01:42:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.