[LU-1334] Oops in ll_fsync on Lustre client / NFS server Created: 18/Apr/12  Updated: 06/Nov/13  Resolved: 06/Nov/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Andrew Prout Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Patchless v2.2.0 client running 2.6.32.59; v2.1.1 servers running CentOS 6.2 (2.6.32-220.el6_lustre.g4554b65.x86_64)


Issue Links:
Duplicate
duplicates LU-2900 Null pointer dereference in ll_fsync ... Resolved
Severity: 3
Rank (Obsolete): 3977

 Description   

Reproducable oops when trying to delete a file from a NFS client.

Apr 18 10:05:03 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
Apr 18 10:05:03 kernel: IP: [<ffffffffa06da18e>] ll_fsync+0xe/0x1120 [lustre]
Apr 18 10:05:03 kernel: PGD 22d2ce067 PUD 22d415067 PMD 0
Apr 18 10:05:03 kernel: Oops: 0000 [#1] SMP
Apr 18 10:05:03 kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
Apr 18 10:05:03 kernel: CPU 0
Apr 18 10:05:03 kernel: Modules linked in: deflate zlib_deflate ctr cast5 crypto_null ccm serpent blowfish twofish twofish_common ecb xcbc cbc md5 sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 lmv mgc lustre lquota lov osc mdc fid fld ksocklnd ptlrpc obdclass lnet lvfs libcfs sunrpc ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 dm_mirror dm_region_hash dm_log bnx2 sg shpchp pcspkr i5000_edac edac_core dcdbas microcode ext3 jbd sd_mod mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ehci_hcd scsi_mod button [last unloaded: freq_table]
Apr 18 10:05:03 kernel: Pid: 2254, comm: nfsd Not tainted 2.6.32.59 #1 PowerEdge 1955
Apr 18 10:05:03 kernel: RIP: 0010:[<ffffffffa06da18e>]  [<ffffffffa06da18e>] ll_fsync+0xe/0x1120 [lustre]
Apr 18 10:05:03 kernel: RSP: 0018:ffff88021dcf7d50  EFLAGS: 00010286
Apr 18 10:05:03 kernel: RAX: ffffffffa06da180 RBX: ffff88021e3d5808 RCX: 0000000000000012
Apr 18 10:05:03 kernel: RDX: 0000000000000000 RSI: ffff8802292539c0 RDI: 0000000000000000
Apr 18 10:05:03 kernel: RBP: 00000000ffffc000 R08: 5a5a5a5a5a5a5a5a R09: 5a5a5a5a5a5a5a5a
Apr 18 10:05:03 kernel: R10: 5a5a5a5a5a5a5a5a R11: 5a5a5a5a5a5a5a5a R12: ffff88022920d6c0
Apr 18 10:05:03 kernel: R13: 0000000000000015 R14: ffff8802292539c0 R15: ffff880222ba5368
Apr 18 10:05:03 kernel: FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
Apr 18 10:05:03 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Apr 18 10:05:03 kernel: CR2: 0000000000000018 CR3: 000000022d876000 CR4: 00000000000006b0
Apr 18 10:05:03 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 18 10:05:03 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 18 10:05:03 kernel: Process nfsd (pid: 2254, threadinfo ffff88021dcf6000, task ffff88022fbb0090)
Apr 18 10:05:03 kernel: Stack:
Apr 18 10:05:03 kernel: 0000000000000200 ffff88022920d6c0 ffff8802292539c0 ffff88022920d6c0
Apr 18 10:05:03 kernel: <0> 0000000000000015 ffffffff810ab2e5 ffff88022dbab7e8 ffff88021e3d5808
Apr 18 10:05:03 kernel: <0> 00000000ffffc000 ffff88022920d6c0 0000000000000015 ffff8802292539c0
Apr 18 10:05:03 kernel: Call Trace:
Apr 18 10:05:03 kernel: [<ffffffff810ab2e5>] ? d_kill+0x57/0x61
Apr 18 10:05:03 kernel: [<ffffffffa035a49c>] ? nfsd_unlink+0x20b/0x24c [nfsd]
Apr 18 10:05:03 kernel: [<ffffffffa0360af0>] ? nfsd3_proc_remove+0x9d/0xa8 [nfsd]
Apr 18 10:05:03 kernel: [<ffffffffa0355784>] ? nfsd_dispatch+0xdf/0x1b2 [nfsd]
Apr 18 10:05:03 kernel: [<ffffffffa01a2182>] ? svc_process+0x413/0x700 [sunrpc]
Apr 18 10:05:03 kernel: [<ffffffff8102f173>] ? default_wake_function+0x0/0x11
Apr 18 10:05:03 kernel: [<ffffffffa03550e1>] ? nfsd+0xe1/0x12a [nfsd]
Apr 18 10:05:03 kernel: [<ffffffffa0355000>] ? nfsd+0x0/0x12a [nfsd]
Apr 18 10:05:03 kernel: [<ffffffff810465cd>] ? kthread+0x75/0x7d
Apr 18 10:05:03 kernel: [<ffffffff8100c8fa>] ? child_rip+0xa/0x20
Apr 18 10:05:03 kernel: [<ffffffff81046558>] ? kthread+0x0/0x7d
Apr 18 10:05:03 kernel: [<ffffffff8100c8f0>] ? child_rip+0x0/0x20
Apr 18 10:05:03 kernel: Code: ff 48 c7 c7 e0 23 74 a0 c7 05 83 82 06 00 00 00 04 00 e8 46 c9 af ff 66 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 38 <48> 8b 47 18 89 54 24 14 48 8b 58 10 8b 05 e4 a4 b1 ff c7 05 4e
Apr 18 10:05:03 kernel: RIP  [<ffffffffa06da18e>] ll_fsync+0xe/0x1120 [lustre]
Apr 18 10:05:03 kernel: RSP <ffff88021dcf7d50>
Apr 18 10:05:03 kernel: CR2: 0000000000000018
Apr 18 10:05:03 kernel: ---[ end trace 81f5758468499db0 ]---


 Comments   
Comment by Alfonso Pardo [ 04/Sep/12 ]

I have got same error.

Machine spec:

CentOS release 5.5 (Final)
2.6.18-308.8.2.el5
lustre-client-modules-2.2.0-2.6.18_238.19.1.el5
lustre-client-2.2.0-2.6.18_238.19.1.el5

Comment by Andreas Dilger [ 06/Nov/13 ]

This was fixed on master via http://review.whamcloud.com/5585

Generated at Sat Feb 10 01:15:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.