[LU-6916] unable to handle kernel NULL pointer dereference at (null) in tgt_client_free() Created: 28/Jul/15  Updated: 19/Mar/19  Resolved: 03/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6831 The ticket for tracking all DNE2 bugs Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I met this in DNE 24 hours failover test, which looks like not a DNE bug.

<4>Lustre: lustre-MDT0000: denying duplicate export for lustre-MDT0000-lwp-OST0000_UUID, -114
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffff812957f6>] __list_add+0x26/0xa0
<4>PGD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/devices/system/cpu/online
<4>CPU 0
<4>Modules linked in: osp(U) mdd(U) lod(U) mdt(U) lfsck(U) mgs(U) mgc(U) osd_ldiskfs(U) ldiskfs(U) jbd2 lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic crc32c_intel libcfs(U) nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 iTCO_wdt iTCO_vendor_support microcode serio_raw mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core i2c_i801 lpc_ich mfd_core ioatdma i7core_edac edac_core ses enclosure sg igb dca i2c_algo_bit i2c_core ptp pps_core ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 2798, comm: mdt00_002 Not tainted 2.6.32-431.29.2.el6_lustre.g2382eb0.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
<4>RIP: 0010:[<ffffffff812957f6>]  [<ffffffff812957f6>] __list_add+0x26/0xa0
<4>RSP: 0018:ffff88081327d960  EFLAGS: 00010246
<4>RAX: 0000000000000000 RBX: ffff88081327d9a0 RCX: 00000000ffffffff
<4>RDX: ffff88082f113aa0 RSI: 0000000000000000 RDI: ffff88081327d9a0
<4>RBP: ffff88081327d980 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000000 R12: ffff88082f113aa0
<4>R13: 0000000000000000 R14: ffff88082f113aa0 R15: ffffffffffffffff
<4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000001034430000 CR4: 00000000000007f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process mdt00_002 (pid: 2798, threadinfo ffff88081327c000, task ffff880829ef2aa0)
<4>Stack:
<4> ffffffffffffff10 ffff88082f113a98 ffff880829ef2aa0 ffff88082f113a9c
<4><d> ffff88081327d9f0 ffffffff8152adef ffff88081327dfd8 ffff880829ef2aa0
<4><d> ffff88101daed1d8 0000000000000000 ffff88081327da10 ffffffffa047e087
<4>Call Trace:
<4> [<ffffffff8152adef>] __mutex_lock_slowpath+0xcf/0x180
<4> [<ffffffffa047e087>] ? cfs_hash_putref+0x2e7/0x480 [libcfs]
<4> [<ffffffff8152acfb>] mutex_lock+0x2b/0x50
<4> [<ffffffffa083db72>] tgt_client_free+0x62/0x610 [ptlrpc]
<4> [<ffffffffa0f59f84>] mdt_destroy_export+0x74/0x220 [mdt]
<4> [<ffffffffa05862c5>] class_new_export+0x315/0x9c0 [obdclass]
<4> [<ffffffffa058745e>] class_connect+0xae/0x250 [obdclass]
<4> [<ffffffffa0f57fd1>] mdt_obd_connect+0xb1/0x720 [mdt]
<4> [<ffffffffa07aaf78>] target_handle_connect+0xe58/0x2d30 [ptlrpc]
<4> [<ffffffff8128c1f0>] ? string+0x40/0x100
<4> [<ffffffffa084edf2>] tgt_request_handle+0x5b2/0x1230 [ptlrpc]
<4> [<ffffffffa07f7611>] ptlrpc_main+0xe41/0x1920 [ptlrpc]
<4> [<ffffffffa07f67d0>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
<4> [<ffffffff8109abf6>] kthread+0x96/0xa0
<4> [<ffffffff8100c20a>] child_rip+0xa/0x20
<4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 4c 8b 42 08 49 89 f5 49 89 d4 49 39 f0 75 27 <4d> 8b 45 00 4d 39 c4 75 40 49 89 5c 24 08 4c 89 23 4c 89 6b 08
<1>RIP  [<ffffffff812957f6>] __list_add+0x26/0xa0
<4> RSP <ffff88081327d960>
<4>CR2: 0000000000000000


 Comments   
Comment by Di Wang [ 28/Jul/15 ]

hmm, it seems ted_lcd_lock init should move from tgt_client_new to tgt_client_alloc(), otherwise the error handler path will cause panic like this.

Comment by Gerrit Updater [ 28/Jul/15 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15770
Subject: LU-6916 target: fix ted_lcd_lock init
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 80ecbb145fb0673e0edd63b3332e34449be4c12e

Comment by James A Simmons [ 28/Jul/15 ]

Ah I just ran into this in my testing.

Comment by Gerrit Updater [ 03/Aug/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15770/
Subject: LU-6916 target: fix ted_lcd_lock init
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 96a352c9ddeafe139097762211c7f64086016717

Generated at Sat Feb 10 02:04:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.