[LU-4025] RIP [<ffffffffa00add09>] fsfilt_put_ops+0x9/0x20 [lvfs] Created: 30/Sep/13  Updated: 01/Oct/13  Resolved: 01/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Kit Westneat (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

virtual machine, Lustre build: 2.4.0-RC2-gd3f91c4-PRISTINE-2.6.32-358.6.2.el6_lustre.g230b174.x86_64


Issue Links:
Duplicate
duplicates LU-3411 Encountered at NULL pointer exception... Resolved
Severity: 3
Rank (Obsolete): 10820

 Description   

During testing on a virtual machine, one OSS rebooted when unmounting OSTs in parallel:

BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
IP: [<ffffffffa00add09>] fsfilt_put_ops+0x9/0x20 [lvfs]
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/host3/target3:0:2/3:0:2:0/state
CPU 0
Modules linked in: osp(U) ofd(U) ost(U) mgc(U) fsfilt_ldiskfs(U) lustre(U) osd_ldiskfs(U) lov(U) ldiskfs(U) osc(U) lquota(U) mdd(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ko2iblnd(U) ksocklnd(U) lnet(U) libcfs(U) rdma_cm iw_cm ib_addr sha512_generic sha256_generic autofs4 ib_srp scsi_transport_srp scsi_tgt sunrpc ib_cm ipv6 ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib mlx4_ib ib_sa mlx4_en mlx4_core ib_mthca ib_mad ib_core dm_round_robin ipmi_devintf ppdev parport_pc parport microcode virtio_net i2c_piix4 i2c_core sg ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sym53c8xx scsi_transport_spi virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
                                                  
Pid: 24601, comm: umount Not tainted 2.6.32-358.6.2.el6_lustre.g230b174.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa00add09>]  [<ffffffffa00add09>] fsfilt_put_ops+0x9/0x20 [lvfs]
RSP: 0018:ffff880029bd1ac8  EFLAGS: 00010282      
RAX: 0000000000000044 RBX: ffff88003d1f6000 RCX: ffff88003746d1c0
RDX: 0000000000000043 RSI: ffff88003d1f6000 RDI: fffffffffffffffe
RBP: ffff880029bd1ac8 R08: 0000000000000000 R09: 0000000000000002
R10: 5a5a5a5a5a5a5a5a R11: 5a5a5a5a5a5a5a5a R12: ffff880029bd1b18
R13: ffffffffa0ece3a0 R14: ffff880029bd1b18 R15: 0000000000000001
FS:  00007f26472de740(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 000000000000000e CR3: 000000002e8c6000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount (pid: 24601, threadinfo ffff880029bd0000, task ffff88002e8c3540)
Stack:                                            
 ffff880029bd1ae8 ffffffffa0e925b9 ffff88003d1f6000 ffff880029bd1b18
<d> ffff880029bd1b08 ffffffffa0e97247 ffff88002f282038 ffff88003d1f6000
<d> ffff880029bd1b88 ffffffffa08a7ba7 0000000210000080 0000000000000000
Call Trace:                                       
 [<ffffffffa0e925b9>] osd_umount+0x39/0x150 [osd_ldiskfs]
 [<ffffffffa0e97247>] osd_device_fini+0x147/0x190 [osd_ldiskfs]
 [<ffffffffa08a7ba7>] class_cleanup+0x577/0xda0 [obdclass]
 [<ffffffffa087cb36>] ? class_name2dev+0x56/0xe0 [obdclass]
 [<ffffffffa08a948c>] class_process_config+0x10bc/0x1c80 [obdclass]
 [<ffffffffa08a2cb3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
 [<ffffffffa08aa1c9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
 [<ffffffffa0758717>] ? cfs_waitq_broadcast+0x17/0x20 [libcfs]
 [<ffffffffa087aee6>] ? class_export_put+0xf6/0x2b0 [obdclass]
 [<ffffffffa0e9b7a5>] osd_obd_disconnect+0x1c5/0x1d0 [osd_ldiskfs]
 [<ffffffffa08ac1fe>] lustre_put_lsi+0x17e/0x1100 [obdclass]
 [<ffffffffa08b4f58>] lustre_common_put_super+0x5f8/0xc40 [obdclass]
 [<ffffffffa08deada>] server_put_super+0x1ca/0xf00 [obdclass]
 [<ffffffff8118334b>] generic_shutdown_super+0x5b/0xe0
 [<ffffffff81183436>] kill_anon_super+0x16/0x60   
 [<ffffffffa08ac026>] lustre_kill_super+0x36/0x60 [obdclass]
 [<ffffffff81183bd7>] deactivate_super+0x57/0x80  
 [<ffffffff811a1c4f>] mntput_no_expire+0xbf/0x110 
 [<ffffffff811a26bb>] sys_umount+0x7b/0x3a0       
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Here is the other umount bt:

PID: 24600  TASK: ffff88003c7c5500  CPU: 0   COMMAND: "umount"
 #0 [ffff88003d11f798] schedule at ffffffff8150df42
 #1 [ffff88003d11f860] jbd2_log_wait_commit at ffffffffa00bdce5 [jbd2]
 #2 [ffff88003d11f8f0] ldiskfs_sync_fs at ffffffffa0da657f [ldiskfs]
 #3 [ffff88003d11f930] vfs_quota_disable at ffffffff811e1756
 #4 [ffff88003d11fa20] ldiskfs_quota_off at ffffffffa0da98f0 [ldiskfs]
 #5 [ffff88003d11fa80] deactivate_super at ffffffff81183bc6
 #6 [ffff88003d11faa0] mntput_no_expire at ffffffff811a1c4f
 #7 [ffff88003d11fad0] osd_umount at ffffffffa0e925f9 [osd_ldiskfs]
 #8 [ffff88003d11faf0] osd_device_fini at ffffffffa0e97247 [osd_ldiskfs]
 #9 [ffff88003d11fb10] class_cleanup at ffffffffa08a7ba7 [obdclass]
#10 [ffff88003d11fb90] class_process_config at ffffffffa08a948c [obdclass]
#11 [ffff88003d11fc20] class_manual_cleanup at ffffffffa08aa1c9 [obdclass]
#12 [ffff88003d11fce0] osd_obd_disconnect at ffffffffa0e9b7a5 [osd_ldiskfs]
#13 [ffff88003d11fd20] lustre_put_lsi at ffffffffa08ac1fe [obdclass]
#14 [ffff88003d11fd50] lustre_common_put_super at ffffffffa08b4f58 [obdclass]
#15 [ffff88003d11fdc0] server_put_super at ffffffffa08deada [obdclass]
#16 [ffff88003d11fe30] generic_shutdown_super at ffffffff8118334b
#17 [ffff88003d11fe50] kill_anon_super at ffffffff81183436
#18 [ffff88003d11fe70] lustre_kill_super at ffffffffa08ac026 [obdclass]
#19 [ffff88003d11fe90] deactivate_super at ffffffff81183bd7
#20 [ffff88003d11feb0] mntput_no_expire at ffffffff811a1c4f
#21 [ffff88003d11fee0] sys_umount at ffffffff811a26bb
#22 [ffff88003d11ff80] system_call_fastpath at ffffffff8100b072

The vmcore is only 97M if you would like to see it. Or I can run some commands on it if you prefer.



 Comments   
Comment by Bruno Faccini (Inactive) [ 01/Oct/13 ]

Yes Kit, can you upload (like for LU-3997) the vmcore and the others necessary stuff, vmlinux/kernel-debuginfo* and lustre-modules/lustre-debuginfo ?
Thanks in advance for help.

Comment by James A Simmons [ 01/Oct/13 ]

This reminds me of LU-3411. Looks like we will need a similar solution.

Comment by Kit Westneat (Inactive) [ 01/Oct/13 ]

2.4.0 doesn't include the fix for LU-3411, I think we can mark this as a duplicate.

Comment by James A Simmons [ 01/Oct/13 ]

Missed that it was a 2.4.0-RC2 instead of a 2.4.1-RC2 release. Yes this has been fixed by LU-3411.

Comment by Bruno Faccini (Inactive) [ 01/Oct/13 ]

Humm thanks guys, I agree with you that "RDI: fffffffffffffffe" is almost a direct proof of what you affirm !!
So duplicating to LU-3411.

Generated at Sat Feb 10 01:38:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.