[LU-11175] Null pointer dereference in idle_timeout_show recovery-small test 57 Created: 25/Jul/18  Updated: 18/Aug/18  Resolved: 18/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-11184 recovery-small test_57: unable to han... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In code added by https://review.whamcloud.com/32719 for LU-8066 but likely actually due to LU-7236 landing (idle import disconnection) we have this regression (and probably many others) where it is blindly assumed cli import is in place where it is not.

Some auditing campaign is needed?

[ 1930.731848] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash =================================== 23:58:28 (1532491108)
[ 1931.971139] BUG: unable to handle kernel NULL pointer dereference at 0000000000000338
[ 1931.972794] IP: [<ffffffffa075326f>] idle_timeout_show+0x1f/0x30 [osc]
[ 1931.973668] PGD 6fb39067 PUD 78ab6067 PMD 0 
[ 1931.974182] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1931.974726] Modules linked in: loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) jbd2 mbcache lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) dm_flakey dm_mod libcfs(OE) crc_t10dif crct10dif_generic crct10dif_common rpcsec_gss_krb5 ata_generic pata_acpi ttm drm_kms_helper drm i2c_piix4 ata_piix pcspkr i2c_core virtio_balloon serio_raw virtio_console virtio_blk libata floppy ip_tables
[ 1931.979448] CPU: 3 PID: 18210 Comm: lctl Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.5-debug #2
[ 1931.980483] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1931.981001] task: ffff880046fc8800 ti: ffff8800acb88000 task.ti: ffff8800acb88000
[ 1931.981968] RIP: 0010:[<ffffffffa075326f>]  [<ffffffffa075326f>] idle_timeout_show+0x1f/0x30 [osc]
[ 1931.982980] RSP: 0018:ffff8800acb8bde0  EFLAGS: 00010246
[ 1931.983507] RAX: 0000000000000000 RBX: ffff880070915800 RCX: ffffffffa0753250
[ 1931.984152] RDX: 0000000000000000 RSI: ffffffffa0779433 RDI: ffff88008d32c000
[ 1931.984769] RBP: ffff8800acb8bde0 R08: ffff880079a49738 R09: 0000000000000000
[ 1931.985332] R10: 0000000000001000 R11: 0000000000000000 R12: ffffffffa0372d60
[ 1931.985867] R13: ffff8800acb8bf18 R14: 0000000000000001 R15: ffff880070915800
[ 1931.986431] FS:  00007f1bcb331740(0000) GS:ffff8800bc980000(0000) knlGS:0000000000000000
[ 1931.987404] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1931.987919] CR2: 0000000000000338 CR3: 0000000027d70000 CR4: 00000000000006e0
[ 1931.988488] Call Trace:
[ 1931.988980]  [<ffffffffa0318ff6>] lustre_attr_show+0x16/0x20 [obdclass]
[ 1931.989521]  [<ffffffff8129a24c>] sysfs_kf_seq_show+0xcc/0x1e0
[ 1931.990037]  [<ffffffff81298953>] kernfs_seq_show+0x23/0x30
[ 1931.990571]  [<ffffffff81234bd5>] seq_read+0x115/0x3f0
[ 1931.991072]  [<ffffffff8129951d>] kernfs_fop_read+0xfd/0x170
[ 1931.991610]  [<ffffffff8120d91c>] vfs_read+0x9c/0x170
[ 1931.992115]  [<ffffffff8120e7df>] SyS_read+0x7f/0xf0
[ 1931.992636]  [<ffffffff8178383b>] ? system_call_after_swapgs+0xc8/0x160
[ 1931.993166]  [<ffffffff817838e9>] system_call_fastpath+0x16/0x1b
[ 1931.993742]  [<ffffffff8178383b>] ? system_call_after_swapgs+0xc8/0x160
[ 1931.994299] Code: e8 47 07 93 e0 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 d0 48 8b 97 30 f4 ff ff 48 c7 c6 33 94 77 a0 48 89 c7 31 c0 48 89 e5 <8b> 92 38 03 00 00 e8 86 c4 c6 e0 5d 48 98 c3 66 90 0f 1f 44 00 
[ 1931.996406] RIP  [<ffffffffa075326f>] idle_timeout_show+0x1f/0x30 [osc]
(gdb) l *(idle_timeout_show+0x1f)
0xe29f is in idle_timeout_show (/home/green/git/lustre-release/lustre/osc/lproc_osc.c:618).
613	{
614		struct obd_device *obd = container_of(kobj, struct obd_device,
615						      obd_kset.kobj);
616		struct client_obd *cli = &obd->u.cli;
617	
618		return sprintf(buf, "%u\n", cli->cl_import->imp_idle_timeout);
619	}
(gdb) p/x &((struct obd_import *)0)->imp_idle_timeout
$3 = 0x338

So it looks like cli->cl_import is NULL a



 Comments   
Comment by Gerrit Updater [ 26/Jul/18 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32883
Subject: LU-11175 osc: serialize access to idle_timeout vs cleanup
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c2df2004ed222f6c30b06b93ba6a415c89c9936e

Comment by Oleg Drokin [ 03/Aug/18 ]

another similar one:

[18362.344427] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash =================================== 20:22:40 (1533255760)
[18364.972638] LustreError: 5963:0:(obd_class.h:1075:obd_statfs()) Device 65 not setup
[18364.976977] BUG: unable to handle kernel NULL pointer dereference at 0000000000000340
[18364.977482] IP: [<ffffffffa0d160cd>] grant_shrink_show+0x1d/0x40 [osc]
[18364.977482] PGD 80000002a2499067 PUD 2d6839067 PMD 0 
[18364.977482] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[18364.977482] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi ttm drm_kms_helper ata_piix serio_raw drm virtio_blk i2c_piix4 virtio_balloon virtio_console pcspkr libata i2c_core floppy ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[18364.977482] CPU: 11 PID: 5963 Comm: lctl Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.5-debug #1
[18364.977482] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[18364.977482] task: ffff88024775e580 ti: ffff8802f5ab0000 task.ti: ffff8802f5ab0000
[18364.977482] RIP: 0010:[<ffffffffa0d160cd>]  [<ffffffffa0d160cd>] grant_shrink_show+0x1d/0x40 [osc]
[18364.977482] RSP: 0018:ffff8802f5ab3de0  EFLAGS: 00010246
[18364.977482] RAX: 0000000000000000 RBX: ffff88028b07dd40 RCX: ffffffffa0d160b0
[18364.977482] RDX: 0000000000000000 RSI: 0000000000001000 RDI: ffff88030271d000
[18364.977482] RBP: ffff8802f5ab3de0 R08: ffff8802d13e17b8 R09: 0000000000000000
[18364.977482] R10: 0000000000001000 R11: 0000000000000000 R12: ffffffffa0935d60
[18364.977482] R13: ffff8802f5ab3f18 R14: 0000000000000001 R15: ffff88028b07dd40
[18364.977482] FS:  00007f65a2104740(0000) GS:ffff88033dcc0000(0000) knlGS:0000000000000000
[18364.977482] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18364.977482] CR2: 0000000000000340 CR3: 00000002f7e48000 CR4: 00000000000006e0
[18364.977482] Call Trace:
[18364.977482]  [<ffffffffa08dbff6>] lustre_attr_show+0x16/0x20 [obdclass]
[18364.977482]  [<ffffffff8129a1ec>] sysfs_kf_seq_show+0xcc/0x1e0
[18364.977482]  [<ffffffff812988f3>] kernfs_seq_show+0x23/0x30
[18364.977482]  [<ffffffff81234b75>] seq_read+0x115/0x3f0
[18364.977482]  [<ffffffff812994bd>] kernfs_fop_read+0xfd/0x170
[18364.977482]  [<ffffffff8120d8bc>] vfs_read+0x9c/0x170
[18364.977482]  [<ffffffff8120e77f>] SyS_read+0x7f/0xf0
[18364.977482]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[18364.977482]  [<ffffffff81783929>] system_call_fastpath+0x16/0x1b
[18364.977482]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[18364.977482] Code: eb dd e8 e7 d8 36 e0 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 d0 48 8b 97 30 f4 ff ff be 00 10 00 00 48 89 c7 31 c0 48 89 e5 <48> 8b 8a 40 03 00 00 48 c7 c2 33 d4 d3 a0 48 c1 e9 21 83 e1 01 
[18364.977482] RIP  [<ffffffffa0d160cd>] grant_shrink_show+0x1d/0x40 [osc]
Comment by Oleg Drokin [ 03/Aug/18 ]

Not sure if 100% related but also

[ 9344.618480] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash =================================== 17:52:51 (1533246771)
[ 9347.040738] BUG: unable to handle kernel paging request at ffffffff81d8fc50
[ 9347.041687] IP: [<ffffffff810fb0f2>] __pv_queued_spin_lock_slowpath+0x1f2/0x3d0
[ 9347.041687] PGD 1c12067 PUD 1c13063 PMD 3263c1063 PTE 8000000001d8f062
[ 9347.041687] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[ 9347.041687] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi ttm drm_kms_helper drm i2c_piix4 ata_piix virtio_balloon pcspkr virtio_blk virtio_console serio_raw i2c_core libata floppy ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[ 9347.041687] CPU: 7 PID: 18520 Comm: lctl Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.5-debug #1
[ 9347.041687] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 9347.041687] task: ffff88025ce60f40 ti: ffff8802ef004000 task.ti: ffff8802ef004000
[ 9347.041687] RIP: 0010:[<ffffffff810fb0f2>]  [<ffffffff810fb0f2>] __pv_queued_spin_lock_slowpath+0x1f2/0x3d0
[ 9347.041687] RSP: 0018:ffff8802ef007d20  EFLAGS: 00010086
[ 9347.041687] RAX: 0000000000008000 RBX: ffff8802a85c5488 RCX: 0000000000000001
[ 9347.041687] RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffffffff81d8fc50
[ 9347.041687] RBP: ffff8802ef007d60 R08: 0000000000000000 R09: 0000000000000000
[ 9347.041687] R10: 0000000000000000 R11: 0000000000000246 R12: ffff88033dbd9c40
[ 9347.041687] R13: ffffffff81d8fc50 R14: ffff88033dbd9c84 R15: 0000000000390000
[ 9347.041687] FS:  00007f820a6bc740(0000) GS:ffff88033dbc0000(0000) knlGS:0000000000000000
[ 9347.041687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9347.041687] CR2: ffffffff81d8fc50 CR3: 00000002d7d64000 CR4: 00000000000006e0
[ 9347.041687] Call Trace:
[ 9347.041687]  [<ffffffff813ccc5d>] do_raw_spin_lock+0x6d/0xa0
[ 9347.041687]  [<ffffffff817798c0>] _raw_spin_lock_irqsave+0x30/0x40
[ 9347.041687]  [<ffffffffa090508d>] lprocfs_stats_lock+0x8d/0xf0 [obdclass]
[ 9347.041687]  [<ffffffffa090516e>] lprocfs_stats_collect+0x7e/0x140 [obdclass]
[ 9347.041687]  [<ffffffffa0905aca>] lprocfs_stats_seq_show+0x4a/0x140 [obdclass]
[ 9347.041687]  [<ffffffff81234b75>] seq_read+0x115/0x3f0
[ 9347.041687]  [<ffffffff8120d8bc>] vfs_read+0x9c/0x170
[ 9347.041687]  [<ffffffff8120e77f>] SyS_read+0x7f/0xf0
[ 9347.041687]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[ 9347.041687]  [<ffffffff81783929>] system_call_fastpath+0x16/0x1b
[ 9347.041687]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160

and

[ 2246.251618] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash =================================== 15:55:37 (1533239737)
[ 2248.581006] BUG: unable to handle kernel paging request at ffff8802aa9cc188
[ 2248.581006] IP: [<ffffffffa08c21b7>] lprocfs_stats_collect+0xc7/0x140 [obdclass]
[ 2248.581006] PGD 23e3067 PUD 33ebfa067 PMD 33eaa5067 PTE 80000002aa9cc060
[ 2248.581006] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2248.581006] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) libcfs(OE) crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi ttm drm_kms_helper drm i2c_piix4 ata_piix virtio_console pcspkr virtio_balloon serio_raw floppy virtio_blk i2c_core libata ip_tables rpcsec_gss_krb5
[ 2248.581006] CPU: 11 PID: 13358 Comm: lctl Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.5-debug #1
[ 2248.581006] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 2248.609087] task: ffff8802b8576ec0 ti: ffff8800a44f0000 task.ti: ffff8800a44f0000
[ 2248.609087] RIP: 0010:[<ffffffffa08c21b7>]  [<ffffffffa08c21b7>] lprocfs_stats_collect+0xc7/0x140 [obdclass]
[ 2248.609087] RSP: 0018:ffff8800a44f3dc8  EFLAGS: 00010246
[ 2248.609087] RAX: 0000000000000010 RBX: ffff8800a44f3e10 RCX: 0000000000000000
[ 2248.609087] RDX: ffff8802aa9cc168 RSI: 0000000000000000 RDI: 0000000000000000
[ 2248.609087] RBP: ffff8800a44f3df0 R08: 0000000000000168 R09: 0000000000000048
[ 2248.609087] R10: 0000000000000000 R11: ffff8800a44f3c96 R12: ffff88009dc82240
[ 2248.609087] R13: 0000000000000009 R14: ffff8802a84ad000 R15: ffff8802d996f980
[ 2248.609087] FS:  00007f48ff90a740(0000) GS:ffff88033dcc0000(0000) knlGS:0000000000000000
[ 2248.609087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2248.609087] CR2: ffff8802aa9cc188 CR3: 00000000a444a000 CR4: 00000000000006e0
[ 2248.609087] Call Trace:
[ 2248.609087]  [<ffffffffa08c2aca>] lprocfs_stats_seq_show+0x4a/0x140 [obdclass]
[ 2248.609087]  [<ffffffff81234cce>] seq_read+0x26e/0x3f0
[ 2248.609087]  [<ffffffff8120d8bc>] vfs_read+0x9c/0x170
[ 2248.609087]  [<ffffffff8120e77f>] SyS_read+0x7f/0xf0
[ 2248.609087]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[ 2248.609087]  [<ffffffff81783929>] system_call_fastpath+0x16/0x1b
[ 2248.609087]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
Comment by James A Simmons [ 03/Aug/18 ]

Does Alex patch fix the issue?

Comment by Oleg Drokin [ 03/Aug/18 ]

only the first one, supposedly.

Comment by Oleg Drokin [ 07/Aug/18 ]

Here's another one I hit today:

[57676.636894] Lustre: DEBUG MARKER: == recovery-small test 57: read procfs entries causes kernel crash =================================== 03:55:27 (1533628527)
[57681.833491] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[57682.115171] BUG: unable to handle kernel NULL pointer dereference at 00000000000005c8
[57682.116054] IP: [<ffffffff81775ea8>] down_read+0x28/0x50
[57682.116054] PGD 80000000ae72b067 PUD 9c1b0067 PMD 0 
[57682.116054] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[57682.116054] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) dm_flakey dm_mod libcfs(OE) loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi ttm drm_kms_helper ata_piix i2c_piix4 drm virtio_balloon pcspkr serio_raw libata virtio_console virtio_blk i2c_core floppy ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[57682.116054] CPU: 5 PID: 8283 Comm: lctl Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.5-debug #1
[57682.116054] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[57682.116054] task: ffff8802eca1e640 ti: ffff88009f9a0000 task.ti: ffff88009f9a0000
[57682.116054] RIP: 0010:[<ffffffff81775ea8>]  [<ffffffff81775ea8>] down_read+0x28/0x50
[57682.116054] RSP: 0018:ffff88009f9a3db0  EFLAGS: 00010246
[57682.116054] RAX: 00000000000005c8 RBX: 00000000000005c8 RCX: ffff88009f9a3fd8
[57682.116054] RDX: 0000000000000000 RSI: 0000000000000015 RDI: ffffffff81aa4a9b
[57682.116054] RBP: ffff88009f9a3db8 R08: ffff8803033b9870 R09: 0000000000000000
[57682.116054] R10: 0000000000001000 R11: 0000000000000000 R12: 00000000000005c8
[57682.116054] R13: ffff880298026000 R14: 0000000000000001 R15: ffff8800af34fd80
[57682.116054] FS:  00007fd1df062740(0000) GS:ffff88033db40000(0000) knlGS:0000000000000000
[57682.116054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[57682.116054] CR2: 00000000000005c8 CR3: 000000008ccf0000 CR4: 00000000000006e0
[57682.116054] Call Trace:
[57682.116054]  [<ffffffffa07b0fc4>] active_show+0x24/0x80 [osp]
[57682.116054]  [<ffffffffa0585ff6>] lustre_attr_show+0x16/0x20 [obdclass]
[57682.116054]  [<ffffffff8129a1ec>] sysfs_kf_seq_show+0xcc/0x1e0
[57682.116054]  [<ffffffff812988f3>] kernfs_seq_show+0x23/0x30
[57682.116054]  [<ffffffff81234b75>] seq_read+0x115/0x3f0
[57682.116054]  [<ffffffff812994bd>] kernfs_fop_read+0xfd/0x170
[57682.116054]  [<ffffffff8120d8bc>] vfs_read+0x9c/0x170
[57682.116054]  [<ffffffff8120e77f>] SyS_read+0x7f/0xf0
[57682.116054]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[57682.116054]  [<ffffffff81783929>] system_call_fastpath+0x16/0x1b
[57682.116054]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160
[57682.116054] Code: 00 00 00 0f 1f 44 00 00 55 31 d2 be 15 00 00 00 48 89 e5 53 48 89 fb 48 c7 c7 9b 4a aa 81 e8 90 70 94 ff e8 6b 11 00 00 48 89 d8 <f0> 48 ff 00 79 05 e8 9d c2 c4 ff 48 83 7b 30 01 74 08 48 c7 43 
(gdb) l *(active_show+0x24)
0x19ff4 is in active_show (/home/green/git/lustre-release/lustre/osp/lproc_osp.c:60).
55						    dd_kobj);
56		struct lu_device *lu = dt2lu_dev(dt);
57		struct obd_device *obd = lu->ld_obd;
58		int rc;
59	
60		LPROCFS_CLIMP_CHECK(obd);
61		rc = sprintf(buf, "%d\n", !obd->u.cli.cl_import->imp_deactive);
62		LPROCFS_CLIMP_EXIT(obd);
63		return rc;
64	}

This code is part of https://review.whamcloud.com/32377 from James, I am not sur ewhy it was not hitting before, but it did hit twice today already.

Comment by James A Simmons [ 07/Aug/18 ]

That is strange. The point of the LPROC_CLIMP_* macros is to prevent this kind of thing.

Comment by Alex Zhuravlev [ 08/Aug/18 ]

I'm not able to reproduce the issue with the patch anymore.

Comment by Peter Jones [ 08/Aug/18 ]

Oleg

Would the combination of Alex's patch and reverting  https://review.whamcloud.com/32377  restore a steady state for you?

Peter

 

 

Comment by James A Simmons [ 08/Aug/18 ]

Reverting will not address other potential issues. The LPROC_CLIMP_* macros are used for many proc/sysfs files.

Comment by James A Simmons [ 09/Aug/18 ]

I started to do a inspect of the code and have found that in several places the cl_import is not protected by the semaphore. I suspect that the idle connection patch that landed exposed this problem.

Comment by Gerrit Updater [ 18/Aug/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32883/
Subject: LU-11175 osc: serialize access to idle_timeout vs cleanup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5874da0b670b7d48c9ddae38f2f9275db50dcbc5

Comment by Peter Jones [ 18/Aug/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:41:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.