[LU-10232] kernel BUG at cl_object.c:206! Created: 12/Nov/17  Updated: 04/Jan/18  Resolved: 17/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.3

Type: Bug Priority: Major
Reporter: Mikhail Pershin Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9490 MPI-IO Lustre ADIO driver gets Lustre... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

this bug was seen during Oleg tests:

 

[ 2954.010902] ------------[ cut here ]------------
[ 2954.011604] kernel BUG at /home/green/git/lustre-release/lustre/obdclass/cl_object.c:206!
[ 2954.012833] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2954.013401] Modules linked in: loop lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) mbcache lquota(OE) lfsck(OE) jbd2 obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 syscopyarea sysfillrect sysimgblt ata_generic pata_acpi ttm drm_kms_helper i2c_piix4 drm ata_piix pcspkr serio_raw i2c_core virtio_balloon virtio_console libata virtio_blk floppy nfsd ip_tables
[ 2954.018129] CPU: 1 PID: 20034 Comm: lfs Tainted: G OE ------------ 3.10.0-debug #1
[ 2954.019284] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2954.019816] task: ffff88001cb74b80 ti: ffff880004108000 task.ti: ffff880004108000
[ 2954.020797] RIP: 0010:[<ffffffffa037e281>] [<ffffffffa037e281>] cl_object_attr_get+0x141/0x150 [obdclass]
[ 2954.021833] RSP: 0018:ffff88000410bbf8 EFLAGS: 00010246
[ 2954.022354] RAX: 0000000000000000 RBX: ffff880008d97fa0 RCX: ffff880008d97f48
[ 2954.022883] RDX: 0000000000000000 RSI: ffff880008d97fa0 RDI: ffff880008d97fa0
[ 2954.023432] RBP: ffff88000410bc18 R08: 0000000000000008 R09: 00000000000000d8
[ 2954.023985] R10: ffff88005356cf00 R11: 0000000000000000 R12: ffff88005356cf00
[ 2954.024533] R13: ffff880007698f68 R14: ffff88000410bc60 R15: ffff88006b7e7ed0
[ 2954.025093] FS: 00007f648fe5f740(0000) GS:ffff8800bc680000(0000) knlGS:0000000000000000
[ 2954.026098] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2954.026624] CR2: 00007f9901ead10c CR3: 000000008dfbe000 CR4: 00000000000006e0
[ 2954.027184] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2954.027741] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2954.028294] Stack:
[ 2954.028766] 0000000000000100 ffff88005356cf00 0000000000a8f060 ffff88005356cf00
[ 2954.029782] ffff88000410bcd8 ffffffffa08c185c 0000000000000020 ffff880007698f68
[ 2954.030815] 0000000000000020 000000000bd10bd0 0000000000000000 0000000000000000
[ 2954.031811] Call Trace:
[ 2954.032293] [<ffffffffa08c185c>] lov_getstripe+0x79c/0x940 [lov]
[ 2954.033778] [<ffffffffa08bfb4f>] lov_object_getstripe+0x6f/0x180 [lov]
[ 2954.034366] [<ffffffffa037df4b>] cl_object_getstripe+0x6b/0x130 [obdclass]
[ 2954.034953] [<ffffffffa0e1ad80>] ll_file_getstripe+0x70/0x170 [lustre]
[ 2954.046852] [<ffffffffa0e2d322>] ll_lov_setstripe+0x332/0x380 [lustre]
[ 2954.047402] [<ffffffffa0e2ecbe>] ll_file_ioctl+0x116e/0x35f0 [lustre]
[ 2954.047953] [<ffffffff810646c5>] ? kernel_map_pages+0xb5/0x120
[ 2954.048485] [<ffffffff81201985>] do_vfs_ioctl+0x305/0x520
[ 2954.049018] [<ffffffff817063f7>] ? _raw_spin_unlock+0x27/0x40
[ 2954.049546] [<ffffffff81201c41>] SyS_ioctl+0xa1/0xc0
[ 2954.050068] [<ffffffff8170fc89>] system_call_fastpath+0x16/0x1b
[ 2954.050599] Code: 3a a0 c7 05 32 7a 07 00 cf 00 00 00 48 c7 05 33 7a 07 00 00 00 00 00 c7 05 21 7a 07 00 01 00 00 00 e8 d4 89 e5 ff e9 0a ff ff ff <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[ 2954.052727] RIP [<ffffffffa037e281>] cl_object_attr_get+0x141/0x150 [obdclass]
[ 2954.053750] RSP <ffff88000410bbf8>

Quick review show that cl_object.c at line 206:

int cl_object_attr_get(const struct lu_env env, struct cl_object obj,
                        struct cl_attr *attr)
{
        struct lu_object_header *top;
        int result;

        assert_spin_locked(cl_object_attr_guard(obj));

and in caller it misses cl_object_attr_lock/unlock pair:

lov_getstrpe()
...
			cl_obj = cl_object_top(&obj->lo_cl);
			cl_object_attr_get(env, cl_obj, &attr);



 Comments   
Comment by Mikhail Pershin [ 12/Nov/17 ]

that code was introduced in LU-9490

Comment by Gerrit Updater [ 12/Nov/17 ]

Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/30052
Subject: LU-10232 lov: call cl_object_attr_get under cl_attr lock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 68c3e1c00c639287f49eb6085f557eed5c922c6c

Comment by Gerrit Updater [ 17/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30052/
Subject: LU-10232 lov: call cl_object_attr_get under cl_attr lock
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 80515fa15ee76fb0174fd3be80c4a113a8d3c875

Comment by Peter Jones [ 17/Dec/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 18/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30586
Subject: LU-10232 lov: call cl_object_attr_get under cl_attr lock
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 586b1b749fc86241ad00884bf57c6868e24990c1

Comment by Gerrit Updater [ 04/Jan/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30586/
Subject: LU-10232 lov: call cl_object_attr_get under cl_attr lock
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 9dd706edb71feeea4670360c99853e916ff8c81a

Generated at Sat Feb 10 02:33:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.