[LU-6436] Oops in cl_glimpse_size0() Created: 06/Apr/15  Updated: 26/Mar/18  Resolved: 26/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: llite, lov

Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running racer on v2_7_51_0-32-g61787e1, something goes wrong in lov_init_raid0() then we oops in cl_glimpse_size0() due to a NULL lli_clob. Not necessarily causation.

[  706.071730] LustreError: 14972:0:(lov_object.c:212:lov_init_sub()) header@ffff8800c375dc78[0x0, 2, [0x100010000:0x2:0x0] hash]{
[  706.071732] 
[  706.074030] LustreError: 14972:0:(lov_object.c:212:lov_init_sub()) ....lovsub@ffff8800c375dd18[0]
[  706.074031] 
[  706.075835] LustreError: 14972:0:(lov_object.c:212:lov_init_sub()) ....osc@ffff8800c37ea908id: 0x0:2 idx: 1 gen: 0 kms_valid: 1 kms 27848 rc: 0 force_sync: 0 min_xid: 0 size: 27848 mtime: 1428330720 atime: 1428330721 ctime: 1428330720 blocks: 0
[  706.075837] 
[  706.079803] LustreError: 14972:0:(lov_object.c:212:lov_init_sub()) } header@ffff8800c375dc78
[  706.079805] 
[  706.081495] LustreError: 14972:0:(lov_object.c:212:lov_init_sub()) stripe 0 is already owned.
[  706.083115] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) header@ffff8800bfec7ac8[0x0, 8, [0x280000401:0x7:0x0] hash]{
[  706.083117] 
[  706.085332] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) ....vvp@ffff8800bfec7b68(0 2) inode: ffff8800b9779018 180144002291466247/41943044 100071 1 1 ffff8800bfec7b68 [0x280000401:0x7:0x0]
[  706.085334] 
[  706.088571] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) ....lov@ffff8800bfec85c8stripes: 1, valid, lsm{ffff8800d02a2230 0x0BD10BD0 1 1 0}:
[  706.088573] 
[  706.091092] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) header@ffff8800c375dc78[0x0, 2, [0x100010000:0x2:0x0] hash]{
[  706.091094] 
[  706.093361] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) ....lovsub@ffff8800c375dd18[0]
[  706.093363] 
[  706.095173] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) ....osc@ffff8800c37ea908id: 0x0:2 idx: 1 gen: 0 kms_valid: 1 kms 27848 rc: 0 force_sync: 0 min_xid: 0 size: 27848 mtime: 1428330720 atime: 1428330721 ctime: 1428330720 blocks: 0
[  706.095175] 
[  706.099079] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) } header@ffff8800c375dc78
[  706.099081] 
[  706.100801] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) 
[  706.100802] 
[  706.102227] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) } header@ffff8800bfec7ac8
[  706.102228] 
[  706.103992] LustreError: 14972:0:(lov_object.c:213:lov_init_sub()) owned.
[  706.105164] LustreError: 14972:0:(lov_object.c:214:lov_init_sub()) header@ffff8800cc490180[0x0, 1, [0x340000400:0x39:0x0]]
[  706.105166] 
[  706.107344] LustreError: 14972:0:(lov_object.c:214:lov_init_sub()) try to own.

[  706.594536] LustreError: 14972:0:(lcommon_cl.c:191:cl_file_inode_init()) Failure to initialize cl object [0x340000400:0x39:0x0]: -5
[  706.596689] LustreError: 14972:0:(llite_lib.c:2347:ll_prep_inode()) new_inode -fatal: rc -5
[  706.598419] BUG: unable to handle kernel NULL pointer dereference at (null)
[  706.599192] IP: [<ffffffffa1291b7e>] cl_object_top+0xe/0x150 [obdclass]
[  706.599192] PGD 0
[  706.599192] Oops: 0000 [#1] SMP
[  706.601440] last sysfs file: /sys/devices/system/cpu/possible
[  706.601440] CPU 1
...
[  706.614418]
[  706.614418] Pid: 15458, comm: ll_agl_14972 Not tainted 2.6.32-431.29.2.el6.lustre.x86_64 #1 Bochs Bochs
[  706.614418] RIP: 0010:[<ffffffffa1291b7e>]  [<ffffffffa1291b7e>] cl_object_top+0xe/0x150 [obdclass]
[  706.614418] RSP: 0018:ffff8800c4ab9d10  EFLAGS: 00010282
[  706.614418] RAX: ffff8800b9f248c8 RBX: ffff880100b85ed0 RCX: 0000000000000000
[  706.614418] RDX: ffff8800baddf978 RSI: ffffffffa12e5f80 RDI: 0000000000000000
[  706.614418] RBP: ffff8800c4ab9d20 R08: 0000000000000001 R09: 0000000000000001
[  706.614418] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801196f50c0
[  706.614418] R13: 0000000000000005 R14: 0000000000000000 R15: ffff8800b9f248c8
[  706.614418] FS:  0000000000000000(0000) GS:ffff88002c200000(0000) knlGS:00000000000000\
00
[  706.614418] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  706.614418] CR2: 0000000000000000 CR3: 000000011b9b6000 CR4: 00000000000006e0
[  706.614418] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  706.614418] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  706.614418] Process ll_agl_14972 (pid: 15458, threadinfo ffff8800c4ab8000, task ffff8800c0ee2480)
[  706.614418] Stack:
[  706.614418]  ffff8800c4ab9d20 ffff880100b85ed0 ffff8800c4ab9d60 ffffffffa129bfad
[  706.614418] <d> ffff8800c4ab9d60 ffff8800c4ab9d8c ffff8800bfe6e480 0000000000000001
[  706.614418] <d> 0000000000000001 0000000000000001 ffff8800c4ab9db0 ffffffffa0c0bec1
[  706.614418] Call Trace:
[  706.614418]  [<ffffffffa129bfad>] cl_io_init+0x3d/0xe0 [obdclass]
[  706.614418]  [<ffffffffa0c0bec1>] cl_glimpse_size0+0x91/0x1d0 [lustre]
[  706.614418]  [<ffffffffa0c0493a>] ll_agl_trigger+0x12a/0x4c0 [lustre]
[  706.614418]  [<ffffffffa0c05f77>] ll_agl_thread+0x187/0x4a0 [lustre]
[  706.614418]  [<ffffffff81061d90>] ? default_wake_function+0x0/0x20
[  706.614418]  [<ffffffffa0c05df0>] ? ll_agl_thread+0x0/0x4a0 [lustre]
[  706.614418]  [<ffffffff8109e856>] kthread+0x96/0xa0
[  706.614418]  [<ffffffff8100c30a>] child_rip+0xa/0x20
[  706.614418]  [<ffffffff815562e0>] ? _spin_unlock_irq+0x30/0x40
[  706.614418]  [<ffffffff8100bb10>] ? restore_args+0x0/0x30
[  706.614418]  [<ffffffff8109e7c0>] ? kthread+0x0/0xa0
[  706.614418]  [<ffffffff8100c300>] ? child_rip+0x0/0x20
[  706.614418] Code: 05 00 00 00 04 00 e8 62 76 ee ff 48 c7 c7 60 68 2e a1 e8 06 b3 ed ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 <48> 8b 07 0f 1f 80 00 00 00 00 48 89 c2 48 8b 40 50 48 85 c0 75
[  706.614418] RIP  [<ffffffffa1291b7e>] cl_object_top+0xe/0x150 [obdclass]
[  706.614418]  RSP <ffff8800c4ab9d10>
[  706.614418] CR2: 0000000000000000


 Comments   
Comment by Zhenyu Xu [ 07/Apr/15 ]

I think it is agl glimpse thread (ls -l) trying to access an outdated file whose stripe object is under another IO, we can add error handling in cl_io_get() to detect NULL cl_object and agl process can ignore such files.

Comment by Gerrit Updater [ 10/Apr/15 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/14440
Subject: LU-6436 llite: ignore glimpse outdated file
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7dd69d85fb6049585f2f4137dc5251024ce95541

Comment by Gerrit Updater [ 22/Jun/17 ]

Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: https://review.whamcloud.com/27777
Subject: LU-6436 llite: NULL pointer dereference in cl_object_top()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7f867b051f05e75583f549d4d13bf3527a051267

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27777/
Subject: LU-6436 llite: NULL pointer dereference in cl_object_top()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 13c8d5e4bebf437227d95582c36ec1567b150cac

Comment by Peter Jones [ 19/Jul/17 ]

Bobijam

Does your fix need to land too or can it be abandoned?

Peter

Comment by Joseph Gmitter (Inactive) [ 26/Mar/18 ]

Patch landed for 2.11.0 (part of tag 2.10.51) and Bobijam's fix is abandoned.

Generated at Sat Feb 10 02:00:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.