[LU-9579]  LBUG: (osc_page.c:433:osc_page_init()) ASSERTION( result == 0 ) Created: 01/Jun/17  Updated: 04/Aug/17  Resolved: 13/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Alexander Boyko Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Epic/Theme: patch
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Lustre client lbugs in osc_page_init when job processes are killed due to cgroup being out of memory. This LBUG occurred on 14 nodes during a recent relrun.

> 2017-05-22T16:41:07.320315-05:00 c0-0c1s6n0 LustreError: 15485:0:(osc_page.c:433:osc_page_init()) ASSERTION( result == 0 ) failed:
> 2017-05-22T16:41:07.320393-05:00 c0-0c1s6n0 Killed process 15246 (namu.exe.6GB_pe) apid 471027 total-vm:8968944kB, anon-rss:5203772kB, file-rss:12kB, shmem-rss:1828kB
> 2017-05-22T16:41:07.320398-05:00 c0-0c1s6n0 Memory cgroup out of memory: Killed 15 processes sharing cpu group with pid 15246.
> 2017-05-22T16:41:07.320404-05:00 c0-0c1s6n0 LustreError: 15485:0:(osc_page.c:433:osc_page_init()) LBUG
> 2017-05-22T16:41:07.320409-05:00 c0-0c1s6n0 Pid: 15485, comm: namu.exe.6GB_pe

> PID: 15485  TASK: ffff8816cf566980  CPU: 55  COMMAND: "namu.exe.6GB_pe"
>  #0 [ffff8816cf56b908] panic at ffffffff8114670e
>  #1 [ffff8816cf56b980] lbug_with_loc at ffffffffa026aead [libcfs]
>  #2 [ffff8816cf56b9a0] osc_page_init at ffffffffa09f9e12 [osc]
>  #3 [ffff8816cf56b9e0] lov_page_init_raid0 at ffffffffa084199b [lov]
>  #4 [ffff8816cf56ba38] lov_page_init at ffffffffa083a34c [lov]
>  #5 [ffff8816cf56ba48] cl_page_alloc at ffffffffa0559bf2 [obdclass]
>  #6 [ffff8816cf56ba88] cl_page_find at ffffffffa0559e1f [obdclass]
>  #7 [ffff8816cf56bad8] ll_readpage at ffffffffa09031c9 [lustre]
>  #8 [ffff8816cf56bbe8] filemap_fault at ffffffff8114b5db
>  #9 [ffff8816cf56bc58] vvp_io_fault_start at ffffffffa093258e [lustre]
> #10 [ffff8816cf56bcc8] cl_io_start at ffffffffa055cfae [obdclass]
> #11 [ffff8816cf56bcf0] cl_io_loop at ffffffffa056036e [obdclass]
> #12 [ffff8816cf56bd20] ll_fault at ffffffffa09137e4 [lustre]
> #13 [ffff8816cf56bd98] __do_fault at ffffffff81175abe
> #14 [ffff8816cf56be00] handle_mm_fault at ffffffff81179528
> #15 [ffff8816cf56bee0] __do_page_fault at ffffffff81048de9
> #16 [ffff8816cf56bf40] do_page_fault at ffffffff8104904c
> #17 [ffff8816cf56bf50] page_fault at ffffffff81506a62
>     RIP: 0000000000415702  RSP: 00002aab20a00480  RFLAGS: 00010202
>     RAX: 0000000000000280  RBX: 000000000000104b  RCX: 0000000000005008
>     RDX: 0000000000000f02  RSI: 000000000517f258  RDI: 0000000000000280
>     RBP: 00002aab20a00670   R8: 0000000106ec9118   R9: 0000000000000781
>     R10: 0000000000000280  R11: 0000000101d49ec0  R12: 000000000cdabab8
>     R13: 0000000000000280  R14: 00000000000013c2  R15: 0000000007c2c860
>     ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b


 Comments   
Comment by Gerrit Updater [ 01/Jun/17 ]

Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: https://review.whamcloud.com/27372
Subject: LU-9579 osc: adds radix_tree_preload
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 12fb2801fa8d225d61d98c8b6023c161005a011c

Comment by Gerrit Updater [ 13/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27372/
Subject: LU-9579 osc: adds radix_tree_preload
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 39ea3031dc6794cf8eb7e183a4412b124289c112

Comment by Peter Jones [ 13/Jun/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:27:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.