[LU-1878] NULL pointer in ll_readahead() Created: 10/Sep/12  Updated: 29/May/17  Resolved: 29/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.x (1.8.0 - 1.8.5)
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Vladimir V. Saveliev Assignee: Bob Glossman (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Attachments: Text File 0001-LU-1878-llite-do-not-shrink-busy-pages.patch    
Severity: 3
Bugzilla ID: 24,376
Epic: client
Rank (Obsolete): 9764

 Description   

Kernel OOPS on a client

<1>[1740431.005226] BUG: unable to handle kernel NULL pointer dereference at (null)
<1>[1740431.008694] IP: [<ffffffffa0a9dfda>] ll_readahead+0x4a/0x1d20 [lustre]
<4>[1740431.008694] PGD 2cffabf5067 PUD 2cffabf4067 PMD 0
<0>[1740431.008694] Oops: 0000 1 SMP
<0>[1740431.008694] last sysfs file: /sys/devices/system/node/node63/meminfo
[archkdbcommon]kdb> -bt

Stack traceback for pid 839153
0xffff8acffbd72600 000000 000000 0 000 R 0xffff8acffbd72c90 *magic.exe
[<ffffffffa0a9dfda>] ll_readahead+0x4a/0x1d20 [lustre]
[<ffffffffa0aa0ebb>] ll_readpage+0x120b/0x1760 [lustre]
[<ffffffff810b5cf1>] filemap_fault+0x241/0x3d0
[<ffffffffa0aaebd7>] ll_fault+0x3b7/0xdd0 [lustre]
[<ffffffff810ceae7>] __do_fault+0x57/0x520
[<ffffffff810d3459>] handle_mm_fault+0x199/0x430
[<ffffffff8139dcaf>] do_page_fault+0x1bf/0x3e0
[<ffffffff8139b5cf>] page_fault+0x1f/0x30
[<0000000000472d64>] 0x472d64



 Comments   
Comment by Vladimir V. Saveliev [ 10/Sep/12 ]

The BUG happened because ll_readhead() was called from ll_readpage() with
mapping == NULL, so the line

inode = mapping->host
caused the oops.

At the begining of ll_readpage() page->mapping was not NULL, because the line
struct inode *inode = page->mapping->host;
worked fine. Therefore, page->mapping changed to NULL while running ll_readpage().

At first sight it looks like the page kept locked all the time at ll_readpage().
However, the following race seems to be possible:

Process 1:
ll_readpage() called via handle_mm_fault()>__do_fault()>ll_fault()->filemap_fault():
starts with locked page P and calls ll_issue_page_read() which queues the page for i/o.
Once the page is queued it can be involved into group i/o triggered by another thread.

Process 2:
obd_trigger_group_io()
triggers i/o for group of queued pages, including the page P. The P gets unlocked and uptodate as result of that.

Process 3:
shrink_slab()>llap_shrink_cache()>llap_shrink_cache_internal()
kswapd shrinks slabs and calls ll_shrink_cache() (registered lustre's slab shrinker, which tries to shrink lustre pages.
llap_shrink_cache_internal() locks the page P and calls ll_truncate_complete_page() which removes the page P from page cache (page->mapping becomes NULL).
llap_shrink_cache_internal() avoids to ll_truncate_complete_page() dirty pages, but it does not care of busy pages: the ones with increase reference count.

Process 1 continues with page->mapping == NULL and calls ll_readahead() passing NULL as 3rd argument:
ll_readpage()
...
/* We have just requested the actual page we want, see if we can tack

  • on some readahead to that page's RPC before it is sent. */
    if (ll_i2sbi(inode)->ll_ra_info.ra_max_pages_per_file)
    ll_readahead(&fd->fd_ras, exp, page->mapping, oig,
    fd->fd_flags);

ll_readahead()
{
...
inode = mapping->host; <<<== NULL pointer dereference happens here, as long as mapping is NULL
...

Dump of stacktraces provided by HLRN (https://bugzilla.lustre.org/attachment.cgi?id=33148) contains number of processes:

  • handling pagefaults of lustre pages looking like:
    0xffff8acffbd72600 839153 838417 1 343 R 0xffff8acffbd72c90 *magic.exe
    [<ffffffffa0a9dfda>] ll_readahead+0x4a/0x1d20 [lustre]
    [<ffffffffa0aa0ebb>] ll_readpage+0x120b/0x1760 [lustre]
    [<ffffffff810b5cf1>] filemap_fault+0x241/0x3d0
    [<ffffffffa0aaebd7>] ll_fault+0x3b7/0xdd0 [lustre]
    [<ffffffff810ceae7>] __do_fault+0x57/0x520
    [<ffffffff810d3459>] handle_mm_fault+0x199/0x430
    [<ffffffff8139dcaf>] do_page_fault+0x1bf/0x3e0
    [<ffffffff8139b5cf>] page_fault+0x1f/0x30
  • processes which are entered lustre shrink code trying to allocate memory and kswapd's shrinking slabs with typical trace like:
    0xffff881072408680 7239 2 0 965 D 0xffff881072408d10 kswapd56
    [<ffffffff81399094>] thread_return+0x0/0x34c
    [<ffffffff810cf589>] mapping_sleep+0x9/0x10
    [<ffffffff8139986a>] __wait_on_bit_lock+0x4a/0xa0
    [<ffffffff81399939>] out_of_line_wait_on_bit_lock+0x79/0xa0
    [<ffffffff810d2309>] unmap_mapping_range+0x149/0x310
    [<ffffffffa0aae6c7>] ll_teardown_mmaps+0x67/0x1c0 [lustre]
    [<ffffffffa0aa2072>] llap_shrink_cache+0x5a2/0xa40 [lustre]
    [<ffffffffa0a83763>] ll_shrink_cache+0x33/0x70 [lustre]
    [<ffffffff810c3f8c>] shrink_slab+0x12c/0x190
    [<ffffffff810c49d8>] balance_pgdat+0x4c8/0x7b0
    [<ffffffff810c4de9>] kswapd+0x129/0x180
    [<ffffffff81064366>] kthread+0x96/0xa0

which shows that the race described above is possible.
The proposed fix is to avoid to ll_truncate_complete_page() pages having extra reference counters.

Comment by Vladimir V. Saveliev [ 10/Sep/12 ]

llap_shrink_cache_internal(): avoid shrinking of pages which are in use

Comment by Peter Jones [ 10/Sep/12 ]

Thanks for the patch Vladimir. Could you please upload it into gerrit so that we can review and test it? There are instructions on the whamcloud wiki or I can get an engineer to step you through the process if that helps.

Comment by Vladimir V. Saveliev [ 10/Sep/12 ]

Hello, Peter

That is what I was planning to do next. From http://wiki.whamcloud.com/display/PUB/Submitting+Changes
I realised that JIRA ticket needs to be created first. So, I created
LU-1878.

I hope that my patch matches
2. Test and commit your patch locally using acceptance-small.sh and
3. The patch follows the Requirements for patch submission:
and I was going to do 4. The patch has been uploaded to Gerrit. On
that step I guess that I am to git push the patch somewhere.

If I went wrong way or there is simple way, please get someone to help
me.

TIA
Best regards,
Vladimir

Comment by Peter Jones [ 10/Sep/12 ]

Great! It sounds like you are on the right track. Once the patch is in gerrit then you should post the URL to the changeset as a comment in the JIRA ticket. Thanks!

Comment by Vladimir V. Saveliev [ 10/Sep/12 ]

http://review.whamcloud.com/3927

Comment by Vladimir V. Saveliev [ 10/Sep/12 ]

the patch is in gerrit (http://review.whamcloud.com/3927), however, I
am not sure about inspectors, please advise.

Comment by Peter Jones [ 10/Sep/12 ]

Excellent - thanks Vladimir! Don't worry about the inspectors - we'll organize that.

Bob

Could you please take care of landing this patch and check in particular its applicability to the master branch

Thanks

Peter

Comment by Vladimir V. Saveliev [ 18/Sep/12 ]

http://review.whamcloud.com/4026

this is updated version accordingly to Jinshan Xiong's inspection.

Comment by Andreas Dilger [ 29/May/17 ]

Close old ticket.

Generated at Sat Feb 10 01:20:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.