[LU-14788] sanity test_133g: crash in __proc_lnet_portal_rotor() Created: 24/Jun/21  Updated: 20/Oct/22  Resolved: 29/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Cyril Bordage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14789 sanity 133f and 133g are no longer ef... Resolved
is related to LU-13091 conf-sanity "lctl list_param" test to... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for John Hammond <jhammond@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a71ac2d1-be7a-46a7-9c39-fbd517c03cc2

test_133g failed with the following error:

trevis-40vm4 crashed during sanity test_133g
[ 8345.119546] BUG: unable to handle kernel paging request at fffffffffffffff2
[ 8345.120892] PGD 6660d067 P4D 6660d067 PUD 6660f067 PMD 0 
[ 8345.121923] Oops: 0000 [#1] SMP PTI
[ 8345.122600] CPU: 0 PID: 376752 Comm: badarea_io Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-240.22.1.el8_lustre.x86_64 #1
[ 8345.124909] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 8345.126052] RIP: 0010:strlen+0x0/0x20
[ 8345.126750] Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[ 8345.130165] RSP: 0018:ffffa4e7c47e7e20 EFLAGS: 00010246
[ 8345.131134] RAX: fffffffffffffff2 RBX: ffffffffc0c7a12d RCX: 00000000003287d1
[ 8345.132441] RDX: fffffffffffffff2 RSI: 0000000000000000 RDI: fffffffffffffff2
[ 8345.133759] RBP: 0000000000000005 R08: 000000000002f040 R09: ffffffff81c4b73c
[ 8345.135069] R10: ffffd62500916700 R11: 0000000000000000 R12: 0000000000000000
[ 8345.136388] R13: 0000000004096000 R14: ffffffffc0c6d620 R15: fffffffffffffff2
[ 8345.137692] FS:  00007fba872a5740(0000) GS:ffff8c187fc00000(0000) knlGS:0000000000000000
[ 8345.139197] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8345.140251] CR2: fffffffffffffff2 CR3: 000000001c7a2001 CR4: 00000000000606f0
[ 8345.141572] Call Trace:
[ 8345.142130]  strim+0x8/0x60
[ 8345.142851]  __proc_lnet_portal_rotor+0x171/0x3e0 [lnet]
[ 8345.143960]  lprocfs_call_handler+0x1d/0x50 [libcfs]
[ 8345.144917]  lnet_debugfs_write+0x30/0x50 [libcfs]
[ 8345.145835]  full_proxy_write+0x53/0x80
[ 8345.146621]  vfs_write+0xa5/0x1a0
[ 8345.147294]  ksys_write+0x4f/0xb0
[ 8345.147944]  do_syscall_64+0x5b/0x1a0
[ 8345.148695]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 8345.149683] RIP: 0033:0x7fba86bb28a8
[ 8345.150365] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 b5 4c 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ 8345.153801] RSP: 002b:00007ffd2d650cd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 8345.155192] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fba86bb28a8
[ 8345.156493] RDX: 0000000000000005 RSI: 0000000004096000 RDI: 0000000000000003
[ 8345.157803] RBP: 00007ffd2d650de8 R08: 00007fba86e83ce0 R09: 00007fba86e83ce0
[ 8345.159108] R10: 0000000000000005 R11: 0000000000000246 R12: 00000000004006e0
[ 8345.160410] R13: 00007ffd2d650de0 R14: 0000000000000000 R15: 0000000000000000
        buf = memdup_user_nul(buffer, nob);
        if (!buf)
                return -ENOMEM;

memdup_user_nul() returns an ERR_PTR on error. See https://www.kernel.org/doc/htmldocs/kernel-api/API-memdup-user-nul.html

The other uses of memdup_user_nul() should also be checked.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_133g - trevis-40vm4 crashed during sanity test_133g



 Comments   
Comment by John Hammond [ 24/Jun/21 ]

cbordage could you take a look at this?

Comment by Gerrit Updater [ 28/Jun/21 ]

Cyril Bordage (cbordage@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44091
Subject: LU-14788 tests: sanity test_133g crash fix
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 58a6285054e5b084b79716a1407633e395bd5a99

Comment by Gerrit Updater [ 27/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/44091/
Subject: LU-14788 lnet: check memdup_user_nul using IS_ERR
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 449d046e55a42cc4d1c4ab0217551cded1864bc4

Comment by Peter Jones [ 29/Jul/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:12:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.