[LU-1714] crash upon loading libcfs module Created: 07/Aug/12 Updated: 04/Feb/15 Resolved: 20/Feb/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.1.5 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alexander Boyko | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
debug kernel & lustre with |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 8136 | ||||||||||||||||
| Description |
RIP: 0010:[<ffffffffa06a2807>] [<ffffffffa06a2807>] cfs_crypto_hash_digest+0xf6/0x166 [libcfs] RSP: 0018:ffff88007c5fbda8 EFLAGS: 00010293 RAX: ffffea0001a55e00 RBX: ffff880078640000 RCX: 0000000087654321 RDX: 0000000000000000 RSI: ffffffffa06c6398 RDI: ffff880078640000 RBP: ffff88007c5fbe28 R08: 0000000000000000 R09: d84156c5635688c0 R10: d84156c5635688c0 R11: 0000000000000020 R12: 0000000000000000 R13: ffff88007c5fbe78 R14: 0000000000020000 R15: ffff88007c5fbe74 FS: 00007f2ef3eb6700(0000) GS:ffff880005000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f4777fa5000 CR3: 000000007c62c000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process insmod (pid: 8489, threadinfo ffff88007c5fa000, task ffff88007c6f00c0) Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 <0> 0000000000000000 0000000000000000 ffff88007c321988 0000000000000000 <0> 00007f2ef1eee010 ffffffffa06c6390 ffff88007c5fbe28 0000000000000000 Call Trace: [<ffffffffa06a29c8>] cfs_crypto_register+0x151/0x35a [libcfs] [<ffffffff8151cbcb>] ? _spin_unlock+0x2b/0x40 [<ffffffffa06b3120>] ? cfs_wi_sched_create+0x3bf/0x4d4 [libcfs] [<ffffffffa06a5d34>] ? init_libcfs_module+0x0/0x377 [libcfs] [<ffffffffa06a5d34>] ? init_libcfs_module+0x0/0x377 [libcfs] [<ffffffffa06a5f5f>] init_libcfs_module+0x22b/0x377 [libcfs] [<ffffffff8100204c>] do_one_initcall+0x3c/0x1d0 [<ffffffff810c1333>] sys_init_module+0xe3/0x260 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b Code: c1 e8 0c 48 8d 0c c5 00 00 00 00 48 c1 e0 06 48 29 c8 48 b9 00 00 00 00 00 ea ff ff 48 8d 04 08 b9 21 43 65 87 48 39 4d 80 74 04 <0f> 0b eb fe f6 c2 01 74 04 0f 0b eb fe 83 e2 03 48 09 c2 48 89 RIP [<ffffffffa06a2807>] cfs_crypto_hash_digest+0xf6/0x166 [libcfs] RSP <ffff88007c5fbda8> ---[ end trace 90f8b1994767d769 ]--- Kernel panic - not syncing: Fatal exception Pid: 8489, comm: insmod Tainted: G D ---------------- 2.6.32-220.7.1.el6.lustre.2015.x86_64.debug #1 Call Trace: [<ffffffff81518f20>] ? panic+0x78/0x148 [<ffffffff8151e564>] ? oops_end+0xe4/0x100 [<ffffffff8100f3bb>] ? die+0x5b/0x90 [<ffffffff8151dc54>] ? do_trap+0xc4/0x160 [<ffffffff8100d005>] ? do_invalid_op+0xa5/0xb0 [<ffffffff8100cff5>] ? do_invalid_op+0x95/0xb0 [<ffffffffa06a2807>] ? cfs_crypto_hash_digest+0xf6/0x166 [libcfs] [<ffffffff8151c661>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff8100bf9b>] ? invalid_op+0x1b/0x20 [<ffffffffa06a2807>] ? cfs_crypto_hash_digest+0xf6/0x166 [libcfs] [<ffffffffa06a27d7>] ? cfs_crypto_hash_digest+0xc6/0x166 [libcfs] [<ffffffffa06a29c8>] ? cfs_crypto_register+0x151/0x35a [libcfs] [<ffffffff8151cbcb>] ? _spin_unlock+0x2b/0x40 [<ffffffffa06b3120>] ? cfs_wi_sched_create+0x3bf/0x4d4 [libcfs] [<ffffffffa06a5d34>] ? init_libcfs_module+0x0/0x377 [libcfs] [<ffffffffa06a5d34>] ? init_libcfs_module+0x0/0x377 [libcfs] [<ffffffffa06a5f5f>] ? init_libcfs_module+0x22b/0x377 [libcfs] [<ffffffff8100204c>] ? do_one_initcall+0x3c/0x1d0 [<ffffffff810c1333>] ? sys_init_module+0xe3/0x260 [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Alexander Boyko [ 07/Aug/12 ] |
| Comment by Andreas Dilger [ 07/Aug/12 ] |
|
This is the same as |
| Comment by Alexander Boyko [ 07/Aug/12 ] |
|
Sure, check for sg_init_table looks the same. |
| Comment by Andreas Dilger [ 07/Aug/12 ] |
|
OK, looking more closely, this has the same cause as Alexander and Prakash, could you please get together to make a single common patch that fixes these issues, or at least have a single autoconf check. I prefer the approach in http://review.whamcloud.com/3423 that has a wrapper that can be used in all the code, instead of #ifdefs inline in the code as in http://review.whamcloud.com/3552. |
| Comment by Prakash Surya (Inactive) [ 07/Aug/12 ] |
|
Sure, I'm open to merging the two or landing one and reworking the other. It's probably a good idea to scan the rest of the code for areas which need similar fixes as well. |
| Comment by Alexander Boyko [ 17/Aug/12 ] |
|
Prakash Surya, how about merging? Or you did not do it? |
| Comment by Prakash Surya (Inactive) [ 17/Aug/12 ] |
|
Alexander, sure I'll merge your fix into mine. I'll update this ticket when I push the new version and add you to the review list. I should be able to do that later today. |
| Comment by Prakash Surya (Inactive) [ 17/Aug/12 ] |
|
Some other places which may need similar fixes: lnet/klnds/o2iblnd/o2iblnd_cb.c:730 lnet/klnds/o2iblnd/o2iblnd_cb.c:773 lustre/obdclass/capa.c:266 lustre/obdclass/capa.c:308 lustre/obdclass/capa.c:311 lustre/obdclass/capa.c:261 lustre/obdclass/capa.c:364 Those locations all make a direct call to sg_set_page which might need to change to sg_init_one. |
| Comment by Prakash Surya (Inactive) [ 17/Aug/12 ] |
|
Please see: http://review.whamcloud.com/3709 |
| Comment by Alexander Boyko [ 17/Aug/12 ] |
|
I think it`s better to add something like this to libcfs headers. #ifndef HAVE_SCATTERLIST_INITTABLE #define sg_init_table(sg, nents) memset(sg, 0, sizeof(*(sg))*(nents)) #endif #ifndef HAVE_SCATTERLIST_SETPAGE sg_set_page(sg, p, len, off) sg_set_buf(sg, page_address(p) + (off & ~CFS_PAGE_MASK), len) #endif Or maybe cfs_ prefix for this functions. |
| Comment by Prakash Surya (Inactive) [ 17/Aug/12 ] |
|
I decided to drop the "cfs_" prefix since there are other "sg_" calls in the code which do not use any "cfs_" wrappers. So to be consistent with those calls, I think it's best to leave them with the "sg_" prefix. |
| Comment by Peter Jones [ 04/Sep/12 ] |
|
Yujian Could you please help out with this one? Thanks Peter |
| Comment by Lustre Bull [ 19/Oct/12 ] |
|
Prakash,
Index: b/lustre/obdclass/capa.c
===================================================================
--- a/lustre/obdclass/capa.c
+++ b/lustre/obdclass/capa.c
@@ -265,6 +265,7 @@ int capa_hmac(__u8 *hmac, struct lustre_
}
keylen = alg->ha_keylen;
+ sg_init_table(&sl, 1);
sg_set_page(&sl, virt_to_page(capa),
offsetof(struct lustre_capa, lc_hmac),
(unsigned long)(capa) % CFS_PAGE_SIZE);
@@ -306,9 +307,11 @@ int capa_encrypt_id(__u32 *d, __u32 *s,
GOTO(out, rc);
}
+ sg_init_table(&sd, 1);
sg_set_page(&sd, virt_to_page(d), 16,
(unsigned long)(d) % CFS_PAGE_SIZE);
+ sg_init_table(&ss, 1);
sg_set_page(&ss, virt_to_page(s), 16,
(unsigned long)(s) % CFS_PAGE_SIZE);
desc.tfm = tfm;
@@ -358,9 +361,11 @@ int capa_decrypt_id(__u32 *d, __u32 *s,
GOTO(out, rc);
}
+ sg_init_table(&sd, 1);
sg_set_page(&sd, virt_to_page(d), 16,
(unsigned long)(d) % CFS_PAGE_SIZE);
+ sg_init_table(&ss, 1);
sg_set_page(&ss, virt_to_page(s), 16,
(unsigned long)(s) % CFS_PAGE_SIZE);
You have also added the following new definition for sg_set_page() in libcfs/include/libcfs/linux/libcfs.h
#ifndef HAVE_SCATTERLIST_SETPAGE
#define sg_set_page(sg, p, len, off) \
sg_set_buf(sg, page_address(p) + ((off) & ~CFS_PAGE_MASK), len)
#endif
This file also includes libcfs/include/libcfs/linux/kp30.h in which there was already an older definition for sg_set_page()
#ifndef HAVE_SCATTERLIST_SETPAGE
static inline void sg_set_page(struct scatterlist *sg, struct page *page,
unsigned int len, unsigned int offset)
{
sg->page = page;
sg->offset = offset;
sg->length = len;
}
#endif
I think this older definition should be removed. |
| Comment by Nathan Rutman [ 21/Nov/12 ] |
|
Xyratex-bug-id: MRP-611 |
| Comment by Emoly Liu [ 18/Feb/13 ] |
|
b2_1 port is at http://review.whamcloud.com/5452 |
| Comment by Peter Jones [ 20/Feb/13 ] |
|
The existing patches have landed for 2.1.5 and 2.4. Please open a new ticket to track any further changes needed in this area of code. |