[LU-11828] PFL crashes when invariant checking is enabled Created: 26/Dec/18 Updated: 30/Jan/19 Resolved: 30/Jan/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Upstream, Lustre 2.12.0, Lustre 2.10.6 |
| Fix Version/s: | Lustre 2.13.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James A Simmons | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Any lustre client using PFL with lustre build with invariants enabled |
||
| Epic/Theme: | patch, upstream |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
For my linux lustre client I always test with invariant checking enabled. With the addition of PFL this exposed an old bug when running the sanity-pfl.sh test. 2018-12-25T21:12:38.969914-05:00 ninja84.ccs.ornl.gov kernel: Lustre: DEBUG MARKER: == sanity-pfl test 0: Create full components f ile, no reused OSTs ==================================== 21:12:38 (1545790358) 2018-12-25T21:12:39.107708-05:00 ninja84.ccs.ornl.gov kernel: LustreError: 16009:0:(cl_io.c:439:cl_io_iter_fini()) ASSERTION( io-> ci_state == CIS_UNLOCKED ) failed: 2018-12-25T21:12:39.107772-05:00 ninja84.ccs.ornl.gov kernel: LustreError: 16009:0:(cl_io.c:439:cl_io_iter_fini()) LBUG 2018-12-25T21:12:39.107804-05:00 ninja84.ccs.ornl.gov kernel: Pid: 16009, comm: dd 4.20.0-rc6+ #1 SMP PREEMPT Sat Dec 15 11:22:06 EST 2018 2018-12-25T21:12:39.116789-05:00 ninja84.ccs.ornl.gov kernel: Call Trace: 2018-12-25T21:12:39.120246-05:00 ninja84.ccs.ornl.gov kernel: libcfs_call_trace+0x8b/0xc0 [libcfs] 2018-12-25T21:12:39.125936-05:00 ninja84.ccs.ornl.gov kernel: lbug_with_loc+0x41/0x90 [libcfs] 2018-12-25T21:12:39.131287-05:00 ninja84.ccs.ornl.gov kernel: cl_io_iter_fini+0x10c/0x110 [obdclass] 2018-12-25T21:12:39.137161-05:00 ninja84.ccs.ornl.gov kernel: cl_io_loop+0x46/0x220 [obdclass] 2018-12-25T21:12:39.142525-05:00 ninja84.ccs.ornl.gov kernel: cl_setattr_ost+0x1ed/0x2a0 [lustre] 2018-12-25T21:12:39.148135-05:00 ninja84.ccs.ornl.gov kernel: ll_setattr_raw+0x7b0/0x9a0 [lustre] 2018-12-25T21:12:39.153767-05:00 ninja84.ccs.ornl.gov kernel: notify_change+0x1dc/0x430 2018-12-25T21:12:39.158523-05:00 ninja84.ccs.ornl.gov kernel: do_truncate+0x72/0xc0 2018-12-25T21:12:39.162910-05:00 ninja84.ccs.ornl.gov kernel: do_sys_ftruncate+0xf5/0x160 2018-12-25T21:12:39.167853-05:00 ninja84.ccs.ornl.gov kernel: do_syscall_64+0x68/0x38f 2018-12-25T21:12:39.172474-05:00 ninja84.ccs.ornl.gov kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 2018-12-25T21:12:39.178480-05:00 ninja84.ccs.ornl.gov kernel: 0xffffffffffffffff 2018-12-25T21:12:39.182552-05:00 ninja84.ccs.ornl.gov kernel: Kernel panic - not syncing: LBUG |
| Comments |
| Comment by Gerrit Updater [ 26/Dec/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33915 |
| Comment by Andreas Dilger [ 22/Jan/19 ] |
|
During review of 33915 there was confusion about the state names used by cl_io_loop() because the state namess don't match with the method names. Currently, it looks like: cl_io_iter_init() -> CIS_IT_STARTED cl_io_start() -> CIS_IO_GOING, but that is unrelated to "CIS_IT_STARTED" cl_io_end() -> CIS_IO_FINISHED, also unrelated to CIS_IT_STARTED cl_io_iter_fini() -> CIS_IT_ENDED, unrelated to cl_io_end() This whole state machine could use some renaming to be a bit more sane. For example, it would be better to use: cl_io_iter_init() -> CIS_IT_INITALIZED cl_io_start() -> cl_io_begin() and use CIS_IO_BEGUN for the state cl_io_end() -> CIS_IO_ENDED cl_io_iter_fini() -> CIS_IT_FINALIZED |
| Comment by James A Simmons [ 22/Jan/19 ] |
|
Should we open a new ticket or continue here? |
| Comment by Andreas Dilger [ 22/Jan/19 ] |
|
Either is probably ok if the second patch is available soon. Otherwise it should be separated. |
| Comment by Gerrit Updater [ 30/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33915/ |
| Comment by Peter Jones [ 30/Jan/19 ] |
|
Landed for 2.13 |