[LU-11828] PFL crashes when invariant checking is enabled Created: 26/Dec/18  Updated: 30/Jan/19  Resolved: 30/Jan/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Upstream, Lustre 2.12.0, Lustre 2.10.6
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None
Environment:

Any lustre client using PFL with lustre build with invariants enabled


Epic/Theme: patch, upstream
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

For my linux lustre client I always test with invariant checking enabled. With the addition of PFL this exposed an old bug when running the sanity-pfl.sh test. 

2018-12-25T21:12:38.969914-05:00 ninja84.ccs.ornl.gov kernel: Lustre: DEBUG MARKER: == sanity-pfl test 0: Create full components f

ile, no reused OSTs ==================================== 21:12:38 (1545790358)

2018-12-25T21:12:39.107708-05:00 ninja84.ccs.ornl.gov kernel: LustreError: 16009:0:(cl_io.c:439:cl_io_iter_fini()) ASSERTION( io->

ci_state == CIS_UNLOCKED ) failed:

2018-12-25T21:12:39.107772-05:00 ninja84.ccs.ornl.gov kernel: LustreError: 16009:0:(cl_io.c:439:cl_io_iter_fini()) LBUG

2018-12-25T21:12:39.107804-05:00 ninja84.ccs.ornl.gov kernel: Pid: 16009, comm: dd 4.20.0-rc6+ #1 SMP PREEMPT Sat Dec 15 11:22:06

EST 2018

2018-12-25T21:12:39.116789-05:00 ninja84.ccs.ornl.gov kernel: Call Trace:

2018-12-25T21:12:39.120246-05:00 ninja84.ccs.ornl.gov kernel: libcfs_call_trace+0x8b/0xc0 [libcfs]

2018-12-25T21:12:39.125936-05:00 ninja84.ccs.ornl.gov kernel: lbug_with_loc+0x41/0x90 [libcfs]

2018-12-25T21:12:39.131287-05:00 ninja84.ccs.ornl.gov kernel: cl_io_iter_fini+0x10c/0x110 [obdclass]

2018-12-25T21:12:39.137161-05:00 ninja84.ccs.ornl.gov kernel: cl_io_loop+0x46/0x220 [obdclass]

2018-12-25T21:12:39.142525-05:00 ninja84.ccs.ornl.gov kernel: cl_setattr_ost+0x1ed/0x2a0 [lustre]

2018-12-25T21:12:39.148135-05:00 ninja84.ccs.ornl.gov kernel: ll_setattr_raw+0x7b0/0x9a0 [lustre]

2018-12-25T21:12:39.153767-05:00 ninja84.ccs.ornl.gov kernel: notify_change+0x1dc/0x430

2018-12-25T21:12:39.158523-05:00 ninja84.ccs.ornl.gov kernel: do_truncate+0x72/0xc0

2018-12-25T21:12:39.162910-05:00 ninja84.ccs.ornl.gov kernel: do_sys_ftruncate+0xf5/0x160

2018-12-25T21:12:39.167853-05:00 ninja84.ccs.ornl.gov kernel: do_syscall_64+0x68/0x38f

2018-12-25T21:12:39.172474-05:00 ninja84.ccs.ornl.gov kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9

2018-12-25T21:12:39.178480-05:00 ninja84.ccs.ornl.gov kernel: 0xffffffffffffffff

2018-12-25T21:12:39.182552-05:00 ninja84.ccs.ornl.gov kernel: Kernel panic - not syncing: LBUG



 Comments   
Comment by Gerrit Updater [ 26/Dec/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33915
Subject: LU-11828 clio: fix incorrect invariant in cl_io_iter_fini()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 514cc090ebe76baf0d29966df46ab1f6c033de24

Comment by Andreas Dilger [ 22/Jan/19 ]

During review of 33915 there was confusion about the state names used by cl_io_loop() because the state namess don't match with the method names. Currently, it looks like:

cl_io_iter_init() -> CIS_IT_STARTED
cl_io_start() -> CIS_IO_GOING, but that is unrelated to "CIS_IT_STARTED"
cl_io_end() -> CIS_IO_FINISHED, also unrelated to CIS_IT_STARTED
cl_io_iter_fini() -> CIS_IT_ENDED, unrelated to cl_io_end()

This whole state machine could use some renaming to be a bit more sane. For example, it would be better to use:

cl_io_iter_init() -> CIS_IT_INITALIZED
cl_io_start() -> cl_io_begin() and use CIS_IO_BEGUN for the state
cl_io_end() -> CIS_IO_ENDED
cl_io_iter_fini() -> CIS_IT_FINALIZED
Comment by James A Simmons [ 22/Jan/19 ]

Should we open a new ticket or continue here?

Comment by Andreas Dilger [ 22/Jan/19 ]

Either is probably ok if the second patch is available soon. Otherwise it should be separated.

Comment by Gerrit Updater [ 30/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33915/
Subject: LU-11828 clio: fix incorrect invariant in cl_io_iter_fini()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8160b9bdf16cc8ed887216b0a9a83932b86d5705

Comment by Peter Jones [ 30/Jan/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:47:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.