[LU-6173] CPU stalled with obd_zombid running Created: 28/Jan/15 Updated: 14/Jun/18 Resolved: 25/May/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.4.3, Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jay Lan (Inactive) | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Git repo can be found at https://github.com/jlan/lustre-nas |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 17274 | ||||
| Description |
|
Yesterday experienced a network problem. Consequently, we had a number of clients stalled. At least four were hanged in this situation. We captured a vmcore on one of the systems. Console logs showed one of the CPUs was detected to stall: All CPU's at r305i7n2 except CPU 9 were running migration process and The stack trace is: PID: 5070 TASK: ffff88046f086300 CPU: 9 COMMAND: "obd_zombid" |
| Comments |
| Comment by Peter Jones [ 29/Jan/15 ] |
|
Emoly Could you please advise? Peter |
| Comment by Emoly Liu [ 03/Feb/15 ] |
|
Jay, could you please upload the full vmcore file for further investigation? Thanks. |
| Comment by Emoly Liu [ 03/Feb/15 ] |
|
BTW, do you have the dmesg log? I want to know what happened during the network problem. |
| Comment by Jay Lan (Inactive) [ 03/Feb/15 ] |
|
File r305i7n2-20150128.bz2 contains the console log of the system when the system getting into trouble. Since stack traces of all CPU's were printed every 10 minutes, the network errors messages were flushed out from the demsg buffer, so I attached the console log here instead. User ID in the log has been replaced with 'xxx'. Vmcore can only be seen by US citizen. I can send crash analysis information to you if that is OK. Otherwise, I will consult with other guys on how to send you encrypted vmcore. Please advise. |
| Comment by Emoly Liu [ 05/Feb/15 ] |
|
Jay, I am not a US citizen. You can send the crash analysis information to me first. If necessary, I will ask for my other US citizen colleague to help. |
| Comment by Jay Lan (Inactive) [ 05/Feb/15 ] |
|
This tarball contains output of 'bt -a', 'ps -a', 'kmem -i' and 'kmem -s'. Let me know if you want me to provide you output of other crash commands. |
| Comment by Oleg Drokin [ 05/Feb/15 ] |
|
Hm, this really reminds me of |
| Comment by Jay Lan (Inactive) [ 05/Feb/15 ] |
|
Slightly different. |
| Comment by Oleg Drokin [ 06/Feb/15 ] |
|
Well, you see - I hit kernel panic in And that's why I think it's something very similar. |
| Comment by Emoly Liu [ 06/Feb/15 ] |
|
I ever suspected |
| Comment by Oleg Drokin [ 06/Feb/15 ] |
|
Yes, it does need more investigation, no question about that. |
| Comment by Jay Lan (Inactive) [ 06/Feb/15 ] |
|
Hi Oleg, The lustre client debuginfo rpm has been uploaded to ftp.whamcloud.com. I attached ". |
| Comment by Oleg Drokin [ 10/Feb/15 ] |
|
So, poking around in the crashdump, it looks like it is indeed something very similar to So, examining the disconnect code, it looks like client_common_put_super assumes the mere call to obd_disconnect(sbi->ll_dt_exp); just marks the import disconnected, but if there are any requests in flight (highly likely if you have a broken connection and requests take seconds to timeout), then the actual final import put would not happen until this last request is finished (every request holds an import reference), and only then the final class_import_put() would happen that would call obd_zombie_import_add() increasing the zombie task list count and would stall obd_zombie_barrier(). So the "fix" for Actually I guess that would lead to unmount hanging until all requests finish processing which might not be ideal either in the face of broken connection, so potentially sbi freeing could be made asynchronous too. |
| Comment by Niu Yawei (Inactive) [ 10/Feb/15 ] |
Will the inflight RPC hold the OSC export refcount as well? I was thinking that obd_disconnect() in client_common_put_super() shall put the last refcount of OSC export and make the umount wait in obd_zombie_barrier(). |
| Comment by Oleg Drokin [ 10/Feb/15 ] |
|
Niu: It's right in the __ptlrpc_request_alloc(): request->rq_import = class_import_get(imp); and the import stays put until all requests are drained, which might take awhile if the requests are stuck on the network. |
| Comment by Gerrit Updater [ 11/Feb/15 ] |
|
Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/13727 |
| Comment by Emoly Liu [ 11/Feb/15 ] |
|
Thanks for Niu&Oleg's help! I pushed a patch for b2_4 for review. |
| Comment by Peter Jones [ 11/Feb/15 ] |
|
Emoly Is this patch also required for master/b2_5? Peter |
| Comment by Emoly Liu [ 12/Feb/15 ] |
|
Peter, yes both master and b2_5 need the patch. I will create one for master later. |
| Comment by Gerrit Updater [ 12/Feb/15 ] |
|
Emoly Liu (emoly.liu@intel.com) uploaded a new patch: http://review.whamcloud.com/13746 |
| Comment by Gerrit Updater [ 03/Mar/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13746/ |
| Comment by Peter Jones [ 25/May/15 ] |
|
Landed for 2.8 |
| Comment by Jay Lan (Inactive) [ 26/May/15 ] |
|
Could you provide a 2.5 back port? Thanks! |
| Comment by Peter Jones [ 27/May/15 ] |
|
Yes this is being worked on |