[LU-5126] Kernel crashed after debug_deamon started Created: 30/May/14  Updated: 20/Jan/17  Resolved: 16/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.4.2
Fix Version/s: Lustre 2.6.0, Lustre 2.5.3

Type: Bug Priority: Minor
Reporter: Li Xi (Inactive) Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: duu, patch
Environment:

Lustre-2.4.2


Severity: 3
Rank (Obsolete): 14143

 Description   

Following crash happened after degbu_daemon was started.

2014-05-30 16:30:19 LNetError: 2775:0/root/rpmbuild/BUILD/lustre-2.4.2/libcfs/libcfs/tracefile.c:1035:tracefiled()) ASSERTION(cfs_page_count(tage->page) > 0) failed
2014-05-30 16:30:19 Kernel panic - not syncing: Lustre debug assertion failure
2014-05-30 16:30:19
2014-05-30 16:30:19 Pid: 2775, comm: ktracefiled Not tainted 2.6.32-358.18.1.el6_lustre.x86_64 #1
2014-05-30 16:30:19 Call Trace:
2014-05-30 16:30:19 [<ffffffff8150de58>] ? panic+0xa7/0x16f
2014-05-30 16:30:19 [<ffffffff810a15fa>] ? do_gettimeofday+0x1a/0x50
2014-05-30 16:30:19 [<ffffffffa0410c14>] ? cfs_trace_assertion_failed+0x74/0x80 [libcfs]
2014-05-30 16:30:19 [<ffffffffa0412b60>] ? tracefiled+0x400/0x530 [libcfs]
2014-05-30 16:30:19 [<ffffffffa0412760>] ? tracefiled+0x0/0x530 [libcfs]
2014-05-30 16:30:19 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
2014-05-30 16:30:19 [<ffffffffa0412760>] ? tracefiled+0x0/0x530 [libcfs]
2014-05-30 16:30:19 [<ffffffffa0412760>] ? tracefiled+0x0/0x530 [libcfs]
2014-05-30 16:30:19 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
2014-05-30 16:34:33 Initializing cgroup subsys cpuset
2014-05-30 16:34:33 Initializing cgroup subsys cpu

And a same crash was reported a long time ago. https://jira.hpdd.intel.com/browse/LU-1311?focusedCommentId=35871&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35871

There was a important log there:
Apr 30 08:21:41 barry-oss4 kernel: [247839.213070] wanted to write 1008 but wrote -5



 Comments   
Comment by Li Xi (Inactive) [ 30/May/14 ]

Here is a patch which tries to fix this problem.
http://review.whamcloud.com/10524

Comment by Peter Jones [ 31/May/14 ]

Thanks Li Xi!

Emoly

Could you please help with this patch?

Thanks

Peter

Comment by Emoly Liu [ 16/Jun/14 ]

The patch landed to 2.6.

Comment by Jian Yu [ 14/Aug/14 ]

Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/11454

Comment by Gerrit Updater [ 20/Jan/17 ]

Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/25008
Subject: LU-5126 hsm: cancel HSM actions when CT unregisters
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0d7632ccfd776087258ae79482d24ba0e5c232fb

Comment by Quentin Bouget [ 20/Jan/17 ]

My bad, the patch above was meant for lu-5216 and not lu-5126, sorry.

Generated at Sat Feb 10 01:48:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.