[LU-3577] BUG: soft lockup - CPU#25 stuck for 67s! [jbd2/dm-8-8:8966]; Kernel panic - not syncing: softlockup: hung tasks Created: 11/Jul/13 Updated: 21/Mar/18 Resolved: 21/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Roger Spellman (Inactive) | Assignee: | Peter Jones |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Kernel: 2.6.32-279.19.1.el6_lustre.2.1.5_1.0.3 |
||
| Attachments: |
|
| Severity: | 2 |
| Rank (Obsolete): | 9055 |
| Description |
|
We have a Lustre 2.1.5 system with two MDSes (active / standby), and two OSSes (active / active). Each OSS has 6 OSTs. We filled the file system to 100%. To remove the files, one Lustre client ran the following script: rm -rf /mnt/hss45/ost/ost-00/* & One OSS crashed with this error: The OSS was STONITH'ed. Shortly thereafter, the second OSS got the same error: BUG: soft lockup - CPU#17 stuck for 67s! [jbd2/dm-6-8:21440] I have attached the full console output. There was nothing in /var/log/messages. |
| Comments |
| Comment by Peter Jones [ 11/Jul/13 ] |
|
Thanks for the report Roger. Given that you are running RHEL 6.4 is there any reason you chose 2.1.5 over 2.1.6? Other than rebuilding for RHEL6.4, are there any other changes you made from a standard 2.1.5? |
| Comment by Roger Spellman (Inactive) [ 11/Jul/13 ] |
|
Peter, We are using 2.1.5 because we started this project a couple of months ago (before 2.1.6 was release), and we have promised a 2.1.x release to a customer pretty soon. We are pretty far into our QA cycle, so switching Lustre versions right now would set us back a bit. We will go to 2.1.6 very soon. But, if you say that this is a known bug in 2.1.5 that is fixed in 2.1.6, that will push us to 2.1.6 even sooner. We make changes to configure scripts and Makefiles, so that we can build on our build machine. We make some minor functional changes to the code (we made them some time ago, in earlier releases). Here are the patches that are functional changes. diff -rcN -x '~' -x '.orig' /build/lustre/lustre-2.1.5/lustre/ldlm/ldlm_pool.c 2.1.5/trunk/lustre-working_lustre.patch/lustre/ldlm/ldlm_pool.c
/*
/*
— 170,179 ---- ***************
+ /* Terascala */
LASSERT ((lsm != NULL) == ((body->valid & OBD_MD_FLEASIZE) != 0)); ***************
Hope this helps. |