[LU-6222] LustreError (statahead.c:262:sa_kill()) ASSERTION( !list_empty(&entry->se_list) ) Created: 06/Feb/15 Updated: 08/Jun/15 Resolved: 13/Feb/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andrew Zenk | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.6.93 on the clients and 2.6.92 on the servers, centos with rpms from lustre-master jenkins tree. |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 17399 |
| Description |
|
We seem to be having an issue similar to the one described in kernel:LustreError: 13007:0:(statahead.c:262:sa_kill()) ASSERTION( !list_empty(&entry->se_list) ) failed: |
| Comments |
| Comment by Jodi Levi (Inactive) [ 06/Feb/15 ] |
|
Lai, |
| Comment by Oleg Drokin [ 06/Feb/15 ] |
|
Can you please detail your reproduction steps if this is something we can easily replicate? |
| Comment by Andrew Zenk [ 06/Feb/15 ] |
|
We do not use DNE. In this case a user is running an rsync as follows "rsync --size-only --progress -av --prune-empty-dirs --include=/ --exclude=.man --exclude=_eot.txt --exclude=.MAN --exclude=_EOT.TXT --include=052903541090_01* --exclude=* /lustre_mountpoint/staging/orig/_uploads/DG_A11281 /lustre_mountpoint/somepath/northslope/" This command is run several times in a serial fashion from a single client with slight variations of the include and srcdir for each run. At a some, seemingly random point during the sequence of rsync jobs, the kernel on the client node panics. Configuration: I'm happy to supply exact specs on raid configurations and disk counts if you feel that it's important, but we'll skip that for now. The MDS is using a single target on a ssd raid10. The OSTs are sata of various types. All servers are connected to our QDR IB fabric as well as a gigabit VLAN. The latter is used for connecting 3 clients that aren't experiencing any issues. There are approximately 20 clients, which are also using centos 6.6. The lustre clients are installed via the pre-built rpms from lustre-master, just like the servers. Though the clients are of slightly mixed build versions. The two clients that we've reproduced the issue on were both running build #2835. The entire filesystem has a stripe count of 1. Let me know if you need any additional information. Thanks! |
| Comment by Lai Siyao [ 09/Feb/15 ] |
|
could you list all process backtraces in the dump? |
| Comment by Andrew Zenk [ 09/Feb/15 ] |
|
Attached output from foreach bt -l |
| Comment by Gerrit Updater [ 10/Feb/15 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/13708 |
| Comment by Lai Siyao [ 10/Feb/15 ] |
|
Andrew, I just uploaded a fix for this issue, will you apply it and test again? |
| Comment by Andrew Zenk [ 10/Feb/15 ] |
|
Thanks! We're testing it now. |
| Comment by Andrew Zenk [ 11/Feb/15 ] |
|
That seems to have fixed it. The rsync script that consistently caused the issue after a minute or two has been running flawlessly for many hours. Thanks again. |
| Comment by Gerrit Updater [ 13/Feb/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13708/ |
| Comment by Peter Jones [ 13/Feb/15 ] |
|
Landed for 2.7 |
| Comment by Gerrit Updater [ 08/Jun/15 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/15178 |