[LUDOC-7] character set problems in HTML manual Created: 06/Jul/11 Updated: 13/Feb/14 Resolved: 08/Oct/13 |
|
| Status: | Closed |
| Project: | Lustre Documentation |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Richard Henwood (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | QInfrastructure | ||
| Environment: |
Mac OS/X 10.6.6 + Firefox 3.6.17 |
||
| Issue Links: |
|
||||||||||||
| Business Value: | 1 | ||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 7203 | ||||||||||||
| Description |
|
There are a large number of "unknown" characters in the HTML version of the Lustre manual. On my system they appear as black diamonds with a question mark like '�'. For example, the copyright (C) character right at the start of the manual, and all of the accented characters in the Oracle boilerplate are shown this way, along with the hard-space (I guess) character in every section and subsection title is shown this way. |
| Comments |
| Comment by Richard Henwood (Inactive) [ 06/Jul/11 ] |
|
It seems this might be an encoding issue. The xml is currently specified as 'UTF-8'. UTF-8 is not a good choice. I propose: ISO-8859-1. |
| Comment by Richard Henwood (Inactive) [ 06/Jul/11 ] |
|
More detail is now available: xsltproc produces HTML output with a directive: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> When you point your browser at Jenkins for the HTML build, Jenkins tells the browser that the file is encoded: UTF-8. Testing with my browser, it seems to take the UTF-8 encoding as correct, and shows the '�' characters. I can change the encoding manually in the browser and the document renders fine. xsltproc with the docbook.xsl specifies ISO-8859-1 encoding because it does it desires ISO-8859-1 for it's output. For example, the 'copyright � date DDDD' lines are generated from <copyright> and <date> docbook elements. A work-around is to manually specify encoding = ISO-8859-1 in your browser for the manual. Two alternative solutions may also be possible:
|
| Comment by Jessica A. Popp (Inactive) [ 20/Dec/11 ] |
|
Would this be something for Joshua to help resolve? |
| Comment by Andreas Dilger [ 20/Dec/11 ] |
|
Yes, Joshua could probably understand what is going on here a lot faster than I could. |
| Comment by Richard Henwood (Inactive) [ 31/Jan/12 ] |
|
one addition piece of information I've just noticed: lustre_manual.diff.html <- encoding appears correct |
| Comment by Andreas Dilger [ 23/Apr/13 ] |
|
It looks like the .xhtml version of the manual does not have this problem. Is that just because of the filename extension is not .html? |
| Comment by Andreas Dilger [ 23/Apr/13 ] |
|
I just found that there are HTML codes for these:
|
| Comment by Richard Henwood (Inactive) [ 23/Apr/13 ] |
|
I believe the reason the 'x' makes a difference to the rendering is because the webserver that Jira uses is not very clever... I've discussed this in: https://jira.hpdd.intel.com/browse/LUDOC-7?focusedCommentId=17320&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17320 |
| Comment by Richard Henwood (Inactive) [ 02/May/13 ] |
|
The config of the webserver is both:
|
| Comment by Andreas Dilger [ 24/Sep/13 ] |
|
http://review.whamcloud.com/7739
|
| Comment by Andreas Dilger [ 24/Sep/13 ] |
|
Sadly, this doesn't work:
|
| Comment by Richard Henwood (Inactive) [ 08/Oct/13 ] |
|
We won't be fixing this. The xhtml version renders fine - and that version is the one that is most commonly linked to. |