[LUDOC-7] character set problems in HTML manual - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Won't Fix
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- QInfrastructure
Environment:
Mac OS/X 10.6.6 + Firefox 3.6.17
Fedora 13 + Firefox 3.5.15

Business Value:
1
Severity:
3
Rank (Obsolete):
7203

Description

There are a large number of "unknown" characters in the HTML version of the Lustre manual. On my system they appear as black diamonds with a question mark like '�'.

For example, the copyright (C) character right at the start of the manual, and all of the accented characters in the Oracle boilerplate are shown this way, along with the hard-space (I guess) character in every section and subsection title is shown this way.

Attachments

Issue Links

is related to

LUDOC-192 Lustre manual doesn't open in some browsers

Closed

LUDOC-217 Lustre manual should be indexed by google

Closed

Activity

[LUDOC-7] character set problems in HTML manual

Richard Henwood (Inactive) added a comment - 23/Apr/13 9:06 PM

I believe the reason the 'x' makes a difference to the rendering is because the webserver that Jira uses is not very clever... I've discussed this in: https://jira.hpdd.intel.com/browse/LUDOC-7?focusedCommentId=17320&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17320

Richard Henwood (Inactive) added a comment - 23/Apr/13 9:06 PM I believe the reason the 'x' makes a difference to the rendering is because the webserver that Jira uses is not very clever... I've discussed this in: https://jira.hpdd.intel.com/browse/LUDOC-7?focusedCommentId=17320&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17320

Andreas Dilger added a comment - 23/Apr/13 6:51 PM - edited

I just found that there are HTML codes for these:

copyright = & copy;
trademark = & trade;
accented characters - see http://symbolcodes.tlt.psu.edu/web/codehtml.html#accent

Andreas Dilger added a comment - 23/Apr/13 6:51 PM - edited I just found that there are HTML codes for these: copyright = & copy; trademark = & trade; accented characters - see http://symbolcodes.tlt.psu.edu/web/codehtml.html#accent

Andreas Dilger added a comment - 23/Apr/13 6:46 PM

It looks like the .xhtml version of the manual does not have this problem. Is that just because of the filename extension is not .html?

Andreas Dilger added a comment - 23/Apr/13 6:46 PM It looks like the .xhtml version of the manual does not have this problem. Is that just because of the filename extension is not .html?

Richard Henwood (Inactive) added a comment - 31/Jan/12 1:34 PM

one addition piece of information I've just noticed:

lustre_manual.diff.html <- encoding appears correct
lustre_manual.html <- encoding appears incorrect

Richard Henwood (Inactive) added a comment - 31/Jan/12 1:34 PM one addition piece of information I've just noticed: lustre_manual.diff.html <- encoding appears correct lustre_manual.html <- encoding appears incorrect

Andreas Dilger added a comment - 20/Dec/11 7:23 PM

Yes, Joshua could probably understand what is going on here a lot faster than I could.

Andreas Dilger added a comment - 20/Dec/11 7:23 PM Yes, Joshua could probably understand what is going on here a lot faster than I could.

Jessica A. Popp (Inactive) added a comment - 20/Dec/11 6:24 PM

Would this be something for Joshua to help resolve?

Jessica A. Popp (Inactive) added a comment - 20/Dec/11 6:24 PM Would this be something for Joshua to help resolve?

Richard Henwood (Inactive) added a comment - 06/Jul/11 3:24 PM

More detail is now available:

xsltproc produces HTML output with a directive:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

When you point your browser at Jenkins for the HTML build, Jenkins tells the browser that the file is encoded: UTF-8.

Testing with my browser, it seems to take the UTF-8 encoding as correct, and shows the '�' characters. I can change the encoding manually in the browser and the document renders fine.

xsltproc with the docbook.xsl specifies ISO-8859-1 encoding because it does it desires ISO-8859-1 for it's output. For example, the 'copyright � date DDDD' lines are generated from <copyright> and <date> docbook elements.

A work-around is to manually specify encoding = ISO-8859-1 in your browser for the manual.

Two alternative solutions may also be possible:

Tell Jenkins to serve HTML content with ISO-8859-1
MORE SPECULATIVE: Create our own xsl to handle generating of the copyright bit at the beginning and convert all '�' chars to html entities.

Richard Henwood (Inactive) added a comment - 06/Jul/11 3:24 PM More detail is now available: xsltproc produces HTML output with a directive: <meta http-equiv= "Content-Type" content= "text/html; charset=ISO-8859-1" > When you point your browser at Jenkins for the HTML build, Jenkins tells the browser that the file is encoded: UTF-8. Testing with my browser, it seems to take the UTF-8 encoding as correct, and shows the '�' characters. I can change the encoding manually in the browser and the document renders fine. xsltproc with the docbook.xsl specifies ISO-8859-1 encoding because it does it desires ISO-8859-1 for it's output. For example, the 'copyright � date DDDD' lines are generated from <copyright> and <date> docbook elements. A work-around is to manually specify encoding = ISO-8859-1 in your browser for the manual. Two alternative solutions may also be possible: Tell Jenkins to serve HTML content with ISO-8859-1 MORE SPECULATIVE: Create our own xsl to handle generating of the copyright bit at the beginning and convert all '�' chars to html entities.

Richard Henwood (Inactive) added a comment - 06/Jul/11 2:15 PM

It seems this might be an encoding issue. The xml is currently specified as 'UTF-8'. UTF-8 is not a good choice.

I propose: ISO-8859-1.

Richard Henwood (Inactive) added a comment - 06/Jul/11 2:15 PM It seems this might be an encoding issue. The xml is currently specified as 'UTF-8'. UTF-8 is not a good choice. I propose: ISO-8859-1.

People

Assignee:: Richard Henwood (Inactive)

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Jul/11 1:29 PM

Updated:: 13/Feb/14 2:40 PM

Resolved:: 08/Oct/13 6:26 PM