Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • None
    • Mac OS/X 10.6.6 + Firefox 3.6.17
      Fedora 13 + Firefox 3.5.15
    • 1
    • 3
    • 7203

    Description

      There are a large number of "unknown" characters in the HTML version of the Lustre manual. On my system they appear as black diamonds with a question mark like '�'.

      For example, the copyright (C) character right at the start of the manual, and all of the accented characters in the Oracle boilerplate are shown this way, along with the hard-space (I guess) character in every section and subsection title is shown this way.

      Attachments

        Issue Links

          Activity

            [LUDOC-7] character set problems in HTML manual

            I believe the reason the 'x' makes a difference to the rendering is because the webserver that Jira uses is not very clever... I've discussed this in: https://jira.hpdd.intel.com/browse/LUDOC-7?focusedCommentId=17320&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17320

            rhenwood Richard Henwood (Inactive) added a comment - I believe the reason the 'x' makes a difference to the rendering is because the webserver that Jira uses is not very clever... I've discussed this in: https://jira.hpdd.intel.com/browse/LUDOC-7?focusedCommentId=17320&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17320
            adilger Andreas Dilger added a comment - - edited

            I just found that there are HTML codes for these:

            adilger Andreas Dilger added a comment - - edited I just found that there are HTML codes for these: copyright = & copy; trademark = & trade; accented characters - see http://symbolcodes.tlt.psu.edu/web/codehtml.html#accent

            It looks like the .xhtml version of the manual does not have this problem. Is that just because of the filename extension is not .html?

            adilger Andreas Dilger added a comment - It looks like the .xhtml version of the manual does not have this problem. Is that just because of the filename extension is not .html?

            one addition piece of information I've just noticed:

            lustre_manual.diff.html <- encoding appears correct
            lustre_manual.html <- encoding appears incorrect

            rhenwood Richard Henwood (Inactive) added a comment - one addition piece of information I've just noticed: lustre_manual.diff.html <- encoding appears correct lustre_manual.html <- encoding appears incorrect

            Yes, Joshua could probably understand what is going on here a lot faster than I could.

            adilger Andreas Dilger added a comment - Yes, Joshua could probably understand what is going on here a lot faster than I could.

            Would this be something for Joshua to help resolve?

            jessica Jessica A. Popp (Inactive) added a comment - Would this be something for Joshua to help resolve?

            More detail is now available:

            xsltproc produces HTML output with a directive:

            <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
            

            When you point your browser at Jenkins for the HTML build, Jenkins tells the browser that the file is encoded: UTF-8.

            Testing with my browser, it seems to take the UTF-8 encoding as correct, and shows the '�' characters. I can change the encoding manually in the browser and the document renders fine.

            xsltproc with the docbook.xsl specifies ISO-8859-1 encoding because it does it desires ISO-8859-1 for it's output. For example, the 'copyright � date DDDD' lines are generated from <copyright> and <date> docbook elements.

            A work-around is to manually specify encoding = ISO-8859-1 in your browser for the manual.

            Two alternative solutions may also be possible:

            • Tell Jenkins to serve HTML content with ISO-8859-1
            • MORE SPECULATIVE: Create our own xsl to handle generating of the copyright bit at the beginning and convert all '�' chars to html entities.
            rhenwood Richard Henwood (Inactive) added a comment - More detail is now available: xsltproc produces HTML output with a directive: <meta http-equiv= "Content-Type" content= "text/html; charset=ISO-8859-1" > When you point your browser at Jenkins for the HTML build, Jenkins tells the browser that the file is encoded: UTF-8. Testing with my browser, it seems to take the UTF-8 encoding as correct, and shows the '�' characters. I can change the encoding manually in the browser and the document renders fine. xsltproc with the docbook.xsl specifies ISO-8859-1 encoding because it does it desires ISO-8859-1 for it's output. For example, the 'copyright � date DDDD' lines are generated from <copyright> and <date> docbook elements. A work-around is to manually specify encoding = ISO-8859-1 in your browser for the manual. Two alternative solutions may also be possible: Tell Jenkins to serve HTML content with ISO-8859-1 MORE SPECULATIVE: Create our own xsl to handle generating of the copyright bit at the beginning and convert all '�' chars to html entities.

            It seems this might be an encoding issue. The xml is currently specified as 'UTF-8'. UTF-8 is not a good choice.

            I propose: ISO-8859-1.

            rhenwood Richard Henwood (Inactive) added a comment - It seems this might be an encoding issue. The xml is currently specified as 'UTF-8'. UTF-8 is not a good choice. I propose: ISO-8859-1.

            People

              rhenwood Richard Henwood (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: