Bug: Question Marks in Posts

  • 8 Replies
  • 2482 Views
*

Parsifal

  • Official Member
  • 36118
  • Bendy Light specialist
Bug: Question Marks in Posts
« on: May 24, 2010, 04:16:00 PM »
While this isn't so much a suggestion or a concern for the forum administration, it is both a suggestion and a concern for posters, and so I believe it belongs in this forum.

You may have noticed that, on occasion, when you paste text on FES, quotation marks, apostrophes, dashes and a few other characters are mysteriously transformed into question marks. Having just discovered the cause of this phenomenon, which in fact extends beyond FES and has nothing to do with the SMF forum software, I would like to share it with you.

This transformation is caused by Microsoft's use of a character set which is incompatible with all existing standards. When text is copied from Microsoft Word onto the web, therefore, anybody reading it on a non-Microsoft platform will see these characters as question marks, or they will be omitted altogether. It seems SMF either performs its own conversion to a different character set, or otherwise transforms the data in such a way that these incompatible characters are stripped even for users of Microsoft platforms.

Fortunately, there are two easy solutions to this. One, and the one I suggest to you, is to stop using Microsoft products. The other is to use the demoroniser, which is a program designed to "correct moronic and gratuitously incompatible HTML generated by Microsoft applications". Either one will repair corrupted text generated by Microsoft Word, and cause your posts to be readable.

A more detailed explanation from the linked page:

A little detective work revealed that, as is usually the case when you encounter something shoddy in the vicinity of a computer, Microsoft incompetence and gratuitous incompatibility were to blame. Western language HTML documents are written in the ISO 8859-1 Latin-1 character set, with a specified set of escapes for special characters. Blithely ignoring this prescription, as usual, Microsoft use their own "extension" to Latin-1, in which a variety of characters which do not appear in Latin-1 are inserted in the range 0x82 through 0x95--this having the merit of being incompatible with both Latin-1 and Unicode, which reserve this region for additional control characters.

These characters include open and close single and double quotes, em and en dashes, an ellipsis and a variety of other things you've been dying for, such as a capital Y umlaut and a florin symbol. Well, okay, you say, if Microsoft want to have their own little incompatible character set, why not? Because it doesn't stop there--in their inimitable fashion (who would want to?)--they aggressively pollute the Web pages of unknowing and innocent victims worldwide with these characters, with the result that the owners of these pages look like semi-literate morons when their pages are viewed on non-Microsoft platforms (or on Microsoft platforms, for that matter, if the user has selected as the browser's font one of the many TrueType fonts which do not include the incompatible Microsoft characters).
« Last Edit: June 26, 2012, 12:07:40 PM by John Davis »
I'm going to side with the white supremacists.

?

Crustinator

  • 7813
  • Bamhammer horror!
Re: An explanation for the question marks that litter some posts
« Reply #1 on: May 24, 2010, 04:19:09 PM »
My suggestions are more low tech:

1) Read what you post before you post it.
2) Don't post copypasta

*

James

  • Flat Earther
  • The Elder Ones
  • 5613
Re: An explanation for the question marks that litter some posts
« Reply #2 on: May 24, 2010, 05:10:10 PM »
This does not seem to solve the issue of foreign-language characters such as o-umlaut, c-cedilla, etc., which are genuinely unsupported by this board's software. This is an issue I have raised repeatedly but has not been addressed by anyone with the relevent know-how or permissions (if, indeed, it is possible at all).
"For your own sake, as well as for that of our beloved country, be bold and firm against error and evil of every kind." - David Wardlaw Scott, Terra Firma 1901

*

Parsifal

  • Official Member
  • 36118
  • Bendy Light specialist
Re: An explanation for the question marks that litter some posts
« Reply #3 on: May 24, 2010, 06:09:32 PM »
This does not seem to solve the issue of foreign-language characters such as o-umlaut, c-cedilla, etc., which are genuinely unsupported by this board's software. This is an issue I have raised repeatedly but has not been addressed by anyone with the relevent know-how or permissions (if, indeed, it is possible at all).

That's actually a more complex issue. There appear to be two write mechanisms into the database; one for normal posting and editing, and the other for the Quick Edit feature (which I do NOT endorse the use of, as making use of Quick Edit causes part of the non-free SMF software to run on your own personal computer in the form of a Javascript program). The one for normal posting doesn't support such foreign language characters, while the Quick Edit feature handles them without a problem. I have no idea why this is; I haven't looked at the SMF source.
I'm going to side with the white supremacists.

*

markjo

  • Content Nazi
  • The Elder Ones
  • 42682
Re: An explanation for the question marks that litter some posts
« Reply #4 on: May 24, 2010, 07:56:09 PM »
Or, you could just turn off smart quotes in your word processor.
Science is what happens when preconception meets verification.
Quote from: Robosteve
Besides, perhaps FET is a conspiracy too.
Quote from: bullhorn
It is just the way it is, you understanding it doesn't concern me.

*

Parsifal

  • Official Member
  • 36118
  • Bendy Light specialist
Re: An explanation for the question marks that litter some posts
« Reply #5 on: May 24, 2010, 09:42:07 PM »
Or, you could just turn off smart quotes in your word processor.

I don't use Microsoft Word, so I had no idea whether such a feature was available or how to use it. The article indicates that it is possible, but the article is more than six years old and Microsoft has a tendency to remove useful things and add stuff nobody wants in later versions of their software.
I'm going to side with the white supremacists.

*

markjo

  • Content Nazi
  • The Elder Ones
  • 42682
Re: An explanation for the question marks that litter some posts
« Reply #6 on: May 25, 2010, 05:04:38 AM »
Or, you could just turn off smart quotes in your word processor.

I don't use Microsoft Word, so I had no idea whether such a feature was available or how to use it. The article indicates that it is possible, but the article is more than six years old and Microsoft has a tendency to remove useful things and add stuff nobody wants in later versions of their software.

Microsoft isnít the only company to use smart quotes.
Science is what happens when preconception meets verification.
Quote from: Robosteve
Besides, perhaps FET is a conspiracy too.
Quote from: bullhorn
It is just the way it is, you understanding it doesn't concern me.

*

Parsifal

  • Official Member
  • 36118
  • Bendy Light specialist
Re: An explanation for the question marks that litter some posts
« Reply #7 on: May 25, 2010, 05:14:04 AM »
Microsoft isnít the only company to use smart quotes.

I don't know of any other software developer who uses Microsoft's non-standard character set. OpenOffice.org, for instance, has a feature similar to Microsoft's smart quotes, but replaces characters in accordance with ASCII standards rather than breaking compatibility.

Edit: I just tested copying text from OpenOffice.org, and its quotation marks don't work on SMF. Turns out the issue is SMF's incompatibility, and not Microsoft's - although Microsoft compounds the issue, in that even if the SMF issue was fixed, there wouldn't be complete resolution.
« Last Edit: May 25, 2010, 05:18:30 AM by Parsifal »
I'm going to side with the white supremacists.

*

Username

  • Administrator
  • 17693
  • President of The Flat Earth Society
Re: An explanation for the question marks that litter some posts
« Reply #8 on: June 23, 2012, 03:44:59 PM »
This should be solved by converting to UTF-8.  Adding this to the list.
The illusion is shattered if we ask what goes on behind the scenes.