Javascript - Sourcecode living in promiscuously encoded world
Going through a lot of Javascript debugging with other folks I've noticed a bug pattern that can be due to some not-so-clear statements or documentation of encodings in the Javascript community as large.
Non-english software-developers have a great deal of exposure to econdings and therefore non-ascii characters. In most programming languages we have exactly the same problem as in Javascript: in what encoding is the code itself, as most file systems don't have an explicit encoding for a text-file.
Going with ascii is what most english developers do naturally, as they don't need any language specific characters to express string literals or comments.
In the browser world, however, the problem gets a whole lot worse. Escpecially with Javascript and some of its unique uses.
A lot of people assume, that Javascript itself should be encoded as UTF-8. This is e.g. due to parts of the JSON documentation, which emphasizes this point.
Conflicts arise however in the following scenarios:
- Javascript included in a HTML page, the surrounding HTML page has an explicit encoding set to non-UTF-8,
- The web-servers configured standard encoding differs from the encoding expressed in the html-header,
- The web-servers configured standard encoding differs from the assumption, the guy who built the source code,
- A web page requests Javascript from several servers, has in explicit encoding, but that differs from the assumptions, the other servers made,
- You test a static HTML page with an included Javascript-file in your local file system, assume a specific encoding - your browser disagrees,
- You have a UTF-8 encoded HTML page and include non-UTF-8 encoded Javascript. In IE, included Javascript inherits the requesting pages encoding. Which is bad. Which makes broken Javascript (ISO-8859-1 characters are invalid in UTF-8). Which is bad.
To make things worse, libraries (e.g. for java) to serialize into JSON have their own assumptions, or just-don't-care.
Living in a multi-encoding and interconnected world is not simple - never has been, but padavans in the Javascript world need some guidance and preparation to ease the location of bugs.
My par-force recommendation in this case is kind of harsh, but works always without special care: USE ASCII, Luke.
If it has the eighth bit on: remove it. Use \u-Unicode encoding where needed.
Use only ASCII in your comments.
Use an automatic tool to check your code for 7-Bit cleanness.
You'll never have to worry again about reusing, including, json-fying data or Javascript code again. It just works. With every browser, every web server and every local file system.
... read more stories on the topic int
David Gelernter: Time to start taking the internet seriously
Speakers with German accents ? even if they stumble into grammatical errors ? are...
How to Download an Audio Book from the Cleveland Public Library
In this era, innovation will be driven by empowered customers and employees and IT...
Wroblewski: A narrative style in form design increases the conversion rate by 25...
Computational Thinking - a way of thinking that is critical in the 21st Century
In Microsoft?s attempt to make Internet Explorer 8 more standards-compliant, the...
Nikki Graziano is a photographer and mathematician at R.I.T. in Rochester, New York. See...
Gartner Reveals Five Social Software Predictions for 2010 and Beyond Analysts Share...
Using node.js for a highly scalable instant messaging system.
It's only days since google expanded the wave invitation...
the #wave hashtag has just been replaced as one of...
March 2009 |
||||||
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
Sun |
1 |
||||||
3 |
4 |
6 |
7 |
8 |
||
11 |
13 |
14 |
15 |
|||
16 |
17 |
18 |
20 |
21 |
22 |
|
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
|||||