Javascript - Sourcecode living in promiscuously encoded world
Going through a lot of Javascript debugging with other folks I've noticed a bug pattern that can be due to some not-so-clear statements or documentation of encodings in the Javascript community as large.
Non-english software-developers have a great deal of exposure to econdings and therefore non-ascii characters. In most programming languages we have exactly the same problem as in Javascript: in what encoding is the code itself, as most file systems don't have an explicit encoding for a text-file.
Going with ascii is what most english developers do naturally, as they don't need any language specific characters to express string literals or comments.
In the browser world, however, the problem gets a whole lot worse. Escpecially with Javascript and some of its unique uses.
A lot of people assume, that Javascript itself should be encoded as UTF-8. This is e.g. due to parts of the JSON documentation, which emphasizes this point.
Conflicts arise however in the following scenarios:
- Javascript included in a HTML page, the surrounding HTML page has an explicit encoding set to non-UTF-8,
- The web-servers configured standard encoding differs from the encoding expressed in the html-header,
- The web-servers configured standard encoding differs from the assumption, the guy who built the source code,
- A web page requests Javascript from several servers, has in explicit encoding, but that differs from the assumptions, the other servers made,
- You test a static HTML page with an included Javascript-file in your local file system, assume a specific encoding - your browser disagrees,
- You have a UTF-8 encoded HTML page and include non-UTF-8 encoded Javascript. In IE, included Javascript inherits the requesting pages encoding. Which is bad. Which makes broken Javascript (ISO-8859-1 characters are invalid in UTF-8). Which is bad.
To make things worse, libraries (e.g. for java) to serialize into JSON have their own assumptions, or just-don't-care.
Living in a multi-encoding and interconnected world is not simple - never has been, but padavans in the Javascript world need some guidance and preparation to ease the location of bugs.
My par-force recommendation in this case is kind of harsh, but works always without special care: USE ASCII, Luke.
If it has the eighth bit on: remove it. Use \u-Unicode encoding where needed.
Use only ASCII in your comments.
Use an automatic tool to check your code for 7-Bit cleanness.
You'll never have to worry again about reusing, including, json-fying data or Javascript code again. It just works. With every browser, every web server and every local file system.
... read more stories on the topic int
You're Doing It Wrong Think you've mastered the art of server performance?...
To help illustrate Facebook's shift away from privacy, the Electronic Frontier...
When you listed your product on Amazon, be aware of user generated content, especially...
The confusing and inconsistent state of downloading files using a web browser has...
Clearly defined goals and fair, incremental rewards are two game design techniques...
David Gelernter: Time to start taking the internet seriously
Speakers with German accents ? even if they stumble into grammatical errors ? are...
How to Download an Audio Book from the Cleveland Public Library
In this era, innovation will be driven by empowered customers and employees and IT...
Wroblewski: A narrative style in form design increases the conversion rate by 25...
curl -O "http://intermedia.pixelboxx.com/demo/meta
March 2009 |
||||||
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
Sun |
1 |
||||||
3 |
4 |
6 |
7 |
8 |
||
11 |
13 |
14 |
15 |
|||
16 |
17 |
18 |
20 |
21 |
22 |
|
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
|||||