ska: unmasked interrupts
Thursday, 5. March 2009

Javascript - Sourcecode living in promiscuously encoded world

Going through a lot of Javascript debugging with other folks I've noticed a bug pattern that can be due to some not-so-clear statements or documentation of encodings in the Javascript community as large.

Non-english software-developers have a great deal of exposure to econdings and therefore non-ascii characters. In most programming languages we have exactly the same problem as in Javascript: in what encoding is the code itself, as most file systems don't have an explicit encoding for a text-file.

Going with ascii is what most english developers do naturally, as they don't need any language specific characters to express string literals or comments.

In the browser world, however, the problem gets a whole lot worse. Escpecially with Javascript and some of its unique uses.

A lot of people assume, that Javascript itself should be encoded as UTF-8. This is e.g. due to parts of the JSON documentation, which emphasizes this point.

Conflicts arise however in the following scenarios:

  • Javascript included in a HTML page, the surrounding HTML page has an explicit encoding set to non-UTF-8,
  • The web-servers configured standard encoding differs from the encoding expressed in the html-header,
  • The web-servers configured standard encoding differs from the assumption, the guy who built the source code,
  • A web page requests Javascript from several servers, has in explicit encoding, but that differs from the assumptions, the other servers made,
  • You test a static HTML page with an included Javascript-file in your local file system, assume a specific encoding - your browser disagrees,
  • You have a UTF-8 encoded HTML page and include non-UTF-8 encoded Javascript. In IE, included Javascript inherits the requesting pages encoding. Which is bad. Which makes broken Javascript (ISO-8859-1 characters are invalid in UTF-8). Which is bad.

To make things worse, libraries (e.g. for java) to serialize into JSON have their own assumptions, or just-don't-care.

Living in a multi-encoding and interconnected world is not simple - never has been, but padavans in the Javascript world need some guidance and preparation to ease the location of bugs.

My par-force recommendation in this case is kind of harsh, but works always without special care: USE ASCII, Luke.
If it has the eighth bit on: remove it. Use \u-Unicode encoding where needed.
Use only ASCII in your comments.
Use an automatic tool to check your code for 7-Bit cleanness.

You'll never have to worry again about reusing, including, json-fying data or Javascript code again. It just works. With every browser, every web server and every local file system.

... read more stories on the topic int

... permalink... comment  ...xml version of this page

Online for 882 days
Last update: 2010.06.20, 21:54
... home
... about
... news feeds
search
 
status
You're not logged in ... login
tweets
unmasked links of interest
You're Doing It Wrong - ACM...
You're Doing It Wrong Think you've mastered the art of server performance?...
Facebook's Eroding Privacy...
To help illustrate Facebook's shift away from privacy, the Electronic Frontier...
Amazon.com: Contech Electronics...
When you listed your product on Amazon, be aware of user generated content, especially...
Improving download behaviors...
The confusing and inconsistent state of downloading files using a web browser has...
Employers: Look to gaming...
Clearly defined goals and fair, incremental rewards are two game design techniques...
Edge 313
David Gelernter: Time to start taking the internet seriously
Linguistic profiling: The...
Speakers with German accents ? even if they stumble into grammatical errors ? are...
The Brads ? a comic about...
How to Download an Audio Book from the Cleveland Public Library
Forrester: The new Era of...
In this era, innovation will be driven by empowered customers and employees and IT...
LukeW | "Mad Libs" Style Form...
Wroblewski: A narrative style in form design increases the conversion rate by 25...
more unmasked links...
unmasked recent updates
nice
Using # as uri reference for css background is really...
by l l o e g a r (2010.06.20, 21:54)
...
Well, I even resolve to "deliver the png" when a "*/*"...
by ska (2010.06.20, 16:45)
...
Using a command-line tool like cURL is definitely something...
by nie (2010.06.20, 14:33)
Well...
curl -O "http://intermedia.pixelboxx.com/demo/metaimage/_8BE5SrQJKr3_Vv1mfOGaIHd5d/f,p/plastic_kiss_on_a_car.png" That...
by marcos (2010.06.20, 11:26)
...
I think the motivation for image stealing is in most...
by nie (2010.06.20, 11:20)
menu
... home
... topics
... galleries

... Pixelbloxx home
calendar
March 2009
Mon
Tue
Wed
Thu
Fri
Sat
Sun
 
 
 
 
 
 
 1 
 3 
 4 
 6 
 7 
 8 
11
13
14
15
16
17
18
20
21
22
23
24
25
26
27
28
29
30
31
 
 
 
 
 
 

xml version of this page

made with antville

XING