Unicode Character Encoding: The Problem and SolutionWritten by Jeffrey Bennett
Proper character encoding is something rarely talked about, but is absolutely essential for every computer program and website in existence to function properly.
Without knowing the correct character set used by the text, your letters and numbers may appear jumbled or worse — completely unreadable.
This applies to literally every single file on the internet and on your computer.
To really understand this stuff, I need to give you a history lesson.
Stay with me! Knowing this stuff can be the difference between being a good developer and a really great one!
An “A” is an “A”. Or is it?
It all started way back when computers were first starting to display text on a screen. Things were simple back then. Typing the letter “A” meant you wanted to show “A”, “B” meant you wanted to show “B”, and so on.
For a computer to display a letter or number, each of these characters must be mapped to a specific character code. Even spaces, tabs, and line breaks need this!
Whenever you press a letter key on your keyboard, you’re getting so much more than the letter itself!
Data is sent in the form of 0s and 1s from the keyboard to the CPU and it requests the appropriate character code from this information.
These character codes are hardcoded into the machine’s operating system so that whenever this character code is requested, it will return the associated character.
Voilà! Within microseconds, you have the letter that you pressed on your keyboard showing up on your screen.
Just like magic! 🌟💥✨
But then, computers began to learn foreign languages like Thai, Mandarin, or Japanese.
Now the letter A could be displayed in a bunch of different ways, each entirely up to the language that was needed. There was information clashing and chaos running amuck.
The advent of the internet made this minor problem, normally isolated to a single machine, a global epidemic.
Now you have all these languages that use the same character codes overlapping each other and competing for attention.
This created serious problems that were not easily solved by simply changing the text of a page or source code of an application.
In fact, it became such a problem that the Unicode Character Consortium had to be created. They are now responsible for creating and maintaining the standards for every single character rendered on a machine, regardless of what language it’s written in.
A Happy Ending
Everybody loves a happy ending, right? Well, this is where we rejoice!
Because of the Unicode Character Consortium, we now have a fantastic way to display many different languages on an individual webpage or application.
Without these standards, websites like Facebook or Google would have never been able to grow to the massive global scale they have.
The next time you type that email to Grandma or post a status update to Facebook, think of the Unicode Character Consortium. They truly have allowed the internet to become the global space it has become.