Characters → Unicode

Encode text to codepoints (hex, \u, HTML)

The Characters to Unicode Converter is a clean developer utility designed to convert plain text into Unicode representation formats. By entering text, it translates each character into its respective Unicode codepoint, hex values, decimal values, and escaped formats (like \uXXXX or HTML entities), supporting your coding projects.

Unicode is the global standard for encoding text, allowing computers to represent characters from every language, symbol system, and emoji set. When programming, developing web pages, or managing databases, you often need to represent special characters using safe escape sequences to prevent encoding errors. This tool automates the translation, providing clean snippets for your code.

All conversions run locally in your web browser. No text inputs or code snippets are sent to external servers, protecting your privacy. It is a secure, fast, and completely free tool built to support developers, web designers, and database administrators.

Supported Unicode formats

This converter translates text into the most common formats used in programming and web design. It displays the Unicode codepoint (the standard U+XXXX designation), the hexadecimal value, and the decimal value for each character.

It also generates escaped formats: the standard C/C++/Java/JavaScript escape sequence (\uXXXX), the CSS escape code (\XXXX), and HTML entity formats (hex and decimal). These escape formats are essential for embedding special symbols in code without causing character encoding issues.

The character translation process

The converter loops through the input text and retrieves the Unicode codepoint for each character using JavaScript's native string methods:

Retrieve codepoint value: Code = Character.codePointAt(0).
Convert to Hexadecimal representation: Hex = Code.toString(16).toUpperCase().
Format as Unicode escape sequence: "\\u" + Hex.padStart(4, "0").

This process handles standard 16-bit characters as well as surrogate pairs (like emojis), ensuring correct representations across all Unicode blocks.

Worked encoding examples

Let us look at how some common special characters are converted into Unicode representations:

Copyright Symbol (©):
- Codepoint: U+00A9 (Decimal: 169).
- JS Escape: \u00A9.
- HTML Entity: © (or ©).
Emoji Smile (😊):
- Codepoint: U+1F60A (Decimal: 128522).
- JS Escape: \u{1F60A}.
- HTML Entity: 😊.
Greek Letter Beta (β):
- Codepoint: U+03B2 (Decimal: 946).
- JS Escape: \u03B2.

When to use this converter

Use this tool when writing code that requires special characters, such as math symbols, foreign language glyphs, or emojis, in formats that are safe from encoding issues. It is an excellent resource for web developers who need to generate HTML entity codes, or software engineers formatting string literals in Java, C#, Python, or JavaScript.

The clean tabular layout lets you copy specific codes for any character instantly. Simply type your text, and the character breakdown updates immediately, allowing you to grab the codes you need.

Surrogate pairs and UTF-8 differences

Note that standard Unicode escape sequences (\uXXXX) support characters up to the Basic Multilingual Plane (BMP), which covers values up to U+FFFF. Characters beyond this range (like emojis or historical scripts) require surrogate pairs or extended braces in JavaScript (\u{XXXXXX}) to render correctly.

Additionally, remember that Unicode codepoint values differ from UTF-8 byte sequences. A codepoint is a numerical index, while UTF-8 is a specific way of encoding that index into bytes. This converter outputs codepoints and standard escape sequences, which are the formats expected in source code strings.

Frequently asked questions

Why does an emoji show two codepoints sometimes?

Many emoji are single codepoints in the supplementary planes (above U+FFFF). Some skin-tone modifiers, flags, and ZWJ sequences combine multiple codepoints. This tool shows the actual codepoints, not the visual grapheme count.

What is the difference between U+XXXX and \uXXXX?

They represent the same value. U+ is the standard Unicode notation; \u is the JavaScript string escape syntax. Both are hex.

Why does ASCII text have small codepoints?

The first 128 Unicode codepoints (U+0000 to U+007F) are identical to ASCII. Unicode was designed to be a superset.

What is a Unicode codepoint?

A Unicode codepoint is the unique numerical value assigned to a specific character or symbol in the Unicode standard, traditionally written in hexadecimal format prefixed with "U+" (e.g., U+0020 for a space).

Why should I use Unicode escapes in my code?

Using Unicode escape sequences ensures that special characters render correctly regardless of how the source code file is encoded or transmitted, preventing characters from corrupting into weird symbols.