Characters → Unicode
Encode text to codepoints (hex, \u, HTML)
This converter takes a string of characters and returns the Unicode codepoint for each character in several common formats: hexadecimal, decimal, JavaScript \u escapes, and HTML numeric character references.
Useful for debugging text encoding issues, escaping characters for source code, embedding special characters in HTML or JSON, and inspecting strings that contain combining marks or emoji.
Output formats
- Hex (U+XXXX): e.g.
U+00E9for "é". - Decimal: e.g.
233for "é". - JavaScript escapes:
\u00E9or\u{1F600}for emoji. - HTML numeric entities:
éoré.
How it works
The browser exposes each character's Unicode codepoint via String.prototype.codePointAt. The converter iterates over the string and formats each codepoint in the chosen output format. It handles surrogate pairs correctly, so a single emoji renders as one codepoint (not two).
Worked example
Input: Café 😀
Output (hex):
- C → U+0043
- a → U+0061
- f → U+0066
- é → U+00E9
- (space) → U+0020
- 😀 → U+1F600
When this is useful
Debugging text that looks correct but does not match expected strings (often a hidden combining character or zero-width space), embedding non-ASCII characters in source code as escapes, and learning about Unicode by inspecting real text.
Combining characters and graphemes
Some visible "characters" (like an accented letter or a flag emoji) are actually composed of multiple codepoints joined by combining marks or zero-width joiners. This tool shows each underlying codepoint individually, which is usually what you want for debugging encoding issues.
Frequently asked questions
Why does an emoji show two codepoints sometimes?
Many emoji are single codepoints in the supplementary planes (above U+FFFF). Some skin-tone modifiers, flags, and ZWJ sequences combine multiple codepoints. This tool shows the actual codepoints, not the visual grapheme count.
What is the difference between U+XXXX and \uXXXX?
They represent the same value. U+ is the standard Unicode notation; \u is the JavaScript string escape syntax. Both are hex.
Why does ASCII text have small codepoints?
The first 128 Unicode codepoints (U+0000 to U+007F) are identical to ASCII. Unicode was designed to be a superset.