Wikipedia
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1335
core.internal.utf
Encode and decode UTF-8, UTF-16 and UTF-32 strings.
For Win32 systems, the C wchar_t type is UTF-16 and corresponds to the D wchar type. For Posix systems, the C wchar_t type is UTF-32 and corresponds to the D utf.dchar type.
UTF character support is restricted to (\u0000 <= character <= \U0010FFFF).
See Also
Copyright
Copyright Digital Mars 2003 - 2016.
License
var UTF8stride
alias wptr
Types 1
aliaswptr = const(wchar) *
Functions 31
fn
void onUnicodeError( string msg, size_t idx, string file = __FILE__, size_t line = __LINE__ ) @safe pure;fn
uint stride(const scope char[] s, size_t i) @safe @nogc pure nothrowstride() returns the length of a UTF-8 sequence starting at index i in string s. Returns: The number of bytes in the UTF-8 sequence or 0xFF meaning s[i] is not the start of of UTF-8 sequence.fn
uint stride(const scope wchar[] s, size_t i) @safe @nogc pure nothrowstride() returns the length of a UTF-16 sequence starting at index i in string s.fn
uint stride(const scope dchar[] s, size_t i) @safe @nogc pure nothrowstride() returns the length of a UTF-32 sequence starting at index i in string s. Returns: The return value will always be 1.fn
size_t toUCSindex(const scope char[] s, size_t i) @safe pureGiven an index i into an array of characters s[], and assuming that index i is at the start of a UTF character, determine the number of UCS characters up to that index i.fn
size_t toUTFindex(const scope char[] s, size_t n) @safe pureGiven a UCS index n into an array of characters s[], return the UTF index.fn
dchar decode(const scope char[] s, ref size_t idx) @safe pureDecodes and returns character starting at s[idx]. idx is advanced past the decoded character. If the character is not well formed, a UtfException is thrown and idx remains unchanged.fn
void encode(ref char[] s, dchar c) @safe pure nothrowEncodes character c and appends it to array s[].fn
ubyte codeLength(C)(dchar c) @safe pure nothrow @nogcReturns the code length of c in the encoding using C as a code point. The code is returned in character count, not in bytes.fn
bool isValidString(S)(const scope S s) @safe pure nothrowChecks to see if string is well formed or not. S can be an array of char, wchar, or dchar. Returns false if it is not. Use to check all untrusted input for correctness.fn
string toUTF8(return scope string s) @safe pure nothrowEncodes string s into UTF-8 and returns the encoded string.fn
wstring toUTF16(const scope char[] s) @trusted pureEncodes string s into UTF-16 and returns the encoded string. toUTF16z() is suitable for calling the 'W' functions in the Win32 API that take an LPWSTR or LPCWSTR argument.fn
dstring toUTF32(const scope char[] s) @trusted pureEncodes string s into UTF-32 and returns the encoded string.Variables 1
var
[
cast(ubyte)
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 0xFF, 0xFF,
] UTF8stride