1
  Here is a description of how you can use STLport to read/write utf8 files.
2
utf8 is a way of encoding wide characters. As so, management of encoding in
3
the C++ Standard library is handle by the codecvt locale facet which is part
4
of the ctype category. However utf8 only describe how encoding must be
5
performed, it cannot be used to classify characters so it is not enough info
6
to know how to generate the whole ctype category facets of a locale
7
instance.
8
9
In C++ it means that the following code will throw an exception to
10
signal that creation failed:
11
12
#include <locale>
13
// Will throw a std::runtime_error exception.
14
std::locale loc(".utf8");
15
16
For the same reason building a locale with the ctype facets based on
17
UTF8 is also wrong:
18
19
// Will throw a std::runtime_error exception:
20
std::locale loc(locale::classic(), ".utf8", std::locale::ctype);
21
22
The only solution to get a locale instance that will handle utf8 encoding
23
is to specifically signal that the codecvt facet should be based on utf8
24
encoding:
25
26
// Will succeed if there is necessary platform support.
27
locale loc(locale::classic(), new codecvt_byname<wchar_t, char, mbstate_t>(".utf8"));
28
29
  Once you have obtain a locale instance you can inject it in a file stream to
30
read/write utf8 files:
31
32
std::fstream fstr("file.utf8");
33
fstr.imbue(loc);
34
35
You can also access the facet directly to perform utf8 encoding/decoding operations:
36
37
typedef std::codecvt<wchar_t, char, mbstate_t> codecvt_t;
38
const codecvt_t& encoding = use_facet<codecvt_t>(loc);
39
40
Notes:
41
42
1. The dot ('.') is mandatory in front of utf8. This is a POSIX convention, locale
43
names have the following format:
44
language[_country[.encoding]]
45
46
Ex: 'fr_FR'
47
    'french'
48
    'ru_RU.koi8r'
49
50
2. utf8 encoding is only supported for the moment under Windows. The less common
51
utf7 encoding is also supported.