| 1 |
========== |
| 2 |
gchardet |
| 3 |
========== |
| 4 |
|
| 5 |
This is a charset detector based upon the `Mozilla Universal Charset |
| 6 |
detector <http://www.mozilla.org/projects/intl/detectorsrc.html>`_ with |
| 7 |
a *g-friendly* API. |
| 8 |
|
| 9 |
|
| 10 |
Using it |
| 11 |
======== |
| 12 |
|
| 13 |
For an example, take a look at ``test.c``, the API is all in ``gchardet.h``. |
| 14 |
|
| 15 |
|
| 16 |
String-oriented API |
| 17 |
------------------- |
| 18 |
This is useful to quickly check small-sized strings, a single function does |
| 19 |
all the job. For example:: |
| 20 |
|
| 21 |
int main (int argc, char **argv) |
| 22 |
{ |
| 23 |
int i; |
| 24 |
for (i = 0; i < argc; argc++) { |
| 25 |
gchar *encoding = g_chardet_detect (argv[i], G_CHARDET_ANY); |
| 26 |
printf ("%s: %s\n", argv[i], (encoding != NULL) ? encoding : "unknown"); |
| 27 |
g_free (encoding); |
| 28 |
} |
| 29 |
} |
| 30 |
|
| 31 |
|
| 32 |
Stream-oriented API |
| 33 |
------------------- |
| 34 |
This is sueful for big chunks of text which are to be gradually fed into the |
| 35 |
detector, in a stream-like fashion. For example you could add text from |
| 36 |
a web page while it is being downloaded:: |
| 37 |
|
| 38 |
gchar *buffer; |
| 39 |
g_chardet_t *cd = g_chardet_new (G_CHARDET_ANY); |
| 40 |
|
| 41 |
while ((buffer = download_chunk ()) != NULL) { |
| 42 |
g_chardet_handle (cd, buffer, strlen (buffer)); |
| 43 |
do_something_else_with_buffer (buffer); |
| 44 |
} |
| 45 |
|
| 46 |
if (g_chardet_charset (cd) != NULL) { |
| 47 |
printf ("Detected charset: %s\n", g_chardet_charset (cd)); |
| 48 |
} |
| 49 |
|
| 50 |
g_chardet_free (cd); |
| 51 |
|
| 52 |
|
| 53 |
Building |
| 54 |
======== |
| 55 |
|
| 56 |
1. Download a copy of the Firefox sources and unpack it:: |
| 57 |
|
| 58 |
wget http://releases.mozilla.org/pub/mozilla.org/firefox/releases/3.5/source/firefox-3.5-source.tar.bz2 |
| 59 |
tar -xjf firefox-3.5-source.tar.bz2 |
| 60 |
|
| 61 |
(A copy of the sources is already included, just in case you |
| 62 |
want to avoid downloading the 40 MiB-sized Firefox tarball.) |
| 63 |
|
| 64 |
3. Copy over the relevant files to this directory:: |
| 65 |
|
| 66 |
cp mozilla-1.9.1/extensions/universalchardet/src/base/* . |
| 67 |
|
| 68 |
4. Build:: |
| 69 |
|
| 70 |
make |
| 71 |
make test |
| 72 |
|
| 73 |
5. Enjoy: |
| 74 |
|
| 75 |
./test < nginx.utf8 |