1
==========
2
 gchardet
3
==========
4
5
This is a charset detector based upon the `Mozilla Universal Charset
6
detector <http://www.mozilla.org/projects/intl/detectorsrc.html>`_ with
7
a *g-friendly* API.
8
9
10
Using it
11
========
12
13
For an example, take a look at ``test.c``, the API is all in ``gchardet.h``.
14
15
16
String-oriented API
17
-------------------
18
This is useful to quickly check small-sized strings, a single function does
19
all the job. For example::
20
21
  int main (int argc, char **argv)
22
  {
23
    int i;
24
    for (i = 0; i < argc; argc++) {
25
      gchar *encoding = g_chardet_detect (argv[i], G_CHARDET_ANY);
26
      printf ("%s: %s\n", argv[i], (encoding != NULL) ? encoding : "unknown");
27
      g_free (encoding);
28
    }
29
  }
30
31
32
Stream-oriented API
33
-------------------
34
This is sueful for big chunks of text which are to be gradually fed into the
35
detector, in a stream-like fashion. For example you could add text from
36
a web page while it is being downloaded::
37
38
  gchar *buffer;
39
  g_chardet_t *cd = g_chardet_new (G_CHARDET_ANY);
40
41
  while ((buffer = download_chunk ()) != NULL) {
42
    g_chardet_handle (cd, buffer, strlen (buffer));
43
    do_something_else_with_buffer (buffer);
44
  }
45
46
  if (g_chardet_charset (cd) != NULL) {
47
    printf ("Detected charset: %s\n", g_chardet_charset (cd));
48
  }
49
50
  g_chardet_free (cd);
51
52
53
Building
54
========
55
56
1. Download a copy of the Firefox sources and unpack it::
57
58
    wget http://releases.mozilla.org/pub/mozilla.org/firefox/releases/3.5/source/firefox-3.5-source.tar.bz2
59
    tar -xjf firefox-3.5-source.tar.bz2
60
61
   (A copy of the sources is already included, just in case you
62
   want to avoid downloading the 40 MiB-sized Firefox tarball.)
63
64
3. Copy over the relevant files to this directory::
65
66
    cp mozilla-1.9.1/extensions/universalchardet/src/base/* .
67
68
4. Build::
69
70
    make
71
    make test
72
73
5. Enjoy:
74
75
    ./test < nginx.utf8