Commit 01dbdf59135967c1262c69706a59fa62629b5522

Add some missing files and a list of files that still need porting.
AUTHORS
(7 / 0)
  
1Jos van den Oever <jos@vandenoever.info>
2Ben van Klinken <bvanklinken@gmail.com>
3Flavio Castelli <micron@madlab.it>
4Arend van Beelen jr. <www.arendjr.nl>
5Christian Ehrlicher <ch.ehrlicher@gmx.de>
6Christopher Blauvelt <cblauvelt@gmail.com>
7Jakub Stachowski <qbast@go2.pl>
COPYING
(486 / 0)
  
1NOTE! The LGPL below is copyrighted by the Free Software Foundation, but
2the instance of code that it refers to (strigi) are copyrighted
3by the authors who actually wrote it.
4
5---------------------------------------------------------------------------
6 GNU LIBRARY GENERAL PUBLIC LICENSE
7 Version 2, June 1991
8
9 Copyright (C) 1991 Free Software Foundation, Inc.
10 51 Franklin Street, Fifth Floor
11 Boston, MA 02110-1301, USA.
12 Everyone is permitted to copy and distribute verbatim copies
13 of this license document, but changing it is not allowed.
14
15[This is the first released version of the library GPL. It is
16 numbered 2 because it goes with version 2 of the ordinary GPL.]
17
18 Preamble
19
20 The licenses for most software are designed to take away your
21freedom to share and change it. By contrast, the GNU General Public
22Licenses are intended to guarantee your freedom to share and change
23free software--to make sure the software is free for all its users.
24
25 This license, the Library General Public License, applies to some
26specially designated Free Software Foundation software, and to any
27other libraries whose authors decide to use it. You can use it for
28your libraries, too.
29
30 When we speak of free software, we are referring to freedom, not
31price. Our General Public Licenses are designed to make sure that you
32have the freedom to distribute copies of free software (and charge for
33this service if you wish), that you receive source code or can get it
34if you want it, that you can change the software or use pieces of it
35in new free programs; and that you know you can do these things.
36
37 To protect your rights, we need to make restrictions that forbid
38anyone to deny you these rights or to ask you to surrender the rights.
39These restrictions translate to certain responsibilities for you if
40you distribute copies of the library, or if you modify it.
41
42 For example, if you distribute copies of the library, whether gratis
43or for a fee, you must give the recipients all the rights that we gave
44you. You must make sure that they, too, receive or can get the source
45code. If you link a program with the library, you must provide
46complete object files to the recipients so that they can relink them
47with the library, after making changes to the library and recompiling
48it. And you must show them these terms so they know their rights.
49
50 Our method of protecting your rights has two steps: (1) copyright
51the library, and (2) offer you this license which gives you legal
52permission to copy, distribute and/or modify the library.
53
54 Also, for each distributor's protection, we want to make certain
55that everyone understands that there is no warranty for this free
56library. If the library is modified by someone else and passed on, we
57want its recipients to know that what they have is not the original
58version, so that any problems introduced by others will not reflect on
59the original authors' reputations.
60
61 Finally, any free program is threatened constantly by software
62patents. We wish to avoid the danger that companies distributing free
63software will individually obtain patent licenses, thus in effect
64transforming the program into proprietary software. To prevent this,
65we have made it clear that any patent must be licensed for everyone's
66free use or not licensed at all.
67
68 Most GNU software, including some libraries, is covered by the ordinary
69GNU General Public License, which was designed for utility programs. This
70license, the GNU Library General Public License, applies to certain
71designated libraries. This license is quite different from the ordinary
72one; be sure to read it in full, and don't assume that anything in it is
73the same as in the ordinary license.
74
75 The reason we have a separate public license for some libraries is that
76they blur the distinction we usually make between modifying or adding to a
77program and simply using it. Linking a program with a library, without
78changing the library, is in some sense simply using the library, and is
79analogous to running a utility program or application program. However, in
80a textual and legal sense, the linked executable is a combined work, a
81derivative of the original library, and the ordinary General Public License
82treats it as such.
83
84 Because of this blurred distinction, using the ordinary General
85Public License for libraries did not effectively promote software
86sharing, because most developers did not use the libraries. We
87concluded that weaker conditions might promote sharing better.
88
89 However, unrestricted linking of non-free programs would deprive the
90users of those programs of all benefit from the free status of the
91libraries themselves. This Library General Public License is intended to
92permit developers of non-free programs to use free libraries, while
93preserving your freedom as a user of such programs to change the free
94libraries that are incorporated in them. (We have not seen how to achieve
95this as regards changes in header files, but we have achieved it as regards
96changes in the actual functions of the Library.) The hope is that this
97will lead to faster development of free libraries.
98
99 The precise terms and conditions for copying, distribution and
100modification follow. Pay close attention to the difference between a
101"work based on the library" and a "work that uses the library". The
102former contains code derived from the library, while the latter only
103works together with the library.
104
105 Note that it is possible for a library to be covered by the ordinary
106General Public License rather than by this special one.
107
108 GNU LIBRARY GENERAL PUBLIC LICENSE
109 TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
110
111 0. This License Agreement applies to any software library which
112contains a notice placed by the copyright holder or other authorized
113party saying it may be distributed under the terms of this Library
114General Public License (also called "this License"). Each licensee is
115addressed as "you".
116
117 A "library" means a collection of software functions and/or data
118prepared so as to be conveniently linked with application programs
119(which use some of those functions and data) to form executables.
120
121 The "Library", below, refers to any such software library or work
122which has been distributed under these terms. A "work based on the
123Library" means either the Library or any derivative work under
124copyright law: that is to say, a work containing the Library or a
125portion of it, either verbatim or with modifications and/or translated
126straightforwardly into another language. (Hereinafter, translation is
127included without limitation in the term "modification".)
128
129 "Source code" for a work means the preferred form of the work for
130making modifications to it. For a library, complete source code means
131all the source code for all modules it contains, plus any associated
132interface definition files, plus the scripts used to control compilation
133and installation of the library.
134
135 Activities other than copying, distribution and modification are not
136covered by this License; they are outside its scope. The act of
137running a program using the Library is not restricted, and output from
138such a program is covered only if its contents constitute a work based
139on the Library (independent of the use of the Library in a tool for
140writing it). Whether that is true depends on what the Library does
141and what the program that uses the Library does.
142
143 1. You may copy and distribute verbatim copies of the Library's
144complete source code as you receive it, in any medium, provided that
145you conspicuously and appropriately publish on each copy an
146appropriate copyright notice and disclaimer of warranty; keep intact
147all the notices that refer to this License and to the absence of any
148warranty; and distribute a copy of this License along with the
149Library.
150
151 You may charge a fee for the physical act of transferring a copy,
152and you may at your option offer warranty protection in exchange for a
153fee.
154
155 2. You may modify your copy or copies of the Library or any portion
156of it, thus forming a work based on the Library, and copy and
157distribute such modifications or work under the terms of Section 1
158above, provided that you also meet all of these conditions:
159
160 a) The modified work must itself be a software library.
161
162 b) You must cause the files modified to carry prominent notices
163 stating that you changed the files and the date of any change.
164
165 c) You must cause the whole of the work to be licensed at no
166 charge to all third parties under the terms of this License.
167
168 d) If a facility in the modified Library refers to a function or a
169 table of data to be supplied by an application program that uses
170 the facility, other than as an argument passed when the facility
171 is invoked, then you must make a good faith effort to ensure that,
172 in the event an application does not supply such function or
173 table, the facility still operates, and performs whatever part of
174 its purpose remains meaningful.
175
176 (For example, a function in a library to compute square roots has
177 a purpose that is entirely well-defined independent of the
178 application. Therefore, Subsection 2d requires that any
179 application-supplied function or table used by this function must
180 be optional: if the application does not supply it, the square
181 root function must still compute square roots.)
182
183These requirements apply to the modified work as a whole. If
184identifiable sections of that work are not derived from the Library,
185and can be reasonably considered independent and separate works in
186themselves, then this License, and its terms, do not apply to those
187sections when you distribute them as separate works. But when you
188distribute the same sections as part of a whole which is a work based
189on the Library, the distribution of the whole must be on the terms of
190this License, whose permissions for other licensees extend to the
191entire whole, and thus to each and every part regardless of who wrote
192it.
193
194Thus, it is not the intent of this section to claim rights or contest
195your rights to work written entirely by you; rather, the intent is to
196exercise the right to control the distribution of derivative or
197collective works based on the Library.
198
199In addition, mere aggregation of another work not based on the Library
200with the Library (or with a work based on the Library) on a volume of
201a storage or distribution medium does not bring the other work under
202the scope of this License.
203
204 3. You may opt to apply the terms of the ordinary GNU General Public
205License instead of this License to a given copy of the Library. To do
206this, you must alter all the notices that refer to this License, so
207that they refer to the ordinary GNU General Public License, version 2,
208instead of to this License. (If a newer version than version 2 of the
209ordinary GNU General Public License has appeared, then you can specify
210that version instead if you wish.) Do not make any other change in
211these notices.
212
213 Once this change is made in a given copy, it is irreversible for
214that copy, so the ordinary GNU General Public License applies to all
215subsequent copies and derivative works made from that copy.
216
217 This option is useful when you wish to copy part of the code of
218the Library into a program that is not a library.
219
220 4. You may copy and distribute the Library (or a portion or
221derivative of it, under Section 2) in object code or executable form
222under the terms of Sections 1 and 2 above provided that you accompany
223it with the complete corresponding machine-readable source code, which
224must be distributed under the terms of Sections 1 and 2 above on a
225medium customarily used for software interchange.
226
227 If distribution of object code is made by offering access to copy
228from a designated place, then offering equivalent access to copy the
229source code from the same place satisfies the requirement to
230distribute the source code, even though third parties are not
231compelled to copy the source along with the object code.
232
233 5. A program that contains no derivative of any portion of the
234Library, but is designed to work with the Library by being compiled or
235linked with it, is called a "work that uses the Library". Such a
236work, in isolation, is not a derivative work of the Library, and
237therefore falls outside the scope of this License.
238
239 However, linking a "work that uses the Library" with the Library
240creates an executable that is a derivative of the Library (because it
241contains portions of the Library), rather than a "work that uses the
242library". The executable is therefore covered by this License.
243Section 6 states terms for distribution of such executables.
244
245 When a "work that uses the Library" uses material from a header file
246that is part of the Library, the object code for the work may be a
247derivative work of the Library even though the source code is not.
248Whether this is true is especially significant if the work can be
249linked without the Library, or if the work is itself a library. The
250threshold for this to be true is not precisely defined by law.
251
252 If such an object file uses only numerical parameters, data
253structure layouts and accessors, and small macros and small inline
254functions (ten lines or less in length), then the use of the object
255file is unrestricted, regardless of whether it is legally a derivative
256work. (Executables containing this object code plus portions of the
257Library will still fall under Section 6.)
258
259 Otherwise, if the work is a derivative of the Library, you may
260distribute the object code for the work under the terms of Section 6.
261Any executables containing that work also fall under Section 6,
262whether or not they are linked directly with the Library itself.
263
264 6. As an exception to the Sections above, you may also compile or
265link a "work that uses the Library" with the Library to produce a
266work containing portions of the Library, and distribute that work
267under terms of your choice, provided that the terms permit
268modification of the work for the customer's own use and reverse
269engineering for debugging such modifications.
270
271 You must give prominent notice with each copy of the work that the
272Library is used in it and that the Library and its use are covered by
273this License. You must supply a copy of this License. If the work
274during execution displays copyright notices, you must include the
275copyright notice for the Library among them, as well as a reference
276directing the user to the copy of this License. Also, you must do one
277of these things:
278
279 a) Accompany the work with the complete corresponding
280 machine-readable source code for the Library including whatever
281 changes were used in the work (which must be distributed under
282 Sections 1 and 2 above); and, if the work is an executable linked
283 with the Library, with the complete machine-readable "work that
284 uses the Library", as object code and/or source code, so that the
285 user can modify the Library and then relink to produce a modified
286 executable containing the modified Library. (It is understood
287 that the user who changes the contents of definitions files in the
288 Library will not necessarily be able to recompile the application
289 to use the modified definitions.)
290
291 b) Accompany the work with a written offer, valid for at
292 least three years, to give the same user the materials
293 specified in Subsection 6a, above, for a charge no more
294 than the cost of performing this distribution.
295
296 c) If distribution of the work is made by offering access to copy
297 from a designated place, offer equivalent access to copy the above
298 specified materials from the same place.
299
300 d) Verify that the user has already received a copy of these
301 materials or that you have already sent this user a copy.
302
303 For an executable, the required form of the "work that uses the
304Library" must include any data and utility programs needed for
305reproducing the executable from it. However, as a special exception,
306the source code distributed need not include anything that is normally
307distributed (in either source or binary form) with the major
308components (compiler, kernel, and so on) of the operating system on
309which the executable runs, unless that component itself accompanies
310the executable.
311
312 It may happen that this requirement contradicts the license
313restrictions of other proprietary libraries that do not normally
314accompany the operating system. Such a contradiction means you cannot
315use both them and the Library together in an executable that you
316distribute.
317
318 7. You may place library facilities that are a work based on the
319Library side-by-side in a single library together with other library
320facilities not covered by this License, and distribute such a combined
321library, provided that the separate distribution of the work based on
322the Library and of the other library facilities is otherwise
323permitted, and provided that you do these two things:
324
325 a) Accompany the combined library with a copy of the same work
326 based on the Library, uncombined with any other library
327 facilities. This must be distributed under the terms of the
328 Sections above.
329
330 b) Give prominent notice with the combined library of the fact
331 that part of it is a work based on the Library, and explaining
332 where to find the accompanying uncombined form of the same work.
333
334 8. You may not copy, modify, sublicense, link with, or distribute
335the Library except as expressly provided under this License. Any
336attempt otherwise to copy, modify, sublicense, link with, or
337distribute the Library is void, and will automatically terminate your
338rights under this License. However, parties who have received copies,
339or rights, from you under this License will not have their licenses
340terminated so long as such parties remain in full compliance.
341
342 9. You are not required to accept this License, since you have not
343signed it. However, nothing else grants you permission to modify or
344distribute the Library or its derivative works. These actions are
345prohibited by law if you do not accept this License. Therefore, by
346modifying or distributing the Library (or any work based on the
347Library), you indicate your acceptance of this License to do so, and
348all its terms and conditions for copying, distributing or modifying
349the Library or works based on it.
350
351 10. Each time you redistribute the Library (or any work based on the
352Library), the recipient automatically receives a license from the
353original licensor to copy, distribute, link with or modify the Library
354subject to these terms and conditions. You may not impose any further
355restrictions on the recipients' exercise of the rights granted herein.
356You are not responsible for enforcing compliance by third parties to
357this License.
358
359 11. If, as a consequence of a court judgment or allegation of patent
360infringement or for any other reason (not limited to patent issues),
361conditions are imposed on you (whether by court order, agreement or
362otherwise) that contradict the conditions of this License, they do not
363excuse you from the conditions of this License. If you cannot
364distribute so as to satisfy simultaneously your obligations under this
365License and any other pertinent obligations, then as a consequence you
366may not distribute the Library at all. For example, if a patent
367license would not permit royalty-free redistribution of the Library by
368all those who receive copies directly or indirectly through you, then
369the only way you could satisfy both it and this License would be to
370refrain entirely from distribution of the Library.
371
372If any portion of this section is held invalid or unenforceable under any
373particular circumstance, the balance of the section is intended to apply,
374and the section as a whole is intended to apply in other circumstances.
375
376It is not the purpose of this section to induce you to infringe any
377patents or other property right claims or to contest validity of any
378such claims; this section has the sole purpose of protecting the
379integrity of the free software distribution system which is
380implemented by public license practices. Many people have made
381generous contributions to the wide range of software distributed
382through that system in reliance on consistent application of that
383system; it is up to the author/donor to decide if he or she is willing
384to distribute software through any other system and a licensee cannot
385impose that choice.
386
387This section is intended to make thoroughly clear what is believed to
388be a consequence of the rest of this License.
389
390 12. If the distribution and/or use of the Library is restricted in
391certain countries either by patents or by copyrighted interfaces, the
392original copyright holder who places the Library under this License may add
393an explicit geographical distribution limitation excluding those countries,
394so that distribution is permitted only in or among countries not thus
395excluded. In such case, this License incorporates the limitation as if
396written in the body of this License.
397
398 13. The Free Software Foundation may publish revised and/or new
399versions of the Library General Public License from time to time.
400Such new versions will be similar in spirit to the present version,
401but may differ in detail to address new problems or concerns.
402
403Each version is given a distinguishing version number. If the Library
404specifies a version number of this License which applies to it and
405"any later version", you have the option of following the terms and
406conditions either of that version or of any later version published by
407the Free Software Foundation. If the Library does not specify a
408license version number, you may choose any version ever published by
409the Free Software Foundation.
410
411 14. If you wish to incorporate parts of the Library into other free
412programs whose distribution conditions are incompatible with these,
413write to the author to ask for permission. For software which is
414copyrighted by the Free Software Foundation, write to the Free
415Software Foundation; we sometimes make exceptions for this. Our
416decision will be guided by the two goals of preserving the free status
417of all derivatives of our free software and of promoting the sharing
418and reuse of software generally.
419
420 NO WARRANTY
421
422 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
423WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
424EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
425OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
426KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
427IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
428PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
429LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
430THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
431
432 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
433WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
434AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
435FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
436CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
437LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
438RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
439FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
440SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
441DAMAGES.
442
443 END OF TERMS AND CONDITIONS
444 How to Apply These Terms to Your New Libraries
445
446 If you develop a new library, and you want it to be of the greatest
447possible use to the public, we recommend making it free software that
448everyone can redistribute and change. You can do so by permitting
449redistribution under these terms (or, alternatively, under the terms of the
450ordinary General Public License).
451
452 To apply these terms, attach the following notices to the library. It is
453safest to attach them to the start of each source file to most effectively
454convey the exclusion of warranty; and each file should have at least the
455"copyright" line and a pointer to where the full notice is found.
456
457 <one line to give the library's name and a brief idea of what it does.>
458 Copyright (C) <year> <name of author>
459
460 This library is free software; you can redistribute it and/or
461 modify it under the terms of the GNU Lesser General Public
462 License as published by the Free Software Foundation; either
463 version 2 of the License, or (at your option) any later version.
464
465 This library is distributed in the hope that it will be useful,
466 but WITHOUT ANY WARRANTY; without even the implied warranty of
467 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
468 Lesser General Public License for more details.
469
470 You should have received a copy of the GNU Lesser General Public
471 License along with this library; if not, write to the Free Software
472 Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
473
474Also add information on how to contact you by electronic and paper mail.
475
476You should also get your employer (if you work as a programmer) or your
477school, if any, to sign a "copyright disclaimer" for the library, if
478necessary. Here is a sample; alter the names:
479
480 Yoyodyne, Inc., hereby disclaims all copyright interest in the
481 library `Frob' (a library for tweaking knobs) written by James Random Hacker.
482
483 <signature of Ty Coon>, 1 April 1990
484 Ty Coon, President of Vice
485
486That's all there is to it!
ChangeLog
(218 / 0)
  
10.7.2
2 - Improve cpp analyzer speed and output
3 - Fix crash due to deep nesting of calls in pdf analyzer
4 - Fix iconv use on Mac OS X
50.7.1
6 - Support more fields from ODF documents
7 - Improved skipping behavior on streams for large files.
8 - Added album art support.
9 - Added support for ID3v1 tags.
10 - Added MP3 stream metadata extraction, UTF-16 support in tags.
11 - Extended the range of metadata extracted by ID3 analyzer.
12 - Added a FLAC audio file analyzer.
13 - Significantly unbreak the PDF analyzer.
14 - Fix scanning trees where permissions are insufficient to read some parts
15 - Check for multithreaded version of libxml2
16 - Require newer CLucene version (0.9.21)
170.7.0
18 - Change to Nepomuk ontologies (Evgeny Egorochkin)
19 - Set file property for embedded ar streams. This fixes the opening of these streams in archivereader.
20 - Instead of reading each .rdf file at once in memory and then parse it, use the libxml2 I/O API to read chunks of the file when requested.
21 - The attribute value is not '\0' terminated but has a pointer to the end of the string. In addition, string comparison was sped up by first comparing the string length.
220.6.5
23 - Fix KDE bug 185551: Strigi now allows paths that start with protocol:/* like 'file:///' or 'remote:/'
24 - Add a new function AnalysisResult::child(). This function allows an AnalysisResult instance to access the last child it has had indexed. This is needed for cases when a parent knows something about a child which the child does not know. In such cases the parent can call child()->addValue(...).
25 - Adjust to the new library naming scheme in iconv-1.12
26 - Implemented missing addTriplet method
27 - Rewrite the implementation of ArchiveReader. The new implementation is more
28 efficient in listing contents of directories. Now single directory entries can be returned without the need for reading the entire archive of which the directory is a part.
290.6.4
30 - Path fixes to the build system the benefit of windows users (sengels)
31 - Clean up of class ArchiveReader
32 - Support for LZMA compressed streams in archives, notably .deb and .rpm
33 - Remove preceding ./ from file path in tar archives.
34 - Make parsing ar and deb files easier to abort: useful in e.g. Dolphin
35 - Better method of removing deleted file from the CLucene
36 - Do not tokenize the URL in the index to improve polling speed
37 - Fix the bz2 header check: more bz2 archives are recognized (pino)
38 - Fix infinite loop on parsing SGI image files
39 - Fix reading of zip files without central directory.
400.6.3
41 - Move Strigi::DirLister in archivereader.h to ArchiveReader::DirLister. Two class with this name were present in the code. The one in archivereader.h was not used in any code outside of Strigi, so we are changing it. Note that this changes means that one should not use Strigi 0.6.2.
42 - Change type of EntryInfo.mtime from 'unsigned' to time_t.
43 - The spec of SDF files was found and used to implement a more precise syntax check for the header of SDF files.
44 - Fix memory corruption bug in ArchiveReader.
45 - Change type of ontology entry 'exposureTime' to string. In theory something like duration would make sense but in practice xsd:string is the used one.
46 - Add a default rule to find mail box directories with pattern '.*.directory'. Since these directory names start with a dot, they are normally not found.
47 - Add '$HOME/.kde4' to the directories that are indexed by default.
48 - Simplify matching of file paths in the rules for including or excluding directories from the index. The code is now more readable and easier to maintain.
49 - Fix a big performance problem: Whenever a directory mtime changed, all files inside the directory were re-indexed.
50 - Fix bug where a gz archive that contains a file that is identical to the
51 original archive is indexed over and over. The depth of nested files that are indexed is now limited to 127.
520.6.2
53 - Better support for nice IO priorities on Linux (Sebastian Trueg)
54 - Compile with development version of CLucene (Ben van Klinken)
55 - Explicitly use 'unsigned char' or 'signed char' instead of 'char' since 'char' can be either signed or unsigned on different processors. E.g. on ARM 'char' means 'unsigned char' and on i386 'char' means 'signed char'. This changes makes libstreamanalyzer 0.6.2 binary incompatible with versions < 0.6.0. (Jos van den OOever)
56 - Many CMake cleanups (Alexander Neundorf)
57 - 6.5x speedup of C++ comment analyzer (Jakub Stachowski)
58 - Various stability fixes (Jos van den Oever, Sebastian Trueg)
59 - Support for ePub format (Jakub Stachowski)
60 - Handle RIFF file with unspecified size for the RIFF packet. (Jos van den Oever)
610.5.11
62 - Fix a bug that can cause a crash on an executable zip file.
63 - Fix parsing of empty headers when CRLFCRLF is followed by a space. In other words, fix parsing of emails that have a space as the first character in the body.
64 - Fix two broken (by design) throughanalyzers by replacing the with one eventanalyzer.
65 - Updated xesam ontology to include proper ranges. This is necessary for the Nepomuk backend but does not change anything for clucene (were all is string anyway)
66 - Make sure the app can handle environments where HOME is not defined.
67 - Make the zip analyzer check more often if it should stop analyzing.
68 - Fix wrong comparison when checking if we are finished yet.
69 - Make the analyzer respect a configuration that only wants part of the stream to be analyzed.
70 - Add an analyzer for Windows self-extracting zip archives.
71 - Ask the analyzerconfiguration if we should continue and put a cap on the maximum length of stream we read
72 - Log parse errors in the analysisresult.
730.5.10
74 - Improved Xesam support. strigidaemon can now be queried with the client from
75 the Xesam test suite.
76 - Fix a bug in subinputstream.
77 Under certain circumstances the function read() of the internal stream could
78 be called with max < min. read() specifies that in such cases, there is no
79 limit on the number of bytes that may be read. This would cause
80 SubInputStream to malfunction because it would allow too much of the internal
81 stream to be read.
82 - Reenable a number of endanalyzers.
83 By accident, the analyzers for .tar and .gz files were disabled in the
84 previous release. Now they are re-enabled.
850.5.9
86 - Fix bug that would severely bloat the strigi index.
87 - Improve latency when calling strigi to stop.
88 - Better (but not yet complete) Xesam support
890.5.8
90 - Improve quiting latency of the most important analyzers. Now Strigi reacts more quickly when you tell it to stop indexing.
91 - Add a tool to analyze the analyzer latency profile and find analyzers that have a high latency.
92 - Bring field names in line with the Xesam ontology.
93 - New analyzers for avi, wav, dds, rgb, sid and ico file types.
94 - Fix deepgrep (finally working again since 0.5.2) and extend the number of fields deepgrep searches in. Now it also searches in fields that are passed as "unsigned char*" to the IndexWriter, but only if they are not registered as being binary fields.
95 - Install two headers that provide metadata information about field types. Basically, these classes publish the ontology that strigi uses.
96 - Fix a problem with CLucene throwing CLuceneError. Because of -fvisibility=hidden, the code did not recognize CLuceneError and caused it to fall through, thus crashing programs using libstreamanalyzer. A unit test to avoid the problem from reappearing has been added.
97 - Fix for system where setenv() is not available (for instance windows). Hopefully those systems have putenv() :)
98 - Remove support for starting strigidaemon with an arbiratry index type and index dir, but add an option to use a different configuration file. This effectively gives the use the same possiblities.
99 - Fixes to the build system that allow strigi to be built and tested as part of a larger project (e.g. kdesupport).
100 - 'strigicmd listFiles' now can be used to retrieve all files/dir indexed under a certain path
101 - Added for support for Gentoo-way compilation flags. Implemented more consistent and pretty optional dependency handling.
1020.5.7
103 - use plugins instead of shared libraries for the indexer backends
104 - lots of bugfixes and cleanups
105 - allow backends to be used in RAM by using ':memory:' as the index name
1060.5.6
107 - Added Xesam User Language parser. Now it will be possible to handle Xesam UserLanguage queries (http://wiki.freedesktop.org/wiki/XesamUserSearchLanguage).
108 - Replaced .ini-based ontology parser with RDF/XML one.
109 - Updated strigicmd: now it's possible to perform searches formulated
110 following xesam userlanguage specifications.
111 - Improved ontology introspection API: properties and classes now have child lists and applicable classes/properties lists.
112 - change IndexReader::getFiles to IndexReader::getChildren.
113 - removed IndexReader::documentId and IndexReader::mTime.
114 - loads of build issues fixed
115 - added a script that helps you to find the patch that broke a unit test
116 - add fieldname for document content per the Xesam standard.
117 - lots more
1180.5.5
119 - GUI now uses a .ui file making future improvements much easier
120 - install detection script for ease of use in other cmake projects
121 - modifying the signature of endAnalysis to endAnalysis(bool complete)
122 for StreamLineAnalyzer, StreamEventAnalyzer, and StreamSaxAnalyzer
123 - add a function to AnalyzerConfiguration that tell how many bytes can
124 be read at most from a stream
125 - add an SAX analyzer plugin that extracts the namespaces used in XML
126 documents. With this it possible to get all XML documents that contain e.g.
127 Chemical Markup Language or Dublin Core.
128 - add a stream for changing the encoding of an incoming stream on the fly
129 - use the new encoding stream to do better email parsing
130 - add m3u stream analyzer.
131 - add simple test program for strigi xesam query builder. It loads a file
132 containing the xesam query. It converts the xesam query into a Strigi::Query
133 object. It serializes the Strigi::Query object to xml for e.g. quality
134 control.
135 - add xesamquery option to strigicmd: now it's possibile to make queries
136 using Xesam language.
137 - add XesamQueryLanguage queries support. Now is possibile to translate
138 xesam queries formulated using XesamQueryLanguage into Strigi::Query objects.
139 - add a cgi executable that takes multipart/form-data and outputs an analysis
140 of the data as xml
141 - give xmlindexer the ability to read from stdin
142 - big improvement in parsing ms word files
143 - better input sanity checking. thanks to zzuf for reporting the errors
144 - cleanup of private variables in classes by introducing a d-pointer
145
1460.5.4
147 - simplify PollingListener by letting it reuse code from DirAnalyzer
148 - improve parsing speed by reading incrementally large blocks and only if no throughanalyzer is ready yet
149 - extract more data from ogg and ID3 files
150 - new registerField(fieldname) function that gets additional data from the
151 ontology
152 - support of indexwriter calls: addValue(index, field, data, size),
153 addValue(index, field, double_value) to CLucene backend.
154 - enable passing of "Tokenized" flag parameter to CLucene backend
155 - support for the Keyword Terms which are not tokenized during queries
156 - handling of optional indexing flags, which are loaded from the ontology
157 - handling of cardinality constraint when indexing
158 - add keyword query type which allows for using keywords that are not split
159 up. e.g. chemistry.molecular_formula#"C 4 H 10". basically "#" sign tells -- do not tokenize
160 - parse the userlanguage wrapped in xesam query language xml
161 - add searialization to xml for Strigi::Query and Strigi::Term, useful for
162 debugging purposes
163 - add types from the xesam dbus interface to strigitypes.h
164 - add support for gif files
165 - add support for analyzing jpeg files.
166 - add prioritized, multithreaded queue for incoming requests
167 - add option --lastfiletoskip to diranalyzer and xmlindexer
168 - add support for Cc: Bcc: Message-ID: In-Reply-To: References: From: and To:
169 - add exclude and include filters to strigicmd create and update commands
170 - add deindex option, it can be used for removing dirs or files from an index
171 created by strigi
172
1730.3.11
174 - SunOS, BSD, 64 bit and Coverity compatibility fixes
175 - Search in a set of default fields and not just in the text content of a file, if no specific field is specified.
176 - Add histogram widget to simple search client
177 - Add support for Ogg Vorbis
178 - Better decoding of email headers
179 - Expand Query object to handle nested queryies
180 - Fix highlighting and display of title in search results.
181 - Fix path for the child indexables
182 - Fix memory problems in archivereader
183 - Check for too short file names and omit the RPM trailer from the results.
184 - Add an additional unit test for the RPM stream provider.
185 - Revert raise() to kill(getpid()) because raise hangs the thread.
186 - Install qtdbus library for strigi.
187
1880.3.10
189 - Convienience classes for using Strigi over Qt 4.2 DBus
190 - Change buildsystem to allow building of deepfind, deepgrep and xmlindexer
191 separately
192 - Speedup of deepfind by selectively using only the analyzers deepfind needs
193 - Many portability fixes (GCC 3, Forte, MSVC)
194 - New, more efficien plugin loading
195 - Add IFilter plugin for the Windows version
196 - Remove the big Strigi lock (faster indexing)
197 - Switch strigiclient to communicate of DBus instead of over a unix socket
198 - Reorganization of the indexer with a new IndexerConfiguration
199 - Improvements of file name filters
200 - New Qt widget for configuring file name filters
201 - Add file name setting to the DBus interface
202 - Move verbose unit tests
203 - Bugfixes in some streams
204
2050.3.9
206 - Added deepfind and deepgrep, programs that are enhanced versions of find
207 and grep.
208 - Added a new way of storing the configuration in an xml file.
209 - Added a way to search in multiple indexes.
210 - Added xmlindexer, a program that outputs the file parsing results as xml.
211 This is convenient for debugging and can also used by other programs that
212 do not want to write their own indexer. It makes the superior Strigi
213 indexer available to other software in a convenient way.
214 - More versatile filters that determine which files to index. (Flavio
215 Castelli)
216 - Add possibility to index files from the client by feeding the file into the
217 daemon. This opens the way to indexing email from remote servers and web
218 pages.
README.win32
(57 / 0)
  
1==Strigi On Windows==
2
3
4Microsoft Visual compiling instructions
5=======================================
6
7I have managed to get Strigi to compile with Microsoft Visual (6 and 8/.NET). So far I have only ported the CLucene indexer since my interest in Strigi is as an indexing application.
8
9Steps to get it to work:
10
111. I had to get all the dependencies from the http://gnuwin32.sourceforge.net/packages.html website.
12I found putting all the libraries into 1 folder resembling linux structure was the easiest:
13
14package
15 \bin
16 bzip2.dll, magic1.dll, zlib1.dll, etc
17 \include
18 bzlib.h, iconv.h, magic.h, zconf.h, zlib.h, etc
19 \lib
20 bzip2.lib, iconv.lib, etc
21
22Basically you just need to download each package that cmake requests. If there is enough interest, i may make my dependencies directory available somewhere so that you can get started quicker.
23
24You'll also need to download CLucene from http://clucene.sourceforge.net. You can either:
25
26* build CLucene - you should build with Unicode, multithreaded and use the same type of runtime libraries as strigi
27OR
28* modify the build once cmake is finished and add the clucene CLMonolithic.cpp to luceneindexer (or whatever binary you end up making). This can be easier since it's a bit quicker and easier to match the library's runtime libraries, etc.
29
302. Get CMake running. Select the strigi folder to configure, and where to build the binaries. Then hit configure, it will ask what you want to build for, i used Visual Studio 6, but I suppose other compilers might work too.
313. Almost immediately it will complain about missing dependencies. Don't worry, this is because you haven't specified the dependencies folder. Under STRIGI_EXTRA_(INC|LIB)_DIRECTORY set the appropriate paths - also add the src directory of the clucene package to the includes directory (seperate by colon ;).
324. Turn off ENABLE_DBUS in the Cmake config, i haven't tried getting it to work yet (though you might be able to).
335. Set the EXECUTABLE_OUTPUT_PATH and LIBRARY_OUTPUT_PATH, this will make things easier since everything won't be spread out.
346. Hit Ok and you should have all your libraries built.
35
36I haven't got all the packages running yet, but there are a few that you should compile straight away. Start with streams and streamindexer, these are the core of strigi.
37
38Now the actual indexer... luceneindexer. Hopefully its just a matter of compiling it, otherwise let me know :)
39
40The next aim is to get the daemon and the inotify equivalent stuff working
41
42
43Mingw compiling instructions
44============================
45
461. Install all the dependencies from the http://gnuwin32.sourceforge.net/packages.html
47 website into a common directory, which will be used in step 5.
482. download and install recent cmake version from http://www.cmake.org
493. download CLucene from http://clucene.sourceforge.net and build it. See the msvc section for the
50 available options
514. create build dir for example <your buildroot>\strigi-mingw-build
525. enter the build dir and run
53 cmake -G "MinGW Makefiles" <strigi-source-root> -DCMAKE_INCLUDE_PATH=<gnuwin32-installation-root>\include -DCMAKE_LIBRARY_PATH=<gnuwin32-installation-root>\lib
546. run
55 mingw32-make install
56 to compile and install strigi into <ProgramFiles>/strigi
57
SPLIT_TODO
(4 / 0)
  
1- see how to support uninstall in the subprojects (cmake_uninstall.cmake.in)
2- FindCppUnit.cmake
3- Stop supporting expat and clean up remnants since we need libxml2 anyway, expat has no value anymore (is it faster?)
4- FindRegex.cmake
TODOFILES
(199 / 0)
  
1AUTHORS
2ChangeLog
3- cleanindexatts.sh
4- cmake/cmake_uninstall.cmake.in
5- cmake/FindBZip2.cmake
6? cmake/FindCppUnit.cmake
7- cmake/FindExpat.cmake
8- cmake/FindHyperEstraier.cmake
9- cmake/FindLibXml2.cmake
10- cmake/FindQt4.cmake
11?/cmake/FindRegex.cmake
12- cmake/FindSQLite.cmake
13? cmake/FindXAttr.cmake
14? cmake/FindXSD.cmake
15? cmake/MacroPushRequiredVars.cmake
16COPYING
17? doc/fieldproperties.txt
18? doc/images/streamindexer.svg
19? doc/xesam/xesam.rdfs
20? Doxyfile
21NEWS
22README.win32
23# archivereader was not installed, byebye
24- src/archiveengine/archivedirengine.cpp
25- src/archiveengine/archivedirengine.h
26- src/archiveengine/archiveengine.cpp
27- src/archiveengine/archiveengine.h
28- src/archiveengine/archiveenginehandler.cpp
29- src/archiveengine/archiveenginehandler.h
30- src/archiveengine/fsfileinputstream.cpp
31- src/archiveengine/fsfileinputstream.h
32- src/archiveengine/streamengine.cpp
33- src/archiveengine/streamengine.h
34- src/archiveengine/tests/ArchiveEngineHandlerTest.cpp
35- src/archiveengine/tests/ArchiveEngineHandlerTest.h
36- src/archiveengine/tests/makestestrunner.pl
37- src/archiveengine/tests/valgrindtest.sh
38# archivecat as not installed, byebye
39- src/archivereader/archivecat.cpp
40- src/archivereader/qclient/archiveenginehandler.cpp
41- src/archivereader/qclient/archiveenginehandler.h
42- src/archivereader/qclient/filebrowser.cpp
43- src/archivereader/qclient/filebrowser.h
44- src/archivereader/qclient/filehandler.cpp
45- src/archivereader/qclient/fsfileinputstream.cpp
46- src/archivereader/qclient/fsfileinputstream.h
47- src/archivereader/qclient/qclient.cpp
48# no-one was using the estraier backend
49- src/estraierindexer/estraierindexer.cpp
50- src/estraierindexer/estraierindexmanager.cpp
51- src/estraierindexer/estraierindexmanager.h
52- src/estraierindexer/estraierindexreader.cpp
53- src/estraierindexer/estraierindexreader.h
54- src/estraierindexer/estraierindexwriter.cpp
55- src/estraierindexer/estraierindexwriter.h
56- src/estraierindexer/tests/EstraierTest.cpp
57? src/indexertests/indexmanagertests.cpp
58? src/indexertests/indexmanagertests.h
59? src/indexertests/indexreadertests.cpp
60? src/indexertests/indexreadertests.h
61? src/indexertests/indexwritertests.cpp
62? src/indexertests/indexwritertests.h
63? src/indexertests/verify.h
64# qclientarchivecat and archivedialog were not installed
65- src/qclient/archivecat.cpp
66- src/qclient/archivedialog.cpp
67- src/qclient/filebrowser.cpp
68- src/qclient/filebrowser.h
69- src/qclient/filehandler.cpp
70# no-one was using the sqliste backend
71- src/sqliteindexer/sqliteindexer.cpp
72- src/sqliteindexer/sqliteindexmanager.cpp
73- src/sqliteindexer/sqliteindexmanager.h
74- src/sqliteindexer/sqliteindexreader.cpp
75- src/sqliteindexer/sqliteindexreader.h
76- src/sqliteindexer/sqliteindexwriter.cpp
77- src/sqliteindexer/sqliteindexwriter.h
78- src/sqliteindexer/tests/simpletest.cpp
79- src/sqliteindexer/tests/SqliteTest.cpp
80? src/streamanalyzer/fieldproperties/chemical.rdfs
81# fieldproperties files are not used anymore, right?
82- src/streamanalyzer/fieldproperties/strigi_chemistry.fieldproperties
83- src/streamanalyzer/fieldproperties/strigi_cursor.fieldproperties
84- src/streamanalyzer/fieldproperties/strigi_diff.fieldproperties
85- src/streamanalyzer/fieldproperties/strigi_documentstats.fieldproperties
86- src/streamanalyzer/fieldproperties/strigi_font.fieldproperties
87- src/streamanalyzer/fieldproperties/strigi_ole.fieldproperties
88- src/streamanalyzer/fieldproperties/strigi_source_code.fieldproperties
89- src/streamanalyzer/filelistertest.cpp
90- src/streamanalyzer/indexwriter.cpp
91- src/streamanalyzer/programthroughanalyzer.h
92? src/streamanalyzer/tests/indextests.cpp
93? src/streamanalyzer/tests/querytests.cpp
94# what to do with xesam queries ?
95? src/streamanalyzer/xesam/location.hh
96? src/streamanalyzer/xesam/position.hh
97? src/streamanalyzer/xesam/stack.hh
98? src/streamanalyzer/xesam/StrigiQueryBuilder.cc
99? src/streamanalyzer/xesam/StrigiQueryBuilder.h
100? src/streamanalyzer/xesam/test.cpp
101? src/streamanalyzer/xesam/testqueries/africa2.txt
102? src/streamanalyzer/xesam/testqueries/africa.txt
103? src/streamanalyzer/xesam/testqueries/helloworld2.txt
104? src/streamanalyzer/xesam/testqueries/helloworld3.txt
105? src/streamanalyzer/xesam/testqueries/helloworld.txt
106? src/streamanalyzer/xesam/testqueries/hendrix2.txt
107? src/streamanalyzer/xesam/testqueries/hendrix.txt
108? src/streamanalyzer/xesam/testqueries/hendrix.xml
109? src/streamanalyzer/xesam/testqueries/inSet.xml
110? src/streamanalyzer/xesam/testqueries/irc_oever.xml
111? src/streamanalyzer/xesam/testqueries/negate.txt
112? src/streamanalyzer/xesam/testqueries/uglyduckling2.xml
113? src/streamanalyzer/xesam/testqueries/uglyduckling.xml
114? src/streamanalyzer/xesam/testqueries/userQuery.xml
115? src/streamanalyzer/xesam/xesam2strigi.cpp
116? src/streamanalyzer/xesam/xesam2strigi.h
117? src/streamanalyzer/xesam/XesamParser.h
118? src/streamanalyzer/xesam/XesamQLParser.cc
119? src/streamanalyzer/xesam/XesamQLParser.h
120? src/streamanalyzer/xesam/XesamQueryBuilder.cc
121? src/streamanalyzer/xesam/XesamQueryBuilder.h
122? src/streamanalyzer/xesam/xesam_ul_driver.cc
123? src/streamanalyzer/xesam/xesam_ul_driver.hh
124? src/streamanalyzer/xesam/xesam_ul_file_scanner.cpp
125? src/streamanalyzer/xesam/xesam_ul_file_scanner.h
126? src/streamanalyzer/xesam/xesam_ul_parser.cc
127? src/streamanalyzer/xesam/xesam_ul_parser.hh
128? src/streamanalyzer/xesam/xesam_ul_parser.yy
129? src/streamanalyzer/xesam/xesam_ul_scanner.cpp
130? src/streamanalyzer/xesam/xesam_ul_scanner.h
131? src/streamanalyzer/xesam/xesam_ul_string_scanner.cpp
132? src/streamanalyzer/xesam/xesam_ul_string_scanner.h
133- src/streams/decodebase64.cpp
134- src/streams/filereader.cpp
135- src/streams/oletest.cpp
136- src/streams/strigi/jstreamsconfig.h
137# we might have to add regex support back, not sure which systems do not have it natively
138? src/streams/strigi/regex/regcomp.c
139? src/streams/strigi/regex/regex.c
140? src/streams/strigi/regex/regexec.c
141? src/streams/strigi/regex/regex.h
142? src/streams/strigi/regex/regex_internal.c
143? src/streams/strigi/regex/regex_internal.h
144? src/streams/strigi/strigiconfig.h.win32.cmake
145- src/streams/strigi/strigi_thread.h
146- src/streams/testpt.cpp
147# no-one was using the sqliste backend
148? src/xapianindexer/xapianindexer.cpp
149? src/xapianindexer/xapianindexmanager.cpp
150? src/xapianindexer/xapianindexmanager.h
151? src/xapianindexer/xapianindexreader.cpp
152? src/xapianindexer/xapianindexreader.h
153? src/xapianindexer/xapianindexwriter.cpp
154? src/xapianindexer/xapianindexwriter.h
155- testdata/analyzers/all/config
156- testdata/analyzers/zip/config
157./tests/bashscripts/findPatchThatBrokeUnitTest.sh
158? tests/bashscripts/simpleupdate.sh
159? tests/bashscripts/twofileupdate.sh
160./tests/daemon/daemonconfiguratortest.cpp
161./tests/daemon/daemonconfiguratortest.h
162./tests/daemon/dbus/daemondbustest.cpp
163./tests/daemon/dbus/daemondbustest.h
164./tests/daemon/dbus/runner.cpp
165./tests/daemon/dbus/strigidaemonunittestsession.cpp
166./tests/daemon/dbus/strigidaemonunittestsession.h
167./tests/daemon/dbus/test.cpp
168./tests/daemon/dbus/xesamdbustest.cpp
169./tests/daemon/dbus/xesamdbustest.h
170./tests/daemon/dbus/xesam/generatexesambindings.sh
171./tests/daemon/dbus/xesamlistener.cpp
172./tests/daemon/dbus/xesamlistener.h
173./tests/daemon/dbus/xesam/xesamdbus.cpp
174./tests/daemon/dbus/xesam/xesamdbus.h
175./tests/daemon/dbus/xesam/xesamtypes.h
176./tests/indextesters/clucenetests.cpp
177- tests/indextesters/estraiertests.cpp
178./tests/indextesters/indexmanagertester.cpp
179./tests/indextesters/indexmanagertester.h
180./tests/indextesters/indexreadertester.cpp
181./tests/indextesters/indexreadertester.h
182./tests/indextesters/indexsearchtester.cpp
183./tests/indextesters/indexsearchtester.h
184./tests/indextesters/indextest.cpp
185./tests/indextesters/indextest.h
186./tests/indextesters/indexwritertester.cpp
187./tests/indextesters/indexwritertester.h
188- tests/indextesters/sqlitetests.cpp
189./tests/streamanalyzer/diranalyzertester.cpp
190./tests/streamanalyzer/diranalyzertester.h
191./tests/streamanalyzer/xesam/xesam2strigitest.cpp
192./tests/streamanalyzer/xesam/xesam2strigitest.h
193./tests/test_runner.cpp
194./tests/utils/unittestfunctions.cpp
195./tests/utils/unittestfunctions.h
196./TODO
197./TODOMONDAY
198./TODO.Phreedom
199./zzuf.txt