Home
- News
- Overall Architecture
- goodiff-core software requirements
- goodiff-frontend software requirements
- Datasets
- Playground, ongoing experiments and development

GooDiff is a service for automated tracking of semantic changes in web service policies.
The gitorious page is used for the ongoing development of GooDiff including the sharing of the collected datasets.
News
- 02 January 2010 – Raw pages dataset available
- 27 November 2009 – First free software release of GooDiff @ VJ12
- 28 November 2009 – First GooDiff Hackathon @ hackerspace.be
- 29 November 2009 – GooDiff – How can the free society survive the grey goo? @ VJ12
Overall Architecture

goodiff-core software requirements
- Python
- BeautifulSoup – a version before 3.1 like 3.0.7a
- pysvn
goodiff-frontend software requirements
- Python
- Trac
- Apache httpd server
Small note, if you are moving the trac installation to a different Subversion repository source. You may need to do a “trac-admin /home/goodiff/trac/GooDiff resync”.
Datasets
Datasets (with all the revisions starting from mid-2006 until now with some big jumps in time) are available as a git bundle at
the following location http://www.foo.be/goodiff/bundle/ .
- The dataset named “GooDiff-gitbundle-YYYY-MM-DD” are the cleaned text files generated from the HTML processing
- The dataset named “GooDiff-gitbundle-rawpages-YYYY-MM_DD” are the raw HTML pages without any processing
The datasets are available for further processing like semantic analysis or additional clean-up for further analysis of the legal documents.
How to import locally the dataset
mkdir dataset
cd dataset; git init
git pull ../GooDiff-gitbundle-yyyy-mm-dd master
Now you have your local GooDiff dataset with all the changesets.

