GooDiff Logo

GooDiff is a service for automated tracking of semantic changes in web service policies.

The gitorious page is used for the ongoing development of GooDiff including the sharing of the collected datasets.

News

  • 02 January 2010 – Raw pages dataset available
  • 27 November 2009 – First free software release of GooDiff @ VJ12
  • 28 November 2009 – First GooDiff Hackathon @ hackerspace.be
  • 29 November 2009 – GooDiff – How can the free society survive the grey goo? @ VJ12

Overall Architecture

GooDiff Overall Architecture

goodiff-core software requirements

goodiff-frontend software requirements

  • Python
  • Trac
  • Apache httpd server

Small note, if you are moving the trac installation to a different Subversion repository source. You may need to do a “trac-admin /home/goodiff/trac/GooDiff resync”.

Datasets

Datasets (with all the revisions starting from mid-2006 until now with some big jumps in time) are available as a git bundle at
the following location http://www.foo.be/goodiff/bundle/ .

  • The dataset named “GooDiff-gitbundle-YYYY-MM-DD” are the cleaned text files generated from the HTML processing
  • The dataset named “GooDiff-gitbundle-rawpages-YYYY-MM_DD” are the raw HTML pages without any processing

The datasets are available for further processing like semantic analysis or additional clean-up for further analysis of the legal documents.

How to import locally the dataset

mkdir dataset
cd dataset; git init
git pull ../GooDiff-gitbundle-yyyy-mm-dd master

Now you have your local GooDiff dataset with all the changesets.

Playground, ongoing experiments and development