1
The code in this directory makes up the "git data miner," a simple hack
2
which attempts to figure things out from the revision history in a git
3
repository. 
4
5
RUNNING GITDM
6
7
Run it like this:
8
9
   git log -p -M [details] | gitdm [options]
10
11
The [details] tell git which changesets are of interest; the [options] can
12
be:
13
14
	-a	If a patch contains signoff lines from both Andrew Morton 
15
		and Linus Torvalds, omit Linus's.
16
17
	-c file Specify the name of the gitdm configuration file.  
18
	   	By default, "./gitdm.config" is used.
19
20
	-d	Omit the developer reports, giving employer information
21
         	only. 
22
23
	-D	Rather than create the usual statistics, create a 
24
		file providing lines changed per day, suitable for
25
		feeding to a tool like gnuplot.
26
27
	-h file	Generate HTML output to the given file
28
29
	-l num	Only list the top <num> entries in each report.
30
31
	-o file	Write text output to the given file (default is stdout).
32
33
	-r pat	Only generate statistics for changes to files whose 
34
	   	name matches the given regular expression.
35
36
	-s	Ignore Signed-off-by lines which match the author of 
37
		each patch.
38
39
	-u 	Group all unknown developers under the "(Unknown)"
40
 	        employer. 
41
42
	-z 	Dump out the hacker database to "database.dump".
43
44
A typical command line used to generate the "who write 2.6.x" LWN articles
45
looks like:
46
47
    git log -p -M v2.6.19..v2.6.20 | \
48
	gitdm -u -s -a -o results -h results.html
49
50
51
CONFIGURATION FILE
52
53
The main purpose of the configuration file is to direct the mapping of
54
email addresses onto employers.  Please note that the config file parser is
55
exceptionally stupid and unrobust at this point, but it gets the job done.  
56
57
Blank lines and lines beginning with "#" are ignored.  Everything else
58
specifies a file with some sort of mapping:
59
60
EmailAliases file
61
62
	Developers often post code under a number of different email
63
	addresses, but it can be desirable to group them all together in
64
	the statistics.  An EmailAliases file just contains a bunch of
65
	lines of the form:
66
67
		alias@address  canonical@address
68
69
	Any patches originating from alias@address will be treated as if
70
	they had come from canonical@address.
71
72
73
EmailMap file
74
75
	Map email addresses onto employers.  These files contain lines
76
	like:
77
78
		[user@]domain  employer  [< yyyy-mm-dd]
79
80
	If the "user@" portion is missing, all email from the given domain
81
	will be treated as being associated with the given employer.  If a
82
	date is provided, the entry is only valid up to that date;
83
	otherwise it is considered valid into the indefinite future.  This
84
	feature can be useful for properly tracking developers' work when
85
	they change employers but do not change email addresses.
86
87
88
GroupMap file employer
89
90
	This is a variant of EmailMap provided for convenience; it contains
91
	email addresses only, all of which are associated with the given
92
	employer.
93
94
95
NOTES AND CREDITS
96
97
Gitdm was written by Jonathan Corbet; many useful contributions have come
98
from Greg Kroah-Hartman.
99
100
Please note that this tool is provided in the hope that it will be useful,
101
but it is not put forward as an example of excellence in design or
102
implementation.  Hacking on gitdm tends to stop the moment it performs
103
whatever task is required of it at the moment.  Patches to make it less
104
hacky, less ugly, and more robust are welcome.
105
106
Jonathan Corbet
107
corbet@lwn.net