1
DataMatrix - a Pythonic implementation of R's data.frame
2
(C) 2008 Luca Beltrame
3
(C) 2008 Giovanni Marco Dall'Olio
4
5
Contents
6
7
1. License
8
2. Description
9
3. Requirements and installation
10
4. Usage
11
5. Credits
12
13
------------------------------------------------------------------------------
14
1. License
15
------------------------------------------------------------------------------
16
17
This program is distributed under the terms of the GNU General Public License
18
(GPL), version 2. This means that you can freely modify, copy and distribute
19
the program, under the terms of said license. The COPYING file gives a good
20
overview of what you can and can't do.
21
22
Although we hope that this program will be useful, but it is without ANY 
23
WARRANTY, without even the implied warranty of MERCHANTABILITY or FITNESS FOR 
24
A PARTICULAR PURPOSE.
25
26
------------------------------------------------------------------------------
27
2. Description
28
------------------------------------------------------------------------------
29
30
DataMatrix is a Python module that tries to emulate the behavior of the 
31
"data.frame" data structure of the R programming language. Data.frames are 
32
essentially tables that can be queried either by row or columns, or, in the 
33
presence of a header, even by column names. 
34
35
DataMatrix emulates this behavior by reading a text file (or file-like object) 
36
into a dictionary which in turn contains lists (the columns of the table).
37
For each row in the original text file, the columns contain a two item tuple, 
38
whose first item is the "row name" and the second the actual value for that 
39
row and column. 
40
41
Row names are a way to identify a row precisely (R uses it because some of its
42
operations return the names of the rows rather than their values) and are either
43
read from a specific column or a progressive numeric value otherwise.
44
45
DataMatrix objects can be queried with a dictionary-like syntax (e.g., 
46
matrix["column name"]), or by other methods.
47
48
------------------------------------------------------------------------------
49
3. Requirements and installation
50
------------------------------------------------------------------------------
51
52
DataMatrix requires Python, at least version 2.5. Python 2.6 is 
53
also known to work. It will likely not work on Python 3.0. It makes no use of
54
third-party modules so it should work on a standard Python install. Being a 
55
pure Python module, it will work reliably on Windows, Linux, *BSD and OS X. 
56
57
The latest version can be downloaded from 
58
59
http://www.dennogumi.org/projects-2/datamatrix  
60
61
both as a source distribution and a Windows installer. Windows user just need
62
to run the installer and follow the on-screen instructions. Other operating 
63
system users should download the source distribution, unpack it 
64
(tar xvzf filename) and then install it by issuing
65
66
python setup.py install
67
68
as root.
69
70
------------------------------------------------------------------------------
71
4. Usage
72
------------------------------------------------------------------------------
73
74
DataMatrix requires a file, or file-like object. We will use the StringIO 
75
module to create a virtual file, so you will have also idea of how an
76
input file should look like:
77
78
>>> from StringIO import StringIO
79
>>> fh = StringIO('''name surname
80
... Albert Einstein
81
... Groucho Marx
82
... ''')
83
84
The basic usage is to create a matrix object by passing a filehandler object
85
to datamatrix.DataMatrix:
86
>>> import datamatrix
87
>>> matrix = datamatrix.DataMatrix(fh, header=True, delimiter=' ')
88
89
Aside the file object, which is mandatory, there are a number of parameters 
90
that can be used. First of all, the "header" parameters tells DataMatrix if 
91
the file to read has a header or not, and if so, the header will be used to
92
assign names to the columns. Otherwise, it will just be a number for each
93
column.
94
95
To specify the column where row names are located, the row_names parameter is
96
used:
97
>>> matrix2 = datamatrix.DataMatrix(fh, header=True, row_names=1)
98
99
In this case, row names are obtained from the first column in the file.
100
101
DataMatrix uses the csv module to do its parsing, so you can specify
102
additional parameters to define the format of your data, such as delimiter
103
(the separator between fields), lineterminator and quoting (how to deal with
104
non-numeric fields). See the csv module documentation for additional details.
105
106
If you print a DataMatrix instance, you'll get some basic information:
107
108
>>> print matrix
109
File name: 
110
Column with identifier names: None (numeric)
111
No. of rows: 2
112
No. of columns: 2
113
Columns: name, surname
114
115
With the columns attribute you can view the columns as a list:
116
117
>>> print matrix.columns
118
['name', 'surname']
119
120
You can access specific rows with the getRow method:
121
122
>>> matrix.getRow(1)
123
['1', 'Albert', 'Einstein']
124
125
Or specific columns with the getColumn method:
126
127
>>> matrix.getColumn("surname",column_name=True)
128
['surname', 'Einstein', 'Marx']
129
130
To get a representation of your data, there is the view method:
131
132
>>> matrix.view()
133
1 Albert Einstein
134
2 Groucho Marx
135
136
Rows and columns can be appended with the append and appendRow methods,
137
respectively. In both cases, the item to be appended needs to be a sequence
138
(list or tuple) and must be as long as the other columns (when appending 
139
columns) or cover all the columns (when appending rows):
140
141
>>> profession = ["scientist", "comedian"] # new column
142
>>> matrix.append(profession, "Job")
143
144
>>> entry = ["Isaac", "Asimov", "writer"] # new row
145
>>> matrix.appendRow(entry,"3")
146
147
Notice that when you append a row and a column you must specify a column or a
148
row name to the methods, as the examples above show.
149
150
Finally, you can write DataMatrix objects to files or file-like objects with 
151
the writeMatrix function in the DataMatrix module:
152
153
>>> fh = StringIO()
154
>>> datamatrix.writeMatrix(matrix,fh)
155
156
Output formatting is again set via options to the csv module. Optionally you 
157
can save only part of the columns, specified as a list:
158
159
>>> datamatrix.writeMatrix(matrix,fh,columns=["name","Job"])
160
161
For other uses, please see the API documentation (generated with pydoc as a 
162
HTML file), also present in the distribution.
163
164
------------------------------------------------------------------------------
165
5. Credits and contact information
166
------------------------------------------------------------------------------
167
168
DataMatrix was started by me (Luca Beltrame) and Giovanni joined up later, 
169
with code fixes and most importantly with unit tests. 
170
Bug reports, feature requests and comments should be either sent via email
171
(einar at heavensinferno dot net) or by leaving a comment at 
172
http://www.dennogumi.org/projects-2/datamatrix