| 1 |
DataMatrix - a Pythonic implementation of R's data.frame |
| 2 |
(C) 2008 Luca Beltrame |
| 3 |
(C) 2008 Giovanni Marco Dall'Olio |
| 4 |
|
| 5 |
Contents |
| 6 |
|
| 7 |
1. License |
| 8 |
2. Description |
| 9 |
3. Requirements and installation |
| 10 |
4. Usage |
| 11 |
5. Credits |
| 12 |
|
| 13 |
------------------------------------------------------------------------------ |
| 14 |
1. License |
| 15 |
------------------------------------------------------------------------------ |
| 16 |
|
| 17 |
This program is distributed under the terms of the GNU General Public License |
| 18 |
(GPL), version 2. This means that you can freely modify, copy and distribute |
| 19 |
the program, under the terms of said license. The COPYING file gives a good |
| 20 |
overview of what you can and can't do. |
| 21 |
|
| 22 |
Although we hope that this program will be useful, but it is without ANY |
| 23 |
WARRANTY, without even the implied warranty of MERCHANTABILITY or FITNESS FOR |
| 24 |
A PARTICULAR PURPOSE. |
| 25 |
|
| 26 |
------------------------------------------------------------------------------ |
| 27 |
2. Description |
| 28 |
------------------------------------------------------------------------------ |
| 29 |
|
| 30 |
DataMatrix is a Python module that tries to emulate the behavior of the |
| 31 |
"data.frame" data structure of the R programming language. Data.frames are |
| 32 |
essentially tables that can be queried either by row or columns, or, in the |
| 33 |
presence of a header, even by column names. |
| 34 |
|
| 35 |
DataMatrix emulates this behavior by reading a text file (or file-like object) |
| 36 |
into a dictionary which in turn contains lists (the columns of the table). |
| 37 |
For each row in the original text file, the columns contain a two item tuple, |
| 38 |
whose first item is the "row name" and the second the actual value for that |
| 39 |
row and column. |
| 40 |
|
| 41 |
Row names are a way to identify a row precisely (R uses it because some of its |
| 42 |
operations return the names of the rows rather than their values) and are either |
| 43 |
read from a specific column or a progressive numeric value otherwise. |
| 44 |
|
| 45 |
DataMatrix objects can be queried with a dictionary-like syntax (e.g., |
| 46 |
matrix["column name"]), or by other methods. |
| 47 |
|
| 48 |
------------------------------------------------------------------------------ |
| 49 |
3. Requirements and installation |
| 50 |
------------------------------------------------------------------------------ |
| 51 |
|
| 52 |
DataMatrix requires Python, at least version 2.5. Python 2.6 is |
| 53 |
also known to work. It will likely not work on Python 3.0. It makes no use of |
| 54 |
third-party modules so it should work on a standard Python install. Being a |
| 55 |
pure Python module, it will work reliably on Windows, Linux, *BSD and OS X. |
| 56 |
|
| 57 |
The latest version can be downloaded from |
| 58 |
|
| 59 |
http://www.dennogumi.org/projects-2/datamatrix |
| 60 |
|
| 61 |
both as a source distribution and a Windows installer. Windows user just need |
| 62 |
to run the installer and follow the on-screen instructions. Other operating |
| 63 |
system users should download the source distribution, unpack it |
| 64 |
(tar xvzf filename) and then install it by issuing |
| 65 |
|
| 66 |
python setup.py install |
| 67 |
|
| 68 |
as root. |
| 69 |
|
| 70 |
------------------------------------------------------------------------------ |
| 71 |
4. Usage |
| 72 |
------------------------------------------------------------------------------ |
| 73 |
|
| 74 |
DataMatrix requires a file, or file-like object. We will use the StringIO |
| 75 |
module to create a virtual file, so you will have also idea of how an |
| 76 |
input file should look like: |
| 77 |
|
| 78 |
>>> from StringIO import StringIO |
| 79 |
>>> fh = StringIO('''name surname |
| 80 |
... Albert Einstein |
| 81 |
... Groucho Marx |
| 82 |
... ''') |
| 83 |
|
| 84 |
The basic usage is to create a matrix object by passing a filehandler object |
| 85 |
to datamatrix.DataMatrix: |
| 86 |
>>> import datamatrix |
| 87 |
>>> matrix = datamatrix.DataMatrix(fh, header=True, delimiter=' ') |
| 88 |
|
| 89 |
Aside the file object, which is mandatory, there are a number of parameters |
| 90 |
that can be used. First of all, the "header" parameters tells DataMatrix if |
| 91 |
the file to read has a header or not, and if so, the header will be used to |
| 92 |
assign names to the columns. Otherwise, it will just be a number for each |
| 93 |
column. |
| 94 |
|
| 95 |
To specify the column where row names are located, the row_names parameter is |
| 96 |
used: |
| 97 |
>>> matrix2 = datamatrix.DataMatrix(fh, header=True, row_names=1) |
| 98 |
|
| 99 |
In this case, row names are obtained from the first column in the file. |
| 100 |
|
| 101 |
DataMatrix uses the csv module to do its parsing, so you can specify |
| 102 |
additional parameters to define the format of your data, such as delimiter |
| 103 |
(the separator between fields), lineterminator and quoting (how to deal with |
| 104 |
non-numeric fields). See the csv module documentation for additional details. |
| 105 |
|
| 106 |
If you print a DataMatrix instance, you'll get some basic information: |
| 107 |
|
| 108 |
>>> print matrix |
| 109 |
File name: |
| 110 |
Column with identifier names: None (numeric) |
| 111 |
No. of rows: 2 |
| 112 |
No. of columns: 2 |
| 113 |
Columns: name, surname |
| 114 |
|
| 115 |
With the columns attribute you can view the columns as a list: |
| 116 |
|
| 117 |
>>> print matrix.columns |
| 118 |
['name', 'surname'] |
| 119 |
|
| 120 |
You can access specific rows with the getRow method: |
| 121 |
|
| 122 |
>>> matrix.getRow(1) |
| 123 |
['1', 'Albert', 'Einstein'] |
| 124 |
|
| 125 |
Or specific columns with the getColumn method: |
| 126 |
|
| 127 |
>>> matrix.getColumn("surname",column_name=True) |
| 128 |
['surname', 'Einstein', 'Marx'] |
| 129 |
|
| 130 |
To get a representation of your data, there is the view method: |
| 131 |
|
| 132 |
>>> matrix.view() |
| 133 |
1 Albert Einstein |
| 134 |
2 Groucho Marx |
| 135 |
|
| 136 |
Rows and columns can be appended with the append and appendRow methods, |
| 137 |
respectively. In both cases, the item to be appended needs to be a sequence |
| 138 |
(list or tuple) and must be as long as the other columns (when appending |
| 139 |
columns) or cover all the columns (when appending rows): |
| 140 |
|
| 141 |
>>> profession = ["scientist", "comedian"] # new column |
| 142 |
>>> matrix.append(profession, "Job") |
| 143 |
|
| 144 |
>>> entry = ["Isaac", "Asimov", "writer"] # new row |
| 145 |
>>> matrix.appendRow(entry,"3") |
| 146 |
|
| 147 |
Notice that when you append a row and a column you must specify a column or a |
| 148 |
row name to the methods, as the examples above show. |
| 149 |
|
| 150 |
Finally, you can write DataMatrix objects to files or file-like objects with |
| 151 |
the writeMatrix function in the DataMatrix module: |
| 152 |
|
| 153 |
>>> fh = StringIO() |
| 154 |
>>> datamatrix.writeMatrix(matrix,fh) |
| 155 |
|
| 156 |
Output formatting is again set via options to the csv module. Optionally you |
| 157 |
can save only part of the columns, specified as a list: |
| 158 |
|
| 159 |
>>> datamatrix.writeMatrix(matrix,fh,columns=["name","Job"]) |
| 160 |
|
| 161 |
For other uses, please see the API documentation (generated with pydoc as a |
| 162 |
HTML file), also present in the distribution. |
| 163 |
|
| 164 |
------------------------------------------------------------------------------ |
| 165 |
5. Credits and contact information |
| 166 |
------------------------------------------------------------------------------ |
| 167 |
|
| 168 |
DataMatrix was started by me (Luca Beltrame) and Giovanni joined up later, |
| 169 |
with code fixes and most importantly with unit tests. |
| 170 |
Bug reports, feature requests and comments should be either sent via email |
| 171 |
(einar at heavensinferno dot net) or by leaving a comment at |
| 172 |
http://www.dennogumi.org/projects-2/datamatrix |