| 099ab5e by Waylan Limberg at 2008-10-16 |
1 |
Writing Extensions for Python-Markdown |
|
2 |
====================================== |
|
3 |
|
|
4 |
Overview |
|
5 |
-------- |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
6 |
|
|
7 |
Python-Markdown includes an API for extension writers to plug their own |
|
8 |
custom functionality and/or syntax into the parser. There are preprocessors |
|
9 |
which allow you to alter the source before it is passed to the parser, |
|
10 |
inline patterns which allow you to add, remove or override the syntax of |
|
11 |
any inline elements, and postprocessors which allow munging of the |
| 3412374 by Waylan Limberg at 2008-11-15 |
12 |
output of the parser before it is returned. If you really want to dive in, |
|
13 |
there are also blockprocessors which are part of the core BlockParser. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
14 |
|
| 2215f99 by Artem Yunusov at 2008-08-11 |
15 |
As the parser builds an [ElementTree][] object which is later rendered |
| 3412374 by Waylan Limberg at 2008-11-15 |
16 |
as Unicode text, there are also some helpers provided to ease manipulation of |
|
17 |
the tree. Each part of the API is discussed in its respective section below. |
|
18 |
Additionaly, reading the source of some [[Available Extensions]] may be helpful. |
|
19 |
For example, the [[Footnotes]] extension uses most of the features documented |
|
20 |
here. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
21 |
|
|
22 |
* [Preprocessors][] |
|
23 |
* [InlinePatterns][] |
| 53f95cd by Waylan Limberg at 2008-10-20 |
24 |
* [Treeprocessors][] |
|
25 |
* [Postprocessors][] |
| 3412374 by Waylan Limberg at 2008-11-15 |
26 |
* [BlockParser][] |
| 2215f99 by Artem Yunusov at 2008-08-11 |
27 |
* [Working with the ElementTree][] |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
28 |
* [Integrating your code into Markdown][] |
|
29 |
* [extendMarkdown][] |
| 870ddad by Waylan Limberg at 2008-10-29 |
30 |
* [OrderedDict][] |
| 88c72d7 by Waylan Limberg at 2008-08-13 |
31 |
* [registerExtension][] |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
32 |
* [Config Settings][] |
|
33 |
* [makeExtension][] |
|
34 |
|
|
35 |
<h3 id="preprocessors">Preprocessors</h3> |
|
36 |
|
|
37 |
Preprocessors munge the source text before it is passed into the Markdown |
|
38 |
core. This is an excellent place to clean up bad syntax, extract things the |
|
39 |
parser may otherwise choke on and perhaps even store it for later retrieval. |
|
40 |
|
| b4ce140 by Waylan Limberg at 2009-07-31 |
41 |
Preprocessors should inherit from ``markdown.preprocessors.Preprocessor`` and |
| 80beed0 by Waylan Limberg at 2009-07-31 |
42 |
implement a ``run`` method with one argument ``lines``. The ``run`` method of |
|
43 |
each Preprocessor will be passed the entire source text as a list of Unicode |
|
44 |
strings. Each string will contain one line of text. The ``run`` method should |
|
45 |
return either that list, or an altered list of Unicode strings. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
46 |
|
|
47 |
A pseudo example: |
|
48 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
49 |
from markdown.preprocessors import Preprocessor |
|
50 |
|
|
51 |
class MyPreprocessor(Preprocessor): |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
52 |
def run(self, lines): |
|
53 |
new_lines = [] |
|
54 |
for line in lines: |
|
55 |
m = MYREGEX.match(line) |
|
56 |
if m: |
|
57 |
# do stuff |
|
58 |
else: |
|
59 |
new_lines.append(line) |
|
60 |
return new_lines |
|
61 |
|
|
62 |
<h3 id="inlinepatterns">Inline Patterns</h3> |
|
63 |
|
|
64 |
Inline Patterns implement the inline HTML element syntax for Markdown such as |
| 099ab5e by Waylan Limberg at 2008-10-16 |
65 |
``*emphasis*`` or ``[links](http://example.com)``. Pattern objects should be |
| 80beed0 by Waylan Limberg at 2009-07-31 |
66 |
instances of classes that inherit from ``markdown.inlinepatterns.Pattern`` or |
|
67 |
one of its children. Each pattern object uses a single regular expression and |
|
68 |
must have the following methods: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
69 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
70 |
* **``getCompiledRegExp()``**: |
|
71 |
|
|
72 |
Returns a compiled regular expression. |
|
73 |
|
|
74 |
* **``handleMatch(m)``**: |
|
75 |
|
|
76 |
Accepts a match object and returns an ElementTree element of a plain |
|
77 |
Unicode string. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
78 |
|
| 099ab5e by Waylan Limberg at 2008-10-16 |
79 |
Note that any regular expression returned by ``getCompiledRegExp`` must capture |
|
80 |
the whole block. Therefore, they should all start with ``r'^(.*?)'`` and end |
|
81 |
with ``r'(.*?)!'``. When using the default ``getCompiledRegExp()`` method |
|
82 |
provided in the ``Pattern`` you can pass in a regular expression without that |
| e4993fc by Waylan Limberg at 2010-09-20 |
83 |
and ``getCompiledRegExp`` will wrap your expression for you and set the |
|
84 |
`re.DOTALL` and `re.UNICODE` flags. This means that the first group of your |
|
85 |
match will be ``m.group(2)`` as ``m.group(1)`` will match everything before the |
|
86 |
pattern. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
87 |
|
|
88 |
For an example, consider this simplified emphasis pattern: |
|
89 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
90 |
from markdown.inlinepatterns import Pattern |
|
91 |
from markdown.util import etree |
|
92 |
|
|
93 |
class EmphasisPattern(Pattern): |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
94 |
def handleMatch(self, m): |
| ddc27d5 by Waylan Limberg at 2010-07-12 |
95 |
el = etree.Element('em') |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
96 |
el.text = m.group(3) |
|
97 |
return el |
|
98 |
|
|
99 |
As discussed in [Integrating Your Code Into Markdown][], an instance of this |
|
100 |
class will need to be provided to Markdown. That instance would be created |
|
101 |
like so: |
|
102 |
|
|
103 |
# an oversimplified regex |
|
104 |
MYPATTERN = r'\*([^*]+)\*' |
|
105 |
# pass in pattern and create instance |
|
106 |
emphasis = EmphasisPattern(MYPATTERN) |
|
107 |
|
|
108 |
Actually it would not be necessary to create that pattern (and not just because |
|
109 |
a more sophisticated emphasis pattern already exists in Markdown). The fact is, |
|
110 |
that example pattern is not very DRY. A pattern for `**strong**` text would |
|
111 |
be almost identical, with the exception that it would create a 'strong' element. |
|
112 |
Therefore, Markdown provides a number of generic pattern classes that can |
|
113 |
provide some common functionality. For example, both emphasis and strong are |
| 099ab5e by Waylan Limberg at 2008-10-16 |
114 |
implemented with separate instances of the ``SimpleTagPettern`` listed below. |
| ddc27d5 by Waylan Limberg at 2010-07-12 |
115 |
Feel free to use or extend any of the Pattern classes found at `markdown.inlinepatterns`. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
116 |
|
|
117 |
**Generic Pattern Classes** |
|
118 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
119 |
* **``SimpleTextPattern(pattern)``**: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
120 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
121 |
Returns simple text of ``group(2)`` of a ``pattern``. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
122 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
123 |
* **``SimpleTagPattern(pattern, tag)``**: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
124 |
|
| 099ab5e by Waylan Limberg at 2008-10-16 |
125 |
Returns an element of type "`tag`" with a text attribute of ``group(3)`` |
|
126 |
of a ``pattern``. ``tag`` should be a string of a HTML element (i.e.: 'em'). |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
127 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
128 |
* **``SubstituteTagPattern(pattern, tag)``**: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
129 |
|
|
130 |
Returns an element of type "`tag`" with no children or text (i.e.: 'br'). |
|
131 |
|
|
132 |
There may be other Pattern classes in the Markdown source that you could extend |
|
133 |
or use as well. Read through the source and see if there is anything you can |
|
134 |
use. You might even get a few ideas for different approaches to your specific |
|
135 |
situation. |
|
136 |
|
| 15224bd by Waylan Limberg at 2008-10-20 |
137 |
<h3 id="treeprocessors">Treeprocessors</h3> |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
138 |
|
| 3412374 by Waylan Limberg at 2008-11-15 |
139 |
Treeprocessors manipulate an ElemenTree object after it has passed through the |
|
140 |
core BlockParser. This is where additional manipulation of the tree takes |
| 6a56356 by Waylan Limberg at 2009-03-18 |
141 |
place. Additionally, the InlineProcessor is a Treeprocessor which steps through |
| 53f95cd by Waylan Limberg at 2008-10-20 |
142 |
the tree and runs the InlinePatterns on the text of each Element in the tree. |
|
143 |
|
| 80beed0 by Waylan Limberg at 2009-07-31 |
144 |
A Treeprocessor should inherit from ``markdown.treeprocessors.Treeprocessor``, |
| 15224bd by Waylan Limberg at 2008-10-20 |
145 |
over-ride the ``run`` method which takes one argument ``root`` (an Elementree |
|
146 |
object) and returns either that root element or a modified root element. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
147 |
|
|
148 |
A pseudo example: |
|
149 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
150 |
from markdown.treprocessors import Treeprocessor |
|
151 |
|
|
152 |
class MyTreeprocessor(Treeprocessor): |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
153 |
def run(self, root): |
| 6a56356 by Waylan Limberg at 2009-03-18 |
154 |
#do stuff |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
155 |
return my_modified_root |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
156 |
|
| bfe67ee by Waylan Limberg at 2008-08-12 |
157 |
For specifics on manipulating the ElementTree, see |
|
158 |
[Working with the ElementTree][] below. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
159 |
|
| 15224bd by Waylan Limberg at 2008-10-20 |
160 |
<h3 id="postprocessors">Postprocessors</h3> |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
161 |
|
| 53f95cd by Waylan Limberg at 2008-10-20 |
162 |
Postprocessors manipulate the document after the ElementTree has been |
|
163 |
serialized into a string. Postprocessors should be used to work with the |
|
164 |
text just before output. |
|
165 |
|
| b4ce140 by Waylan Limberg at 2009-07-31 |
166 |
A Postprocessor should inherit from ``markdown.postprocessors.Postprocessor`` |
| 80beed0 by Waylan Limberg at 2009-07-31 |
167 |
and over-ride the ``run`` method which takes one argument ``text`` and returns |
|
168 |
a Unicode string. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
169 |
|
| 15224bd by Waylan Limberg at 2008-10-20 |
170 |
Postprocessors are run after the ElementTree has been serialized back into |
| bfe67ee by Waylan Limberg at 2008-08-12 |
171 |
Unicode text. For example, this may be an appropriate place to add a table of |
|
172 |
contents to a document: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
173 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
174 |
from markdown.postprocessors import Postprocessor |
|
175 |
|
|
176 |
class TocPostprocessor(Postprocessor): |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
177 |
def run(self, text): |
|
178 |
return MYMARKERRE.sub(MyToc, text) |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
179 |
|
| 3412374 by Waylan Limberg at 2008-11-15 |
180 |
<h3 id="blockparser">BlockParser</h3> |
|
181 |
|
|
182 |
Sometimes, pre/tree/postprocessors and Inline Patterns aren't going to do what |
|
183 |
you need. Perhaps you want a new type of block type that needs to be integrated |
|
184 |
into the core parsing. In such a situation, you can add/change/remove |
|
185 |
functionality of the core ``BlockParser``. The BlockParser is composed of a |
|
186 |
number of Blockproccessors. The BlockParser steps through each block of text |
|
187 |
(split by blank lines) and passes each block to the appropriate Blockprocessor. |
|
188 |
That Blockprocessor parses the block and adds it to the ElementTree. The |
|
189 |
[[Definition Lists]] extension would be a good example of an extension that |
|
190 |
adds/modifies Blockprocessors. |
|
191 |
|
| 80beed0 by Waylan Limberg at 2009-07-31 |
192 |
A Blockprocessor should inherit from ``markdown.blockprocessors.BlockProcessor`` |
|
193 |
and implement both the ``test`` and ``run`` methods. |
| 3412374 by Waylan Limberg at 2008-11-15 |
194 |
|
|
195 |
The ``test`` method is used by BlockParser to identify the type of block. |
|
196 |
Therefore the ``test`` method must return a boolean value. If the test returns |
|
197 |
``True``, then the BlockParser will call that Blockprocessor's ``run`` method. |
|
198 |
If it returns ``False``, the BlockParser will move on to the next |
|
199 |
BlockProcessor. |
|
200 |
|
|
201 |
The **``test``** method takes two arguments: |
|
202 |
|
|
203 |
* **``parent``**: The parent etree Element of the block. This can be useful as |
|
204 |
the block may need to be treated differently if it is inside a list, for |
|
205 |
example. |
|
206 |
|
|
207 |
* **``block``**: A string of the current block of text. The test may be a |
|
208 |
simple string method (such as ``block.startswith(some_text)``) or a complex |
|
209 |
regular expression. |
|
210 |
|
|
211 |
The **``run``** method takes two arguments: |
|
212 |
|
|
213 |
* **``parent``**: A pointer to the parent etree Element of the block. The run |
|
214 |
method will most likely attach additional nodes to this parent. Note that |
|
215 |
nothing is returned by the method. The Elementree object is altered in place. |
|
216 |
|
|
217 |
* **``blocks``**: A list of all remaining blocks of the document. Your run |
|
218 |
method must remove (pop) the first block from the list (which it altered in |
|
219 |
place - not returned) and parse that block. You may find that a block of text |
|
220 |
legitimately contains multiple block types. Therefore, after processing the |
| 6a56356 by Waylan Limberg at 2009-03-18 |
221 |
first type, your processor can insert the remaining text into the beginning |
| 3412374 by Waylan Limberg at 2008-11-15 |
222 |
of the ``blocks`` list for future parsing. |
|
223 |
|
|
224 |
Please be aware that a single block can span multiple text blocks. For example, |
|
225 |
The official Markdown syntax rules state that a blank line does not end a |
|
226 |
Code Block. If the next block of text is also indented, then it is part of |
|
227 |
the previous block. Therefore, the BlockParser was specifically designed to |
|
228 |
address these types of situations. If you notice the ``CodeBlockProcessor``, |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
229 |
in the core, you will note that it checks the last child of the ``parent``. |
| 3412374 by Waylan Limberg at 2008-11-15 |
230 |
If the last child is a code block (``<pre><code>...</code></pre>``), then it |
|
231 |
appends that block to the previous code block rather than creating a new |
|
232 |
code block. |
|
233 |
|
|
234 |
Each BlockProcessor has the following utility methods available: |
|
235 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
236 |
* **``lastChild(parent)``**: |
|
237 |
|
|
238 |
Returns the last child of the given etree Element or ``None`` if it had no |
|
239 |
children. |
|
240 |
|
|
241 |
* **``detab(text)``**: |
|
242 |
|
|
243 |
Removes one level of indent (four spaces by default) from the front of each |
|
244 |
line of the given text string. |
|
245 |
|
| 6a56356 by Waylan Limberg at 2009-03-18 |
246 |
* **``looseDetab(text, level)``**: |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
247 |
|
| 6a56356 by Waylan Limberg at 2009-03-18 |
248 |
Removes "level" levels of indent (defaults to 1) from the front of each line |
|
249 |
of the given text string. However, this methods allows secondary lines to |
|
250 |
not be indented as does some parts of the Markdown syntax. |
| 3412374 by Waylan Limberg at 2008-11-15 |
251 |
|
|
252 |
Each BlockProcessor also has a pointer to the containing BlockParser instance at |
|
253 |
``self.parser``, which can be used to check or alter the state of the parser. |
|
254 |
The BlockParser tracks it's state in a stack at ``parser.state``. The state |
|
255 |
stack is an instance of the ``State`` class. |
|
256 |
|
|
257 |
**``State``** is a subclass of ``list`` and has the additional methods: |
|
258 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
259 |
* **``set(state)``**: |
|
260 |
|
|
261 |
Set a new state to string ``state``. The new state is appended to the end |
|
262 |
of the stack. |
|
263 |
|
|
264 |
* **``reset()``**: |
|
265 |
|
|
266 |
Step back one step in the stack. The last state at the end is removed from |
|
267 |
the stack. |
|
268 |
|
|
269 |
* **``isstate(state)``**: |
|
270 |
|
|
271 |
Test that the top (current) level of the stack is of the given string |
|
272 |
``state``. |
| 3412374 by Waylan Limberg at 2008-11-15 |
273 |
|
|
274 |
Note that to ensure that the state stack doesn't become corrupted, each time a |
|
275 |
state is set for a block, that state *must* be reset when the parser finishes |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
276 |
parsing that block. |
| 3412374 by Waylan Limberg at 2008-11-15 |
277 |
|
|
278 |
An instance of the **``BlockParser``** is found at ``Markdown.parser``. |
|
279 |
``BlockParser`` has the following methods: |
|
280 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
281 |
* **``parseDocument(lines)``**: |
|
282 |
|
|
283 |
Given a list of lines, an ElementTree object is returned. This should be |
|
284 |
passed an entire document and is the only method the ``Markdown`` class |
|
285 |
calls directly. |
|
286 |
|
|
287 |
* **``parseChunk(parent, text)``**: |
|
288 |
|
|
289 |
Parses a chunk of markdown text composed of multiple blocks and attaches |
|
290 |
those blocks to the ``parent`` Element. The ``parent`` is altered in place |
|
291 |
and nothing is returned. Extensions would most likely use this method for |
|
292 |
block parsing. |
|
293 |
|
|
294 |
* **``parseBlocks(parent, blocks)``**: |
|
295 |
|
|
296 |
Parses a list of blocks of text and attaches those blocks to the ``parent`` |
|
297 |
Element. The ``parent`` is altered in place and nothing is returned. This |
|
298 |
method will generally only be used internally to recursively parse nested |
|
299 |
blocks of text. |
|
300 |
|
|
301 |
While is is not recommended, an extension could subclass or completely replace |
| 3412374 by Waylan Limberg at 2008-11-15 |
302 |
the ``BlockParser``. The new class would have to provide the same public API. |
|
303 |
However, be aware that other extensions may expect the core parser provided |
|
304 |
and will not work with such a drastically different parser. |
| 760154f by Waylan Limberg at 2008-10-20 |
305 |
|
| 2215f99 by Artem Yunusov at 2008-08-11 |
306 |
<h3 id="working_with_et">Working with the ElementTree</h3> |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
307 |
|
|
308 |
As mentioned, the Markdown parser converts a source document to an |
| bfe67ee by Waylan Limberg at 2008-08-12 |
309 |
[ElementTree][] object before serializing that back to Unicode text. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
310 |
Markdown has provided some helpers to ease that manipulation within the context |
| 2215f99 by Artem Yunusov at 2008-08-11 |
311 |
of the Markdown module. |
|
312 |
|
| bfe67ee by Waylan Limberg at 2008-08-12 |
313 |
First, to get access to the ElementTree module import ElementTree from |
|
314 |
``markdown`` rather than importing it directly. This will ensure you are using |
| ddc27d5 by Waylan Limberg at 2010-07-12 |
315 |
the same version of ElementTree as markdown. The module is found at |
|
316 |
``markdown.util.etree`` within Markdown. |
| bfe67ee by Waylan Limberg at 2008-08-12 |
317 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
318 |
from markdown.util import etree |
| 2215f99 by Artem Yunusov at 2008-08-11 |
319 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
320 |
``markdown.util.etree`` tries to import ElementTree from any known location, |
|
321 |
first as a standard library module (from ``xml.etree`` in Python 2.5), then as |
|
322 |
a third party package (``Elementree``). In each instance, ``cElementTree`` is |
|
323 |
tried first, then ``ElementTree`` if the faster C implementation is not |
|
324 |
available on your system. |
| bfe67ee by Waylan Limberg at 2008-08-12 |
325 |
|
|
326 |
Sometimes you may want text inserted into an element to be parsed by |
| 099ab5e by Waylan Limberg at 2008-10-16 |
327 |
[InlinePatterns][]. In such a situation, simply insert the text as you normally |
|
328 |
would and the text will be automatically run through the InlinePatterns. |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
329 |
However, if you do *not* want some text to be parsed by InlinePatterns, |
| 3412374 by Waylan Limberg at 2008-11-15 |
330 |
then insert the text as an ``AtomicString``. |
| 2215f99 by Artem Yunusov at 2008-08-11 |
331 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
332 |
from markdown.util import AtomicString |
|
333 |
some_element.text = AtomicString(some_text) |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
334 |
|
| bfe67ee by Waylan Limberg at 2008-08-12 |
335 |
Here's a basic example which creates an HTML table (note that the contents of |
|
336 |
the second cell (``td2``) will be run through InlinePatterns latter): |
| 2215f99 by Artem Yunusov at 2008-08-11 |
337 |
|
|
338 |
table = etree.Element("table") |
| 3412374 by Waylan Limberg at 2008-11-15 |
339 |
table.set("cellpadding", "2") # Set cellpadding to 2 |
|
340 |
tr = etree.SubElement(table, "tr") # Add child tr to table |
|
341 |
td1 = etree.SubElement(tr, "td") # Add child td1 to tr |
|
342 |
td1.text = markdown.AtomicString("Cell content") # Add plain text content |
|
343 |
td2 = etree.SubElement(tr, "td") # Add second td to tr |
|
344 |
td2.text = "*text* with **inline** formatting." # Add markup text |
|
345 |
table.tail = "Text after table" # Add text after table |
| 2215f99 by Artem Yunusov at 2008-08-11 |
346 |
|
| bfe67ee by Waylan Limberg at 2008-08-12 |
347 |
You can also manipulate an existing tree. Consider the following example which |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
348 |
adds a ``class`` attribute to ``<a>`` elements: |
| 2215f99 by Artem Yunusov at 2008-08-11 |
349 |
|
| bfe67ee by Waylan Limberg at 2008-08-12 |
350 |
def set_link_class(self, element): |
|
351 |
for child in element: |
|
352 |
if child.tag == "a": |
|
353 |
child.set("class", "myclass") #set the class attribute |
|
354 |
set_link_class(child) # run recursively on children |
|
355 |
|
|
356 |
For more information about working with ElementTree see the ElementTree |
|
357 |
[Documentation](http://effbot.org/zone/element-index.htm) |
|
358 |
([Python Docs](http://docs.python.org/lib/module-xml.etree.ElementTree.html)). |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
359 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
360 |
<h3 id="integrating_into_markdown">Integrating Your Code Into Markdown</h3> |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
361 |
|
|
362 |
Once you have the various pieces of your extension built, you need to tell |
|
363 |
Markdown about them and ensure that they are run in the proper sequence. |
| 099ab5e by Waylan Limberg at 2008-10-16 |
364 |
Markdown accepts a ``Extension`` instance for each extension. Therefore, you |
| ddc27d5 by Waylan Limberg at 2010-07-12 |
365 |
will need to define a class that extends ``markdown.extensions.Extension`` and |
|
366 |
over-rides the ``extendMarkdown`` method. Within this class you will manage |
|
367 |
configuration options for your extension and attach the various processors and |
|
368 |
patterns to the Markdown instance. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
369 |
|
|
370 |
It is important to note that the order of the various processors and patterns |
| 099ab5e by Waylan Limberg at 2008-10-16 |
371 |
matters. For example, if we replace ``http://...`` links with <a> elements, and |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
372 |
*then* try to deal with inline html, we will end up with a mess. Therefore, |
|
373 |
the various types of processors and patterns are stored within an instance of |
| 870ddad by Waylan Limberg at 2008-10-29 |
374 |
the Markdown class in [OrderedDict][]s. Your ``Extension`` class will need to |
|
375 |
manipulate those OrderedDicts appropriately. You may insert instances of your |
|
376 |
processors and patterns into the appropriate location in an OrderedDict, remove |
|
377 |
a built-in instance, or replace a built-in instance with your own. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
378 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
379 |
<h4 id="extendmarkdown">extendMarkdown</h4> |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
380 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
381 |
The ``extendMarkdown`` method of a ``markdown.extensions.Extension`` class |
|
382 |
accepts two arguments: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
383 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
384 |
* **``md``**: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
385 |
|
|
386 |
A pointer to the instance of the Markdown class. You should use this to |
| 870ddad by Waylan Limberg at 2008-10-29 |
387 |
access the [OrderedDict][]s of processors and patterns. They are found |
|
388 |
under the following attributes: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
389 |
|
| 099ab5e by Waylan Limberg at 2008-10-16 |
390 |
* ``md.preprocessors`` |
|
391 |
* ``md.inlinePatterns`` |
| 3412374 by Waylan Limberg at 2008-11-15 |
392 |
* ``md.parser.blockprocessors`` |
| 53f95cd by Waylan Limberg at 2008-10-20 |
393 |
* ``md.treepreprocessors`` |
|
394 |
* ``md.postprocessors`` |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
395 |
|
|
396 |
Some other things you may want to access in the markdown instance are: |
|
397 |
|
| 099ab5e by Waylan Limberg at 2008-10-16 |
398 |
* ``md.htmlStash`` |
| 38100c9 by Waylan Limberg at 2009-03-17 |
399 |
* ``md.output_formats`` |
|
400 |
* ``md.set_output_format()`` |
| 099ab5e by Waylan Limberg at 2008-10-16 |
401 |
* ``md.registerExtension()`` |
| ddc27d5 by Waylan Limberg at 2010-07-12 |
402 |
* ``md.html_replacement_text`` |
|
403 |
* ``md.tab_length`` |
|
404 |
* ``md.enable_attributes`` |
|
405 |
* ``md.smart_emphasis`` |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
406 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
407 |
* **``md_globals``**: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
408 |
|
|
409 |
Contains all the various global variables within the markdown module. |
|
410 |
|
|
411 |
Of course, with access to those items, theoretically you have the option to |
| 3412374 by Waylan Limberg at 2008-11-15 |
412 |
changing anything through various [monkey_patching][] techniques. However, you |
|
413 |
should be aware that the various undocumented or private parts of markdown |
|
414 |
may change without notice and your monkey_patches may break with a new release. |
|
415 |
Therefore, what you really should be doing is inserting processors and patterns |
|
416 |
into the markdown pipeline. Consider yourself warned. |
| 88c72d7 by Waylan Limberg at 2008-08-13 |
417 |
|
|
418 |
[monkey_patching]: http://en.wikipedia.org/wiki/Monkey_patch |
|
419 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
420 |
A simple example: |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
421 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
422 |
from markdown.extensions import Extension |
|
423 |
|
|
424 |
class MyExtension(Extension): |
| 870ddad by Waylan Limberg at 2008-10-29 |
425 |
def extendMarkdown(self, md, md_globals): |
|
426 |
# Insert instance of 'mypattern' before 'references' pattern |
|
427 |
md.inlinePatterns.add('mypattern', MyPattern(md), '<references') |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
428 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
429 |
<h4 id="ordereddict">OrderedDict</h4> |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
430 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
431 |
An OrderedDict is a dictionary like object that retains the order of it's |
|
432 |
items. The items are ordered in the order in which they were appended to |
|
433 |
the OrderedDict. However, an item can also be inserted into the OrderedDict |
|
434 |
in a specific location in relation to the existing items. |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
435 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
436 |
Think of OrderedDict as a combination of a list and a dictionary as it has |
|
437 |
methods common to both. For example, you can get and set items using the |
|
438 |
``od[key] = value`` syntax and the methods ``keys()``, ``values()``, and |
|
439 |
``items()`` work as expected with the keys, values and items returned in the |
|
440 |
proper order. At the same time, you can use ``insert()``, ``append()``, and |
|
441 |
``index()`` as you would with a list. |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
442 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
443 |
Generally speaking, within Markdown extensions you will be using the special |
|
444 |
helper method ``add()`` to add additional items to an existing OrderedDict. |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
445 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
446 |
The ``add()`` method accepts three arguments: |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
447 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
448 |
* **``key``**: A string. The key is used for later reference to the item. |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
449 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
450 |
* **``value``**: The object instance stored in this item. |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
451 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
452 |
* **``location``**: Optional. The items location in relation to other items. |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
453 |
|
|
454 |
Note that the location can consist of a few different values: |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
455 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
456 |
* The special strings ``"_begin"`` and ``"_end"`` insert that item at the |
|
457 |
beginning or end of the OrderedDict respectively. |
|
458 |
|
|
459 |
* A less-than sign (``<``) followed by an existing key (i.e.: |
|
460 |
``"<somekey"``) inserts that item before the existing key. |
|
461 |
|
|
462 |
* A greater-than sign (``>``) followed by an existing key (i.e.: |
|
463 |
``">somekey"``) inserts that item after the existing key. |
|
464 |
|
|
465 |
Consider the following example: |
|
466 |
|
| ddc27d5 by Waylan Limberg at 2010-07-12 |
467 |
>>> from markdown.odict import OrderedDict |
|
468 |
>>> od = OrderedDict() |
| 870ddad by Waylan Limberg at 2008-10-29 |
469 |
>>> od['one'] = 1 # The same as: od.add('one', 1, '_begin') |
|
470 |
>>> od['three'] = 3 # The same as: od.add('three', 3, '>one') |
|
471 |
>>> od['four'] = 4 # The same as: od.add('four', 4, '_end') |
|
472 |
>>> od.items() |
|
473 |
[("one", 1), ("three", 3), ("four", 4)] |
|
474 |
|
|
475 |
Note that when building an OrderedDict in order, the extra features of the |
|
476 |
``add`` method offer no real value and are not necessary. However, when |
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
477 |
manipulating an existing OrderedDict, ``add`` can be very helpful. So let's |
| 870ddad by Waylan Limberg at 2008-10-29 |
478 |
insert another item into the OrderedDict. |
|
479 |
|
|
480 |
>>> od.add('two', 2, '>one') # Insert after 'one' |
|
481 |
>>> od.values() |
|
482 |
[1, 2, 3, 4] |
|
483 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
484 |
Now let's insert another item. |
| 870ddad by Waylan Limberg at 2008-10-29 |
485 |
|
|
486 |
>>> od.add('twohalf', 2.5, '<three') # Insert before 'three' |
|
487 |
>>> od.keys() |
|
488 |
["one", "two", "twohalf", "three", "four"] |
|
489 |
|
|
490 |
Note that we also could have set the location of "twohalf" to be 'after two' |
|
491 |
(i.e.: ``'>two'``). However, it's unlikely that you will have control over the |
|
492 |
order in which extensions will be loaded, and this could affect the final |
| 6a56356 by Waylan Limberg at 2009-03-18 |
493 |
sorted order of an OrderedDict. For example, suppose an extension adding |
| 870ddad by Waylan Limberg at 2008-10-29 |
494 |
'twohalf' in the above examples was loaded before a separate extension which |
|
495 |
adds 'two'. You may need to take this into consideration when adding your |
|
496 |
extension components to the various markdown OrderedDicts. |
|
497 |
|
|
498 |
Once an OrderedDict is created, the items are available via key: |
|
499 |
|
|
500 |
MyNode = od['somekey'] |
|
501 |
|
|
502 |
Therefore, to delete an existing item: |
|
503 |
|
|
504 |
del od['somekey'] |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
505 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
506 |
To change the value of an existing item (leaving location unchanged): |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
507 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
508 |
od['somekey'] = MyNewObject() |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
509 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
510 |
To change the location of an existing item: |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
511 |
|
| 870ddad by Waylan Limberg at 2008-10-29 |
512 |
t.link('somekey', '<otherkey') |
| c6e6f94 by Waylan Limberg at 2008-10-20 |
513 |
|
| 88c72d7 by Waylan Limberg at 2008-08-13 |
514 |
<h4 id="registerextension">registerExtension</h4> |
|
515 |
|
|
516 |
Some extensions may need to have their state reset between multiple runs of the |
|
517 |
Markdown class. For example, consider the following use of the [[Footnotes]] |
|
518 |
extension: |
|
519 |
|
|
520 |
md = markdown.Markdown(extensions=['footnotes']) |
|
521 |
html1 = md.convert(text_with_footnote) |
|
522 |
md.reset() |
|
523 |
html2 = md.convert(text_without_footnote) |
|
524 |
|
|
525 |
Without calling ``reset``, the footnote definitions from the first document will |
|
526 |
be inserted into the second document as they are still stored within the class |
|
527 |
instance. Therefore the ``Extension`` class needs to define a ``reset`` method |
|
528 |
that will reset the state of the extension (i.e.: ``self.footnotes = {}``). |
|
529 |
However, as many extensions do not have a need for ``reset``, ``reset`` is only |
|
530 |
called on extensions that are registered. |
|
531 |
|
|
532 |
To register an extension, call ``md.registerExtension`` from within your |
|
533 |
``extendMarkdown`` method: |
|
534 |
|
|
535 |
|
|
536 |
def extendMarkdown(self, md, md_globals): |
|
537 |
md.registerExtension(self) |
|
538 |
# insert processors and patterns here |
|
539 |
|
|
540 |
Then, each time ``reset`` is called on the Markdown instance, the ``reset`` |
|
541 |
method of each registered extension will be called as well. You should also |
|
542 |
note that ``reset`` will be called on each registered extension after it is |
|
543 |
initialized the first time. Keep that in mind when over-riding the extension's |
|
544 |
``reset`` method. |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
545 |
|
|
546 |
<h4 id="configsettings">Config Settings</h4> |
|
547 |
|
|
548 |
If an extension uses any parameters that the user may want to change, |
| 099ab5e by Waylan Limberg at 2008-10-16 |
549 |
those parameters should be stored in ``self.config`` of your |
|
550 |
``markdown.Extension`` class in the following format: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
551 |
|
|
552 |
self.config = {parameter_1_name : [value1, description1], |
|
553 |
parameter_2_name : [value2, description2] } |
|
554 |
|
|
555 |
When stored this way the config parameters can be over-ridden from the |
|
556 |
command line or at the time Markdown is initiated: |
|
557 |
|
|
558 |
markdown.py -x myextension(SOME_PARAM=2) inputfile.txt > output.txt |
|
559 |
|
|
560 |
Note that parameters should always be assumed to be set to string |
|
561 |
values, and should be converted at run time. For example: |
|
562 |
|
|
563 |
i = int(self.getConfig("SOME_PARAM")) |
|
564 |
|
| 3dfcbc8 by Waylan Limberg at 2008-11-15 |
565 |
<h4 id="makeextension">makeExtension</h4> |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
566 |
|
|
567 |
Each extension should ideally be placed in its own module starting |
|
568 |
with the ``mdx_`` prefix (e.g. ``mdx_footnotes.py``). The module must |
|
569 |
provide a module-level function called ``makeExtension`` that takes |
|
570 |
an optional parameter consisting of a dictionary of configuration over-rides |
| 88c72d7 by Waylan Limberg at 2008-08-13 |
571 |
and returns an instance of the extension. An example from the footnote |
|
572 |
extension: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
573 |
|
|
574 |
def makeExtension(configs=None) : |
|
575 |
return FootnoteExtension(configs=configs) |
|
576 |
|
|
577 |
By following the above example, when Markdown is passed the name of your |
|
578 |
extension as a string (i.e.: ``'footnotes'``), it will automatically import |
|
579 |
the module and call the ``makeExtension`` function initiating your extension. |
|
580 |
|
| 88c72d7 by Waylan Limberg at 2008-08-13 |
581 |
You may have noted that the extensions packaged with Python-Markdown do not |
|
582 |
use the ``mdx_`` prefix in their module names. This is because they are all |
| 6a56356 by Waylan Limberg at 2009-03-18 |
583 |
part of the ``markdown.extensions`` package. Markdown will first try to import |
|
584 |
from ``markdown.extensions.extname`` and upon failure, ``mdx_extname``. If both |
| 88c72d7 by Waylan Limberg at 2008-08-13 |
585 |
fail, Markdown will continue without the extension. |
|
586 |
|
|
587 |
However, Markdown will also accept an already existing instance of an extension. |
|
588 |
For example: |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
589 |
|
| 88c72d7 by Waylan Limberg at 2008-08-13 |
590 |
import markdown |
|
591 |
import myextension |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
592 |
configs = {...} |
| 88c72d7 by Waylan Limberg at 2008-08-13 |
593 |
myext = myextension.MyExtension(configs=configs) |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
594 |
md = markdown.Markdown(extensions=[myext]) |
|
595 |
|
|
596 |
This is useful if you need to implement a large number of extensions with more |
|
597 |
than one residing in a module. |
|
598 |
|
|
599 |
[Preprocessors]: #preprocessors |
|
600 |
[InlinePatterns]: #inlinepatterns |
| 15224bd by Waylan Limberg at 2008-10-20 |
601 |
[Treeprocessors]: #treeprocessors |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
602 |
[Postprocessors]: #postprocessors |
| 3412374 by Waylan Limberg at 2008-11-15 |
603 |
[BlockParser]: #blockparser |
| 2215f99 by Artem Yunusov at 2008-08-11 |
604 |
[Working with the ElementTree]: #working_with_et |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
605 |
[Integrating your code into Markdown]: #integrating_into_markdown |
|
606 |
[extendMarkdown]: #extendmarkdown |
| 870ddad by Waylan Limberg at 2008-10-29 |
607 |
[OrderedDict]: #ordereddict |
| 88c72d7 by Waylan Limberg at 2008-08-13 |
608 |
[registerExtension]: #registerextension |
| 13f2bf6 by Waylan Limberg at 2008-08-07 |
609 |
[Config Settings]: #configsettings |
|
610 |
[makeExtension]: #makeextension |
| bfe67ee by Waylan Limberg at 2008-08-12 |
611 |
[ElementTree]: http://effbot.org/zone/element-index.htm |