099ab5e by Waylan Limberg at 2008-10-16 1
Writing Extensions for Python-Markdown
2
======================================
3
4
Overview
5
--------
13f2bf6 by Waylan Limberg at 2008-08-07 6
7
Python-Markdown includes an API for extension writers to plug their own 
8
custom functionality and/or syntax into the parser. There are preprocessors
9
which allow you to alter the source before it is passed to the parser, 
10
inline patterns which allow you to add, remove or override the syntax of
11
any inline elements, and postprocessors which allow munging of the
3412374 by Waylan Limberg at 2008-11-15 12
output of the parser before it is returned. If you really want to dive in, 
13
there are also blockprocessors which are part of the core BlockParser.
13f2bf6 by Waylan Limberg at 2008-08-07 14
2215f99 by Artem Yunusov at 2008-08-11 15
As the parser builds an [ElementTree][] object which is later rendered 
3412374 by Waylan Limberg at 2008-11-15 16
as Unicode text, there are also some helpers provided to ease manipulation of 
17
the tree. Each part of the API is discussed in its respective section below. 
18
Additionaly, reading the source of some [[Available Extensions]] may be helpful.
19
For example, the [[Footnotes]] extension uses most of the features documented 
20
here.
13f2bf6 by Waylan Limberg at 2008-08-07 21
22
* [Preprocessors][]
23
* [InlinePatterns][]
53f95cd by Waylan Limberg at 2008-10-20 24
* [Treeprocessors][] 
25
* [Postprocessors][]
3412374 by Waylan Limberg at 2008-11-15 26
* [BlockParser][]
2215f99 by Artem Yunusov at 2008-08-11 27
* [Working with the ElementTree][]
13f2bf6 by Waylan Limberg at 2008-08-07 28
* [Integrating your code into Markdown][]
29
    * [extendMarkdown][]
870ddad by Waylan Limberg at 2008-10-29 30
    * [OrderedDict][]
88c72d7 by Waylan Limberg at 2008-08-13 31
    * [registerExtension][]
13f2bf6 by Waylan Limberg at 2008-08-07 32
    * [Config Settings][]
33
    * [makeExtension][]
34
35
<h3 id="preprocessors">Preprocessors</h3>
36
37
Preprocessors munge the source text before it is passed into the Markdown 
38
core. This is an excellent place to clean up bad syntax, extract things the 
39
parser may otherwise choke on and perhaps even store it for later retrieval.
40
b4ce140 by Waylan Limberg at 2009-07-31 41
Preprocessors should inherit from ``markdown.preprocessors.Preprocessor`` and 
80beed0 by Waylan Limberg at 2009-07-31 42
implement a ``run`` method with one argument ``lines``. The ``run`` method of 
43
each Preprocessor will be passed the entire source text as a list of Unicode 
44
strings. Each string will contain one line of text. The ``run`` method should 
45
return either that list, or an altered list of Unicode strings.
13f2bf6 by Waylan Limberg at 2008-08-07 46
47
A pseudo example:
48
ddc27d5 by Waylan Limberg at 2010-07-12 49
    from markdown.preprocessors import Preprocessor
50
51
    class MyPreprocessor(Preprocessor):
13f2bf6 by Waylan Limberg at 2008-08-07 52
        def run(self, lines):
53
            new_lines = []
54
            for line in lines:
55
                m = MYREGEX.match(line)
56
                if m:
57
                    # do stuff
58
                else:
59
                    new_lines.append(line)
60
            return new_lines
61
62
<h3 id="inlinepatterns">Inline Patterns</h3>
63
64
Inline Patterns implement the inline HTML element syntax for Markdown such as
099ab5e by Waylan Limberg at 2008-10-16 65
``*emphasis*`` or ``[links](http://example.com)``. Pattern objects should be 
80beed0 by Waylan Limberg at 2009-07-31 66
instances of classes that inherit from ``markdown.inlinepatterns.Pattern`` or 
67
one of its children. Each pattern object uses a single regular expression and 
68
must have the following methods:
13f2bf6 by Waylan Limberg at 2008-08-07 69
3dfcbc8 by Waylan Limberg at 2008-11-15 70
* **``getCompiledRegExp()``**: 
71
72
    Returns a compiled regular expression.
73
74
* **``handleMatch(m)``**: 
75
76
    Accepts a match object and returns an ElementTree element of a plain 
77
    Unicode string.
13f2bf6 by Waylan Limberg at 2008-08-07 78
099ab5e by Waylan Limberg at 2008-10-16 79
Note that any regular expression returned by ``getCompiledRegExp`` must capture
80
the whole block. Therefore, they should all start with ``r'^(.*?)'`` and end
81
with ``r'(.*?)!'``. When using the default ``getCompiledRegExp()`` method 
82
provided in the ``Pattern`` you can pass in a regular expression without that 
e4993fc by Waylan Limberg at 2010-09-20 83
and ``getCompiledRegExp`` will wrap your expression for you and set the 
84
`re.DOTALL` and `re.UNICODE` flags. This means that the first group of your 
85
match will be ``m.group(2)`` as ``m.group(1)`` will match everything before the
86
pattern.
13f2bf6 by Waylan Limberg at 2008-08-07 87
88
For an example, consider this simplified emphasis pattern:
89
ddc27d5 by Waylan Limberg at 2010-07-12 90
    from markdown.inlinepatterns import Pattern
91
    from markdown.util import etree
92
93
    class EmphasisPattern(Pattern):
13f2bf6 by Waylan Limberg at 2008-08-07 94
        def handleMatch(self, m):
ddc27d5 by Waylan Limberg at 2010-07-12 95
            el = etree.Element('em')
13f2bf6 by Waylan Limberg at 2008-08-07 96
            el.text = m.group(3)
97
            return el
98
99
As discussed in [Integrating Your Code Into Markdown][], an instance of this
100
class will need to be provided to Markdown. That instance would be created
101
like so:
102
103
    # an oversimplified regex
104
    MYPATTERN = r'\*([^*]+)\*'
105
    # pass in pattern and create instance
106
    emphasis = EmphasisPattern(MYPATTERN)
107
108
Actually it would not be necessary to create that pattern (and not just because
109
a more sophisticated emphasis pattern already exists in Markdown). The fact is,
110
that example pattern is not very DRY. A pattern for `**strong**` text would
111
be almost identical, with the exception that it would create a 'strong' element.
112
Therefore, Markdown provides a number of generic pattern classes that can 
113
provide some common functionality. For example, both emphasis and strong are
099ab5e by Waylan Limberg at 2008-10-16 114
implemented with separate instances of the ``SimpleTagPettern`` listed below. 
ddc27d5 by Waylan Limberg at 2010-07-12 115
Feel free to use or extend any of the Pattern classes found at `markdown.inlinepatterns`.
13f2bf6 by Waylan Limberg at 2008-08-07 116
117
**Generic Pattern Classes**
118
3dfcbc8 by Waylan Limberg at 2008-11-15 119
* **``SimpleTextPattern(pattern)``**:
13f2bf6 by Waylan Limberg at 2008-08-07 120
3dfcbc8 by Waylan Limberg at 2008-11-15 121
    Returns simple text of ``group(2)`` of a ``pattern``.
13f2bf6 by Waylan Limberg at 2008-08-07 122
3dfcbc8 by Waylan Limberg at 2008-11-15 123
* **``SimpleTagPattern(pattern, tag)``**:
13f2bf6 by Waylan Limberg at 2008-08-07 124
099ab5e by Waylan Limberg at 2008-10-16 125
    Returns an element of type "`tag`" with a text attribute of ``group(3)``
126
    of a ``pattern``. ``tag`` should be a string of a HTML element (i.e.: 'em').
13f2bf6 by Waylan Limberg at 2008-08-07 127
3dfcbc8 by Waylan Limberg at 2008-11-15 128
* **``SubstituteTagPattern(pattern, tag)``**:
13f2bf6 by Waylan Limberg at 2008-08-07 129
130
    Returns an element of type "`tag`" with no children or text (i.e.: 'br').
131
132
There may be other Pattern classes in the Markdown source that you could extend
133
or use as well. Read through the source and see if there is anything you can 
134
use. You might even get a few ideas for different approaches to your specific
135
situation.
136
15224bd by Waylan Limberg at 2008-10-20 137
<h3 id="treeprocessors">Treeprocessors</h3>
13f2bf6 by Waylan Limberg at 2008-08-07 138
3412374 by Waylan Limberg at 2008-11-15 139
Treeprocessors manipulate an ElemenTree object after it has passed through the
140
core BlockParser. This is where additional manipulation of the tree takes
6a56356 by Waylan Limberg at 2009-03-18 141
place. Additionally, the InlineProcessor is a Treeprocessor which steps through
53f95cd by Waylan Limberg at 2008-10-20 142
the tree and runs the InlinePatterns on the text of each Element in the tree.
143
80beed0 by Waylan Limberg at 2009-07-31 144
A Treeprocessor should inherit from ``markdown.treeprocessors.Treeprocessor``,
15224bd by Waylan Limberg at 2008-10-20 145
over-ride the ``run`` method which takes one argument ``root`` (an Elementree 
146
object) and returns either that root element or a modified root element.
13f2bf6 by Waylan Limberg at 2008-08-07 147
148
A pseudo example:
149
ddc27d5 by Waylan Limberg at 2010-07-12 150
    from markdown.treprocessors import Treeprocessor
151
152
    class MyTreeprocessor(Treeprocessor):
3dfcbc8 by Waylan Limberg at 2008-11-15 153
        def run(self, root):
6a56356 by Waylan Limberg at 2009-03-18 154
            #do stuff
3dfcbc8 by Waylan Limberg at 2008-11-15 155
            return my_modified_root
13f2bf6 by Waylan Limberg at 2008-08-07 156
bfe67ee by Waylan Limberg at 2008-08-12 157
For specifics on manipulating the ElementTree, see 
158
[Working with the ElementTree][] below.
13f2bf6 by Waylan Limberg at 2008-08-07 159
15224bd by Waylan Limberg at 2008-10-20 160
<h3 id="postprocessors">Postprocessors</h3>
13f2bf6 by Waylan Limberg at 2008-08-07 161
53f95cd by Waylan Limberg at 2008-10-20 162
Postprocessors manipulate the document after the ElementTree has been 
163
serialized into a string. Postprocessors should be used to work with the
164
text just before output.
165
b4ce140 by Waylan Limberg at 2009-07-31 166
A Postprocessor should inherit from ``markdown.postprocessors.Postprocessor`` 
80beed0 by Waylan Limberg at 2009-07-31 167
and over-ride the ``run`` method which takes one argument ``text`` and returns 
168
a Unicode string.
13f2bf6 by Waylan Limberg at 2008-08-07 169
15224bd by Waylan Limberg at 2008-10-20 170
Postprocessors are run after the ElementTree has been serialized back into 
bfe67ee by Waylan Limberg at 2008-08-12 171
Unicode text.  For example, this may be an appropriate place to add a table of 
172
contents to a document:
13f2bf6 by Waylan Limberg at 2008-08-07 173
ddc27d5 by Waylan Limberg at 2010-07-12 174
    from markdown.postprocessors import Postprocessor
175
176
    class TocPostprocessor(Postprocessor):
3dfcbc8 by Waylan Limberg at 2008-11-15 177
        def run(self, text):
178
            return MYMARKERRE.sub(MyToc, text)
13f2bf6 by Waylan Limberg at 2008-08-07 179
3412374 by Waylan Limberg at 2008-11-15 180
<h3 id="blockparser">BlockParser</h3>
181
182
Sometimes, pre/tree/postprocessors and Inline Patterns aren't going to do what 
183
you need. Perhaps you want a new type of block type that needs to be integrated 
184
into the core parsing. In such a situation, you can add/change/remove 
185
functionality of the core ``BlockParser``. The BlockParser is composed of a
186
number of Blockproccessors. The BlockParser steps through each block of text
187
(split by blank lines) and passes each block to the appropriate Blockprocessor.
188
That Blockprocessor parses the block and adds it to the ElementTree. The
189
[[Definition Lists]] extension would be a good example of an extension that
190
adds/modifies Blockprocessors.
191
80beed0 by Waylan Limberg at 2009-07-31 192
A Blockprocessor should inherit from ``markdown.blockprocessors.BlockProcessor``
193
and implement both the ``test`` and ``run`` methods.
3412374 by Waylan Limberg at 2008-11-15 194
195
The ``test`` method is used by BlockParser to identify the type of block.
196
Therefore the ``test`` method must return a boolean value. If the test returns
197
``True``, then the BlockParser will call that Blockprocessor's ``run`` method.
198
If it returns ``False``, the BlockParser will move on to the next 
199
BlockProcessor.
200
201
The **``test``** method takes two arguments:
202
203
* **``parent``**: The parent etree Element of the block. This can be useful as
204
  the block may need to be treated differently if it is inside a list, for
205
  example.
206
207
* **``block``**: A string of the current block of text. The test may be a 
208
  simple string method (such as ``block.startswith(some_text)``) or a complex 
209
  regular expression.
210
211
The **``run``** method takes two arguments:
212
213
* **``parent``**: A pointer to the parent etree Element of the block. The run 
214
  method will most likely attach additional nodes to this parent. Note that
215
  nothing is returned by the method. The Elementree object is altered in place.
216
217
* **``blocks``**: A list of all remaining blocks of the document. Your run 
218
  method must remove (pop) the first block from the list (which it altered in
219
  place - not returned) and parse that block. You may find that a block of text
220
  legitimately contains multiple block types. Therefore, after processing the 
6a56356 by Waylan Limberg at 2009-03-18 221
  first type, your processor can insert the remaining text into the beginning
3412374 by Waylan Limberg at 2008-11-15 222
  of the ``blocks`` list for future parsing.
223
224
Please be aware that a single block can span multiple text blocks. For example,
225
The official Markdown syntax rules state that a blank line does not end a
226
Code Block. If the next block of text is also indented, then it is part of
227
the previous block. Therefore, the BlockParser was specifically designed to 
228
address these types of situations. If you notice the ``CodeBlockProcessor``,
3dfcbc8 by Waylan Limberg at 2008-11-15 229
in the core, you will note that it checks the last child of the ``parent``.
3412374 by Waylan Limberg at 2008-11-15 230
If the last child is a code block (``<pre><code>...</code></pre>``), then it
231
appends that block to the previous code block rather than creating a new 
232
code block.
233
234
Each BlockProcessor has the following utility methods available:
235
3dfcbc8 by Waylan Limberg at 2008-11-15 236
* **``lastChild(parent)``**: 
237
238
    Returns the last child of the given etree Element or ``None`` if it had no 
239
    children.
240
241
* **``detab(text)``**: 
242
243
    Removes one level of indent (four spaces by default) from the front of each
244
    line of the given text string.
245
6a56356 by Waylan Limberg at 2009-03-18 246
* **``looseDetab(text, level)``**: 
3dfcbc8 by Waylan Limberg at 2008-11-15 247
6a56356 by Waylan Limberg at 2009-03-18 248
    Removes "level" levels of indent (defaults to 1) from the front of each line 
249
    of the given text string. However, this methods allows secondary lines to 
250
    not be indented as does some parts of the Markdown syntax.
3412374 by Waylan Limberg at 2008-11-15 251
252
Each BlockProcessor also has a pointer to the containing BlockParser instance at
253
``self.parser``, which can be used to check or alter the state of the parser.
254
The BlockParser tracks it's state in a stack at ``parser.state``. The state
255
stack is an instance of the ``State`` class.
256
257
**``State``** is a subclass of ``list`` and has the additional methods:
258
3dfcbc8 by Waylan Limberg at 2008-11-15 259
* **``set(state)``**: 
260
261
    Set a new state to string ``state``. The new state is appended to the end 
262
    of the stack.
263
264
* **``reset()``**: 
265
266
    Step back one step in the stack. The last state at the end is removed from 
267
    the stack.
268
269
* **``isstate(state)``**: 
270
271
    Test that the top (current) level of the stack is of the given string 
272
    ``state``.
3412374 by Waylan Limberg at 2008-11-15 273
274
Note that to ensure that the state stack doesn't become corrupted, each time a
275
state is set for a block, that state *must* be reset when the parser finishes
3dfcbc8 by Waylan Limberg at 2008-11-15 276
parsing that block.
3412374 by Waylan Limberg at 2008-11-15 277
278
An instance of the **``BlockParser``** is found at ``Markdown.parser``.
279
``BlockParser`` has the following methods:
280
3dfcbc8 by Waylan Limberg at 2008-11-15 281
* **``parseDocument(lines)``**: 
282
283
    Given a list of lines, an ElementTree object is returned. This should be 
284
    passed an entire document and is the only method the ``Markdown`` class 
285
    calls directly.
286
287
* **``parseChunk(parent, text)``**: 
288
289
    Parses a chunk of markdown text composed of multiple blocks and attaches 
290
    those blocks to the ``parent`` Element. The ``parent`` is altered in place 
291
    and nothing is returned. Extensions would most likely use this method for 
292
    block parsing.
293
294
* **``parseBlocks(parent, blocks)``**: 
295
296
    Parses a list of blocks of text and attaches those blocks to the ``parent``
297
    Element. The ``parent`` is altered in place and nothing is returned. This 
298
    method will generally only be used internally to recursively parse nested 
299
    blocks of text.
300
301
While is is not recommended, an extension could subclass or completely replace
3412374 by Waylan Limberg at 2008-11-15 302
the ``BlockParser``. The new class would have to provide the same public API.
303
However, be aware that other extensions may expect the core parser provided
304
and will not work with such a drastically different parser.
760154f by Waylan Limberg at 2008-10-20 305
2215f99 by Artem Yunusov at 2008-08-11 306
<h3 id="working_with_et">Working with the ElementTree</h3>
13f2bf6 by Waylan Limberg at 2008-08-07 307
308
As mentioned, the Markdown parser converts a source document to an 
bfe67ee by Waylan Limberg at 2008-08-12 309
[ElementTree][] object before serializing that back to Unicode text. 
13f2bf6 by Waylan Limberg at 2008-08-07 310
Markdown has provided some helpers to ease that manipulation within the context 
2215f99 by Artem Yunusov at 2008-08-11 311
of the Markdown module.
312
bfe67ee by Waylan Limberg at 2008-08-12 313
First, to get access to the ElementTree module import ElementTree from 
314
``markdown`` rather than importing it directly. This will ensure you are using 
ddc27d5 by Waylan Limberg at 2010-07-12 315
the same version of ElementTree as markdown. The module is found at 
316
``markdown.util.etree`` within Markdown.
bfe67ee by Waylan Limberg at 2008-08-12 317
ddc27d5 by Waylan Limberg at 2010-07-12 318
    from markdown.util import etree
2215f99 by Artem Yunusov at 2008-08-11 319
    
ddc27d5 by Waylan Limberg at 2010-07-12 320
``markdown.util.etree`` tries to import ElementTree from any known location, 
321
first as a standard library module (from ``xml.etree`` in Python 2.5), then as 
322
a third party package (``Elementree``). In each instance, ``cElementTree`` is 
323
tried first, then ``ElementTree`` if the faster C implementation is not 
324
available on your system.
bfe67ee by Waylan Limberg at 2008-08-12 325
326
Sometimes you may want text inserted into an element to be parsed by 
099ab5e by Waylan Limberg at 2008-10-16 327
[InlinePatterns][]. In such a situation, simply insert the text as you normally
328
would and the text will be automatically run through the InlinePatterns. 
3dfcbc8 by Waylan Limberg at 2008-11-15 329
However, if you do *not* want some text to be parsed by InlinePatterns,
3412374 by Waylan Limberg at 2008-11-15 330
then insert the text as an ``AtomicString``.
2215f99 by Artem Yunusov at 2008-08-11 331
ddc27d5 by Waylan Limberg at 2010-07-12 332
    from markdown.util import AtomicString
333
    some_element.text = AtomicString(some_text)
3dfcbc8 by Waylan Limberg at 2008-11-15 334
bfe67ee by Waylan Limberg at 2008-08-12 335
Here's a basic example which creates an HTML table (note that the contents of 
336
the second cell (``td2``) will be run through InlinePatterns latter):
2215f99 by Artem Yunusov at 2008-08-11 337
338
    table = etree.Element("table") 
3412374 by Waylan Limberg at 2008-11-15 339
    table.set("cellpadding", "2")                      # Set cellpadding to 2
340
    tr = etree.SubElement(table, "tr")                 # Add child tr to table
341
    td1 = etree.SubElement(tr, "td")                   # Add child td1 to tr
342
    td1.text = markdown.AtomicString("Cell content")   # Add plain text content
343
    td2 = etree.SubElement(tr, "td")                   # Add second td to tr
344
    td2.text = "*text* with **inline** formatting."    # Add markup text
345
    table.tail = "Text after table"                    # Add text after table
2215f99 by Artem Yunusov at 2008-08-11 346
bfe67ee by Waylan Limberg at 2008-08-12 347
You can also manipulate an existing tree. Consider the following example which 
3dfcbc8 by Waylan Limberg at 2008-11-15 348
adds a ``class`` attribute to ``<a>`` elements:
2215f99 by Artem Yunusov at 2008-08-11 349
bfe67ee by Waylan Limberg at 2008-08-12 350
	def set_link_class(self, element):
351
		for child in element: 
352
		    if child.tag == "a":
353
                child.set("class", "myclass") #set the class attribute
354
            set_link_class(child) # run recursively on children
355
356
For more information about working with ElementTree see the ElementTree
357
[Documentation](http://effbot.org/zone/element-index.htm) 
358
([Python Docs](http://docs.python.org/lib/module-xml.etree.ElementTree.html)).
13f2bf6 by Waylan Limberg at 2008-08-07 359
3dfcbc8 by Waylan Limberg at 2008-11-15 360
<h3 id="integrating_into_markdown">Integrating Your Code Into Markdown</h3>
13f2bf6 by Waylan Limberg at 2008-08-07 361
362
Once you have the various pieces of your extension built, you need to tell 
363
Markdown about them and ensure that they are run in the proper sequence. 
099ab5e by Waylan Limberg at 2008-10-16 364
Markdown accepts a ``Extension`` instance for each extension. Therefore, you
ddc27d5 by Waylan Limberg at 2010-07-12 365
will need to define a class that extends ``markdown.extensions.Extension`` and 
366
over-rides the ``extendMarkdown`` method. Within this class you will manage 
367
configuration options for your extension and attach the various processors and 
368
patterns to the Markdown instance. 
13f2bf6 by Waylan Limberg at 2008-08-07 369
370
It is important to note that the order of the various processors and patterns 
099ab5e by Waylan Limberg at 2008-10-16 371
matters. For example, if we replace ``http://...`` links with <a> elements, and 
13f2bf6 by Waylan Limberg at 2008-08-07 372
*then* try to deal with  inline html, we will end up with a mess. Therefore, 
373
the various types of processors and patterns are stored within an instance of 
870ddad by Waylan Limberg at 2008-10-29 374
the Markdown class in [OrderedDict][]s. Your ``Extension`` class will need to 
375
manipulate those OrderedDicts appropriately. You may insert instances of your 
376
processors and patterns into the appropriate location in an OrderedDict, remove
377
a built-in instance, or replace a built-in instance with your own.
13f2bf6 by Waylan Limberg at 2008-08-07 378
3dfcbc8 by Waylan Limberg at 2008-11-15 379
<h4 id="extendmarkdown">extendMarkdown</h4>
13f2bf6 by Waylan Limberg at 2008-08-07 380
ddc27d5 by Waylan Limberg at 2010-07-12 381
The ``extendMarkdown`` method of a ``markdown.extensions.Extension`` class 
382
accepts two arguments:
13f2bf6 by Waylan Limberg at 2008-08-07 383
3dfcbc8 by Waylan Limberg at 2008-11-15 384
* **``md``**:
13f2bf6 by Waylan Limberg at 2008-08-07 385
386
    A pointer to the instance of the Markdown class. You should use this to 
870ddad by Waylan Limberg at 2008-10-29 387
    access the [OrderedDict][]s of processors and patterns. They are found 
388
    under the following attributes:
13f2bf6 by Waylan Limberg at 2008-08-07 389
099ab5e by Waylan Limberg at 2008-10-16 390
    * ``md.preprocessors``
391
    * ``md.inlinePatterns``
3412374 by Waylan Limberg at 2008-11-15 392
    * ``md.parser.blockprocessors``
53f95cd by Waylan Limberg at 2008-10-20 393
    * ``md.treepreprocessors``
394
    * ``md.postprocessors``
13f2bf6 by Waylan Limberg at 2008-08-07 395
396
    Some other things you may want to access in the markdown instance are:
397
099ab5e by Waylan Limberg at 2008-10-16 398
    * ``md.htmlStash``
38100c9 by Waylan Limberg at 2009-03-17 399
    * ``md.output_formats``
400
    * ``md.set_output_format()``
099ab5e by Waylan Limberg at 2008-10-16 401
    * ``md.registerExtension()``
ddc27d5 by Waylan Limberg at 2010-07-12 402
    * ``md.html_replacement_text``
403
    * ``md.tab_length``
404
    * ``md.enable_attributes``
405
    * ``md.smart_emphasis``
13f2bf6 by Waylan Limberg at 2008-08-07 406
3dfcbc8 by Waylan Limberg at 2008-11-15 407
* **``md_globals``**:
13f2bf6 by Waylan Limberg at 2008-08-07 408
409
    Contains all the various global variables within the markdown module.
410
411
Of course, with access to those items, theoretically you have the option to 
3412374 by Waylan Limberg at 2008-11-15 412
changing anything through various [monkey_patching][] techniques. However, you 
413
should be aware that the various undocumented or private parts of markdown 
414
may change without notice and your monkey_patches may break with a new release.
415
Therefore, what you really should be doing is inserting processors and patterns
416
into the markdown pipeline. Consider yourself warned.
88c72d7 by Waylan Limberg at 2008-08-13 417
418
[monkey_patching]: http://en.wikipedia.org/wiki/Monkey_patch
419
870ddad by Waylan Limberg at 2008-10-29 420
A simple example:
c6e6f94 by Waylan Limberg at 2008-10-20 421
ddc27d5 by Waylan Limberg at 2010-07-12 422
    from markdown.extensions import Extension
423
424
    class MyExtension(Extension):
870ddad by Waylan Limberg at 2008-10-29 425
        def extendMarkdown(self, md, md_globals):
426
            # Insert instance of 'mypattern' before 'references' pattern
427
            md.inlinePatterns.add('mypattern', MyPattern(md), '<references')
c6e6f94 by Waylan Limberg at 2008-10-20 428
870ddad by Waylan Limberg at 2008-10-29 429
<h4 id="ordereddict">OrderedDict</h4>
c6e6f94 by Waylan Limberg at 2008-10-20 430
870ddad by Waylan Limberg at 2008-10-29 431
An OrderedDict is a dictionary like object that retains the order of it's
432
items. The items are ordered in the order in which they were appended to
433
the OrderedDict. However, an item can also be inserted into the OrderedDict
434
in a specific location in relation to the existing items.
c6e6f94 by Waylan Limberg at 2008-10-20 435
870ddad by Waylan Limberg at 2008-10-29 436
Think of OrderedDict as a combination of a list and a dictionary as it has 
437
methods common to both. For example, you can get and set items using the 
438
``od[key] = value`` syntax and the methods ``keys()``, ``values()``, and 
439
``items()`` work as expected with the keys, values and items returned in the 
440
proper order. At the same time, you can use ``insert()``, ``append()``, and 
441
``index()`` as you would with a list.
c6e6f94 by Waylan Limberg at 2008-10-20 442
870ddad by Waylan Limberg at 2008-10-29 443
Generally speaking, within Markdown extensions you will be using the special 
444
helper method ``add()`` to add additional items to an existing OrderedDict. 
c6e6f94 by Waylan Limberg at 2008-10-20 445
870ddad by Waylan Limberg at 2008-10-29 446
The ``add()`` method accepts three arguments:
c6e6f94 by Waylan Limberg at 2008-10-20 447
870ddad by Waylan Limberg at 2008-10-29 448
* **``key``**: A string. The key is used for later reference to the item.
3dfcbc8 by Waylan Limberg at 2008-11-15 449
870ddad by Waylan Limberg at 2008-10-29 450
* **``value``**: The object instance stored in this item.
3dfcbc8 by Waylan Limberg at 2008-11-15 451
870ddad by Waylan Limberg at 2008-10-29 452
* **``location``**: Optional. The items location in relation to other items. 
3dfcbc8 by Waylan Limberg at 2008-11-15 453
454
    Note that the location can consist of a few different values:
c6e6f94 by Waylan Limberg at 2008-10-20 455
870ddad by Waylan Limberg at 2008-10-29 456
    * The special strings ``"_begin"`` and ``"_end"`` insert that item at the 
457
      beginning or end of the OrderedDict respectively. 
458
    
459
    * A less-than sign (``<``) followed by an existing key (i.e.: 
460
      ``"<somekey"``) inserts that item before the existing key.
461
    
462
    * A greater-than sign (``>``) followed by an existing key (i.e.: 
463
      ``">somekey"``) inserts that item after the existing key. 
464
465
Consider the following example:
466
ddc27d5 by Waylan Limberg at 2010-07-12 467
    >>> from markdown.odict import OrderedDict
468
    >>> od = OrderedDict()
870ddad by Waylan Limberg at 2008-10-29 469
    >>> od['one'] =  1           # The same as: od.add('one', 1, '_begin')
470
    >>> od['three'] = 3          # The same as: od.add('three', 3, '>one')
471
    >>> od['four'] = 4           # The same as: od.add('four', 4, '_end')
472
    >>> od.items()
473
    [("one", 1), ("three", 3), ("four", 4)]
474
475
Note that when building an OrderedDict in order, the extra features of the
476
``add`` method offer no real value and are not necessary. However, when 
3dfcbc8 by Waylan Limberg at 2008-11-15 477
manipulating an existing OrderedDict, ``add`` can be very helpful. So let's 
870ddad by Waylan Limberg at 2008-10-29 478
insert another item into the OrderedDict.
479
480
    >>> od.add('two', 2, '>one')         # Insert after 'one'
481
    >>> od.values()
482
    [1, 2, 3, 4]
483
3dfcbc8 by Waylan Limberg at 2008-11-15 484
Now let's insert another item.
870ddad by Waylan Limberg at 2008-10-29 485
486
    >>> od.add('twohalf', 2.5, '<three') # Insert before 'three'
487
    >>> od.keys()
488
    ["one", "two", "twohalf", "three", "four"]
489
490
Note that we also could have set the location of "twohalf" to be 'after two'
491
(i.e.: ``'>two'``). However, it's unlikely that you will have control over the 
492
order in which extensions will be loaded, and this could affect the final 
6a56356 by Waylan Limberg at 2009-03-18 493
sorted order of an OrderedDict. For example, suppose an extension adding 
870ddad by Waylan Limberg at 2008-10-29 494
'twohalf' in the above examples was loaded before a separate  extension which 
495
adds 'two'. You may need to take this into consideration when adding your 
496
extension components to the various markdown OrderedDicts.
497
498
Once an OrderedDict is created, the items are available via key:
499
500
    MyNode = od['somekey']
501
502
Therefore, to delete an existing item:
503
504
    del od['somekey']
c6e6f94 by Waylan Limberg at 2008-10-20 505
870ddad by Waylan Limberg at 2008-10-29 506
To change the value of an existing item (leaving location unchanged):
c6e6f94 by Waylan Limberg at 2008-10-20 507
870ddad by Waylan Limberg at 2008-10-29 508
    od['somekey'] = MyNewObject()
c6e6f94 by Waylan Limberg at 2008-10-20 509
870ddad by Waylan Limberg at 2008-10-29 510
To change the location of an existing item:
c6e6f94 by Waylan Limberg at 2008-10-20 511
870ddad by Waylan Limberg at 2008-10-29 512
    t.link('somekey', '<otherkey')
c6e6f94 by Waylan Limberg at 2008-10-20 513
88c72d7 by Waylan Limberg at 2008-08-13 514
<h4 id="registerextension">registerExtension</h4>
515
516
Some extensions may need to have their state reset between multiple runs of the
517
Markdown class. For example, consider the following use of the [[Footnotes]] 
518
extension:
519
520
    md = markdown.Markdown(extensions=['footnotes'])
521
    html1 = md.convert(text_with_footnote)
522
    md.reset()
523
    html2 = md.convert(text_without_footnote)
524
525
Without calling ``reset``, the footnote definitions from the first document will
526
be inserted into the second document as they are still stored within the class
527
instance. Therefore the ``Extension`` class needs to define a ``reset`` method
528
that will reset the state of the extension (i.e.: ``self.footnotes = {}``).
529
However, as many extensions do not have a need for ``reset``, ``reset`` is only
530
called on extensions that are registered.
531
532
To register an extension, call ``md.registerExtension`` from within your 
533
``extendMarkdown`` method:
534
535
536
    def extendMarkdown(self, md, md_globals):
537
        md.registerExtension(self)
538
        # insert processors and patterns here
539
540
Then, each time ``reset`` is called on the Markdown instance, the ``reset`` 
541
method of each registered extension will be called as well. You should also
542
note that ``reset`` will be called on each registered extension after it is
543
initialized the first time. Keep that in mind when over-riding the extension's
544
``reset`` method.
13f2bf6 by Waylan Limberg at 2008-08-07 545
546
<h4 id="configsettings">Config Settings</h4>
547
548
If an extension uses any parameters that the user may want to change,
099ab5e by Waylan Limberg at 2008-10-16 549
those parameters should be stored in ``self.config`` of your 
550
``markdown.Extension`` class in the following format:
13f2bf6 by Waylan Limberg at 2008-08-07 551
552
    self.config = {parameter_1_name : [value1, description1],
553
                   parameter_2_name : [value2, description2] }
554
555
When stored this way the config parameters can be over-ridden from the
556
command line or at the time Markdown is initiated:
557
558
    markdown.py -x myextension(SOME_PARAM=2) inputfile.txt > output.txt
559
560
Note that parameters should always be assumed to be set to string
561
values, and should be converted at run time. For example:
562
563
    i = int(self.getConfig("SOME_PARAM"))
564
3dfcbc8 by Waylan Limberg at 2008-11-15 565
<h4 id="makeextension">makeExtension</h4>
13f2bf6 by Waylan Limberg at 2008-08-07 566
567
Each extension should ideally be placed in its own module starting
568
with the  ``mdx_`` prefix (e.g. ``mdx_footnotes.py``).  The module must
569
provide a module-level function called ``makeExtension`` that takes
570
an optional parameter consisting of a dictionary of configuration over-rides 
88c72d7 by Waylan Limberg at 2008-08-13 571
and returns an instance of the extension.  An example from the footnote 
572
extension:
13f2bf6 by Waylan Limberg at 2008-08-07 573
574
    def makeExtension(configs=None) :
575
        return FootnoteExtension(configs=configs)
576
577
By following the above example, when Markdown is passed the name of your 
578
extension as a string (i.e.: ``'footnotes'``), it will automatically import
579
the module and call the ``makeExtension`` function initiating your extension.
580
88c72d7 by Waylan Limberg at 2008-08-13 581
You may have noted that the extensions packaged with Python-Markdown do not
582
use the ``mdx_`` prefix in their module names. This is because they are all
6a56356 by Waylan Limberg at 2009-03-18 583
part of the ``markdown.extensions`` package. Markdown will first try to import
584
from ``markdown.extensions.extname`` and upon failure, ``mdx_extname``. If both
88c72d7 by Waylan Limberg at 2008-08-13 585
fail, Markdown will continue without the extension.
586
587
However, Markdown will also accept an already existing instance of an extension.
588
For example:
13f2bf6 by Waylan Limberg at 2008-08-07 589
88c72d7 by Waylan Limberg at 2008-08-13 590
    import markdown
591
    import myextension
13f2bf6 by Waylan Limberg at 2008-08-07 592
    configs = {...}
88c72d7 by Waylan Limberg at 2008-08-13 593
    myext = myextension.MyExtension(configs=configs)
13f2bf6 by Waylan Limberg at 2008-08-07 594
    md = markdown.Markdown(extensions=[myext])
595
596
This is useful if you need to implement a large number of extensions with more
597
than one residing in a module.
598
599
[Preprocessors]: #preprocessors
600
[InlinePatterns]: #inlinepatterns
15224bd by Waylan Limberg at 2008-10-20 601
[Treeprocessors]: #treeprocessors
13f2bf6 by Waylan Limberg at 2008-08-07 602
[Postprocessors]: #postprocessors
3412374 by Waylan Limberg at 2008-11-15 603
[BlockParser]: #blockparser
2215f99 by Artem Yunusov at 2008-08-11 604
[Working with the ElementTree]: #working_with_et
13f2bf6 by Waylan Limberg at 2008-08-07 605
[Integrating your code into Markdown]: #integrating_into_markdown
606
[extendMarkdown]: #extendmarkdown
870ddad by Waylan Limberg at 2008-10-29 607
[OrderedDict]: #ordereddict
88c72d7 by Waylan Limberg at 2008-08-13 608
[registerExtension]: #registerextension
13f2bf6 by Waylan Limberg at 2008-08-07 609
[Config Settings]: #configsettings
610
[makeExtension]: #makeextension
bfe67ee by Waylan Limberg at 2008-08-12 611
[ElementTree]: http://effbot.org/zone/element-index.htm