| 1 |
Using Markdown as Python Library |
| 2 |
================================ |
| 3 |
|
| 4 |
First and foremost, Python-Markdown is intended to be a python library module |
| 5 |
used by various projects to convert Markdown syntax into HTML. |
| 6 |
|
| 7 |
The Basics |
| 8 |
---------- |
| 9 |
|
| 10 |
To use markdown as a module: |
| 11 |
|
| 12 |
import markdown |
| 13 |
html = markdown.markdown(your_text_string) |
| 14 |
|
| 15 |
Encoded Text |
| 16 |
------------ |
| 17 |
|
| 18 |
Note that ``markdown()`` expects **Unicode** as input (although a simple ASCII |
| 19 |
string should work) and returns output as Unicode. Do not pass encoded strings to it! |
| 20 |
If your input is encoded, e.g. as UTF-8, it is your responsibility to decode |
| 21 |
it. E.g.: |
| 22 |
|
| 23 |
input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8") |
| 24 |
text = input_file.read() |
| 25 |
html = markdown.markdown(text, extensions) |
| 26 |
|
| 27 |
If you later want to write it to disk, you should encode it yourself: |
| 28 |
|
| 29 |
output_file = codecs.open("some_file.html", "w", encoding="utf-8") |
| 30 |
output_file.write(html) |
| 31 |
|
| 32 |
More Options |
| 33 |
------------ |
| 34 |
|
| 35 |
If you want to pass more options, you can create an instance of the ``Markdown`` |
| 36 |
class yourself and then use ``convert()`` to generate HTML: |
| 37 |
|
| 38 |
import markdown |
| 39 |
md = markdown.Markdown( |
| 40 |
extensions=['footnotes'], |
| 41 |
extension_configs= {'footnotes' : ('PLACE_MARKER','~~~~~~~~')}, |
| 42 |
output_format='html4', |
| 43 |
safe_mode="replace", |
| 44 |
html_replacement_text="--NO HTML ALLOWED--", |
| 45 |
tab_length=8, |
| 46 |
enable_attributes=False, |
| 47 |
smart_emphasis=False, |
| 48 |
) |
| 49 |
return md.convert(some_text) |
| 50 |
|
| 51 |
You should also use this method if you want to process multiple strings: |
| 52 |
|
| 53 |
md = markdown.Markdown() |
| 54 |
html1 = md.convert(text1) |
| 55 |
html2 = md.convert(text2) |
| 56 |
|
| 57 |
Any options accepted by the `Markdown` class are also accepted by the |
| 58 |
`markdown` shortcut function. However, a new instant of the class will be |
| 59 |
created each time the shortcut function is called. |
| 60 |
|
| 61 |
Working with Files |
| 62 |
------------------ |
| 63 |
|
| 64 |
While the Markdown class is only intended to work with Unicode text, some |
| 65 |
encoding/decoding is required for the command line features. These functions |
| 66 |
and methods are only intended to fit the common use case. |
| 67 |
|
| 68 |
The ``Markdown`` class has the method ``convertFile`` which reads in a file and |
| 69 |
writes out to a file-like-object: |
| 70 |
|
| 71 |
md = markdown.Markdown() |
| 72 |
md.convertFile(input="in.txt", output="out.html", encoding="utf-8") |
| 73 |
|
| 74 |
The markdown module also includes a shortcut function ``markdownFromFile`` that |
| 75 |
wraps the above method. |
| 76 |
|
| 77 |
markdown.markdownFromFile(input="in.txt", |
| 78 |
output="out.html", |
| 79 |
extensions=[], |
| 80 |
encoding="utf-8", |
| 81 |
safe=False) |
| 82 |
|
| 83 |
In either case, if the ``output`` keyword is passed a file name (i.e.: |
| 84 |
``output="out.html"``), it will try to write to a file by that name. If |
| 85 |
``output`` is passed a file-like-object (i.e. ``output=StringIO.StringIO()``), |
| 86 |
it will attempt to write out to that object. Finally, if ``output`` is |
| 87 |
set to ``None``, it will write to ``stdout``. |
| 88 |
|
| 89 |
Using Extensions |
| 90 |
---------------- |
| 91 |
|
| 92 |
One of the parameters that you can pass is a list of Extensions. Extensions |
| 93 |
must be available as python modules either within the ``markdown.extensions`` |
| 94 |
package or on your PYTHONPATH with names starting with `mdx_`, followed by the |
| 95 |
name of the extension. Thus, ``extensions=['footnotes']`` will first look for |
| 96 |
the module ``markdown.extensions.footnotes``, then a module named |
| 97 |
``mdx_footnotes``. See the documentation specific to the extension you are |
| 98 |
using for help in specifying configuration settings for that extension. |
| 99 |
|
| 100 |
Note that some extensions may need their state reset between each call to |
| 101 |
``convert``: |
| 102 |
|
| 103 |
html1 = md.convert(text1) |
| 104 |
md.reset() |
| 105 |
html2 = md.convert(text2) |
| 106 |
|
| 107 |
Safe Mode |
| 108 |
--------- |
| 109 |
|
| 110 |
If you are using Markdown on a web system which will transform text provided |
| 111 |
by untrusted users, you may want to use the "safe_mode" option which ensures |
| 112 |
that the user's HTML tags are either replaced, removed or escaped. (They can |
| 113 |
still create links using Markdown syntax.) |
| 114 |
|
| 115 |
* To replace HTML, set ``safe_mode="replace"`` (``safe_mode=True`` still works |
| 116 |
for backward compatibility with older versions). The HTML will be replaced |
| 117 |
with the text assigned to ``html_replacement_text`` which defaults to |
| 118 |
``[HTML_REMOVED]``. To replace the HTML with something else: |
| 119 |
|
| 120 |
md = markdown.Markdown(safe_mode="replace", |
| 121 |
html_replacement_text="--RAW HTML NOT ALLOWED--") |
| 122 |
|
| 123 |
* To remove HTML, set ``safe_mode="remove"``. Any raw HTML will be completely |
| 124 |
stripped from the text with no warning to the author. |
| 125 |
|
| 126 |
* To escape HTML, set ``safe_mode="escape"``. The HTML will be escaped and |
| 127 |
included in the document. |
| 128 |
|
| 129 |
Note that "safe_mode" does not alter the "enable_attributes" option, which |
| 130 |
could allow someone to inject javascript (i.e., `{@onclick=alert(1)}`). You |
| 131 |
may also want to set `enable_attributes=False` when using "safe_mode". |
| 132 |
|
| 133 |
Output Formats |
| 134 |
-------------- |
| 135 |
|
| 136 |
If Markdown is outputing (X)HTML as part of a web page, most likely you will |
| 137 |
want the output to match the (X)HTML version used by the rest of your page/site. |
| 138 |
Currently, Markdown offers two output formats out of the box; "HTML4" and |
| 139 |
"XHTML1" (the default) . Markdown will also accept the formats "HTML" and |
| 140 |
"XHTML" which currently map to "HTML4" and "XHTML" respectively. However, |
| 141 |
you should use the more explicit keys as the general keys may change in the |
| 142 |
future if it makes sense at that time. The keys can either be lowercase or |
| 143 |
uppercase. |
| 144 |
|
| 145 |
To set the output format do: |
| 146 |
|
| 147 |
html = markdown.markdown(text, output_format='html4') |
| 148 |
|
| 149 |
Or, when using the Markdown class: |
| 150 |
|
| 151 |
md = markdown.Markdown(output_format='html4') |
| 152 |
html = md.convert(text) |
| 153 |
|
| 154 |
Note that the output format is only set once for the class and cannot be |
| 155 |
specified each time ``convert()`` is called. If you really must change the |
| 156 |
output format for the class, you can use the ``set_output_format`` method: |
| 157 |
|
| 158 |
md.set_output_format('xhtml1') |