Import Upstream version 0.4.3

2022-06-02 17:41:00 +08:00 · 2022-06-02 17:41:00 +08:00 · e73391db57
commit e73391db57
47 changed files with 29103 additions and 0 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -0,0 +1,34 @@
+# YAML definition for travis-ci.com continuous integration.
+# See https://docs.travis-ci.com/user/languages/c
+
+language: c
+dist: bionic
+
+compiler:
+    - gcc
+
+addons:
+    apt:
+        packages:
+            - python3   # for running tests
+            - lcov      # for generating code coverage report
+
+before_script:
+    - mkdir build
+    - cd build
+    # We enforce -Wdeclaration-after-statement because Qt project needs to
+    # build MD4C with Integrity compiler which chokes whenever a declaration
+    # is not at the beginning of a block.
+    - CFLAGS='--coverage -g -O0 -Wall -Wdeclaration-after-statement -Werror' cmake -DCMAKE_BUILD_TYPE=Debug -G 'Unix Makefiles' ..
+
+script:
+    - make VERBOSE=1
+
+after_success:
+    - ../scripts/run-tests.sh
+    # Creating report
+    - lcov --directory . --capture --output-file coverage.info # capture coverage info
+    - lcov --remove coverage.info '/usr/*' --output-file coverage.info # filter out system
+    - lcov --list coverage.info # debug info
+    # Uploading report to CodeCov
+    - bash <(curl -s https://codecov.io/bash) || echo "Codecov did not collect coverage reports"
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -0,0 +1,268 @@
+
+# MD4C Change Log
+
+
+## Version 0.4.3
+
+New features:
+
+ * With `MD_FLAG_UNDERLINE`, spans enclosed in underscore (`_foo_`) are seen
+   as underline (`MD_SPAN_UNDERLINE`) rather then an ordinary emphasis or
+   strong emphasis.
+
+Changes:
+
+ * The implementation of wiki-links extension (with `MD_FLAG_WIKILINKS`) has
+   been simplified.
+
+    - A noticeable increase of MD4C's memory footprint introduced by the
+      extension implementation in 0.4.0 has been removed.
+    - The priority handling towards other inline elements have been unified.
+      (This affects an obscure case where syntax of an image was in place of
+      wiki-link destination made the wiki-link invalid. Now *all* inline spans
+      in the wiki-link destination, including the images, is suppressed.)
+    - The length limitation of 100 characters now always applies to wiki-link
+      destination.
+
+ * Recognition of strike-through spans (with the flag `MD_FLAG_STRIKETHROUGH`)
+   has become much stricter and, arguably, reasonable.
+
+    - Only single tildes (`~`) and double tildes (`~~`) are recognized as
+      strike-through marks. Longer ones are not anymore.
+    - The length of the opener and closer marks have to be the same.
+    - The tildes cannot open a strike-through span if a whitespace follows.
+    - The tildes cannot close a strike-through span if a whitespace precedes.
+
+   This change follows the changes of behavior in cmark-gfm some time ago, so
+   it is also beneficial from compatibility point of view.
+
+ * When building MD4C by hand instead of using its CMake-based build, the UTF-8
+   support was by default disabled, unless explicitly asked for by defining
+   a preprocessor macro `MD4C_USE_UTF8`.
+
+   This has been changed and the UTF-8 mode now becomes the default, no matter
+   how `md4c.c` is compiled. If you need to disable it and use the ASCII-only
+   mode, you have explicitly define macro `MD4C_USE_ASCII` when compiling it.
+
+   (The CMake-based build as provided in our repository explicitly asked for
+   the UTF-8 support with `-DMD4C_USE_UTF8`. I.e. if you are using MD4C library
+   built with our vanilla `CMakeLists.txt` files, this change should not affect
+   you.)
+
+Fixes:
+
+ * Fixed some string length handling in the special `MD4C_USE_UTF16` build.
+
+   (This does not affect you unless you are on Windows and explicitly define
+   the macro when building MD4C.)
+
+ * [#100](https://github.com/mity/md4c/issues/100):
+   Fixed an off-by-one error in the maximal length limit of some segments
+   of e-mail addresses used in autolinks.
+
+ * [#107](https://github.com/mity/md4c/issues/107):
+   Fix mis-detection of asterisk-encoded emphasis in some corner cases when
+   length of the opener and closer differs, as in `***foo *bar baz***`.
+
+
+## Version 0.4.2
+
+Fixes:
+
+ * [#98](https://github.com/mity/md4c/issues/98):
+   Fix mis-detection of asterisk-encoded emphasis in some corner cases when
+   length of the opener and closer differs, as in `**a *b c** d*`.
+
+
+## Version 0.4.1
+
+Unfortunately, 0.4.0 has been released with badly updated ChangeLog. Fixing
+this is the only change on 0.4.1.
+
+
+## Version 0.4.0
+
+New features:
+
+ * With `MD_FLAG_LATEXMATHSPANS`, LaTeX math spans (`$...$`) and LaTeX display
+   math spans (`$$...$$`) are now recognized. (Note though that the HTML
+   renderer outputs them verbatim in a custom `<x-equation>` tag.)
+
+   Contributed by [Tilman Roeder](https://github.com/dyedgreen).
+
+ * With `MD_FLAG_WIKILINKS`, Wiki-style links (`[[...]]`) are now recognized.
+   (Note though that the HTML renderer renders them as a custom `<x-wikilink>`
+   tag.)
+
+   Contributed by [Nils Blomqvist](https://github.com/niblo).
+
+Changes:
+
+ * Parsing of tables (with `MD_FLAG_TABLES`) is now closer to the way how
+   cmark-gfm parses tables as we do not require every row of the table to
+   contain a pipe `|` anymore.
+
+   As a consequence, paragraphs now cannot interrupt tables. A paragraph which
+   follows the table has to be delimited with a blank line.
+
+Fixes:
+
+ * [#94](https://github.com/mity/md4c/issues/94):
+   `md_build_ref_def_hashtable()`: Do not allocate more memory then strictly
+   needed.
+
+ * [#95](https://github.com/mity/md4c/issues/95):
+   `md_is_container_mark()`: Ordered list mark requires at least one digit.
+
+ * [#96](https://github.com/mity/md4c/issues/96):
+   Some fixes for link label comparison.
+
+
+## Version 0.3.4
+
+Changes:
+
+ * Make Unicode-specific code compliant to Unicode 12.1.
+
+ * Structure `MD_BLOCK_CODE_DETAIL` got new member `fenced_char`. Application
+   can use it to detect character used to form the block fences (`` ` `` or
+   `~`). In the case of indented code block, it is set to zero.
+
+Fixes:
+
+ * [#77](https://github.com/mity/md4c/issues/77):
+   Fix maximal count of digits for numerical character references, as requested
+   by CommonMark specification 0.29.
+
+ * [#78](https://github.com/mity/md4c/issues/78):
+   Fix link reference definition label matching for Unicode characters where
+   the folding mapping leads to multiple codepoints, as e.g. in `ẞ` -> `SS`.
+
+ * [#83](https://github.com/mity/md4c/issues/83):
+   Fix recognition of an empty blockquote which interrupts a paragraph.
+
+
+## Version 0.3.3
+
+Changes:
+
+ * Make permissive URL autolink and permissive WWW autolink extensions stricter.
+
+   This brings the behavior closer to GFM and mitigates risk of false positives.
+   In particular, the domain has to contain at least one dot and parenthesis
+   can be part of the link destination only if `(` and `)` are balanced.
+
+Fixes:
+
+ * [#73](https://github.com/mity/md4c/issues/73):
+   Some raw HTML inputs could lead to quadratic parsing times.
+
+ * [#74](https://github.com/mity/md4c/issues/74):
+   Fix input leading to a crash. Found by fuzzing.
+
+ * [#76](https://github.com/mity/md4c/issues/76):
+   Fix handling of parenthesis in some corner cases of permissive URL autolink
+   and permissive WWW autolink extensions.
+
+
+## Version 0.3.2
+
+Changes:
+
+ * Changes mandated by CommonMark specification 0.29.
+
+   Most importantly, the white-space trimming rules for code spans have changed.
+   At most one space/newline is trimmed from beginning/end of the code span
+   (if the code span contains some non-space contents, and if it begins and
+   ends with space at the same time). In all other cases the spaces in the code
+   span are now left intact.
+
+   Other changes in behavior are in corner cases only. Refer to [CommonMark
+   0.29 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.29)
+   for more info.
+
+Fixes:
+
+ * [#68](https://github.com/mity/md4c/issues/68):
+   Some specific HTML blocks were not recognized when EOF follows without any
+   end-of-line character.
+
+ * [#69](https://github.com/mity/md4c/issues/69):
+   Strike-through span not working correctly when its opener mark is directly
+   followed by other opener mark; or when other closer mark directly precedes
+   its closer mark.
+
+
+## Version 0.3.1
+
+Fixes:
+
+ * [#58](https://github.com/mity/md4c/issues/58),
+   [#59](https://github.com/mity/md4c/issues/59),
+   [#60](https://github.com/mity/md4c/issues/60),
+   [#63](https://github.com/mity/md4c/issues/63),
+   [#66](https://github.com/mity/md4c/issues/66):
+   Some inputs could lead to quadratic parsing times. Thanks to Anders Kaseorg
+   for finding all those issues.
+
+ * [#61](https://github.com/mity/md4c/issues/59):
+   Flag `MD_FLAG_NOHTMLSPANS` erroneously affected also recognition of
+   CommonMark autolinks.
+
+
+## Version 0.3.0
+
+New features:
+
+ * Add extension for GitHub-style task lists:
+
+   ```
+    * [x] foo
+    * [x] bar
+    * [ ] baz
+   ```
+
+   (It has to be explicitly enabled with `MD_FLAG_TASKLISTS`.)
+
+ * Added support for building as a shared library. On non-Windows platforms,
+   this is now default behavior; on Windows static library is still the default.
+   The CMake option `BUILD_SHARED_LIBS` can be used to request one or the other
+   explicitly.
+
+   Contributed by Lisandro Damián Nicanor Pérez Meyer.
+
+ * Renamed structure `MD_RENDERER` to `MD_PARSER` and refactorize its contents
+   a little bit. Note this is source-level incompatible and initialization code
+   in apps may need to be updated.
+
+   The aim of the change is to be more friendly for long-term ABI compatibility
+   we shall maintain, starting with this release.
+
+ * Added `CHANGELOG.md` (this file).
+
+ * Make sure `md_process_table_row()` reports the same count of table cells for
+   all table rows, no matter how broken the input is. The cell count is derived
+   from table underline line. Bogus cells in other rows are silently ignored.
+   Missing cells in other rows are reported as empty ones.
+
+Fixes:
+
+ * CID 1475544:
+   Calling `md_free_attribute()` on uninitialized data.
+
+ * [#47](https://github.com/mity/md4c/issues/47):
+   Using bad offsets in `md_is_entity_str()`, in some cases leading to buffer
+   overflow.
+
+ * [#51](https://github.com/mity/md4c/issues/51):
+   Segfault in `md_process_table_cell()`.
+
+ * [#53](https://github.com/mity/md4c/issues/53):
+   With `MD_FLAG_PERMISSIVEURLAUTOLINKS` or `MD_FLAG_PERMISSIVEWWWAUTOLINKS`
+   we could generate bad output for ordinary Markdown links, if a non-space
+   character immediately follows like e.g. in `[link](http://github.com)X`.
+
+
+## Version 0.2.7
+
+This was the last version before the changelog has been added.
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -0,0 +1,56 @@
+
+cmake_minimum_required(VERSION 3.4)
+project(MD4C C)
+
+set(MD_VERSION_MAJOR 0)
+set(MD_VERSION_MINOR 4)
+set(MD_VERSION_RELEASE 3)
+set(MD_VERSION "${MD_VERSION_MAJOR}.${MD_VERSION_MINOR}.${MD_VERSION_RELEASE}")
+
+if(WIN32)
+    # On Windows, given there is no standard lib install dir etc., we rather
+    # by default build static lib.
+    option(BUILD_SHARED_LIBS "help string describing option" OFF)
+else()
+    # On Linux, MD4C is slowly being adding into some distros which prefer
+    # shared lib.
+    option(BUILD_SHARED_LIBS "help string describing option" ON)
+endif()
+
+add_definitions(
+    -DMD_VERSION_MAJOR=${MD_VERSION_MAJOR}
+    -DMD_VERSION_MINOR=${MD_VERSION_MINOR}
+    -DMD_VERSION_RELEASE=${MD_VERSION_RELEASE}
+)
+
+set(CMAKE_CONFIGURATION_TYPES Debug Release RelWithDebInfo MinSizeRel)
+if("${CMAKE_BUILD_TYPE}" STREQUAL "")
+    set(CMAKE_BUILD_TYPE $ENV{CMAKE_BUILD_TYPE})
+
+    if("${CMAKE_BUILD_TYPE}" STREQUAL "")
+        set(CMAKE_BUILD_TYPE "Release")
+    endif()
+endif()
+
+
+if(${CMAKE_C_COMPILER_ID} MATCHES GNU|Clang)
+    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall")
+elseif(MSVC)
+    # Disable warnings about the so-called unsecured functions:
+    add_definitions(/D_CRT_SECURE_NO_WARNINGS)
+
+    # Specify proper C runtime library:
+    string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
+    string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE}")
+    string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_RELWITHDEBINFO "{$CMAKE_C_FLAGS_RELWITHDEBINFO}")
+    string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_MINSIZEREL}")
+    set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /MTd")
+    set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /MT")
+    set(CMAKE_C_FLAGS_RELWITHDEBINFO "${CMAKE_C_FLAGS_RELEASE} /MT")
+    set(CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_RELEASE} /MT")
+endif()
+
+include(GNUInstallDirs)
+
+add_subdirectory(md4c)
+add_subdirectory(md2html)
--- a/LICENSE.md
+++ b/LICENSE.md
@ -0,0 +1,22 @@
+
+# The MIT License (MIT)
+
+Copyright © 2016-2020 Martin Mitáš
+
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this software and associated documentation files (the “Software”),
+to deal in the Software without restriction, including without limitation
+the rights to use, copy, modify, merge, publish, distribute, sublicense,
+and/or sell copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included
+in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+IN THE SOFTWARE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,286 @@
+[![Linux Build Status (travis-ci.com)](https://img.shields.io/travis/mity/md4c/master.svg?logo=linux&label=linux%20build)](https://travis-ci.org/mity/md4c)
+[![Windows Build Status (appveyor.com)](https://img.shields.io/appveyor/ci/mity/md4c/master.svg?logo=windows&label=windows%20build)](https://ci.appveyor.com/project/mity/md4c/branch/master)
+[![Code Coverage Status (codecov.io)](https://img.shields.io/codecov/c/github/mity/md4c/master.svg?logo=codecov&label=code%20coverage)](https://codecov.io/github/mity/md4c)
+[![Coverity Scan Status](https://img.shields.io/coverity/scan/mity-md4c.svg?label=coverity%20scan)](https://scan.coverity.com/projects/mity-md4c)
+
+
+# MD4C Readme
+
+* Home: http://github.com/mity/md4c
+* Wiki: http://github.com/mity/md4c/wiki
+* Issue tracker: http://github.com/mity/md4c/issues
+
+MD4C stands for "Markdown for C" and that's exactly what this project is about.
+
+
+## What is Markdown
+
+In short, Markdown is the markup language this `README.md` file is written in.
+
+The following resources can explain more if you are unfamiliar with it:
+* [Wikipedia article](http://en.wikipedia.org/wiki/Markdown)
+* [CommonMark site](http://commonmark.org)
+
+
+## What is MD4C
+
+MD4C is C Markdown parser with the following features:
+
+* **Compliance:** Generally MD4C aims to be compliant to the latest version of
+  [CommonMark specification](http://spec.commonmark.org/). Currently, we are
+  fully compliant to CommonMark 0.29.
+
+* **Extensions:** MD4C supports some commonly requested and accepted extensions.
+  See below.
+
+* **Compactness:** MD4C is implemented in one source file and one header file.
+  There are no dependencies other then standard C library.
+
+* **Embedding:** MD4C is easy to reuse in other projects, its API is very
+  straightforward: There is actually just one function, `md_parse()`.
+
+* **Push model:** MD4C parses the complete document and calls few callback
+  functions provided by the application to inform it about a start/end of
+  every block, a start/end of every span, and with any textual contents.
+
+* **Portability:** MD4C builds and works on Windows and POSIX-compliant OSes.
+  (It should be simple to make it run also on most other platforms, at least as
+  long as the platform provides C standard library, including a heap memory
+  management.)
+
+* **Encoding:** MD4C can be compiled to recognize ASCII-only control characters,
+  UTF-8 and, on Windows, also UTF-16 (i.e. what is on Windows commonly called
+  just "Unicode"). See more details below.
+
+* **Permissive license:** MD4C is available under the MIT license.
+
+* **Performance:** MD4C is [very fast](https://talk.commonmark.org/t/2520).
+
+
+## Using MD4C
+
+Application has to include the header `md4c.h` and link against MD4C library;
+or alternatively it may include `md4c.h` and `md4c.c` directly into its source
+base as the parser is only implemented in the single C source file.
+
+The main provided function is `md_parse()`. It takes a text in the Markdown
+syntax and a pointer to a structure which provides pointers to several callback
+functions.
+
+As `md_parse()` processes the input, it calls the callbacks (when entering or
+leaving any Markdown block or span; and when outputting any textual content of
+the document), allowing application to convert it into another format or render
+it onto the screen.
+
+An example implementation of simple renderer is available in the `md2html`
+directory which implements a conversion utility from Markdown to HTML.
+
+
+## Markdown Extensions
+
+The default behavior is to recognize only Markdown syntax defined by the
+[CommonMark specification](http://spec.commonmark.org/).
+
+However with appropriate flags, the behavior can be tuned to enable some
+additional extensions:
+
+* With the flag `MD_FLAG_COLLAPSEWHITESPACE`, a non-trivial whitespace is
+  collapsed into a single space.
+
+* With the flag `MD_FLAG_TABLES`, GitHub-style tables are supported.
+
+* With the flag `MD_FLAG_TASKLISTS`, GitHub-style task lists are supported.
+
+* With the flag `MD_FLAG_STRIKETHROUGH`, strike-through spans are enabled
+  (text enclosed in tilde marks, e.g. `~foo bar~`).
+
+* With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS` permissive URL autolinks
+  (not enclosed in `<` and `>`) are supported.
+
+* With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, permissive e-mail
+  autolinks (not enclosed in `<` and `>`) are supported.
+
+* With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS` permissive WWW autolinks
+  without any scheme specified (e.g. `www.example.com`) are supported. MD4C
+  then assumes `http:` scheme.
+
+* With the flag `MD_FLAG_LATEXMATHSPANS` LaTeX math spans (`$...$`) and
+  LaTeX display math spans (`$$...$$`) are supported. (Note though that the
+  HTML renderer outputs them verbatim in a custom tag `<x-equation>`.)
+
+* With the flag `MD_FLAG_WIKILINKS`, wiki-style links (`[[link label]]` and
+  `[[target article|link label]]`) are supported. (Note that the HTML renderer
+  outputs them in a custom tag `<x-wikilink>`.)
+
+* With the flag `MD_FLAG_UNDERLINE`, underscore (`_`) denotes an underline
+  instead of an ordinary emphasis or strong emphasis.
+
+Few features of CommonMark (those some people see as mis-features) may be
+disabled:
+
+* With the flag `MD_FLAG_NOHTMLSPANS` or `MD_FLAG_NOHTMLBLOCKS`, raw inline
+  HTML or raw HTML blocks respectively are disabled.
+
+* With the flag `MD_FLAG_NOINDENTEDCODEBLOCKS`, indented code blocks are
+  disabled.
+
+
+## Input/Output Encoding
+
+The CommonMark specification generally assumes UTF-8 input, but under closer
+inspection, Unicode plays any role in few very specific situations when parsing
+Markdown documents:
+
+1. For detection of word boundaries when processing emphasis and strong
+   emphasis, some classification of Unicode characters (whether it is
+   a whitespace or a punctuation) is needed.
+
+2. For (case-insensitive) matching of a link reference label with the
+   corresponding link reference definition, Unicode case folding is used.
+
+3. For translating HTML entities (e.g. `&amp;`) and numeric character
+   references (e.g. `&#35;` or `&#xcab;`) into their Unicode equivalents.
+
+   However MD4C leaves this translation on the renderer/application; as the
+   renderer is supposed to really know output encoding and whether it really
+   needs to perform this kind of translation. (For example, when the renderer
+   outputs HTML, it may leave the entities untranslated and defer the work to
+   a web browser.)
+
+MD4C relies on this property of the CommonMark and the implementation is, to
+a large degree, encoding-agnostic. Most of MD4C code only assumes that the
+encoding of your choice is compatible with ASCII, i.e. that the codepoints
+below 128 have the same numeric values as ASCII.
+
+Any input MD4C does not understand is simply seen as part of the document text
+and sent to the renderer's callback functions unchanged.
+
+The two situations (word boundary detection and link reference matching) where
+MD4C has to understand Unicode are handled as specified by the following rules:
+
+* If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8 for the
+  word boundary detection and for the case-insensitive matching of link labels.
+
+  When none of these macros is explicitly used, this is the default behavior.
+
+* On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses
+  `WCHAR` instead of `char` and assumes UTF-16 encoding in those situations.
+  (UTF-16 is what Windows developers usually call just "Unicode" and what
+  Win32API generally works with.)
+
+  Note that because this macro affects also the types in `md4c.h`, you have
+  to define the macro both when building MD4C as well as when including
+  `md4c.h`.
+
+  Also note this is only supported in the parser (`md4c.[hc]`). The HTML
+  renderer does not support this and you will have to write your own custom
+  renderer to use this feature.
+
+* If preprocessor macro `MD4C_USE_ASCII` is defined, MD4C assumes nothing but
+  an ASCII input.
+
+  That effectively means that non-ASCII whitespace or punctuation characters
+  won't be recognized as such and that link reference matching will work in
+  a case-insensitive way only for ASCII letters (`[a-zA-Z]`).
+
+
+## Documentation
+
+The API is quite well documented in the comments in the `md4c.h` header.
+
+There is also [project wiki](http://github.com/mity/md4c/wiki) which provides
+some more comprehensive documentation. However note it is incomplete and some
+details may be little-bit outdated.
+
+
+## FAQ
+
+**Q: In my code, I need to convert Markdown to HTML. How?**
+
+**A:** Indeed the API, as provided by `md4c.h`, is just a SAX-like Markdown
+parser. Nothing more and nothing less.
+
+That said, there is a complete HTML generator built on top of the parser in the
+directory `md2html` (the files `render_html.[hc]` and `entity.[hc]`). At this
+time, you have to directly reuse that code in your project.
+
+There is [some discussion](https://github.com/mity/md4c/issues/82) whether this
+should be changed (and how) in the future.
+
+**Q: How does MD4C compare to a parser XY?**
+
+**A:** Some other implementations combine Markdown parser and HTML generator
+into a single entangled code hidden behind an interface which just allows the
+conversion from Markdown to HTML, and they are unusable if you want to process
+the input in any other way.
+
+Even when the parsing is available as a standalone feature, most parsers (if
+not all of them; at least within the scope of C/C++ language) are full DOM-like
+parsers: They construct abstract syntax tree (AST) representation of the whole
+Markdown document. That takes time and it leads to bigger memory footprint.
+
+It's completely fine as long as you really need it. If you don't need the full
+AST, there is very high chance that using MD4C will be faster and much less
+memory-hungry.
+
+Last but not least, some Markdown parsers are implemented in a naive way. When
+fed with a [smartly crafted input pattern](test/pathological_tests.py), they
+may exhibit quadratic (or even worse) parsing times. What MD4C can still parse
+in a fraction of second may turn into long minutes or possibly hours with them.
+Hence, when such a naive parser is used to process an input from an untrusted
+source, the possibility of denial-of-service attacks becomes a real danger.
+
+A lot of our effort went into providing linear parsing times no matter what
+kind of crazy input MD4C parser is fed with. (If you encounter an input pattern
+which leads to a sub-linear parsing times, please do not hesitate and report it
+as a bug.)
+
+**Q: Does MD4C perform any input validation?**
+
+**A:** No.
+
+CommonMark specification declares that any sequence of (Unicode) characters is
+a valid Markdown document; i.e. that it does not matter whether some Markdown
+syntax is in some way broken or not. If it is broken, it will simply not be
+recognized and the parser should see the broken syntax construction just as a
+verbatim text.
+
+MD4C takes this a step further. It sees any sequence of bytes as a valid input,
+following completely the GIGO philosophy (garbage in, garbage out).
+
+If you need to validate that the input is, say, a valid UTF-8 document, you
+have to do it on your own. You can simply validate the whole Markdown document
+before passing it to the MD4C parser.
+
+Alternatively, you may perform the validation on the fly during the parsing,
+in the `MD_PARSER::text()` callback. (Given how MD4C works internally, it will
+never break a sequence of bytes into multiple calls of `MD_PARSER::text()`,
+unless that sequence is already broken to multiple pieces in the input by some
+whitespace, new line character(s) and/or any Markdown syntax construction.)
+
+
+## License
+
+MD4C is covered with MIT license, see the file `LICENSE.md`.
+
+
+## Links to Related Projects
+
+Ports and bindings to other languages:
+
+* [commonmark-d](https://github.com/AuburnSounds/commonmark-d):
+  Port of MD4C to D language.
+
+* [markdown-wasm](https://github.com/rsms/markdown-wasm):
+  Markdown parser and HTML generator for WebAssembly, based on MD4C.
+
+Software using MD4C:
+
+* [Qt](https://www.qt.io/):
+  Cross-platform C++ GUI framework.
+
+* [Textosaurus](https://github.com/martinrotter/textosaurus):
+  Cross-platform text editor based on Qt and Scintilla.
+
+* [8th](https://8th-dev.com/):
+  Cross-platform concatenative programming language.
--- a/appveyor.yml
+++ b/appveyor.yml
@ -0,0 +1,29 @@
+# YAML definition for Appveyor.com continuous integration.
+# See http://www.appveyor.com/docs/appveyor-yml
+
+version: '{branch}-{build}'
+
+before_build:
+  - 'cmake --version'
+  - 'if "%PLATFORM%"=="x64" cmake -G "Visual Studio 12 Win64" .'
+  - 'if not "%PLATFORM%"=="x64" cmake -G "Visual Studio 12" .'
+
+build:
+  project: md4c.sln
+  verbosity: detailed
+
+skip_tags: true
+
+os:
+  - Windows Server 2012 R2
+
+configuration:
+  - Debug
+  - Release
+
+platform:
+  - x64    # 64-bit build
+  - win32  # 32-bit build
+
+artifacts:
+  - path: $(configuration)/md2html/md2html.exe
--- a/codecov.yml
+++ b/codecov.yml
@ -0,0 +1,4 @@
+# YAML definition for codecov.io code coverage reports.
+
+ignore:
+    - "md2html"
--- a/md2html/CMakeLists.txt
+++ b/md2html/CMakeLists.txt
@ -0,0 +1,15 @@
+
+include_directories("${PROJECT_SOURCE_DIR}/md4c")
+
+add_executable(md2html cmdline.c cmdline.h entity.c entity.h md2html.c render_html.c render_html.h)
+target_link_libraries(md2html md4c)
+
+install(
+    TARGETS md2html
+    ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
+    LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
+    RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
+    PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
+)
+
+install(FILES "md2html.1" DESTINATION "${CMAKE_INSTALL_MANDIR}/man1")
--- a/md2html/cmdline.c
+++ b/md2html/cmdline.c
@ -0,0 +1,296 @@
+/* cmdline.c: a reentrant version of getopt(). Written 2006 by Brian
+ * Raiter. This code is in the public domain.
+ */
+
+#include	<stdio.h>
+#include	<stdlib.h>
+#include	<string.h>
+#include	<ctype.h>
+#include	"cmdline.h"
+
+#define	docallback(opt, val) \
+	    do { if ((r = callback(opt, val, data)) != 0) return r; } while (0)
+
+/* Parse the given cmdline arguments.
+ */
+int readoptions(option const* list, int argc, char **argv,
+		int (*callback)(int, char const*, void*), void *data)
+{
+    char		argstring[] = "--";
+    option const       *opt;
+    char const	       *val;
+    char const	       *p;
+    int			stop = 0;
+    int			argi, len, r;
+
+    if (!list || !callback)
+	return -1;
+
+    for (argi = 1 ; argi < argc ; ++argi)
+    {
+	/* First, check for "--", which forces all remaining arguments
+	 * to be treated as non-options.
+	 */
+	if (!stop && argv[argi][0] == '-' && argv[argi][1] == '-'
+					  && argv[argi][2] == '\0') {
+	    stop = 1;
+	    continue;
+	}
+
+	/* Arguments that do not begin with '-' (or are only "-") are
+	 * not options.
+	 */
+	if (stop || argv[argi][0] != '-' || argv[argi][1] == '\0') {
+	    docallback(0, argv[argi]);
+	    continue;
+	}
+
+	if (argv[argi][1] == '-')
+	{
+	    /* Arguments that begin with a double-dash are long
+	     * options.
+	     */
+	    p = argv[argi] + 2;
+	    val = strchr(p, '=');
+	    if (val)
+		len = val++ - p;
+	    else
+		len = strlen(p);
+
+	    /* Is it on the list of valid options? If so, does it
+	     * expect a parameter?
+	     */
+	    for (opt = list ; opt->optval ; ++opt)
+		if (opt->name && !strncmp(p, opt->name, len)
+			      && !opt->name[len])
+		    break;
+	    if (!opt->optval) {
+		docallback('?', argv[argi]);
+	    } else if (!val && opt->arg == 1) {
+		docallback(':', argv[argi]);
+	    } else if (val && opt->arg == 0) {
+		docallback('=', argv[argi]);
+	    } else {
+		docallback(opt->optval, val);
+	    }
+	}
+	else
+	{
+	    /* Arguments that begin with a single dash contain one or
+	     * more short options. Each character in the argument is
+	     * examined in turn, unless a parameter consumes the rest
+	     * of the argument (or possibly even the following
+	     * argument).
+	     */
+	    for (p = argv[argi] + 1 ; *p ; ++p) {
+		for (opt = list ; opt->optval ; ++opt)
+		    if (opt->chname == *p)
+			break;
+		if (!opt->optval) {
+		    argstring[1] = *p;
+		    docallback('?', argstring);
+		    continue;
+		} else if (opt->arg == 0) {
+		    docallback(opt->optval, NULL);
+		    continue;
+		} else if (p[1]) {
+		    docallback(opt->optval, p + 1);
+		    break;
+		} else if (argi + 1 < argc && strcmp(argv[argi + 1], "--")) {
+		    ++argi;
+		    docallback(opt->optval, argv[argi]);
+		    break;
+		} else if (opt->arg == 2) {
+		    docallback(opt->optval, NULL);
+		    continue;
+		} else {
+		    argstring[1] = *p;
+		    docallback(':', argstring);
+		    break;
+		}
+	    }
+	}
+    }
+    return 0;
+}
+
+/* Verify that str points to an ASCII zero or one (optionally with
+ * whitespace) and return the value present, or -1 if str's contents
+ * are anything else.
+ */
+static int readboolvalue(char const *str)
+{
+    char	d;
+
+    while (isspace(*str))
+	++str;
+    if (!*str)
+	return -1;
+    d = *str++;
+    while (isspace(*str))
+	++str;
+    if (*str)
+	return -1;
+    if (d == '0')
+	return 0;
+    else if (d == '1')
+	return 1;
+    else
+	return -1;
+}
+
+/* Parse a configuration file.
+ */
+int readcfgfile(option const* list, FILE *fp,
+		int (*callback)(int, char const*, void*), void *data)
+{
+    char		buf[1024];
+    option const       *opt;
+    char	       *name, *val, *p;
+    int			len, f, r;
+
+    while (fgets(buf, sizeof buf, fp) != NULL)
+    {
+	/* Strip off the trailing newline and any leading whitespace.
+	 * If the line begins with a hash sign, skip it entirely.
+	 */
+	len = strlen(buf);
+	if (len && buf[len - 1] == '\n')
+	    buf[--len] = '\0';
+	for (p = buf ; isspace(*p) ; ++p) ;
+	if (!*p || *p == '#')
+	    continue;
+
+	/* Find the end of the option's name and the beginning of the
+	 * parameter, if any.
+	 */
+	for (name = p ; *p && *p != '=' && !isspace(*p) ; ++p) ;
+	len = p - name;
+	for ( ; *p == '=' || isspace(*p) ; ++p) ;
+	val = p;
+
+	/* Is it on the list of valid options? Does it take a
+	 * full parameter, or just an optional boolean?
+	 */
+	for (opt = list ; opt->optval ; ++opt)
+	    if (opt->name && !strncmp(name, opt->name, len)
+			  && !opt->name[len])
+		    break;
+	if (!opt->optval) {
+	    docallback('?', name);
+	} else if (!*val && opt->arg == 1) {
+	    docallback(':', name);
+	} else if (*val && opt->arg == 0) {
+	    f = readboolvalue(val);
+	    if (f < 0)
+		docallback('=', name);
+	    else if (f == 1)
+		docallback(opt->optval, NULL);
+	} else {
+	    docallback(opt->optval, val);
+	}
+    }
+    return ferror(fp) ? -1 : 0;
+}
+
+/* Turn a string containing a cmdline into an argc-argv pair.
+ */
+int makecmdline(char const *cmdline, int *argcp, char ***argvp)
+{
+    char      **argv;
+    int		argc;
+    char const *s;
+    int		n, quoted;
+
+    if (!cmdline)
+	return 0;
+
+    /* Calcuate argc by counting the number of "clumps" of non-spaces.
+     */
+    for (s = cmdline ; isspace(*s) ; ++s) ;
+    if (!*s) {
+	*argcp = 1;
+	if (argvp) {
+	    *argvp = malloc(2 * sizeof(char*));
+	    if (!*argvp)
+		return 0;
+	    (*argvp)[0] = NULL;
+	    (*argvp)[1] = NULL;
+	}
+	return 1;
+    }
+    for (argc = 2, quoted = 0 ; *s ; ++s) {
+	if (quoted == '"') {
+	    if (*s == '"')
+		quoted = 0;
+	    else if (*s == '\\' && s[1])
+		++s;
+	} else if (quoted == '\'') {
+	    if (*s == '\'')
+		quoted = 0;
+	} else {
+	    if (isspace(*s)) {
+		for ( ; isspace(s[1]) ; ++s) ;
+		if (!s[1])
+		    break;
+		++argc;
+	    } else if (*s == '"' || *s == '\'') {
+		quoted = *s;
+	    }
+	}
+    }
+
+    *argcp = argc;
+    if (!argvp)
+	return 1;
+
+    /* Allocate space for all the arguments and their pointers.
+     */
+    argv = malloc((argc + 1) * sizeof(char*) + strlen(cmdline) + 1);
+    *argvp = argv;
+    if (!argv)
+	return 0;
+    argv[0] = NULL;
+    argv[1] = (char*)(argv + argc + 1);
+
+    /* Copy the string into the allocated memory immediately after the
+     * argv array. Where spaces immediately follows a nonspace,
+     * replace it with a \0. Where a nonspace immediately follows
+     * spaces, store a pointer to it. (Except, of course, when the
+     * space-nonspace transitions occur within quotes.)
+     */
+    for (s = cmdline ; isspace(*s) ; ++s) ;
+    for (argc = 1, n = 0, quoted = 0 ; *s ; ++s) {
+	if (quoted == '"') {
+	    if (*s == '"') {
+		quoted = 0;
+	    } else {
+		if (*s == '\\' && s[1])
+		    ++s;
+		argv[argc][n++] = *s;
+	    }
+	} else if (quoted == '\'') {
+	    if (*s == '\'')
+		quoted = 0;
+	    else
+		argv[argc][n++] = *s;
+	} else {
+	    if (isspace(*s)) {
+		argv[argc][n] = '\0';
+		for ( ; isspace(s[1]) ; ++s) ;
+		if (!s[1])
+		    break;
+		argv[argc + 1] = argv[argc] + n + 1;
+		++argc;
+		n = 0;
+	    } else {
+		if (*s == '"' || *s == '\'')
+		    quoted = *s;
+		else
+		    argv[argc][n++] = *s;
+	    }
+	}
+    }
+    argv[argc + 1] = NULL;
+    return 1;
+}
--- a/md2html/cmdline.h
+++ b/md2html/cmdline.h
@ -0,0 +1,86 @@
+/* cmdline.h: a reentrant version of getopt(). Written 2006 by Brian
+ * Raiter. This code is in the public domain.
+ */
+
+#ifndef	_cmdline_h_
+#define	_cmdline_h_
+
+/* The information specifying a single cmdline option.
+ */
+typedef struct option {
+    char const *name;		/* the option's long name, or "" if none */
+    char	chname;		/* a single-char name, or zero if none */
+    int		optval;		/* a unique value representing this option */
+    int		arg;		/* 0 = no arg, 1 = arg req'd, 2 = optional */
+} option;
+
+/* Parse the given cmdline arguments. list is an array of option
+ * structs, each entry specifying a valid option. The last struct in
+ * the array must have name set to NULL. argc and argv give the
+ * cmdline to parse. callback is the function to call for each option
+ * and non-option found on the cmdline. data is a pointer that is
+ * passed to each invocation of callback. The return value of callback
+ * should be zero to continue processing the cmdline, or any other
+ * value to abort. The return value of readoptions() is the value
+ * returned from the last callback, or zero if no arguments were
+ * found, or -1 if an error occurred.
+ *
+ * When readoptions() encounters a regular cmdline argument (i.e. a
+ * non-option argument), callback() is invoked with opt equal to zero
+ * and val pointing to the argument. When an option is found,
+ * callback() is invoked with opt equal to the optval field in the
+ * option struct corresponding to that option, and val points to the
+ * option's paramter, or is NULL if the option does not take a
+ * parameter. If readoptions() finds an option that does not appear in
+ * the list of valid options, callback() is invoked with opt equal to
+ * '?'. If readoptions() encounters an option that is missing its
+ * required parameter, callback() is invoked with opt equal to ':'. If
+ * readoptions() finds a parameter on a long option that does not
+ * admit a parameter, callback() is invoked with opt equal to '='. In
+ * each of these cases, val will point to the erroneous option
+ * argument.
+ */
+extern int readoptions(option const* list, int argc, char **argv,
+		       int (*callback)(int opt, char const *val, void *data),
+		       void *data);
+
+/* Parse the given file. list is an array of option structs, in the
+ * same form as taken by readoptions(). fp is a pointer to an open
+ * text file. callback is the function to call for each line found in
+ * the configuration file. data is a pointer that is passed to each
+ * invocation of callback. The return value of readcfgfile() is the
+ * value returned from the last callback, or zero if no arguments were
+ * found, or -1 if an error occurred while reading the file.
+ *
+ * The function will ignore lines that contain only whitespace, or
+ * lines that begin with a hash sign. All other lines should be of the
+ * form "OPTION=VALUE", where OPTION is one of the long options in
+ * list. Whitespace around the equal sign is permitted. An option that
+ * takes no arguments can either have a VALUE of 0 or 1, or omit the
+ * "=VALUE" entirely. (A VALUE of 0 will behave the same as if the
+ * line was not present.)
+ */
+extern int readcfgfile(option const* list, FILE *fp,
+		       int (*callback)(int opt, char const *val, void *data),
+		       void *data);
+
+
+/* Create an argc-argv pair from a string containing a command line.
+ * cmdline is the string to be parsed. argcp points to the variable to
+ * receive the argc value, and argvp points to the variable to receive
+ * the argv value. argvp can be NULL if the caller just wants to get
+ * argc. Zero is returned on failure. This function allocates memory
+ * on behalf of the caller. The memory is allocated as a single block,
+ * so it is sufficient to simply free() the pointer returned through
+ * argvp. Note that argv[0] will always be initialized to NULL; the
+ * first argument will be stored in argv[1]. The string is parsed by
+ * separating arguments on whitespace boundaries. Space within
+ * substrings enclosed in single-quotes is ignored. A substring
+ * enclosed in double-quotes is treated the same, except that the
+ * backslash is recognized as an escape character within such a
+ * substring. Enclosing quotes and escaping backslashes are not copied
+ * into the argv values.
+ */
+extern int makecmdline(char const *cmdline, int *argcp, char ***argvp);
+
+#endif
--- a/md2html/entity.c
+++ b/md2html/entity.c
--- a/md2html/entity.h
+++ b/md2html/entity.h
@ -0,0 +1,42 @@
+/*
+ * MD4C: Markdown parser for C
+ * (http://github.com/mity/md4c)
+ *
+ * Copyright (c) 2016-2017 Martin Mitas
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef MD2HTML_ENTITY_H
+#define MD2HTML_ENTITY_H
+
+#include <stdlib.h>
+
+
+/* Most entities are formed by single Unicode codepoint, few by two codepoints.
+ * Single-codepoint entities have codepoints[1] set to zero. */
+struct entity {
+    const char* name;
+    unsigned codepoints[2];
+};
+
+const struct entity* entity_lookup(const char* name, size_t name_size);
+
+
+#endif  /* MD2HTML_ENTITY_H */
--- a/md2html/md2html.1
+++ b/md2html/md2html.1
@ -0,0 +1,113 @@
+.TH MD2HTML 1 "June 2019" "" "General Commands Manual"
+.nh
+.ad l
+.
+.SH NAME
+.
+md2html \- convert Markdown to HTML
+.
+.SH SYNOPSIS
+.
+.B md2html
+.RI [ OPTION ]...\&
+.RI [ FILE ]
+.
+.SH OPTIONS
+.
+.SS General options:
+.
+.TP
+.BR -o ", " --output= \fIOUTFILE\fR
+Write output to \fIOUTFILE\fR instead of \fBstdout\fR(3)
+.
+.TP
+.BR -f ", " --full-html
+Generate full HTML document, including header
+.
+.TP
+.BR -s ", " --stat
+Measure time of input parsing
+.
+.TP
+.BR -h ", " --help
+Display help and exit
+.
+.TP
+.BR -v ", " --version
+Display version and exit
+.
+.SS Markdown dialect options:
+.
+.TP
+.B --commonmark
+CommonMark (the default)
+.
+.TP
+.B --github
+Github Flavored Markdown
+.
+.PP
+Note: dialect options are equivalent to some combination of flags below.
+.
+.SS Markdown extension options:
+.
+.TP
+.B --fcollapse-whitespace
+Collapse non-trivial whitespace
+.
+.TP
+.B --fverbatim-entities
+Do not translate entities
+.
+.TP
+.B --fpermissive-atx-headers
+Allow ATX headers without delimiting space
+.
+.TP
+.B --fpermissive-url-autolinks
+Allow URL autolinks without "<" and ">" delimiters
+.
+.TP
+.B --fpermissive-www-autolinks
+Allow WWW autolinks without any scheme (e.g. "www.example.com")
+.
+.TP
+.B --fpermissive-email-autolinks
+Allow e-mail autolinks without "<", ">" and "mailto:"
+.
+.TP
+.B --fpermissive-autolinks
+Enable all 3 of the above permissive autolinks options
+.
+.TP
+.B --fno-indented-code
+Disable indented code blocks
+.
+.TP
+.B --fno-html-blocks
+Disable raw HTML blocks
+.
+.TP
+.B --fno-html-spans
+Disable raw HTML spans
+.
+.TP
+.B --fno-html
+Same as \fB--fno-html-blocks --fno-html-spans\fR
+.
+.TP
+.B --ftables
+Enable tables
+.
+.TP
+.B --fstrikethrough
+Enable strikethrough spans
+.
+.TP
+.B --ftasklists
+Enable task lists
+.
+.SH SEE ALSO
+.
+https://github.com/mity/md4c
+.
--- a/md2html/md2html.c
+++ b/md2html/md2html.c
@ -0,0 +1,371 @@
+/*
+ * MD4C: Markdown parser for C
+ * (http://github.com/mity/md4c)
+ *
+ * Copyright (c) 2016-2017 Martin Mitas
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+
+#include "render_html.h"
+#include "cmdline.h"
+
+
+
+/* Global options. */
+static unsigned parser_flags = 0;
+static unsigned renderer_flags = MD_RENDER_FLAG_DEBUG;
+static int want_fullhtml = 0;
+static int want_stat = 0;
+
+
+/*********************************
+ ***  Simple grow-able buffer  ***
+ *********************************/
+
+/* We render to a memory buffer instead of directly outputting the rendered
+ * documents, as this allows using this utility for evaluating performance
+ * of MD4C (--stat option). This allows us to measure just time of the parser,
+ * without the I/O.
+ */
+
+struct membuffer {
+    char* data;
+    size_t asize;
+    size_t size;
+};
+
+static void
+membuf_init(struct membuffer* buf, MD_SIZE new_asize)
+{
+    buf->size = 0;
+    buf->asize = new_asize;
+    buf->data = malloc(buf->asize);
+    if(buf->data == NULL) {
+        fprintf(stderr, "membuf_init: malloc() failed.\n");
+        exit(1);
+    }
+}
+
+static void
+membuf_fini(struct membuffer* buf)
+{
+    if(buf->data)
+        free(buf->data);
+}
+
+static void
+membuf_grow(struct membuffer* buf, size_t new_asize)
+{
+    buf->data = realloc(buf->data, new_asize);
+    if(buf->data == NULL) {
+        fprintf(stderr, "membuf_grow: realloc() failed.\n");
+        exit(1);
+    }
+    buf->asize = new_asize;
+}
+
+static void
+membuf_append(struct membuffer* buf, const char* data, MD_SIZE size)
+{
+    if(buf->asize < buf->size + size)
+        membuf_grow(buf, buf->size + buf->size / 2 + size);
+    memcpy(buf->data + buf->size, data, size);
+    buf->size += size;
+}
+
+
+/**********************
+ ***  Main program  ***
+ **********************/
+
+static void
+process_output(const MD_CHAR* text, MD_SIZE size, void* userdata)
+{
+    membuf_append((struct membuffer*) userdata, text, size);
+}
+
+static int
+process_file(FILE* in, FILE* out)
+{
+    MD_SIZE n;
+    struct membuffer buf_in = {0};
+    struct membuffer buf_out = {0};
+    int ret = -1;
+    clock_t t0, t1;
+
+    membuf_init(&buf_in, 32 * 1024);
+
+    /* Read the input file into a buffer. */
+    while(1) {
+        if(buf_in.size >= buf_in.asize)
+            membuf_grow(&buf_in, buf_in.asize + buf_in.asize / 2);
+
+        n = fread(buf_in.data + buf_in.size, 1, buf_in.asize - buf_in.size, in);
+        if(n == 0)
+            break;
+        buf_in.size += n;
+    }
+
+    /* Input size is good estimation of output size. Add some more reserve to
+     * deal with the HTML header/footer and tags. */
+    membuf_init(&buf_out, buf_in.size + buf_in.size/8 + 64);
+
+    /* Parse the document. This shall call our callbacks provided via the
+     * md_renderer_t structure. */
+    t0 = clock();
+
+    ret = md_render_html(buf_in.data, buf_in.size, process_output,
+                (void*) &buf_out, parser_flags, renderer_flags);
+
+    t1 = clock();
+    if(ret != 0) {
+        fprintf(stderr, "Parsing failed.\n");
+        goto out;
+    }
+
+    /* Write down the document in the HTML format. */
+    if(want_fullhtml) {
+        fprintf(out, "<html>\n");
+        fprintf(out, "<head>\n");
+        fprintf(out, "<title></title>\n");
+        fprintf(out, "<meta name=\"generator\" content=\"md2html\">\n");
+        fprintf(out, "</head>\n");
+        fprintf(out, "<body>\n");
+    }
+
+    fwrite(buf_out.data, 1, buf_out.size, out);
+
+    if(want_fullhtml) {
+        fprintf(out, "</body>\n");
+        fprintf(out, "</html>\n");
+    }
+
+    if(want_stat) {
+        if(t0 != (clock_t)-1  &&  t1 != (clock_t)-1) {
+            double elapsed = (double)(t1 - t0) / CLOCKS_PER_SEC;
+            if (elapsed < 1)
+                fprintf(stderr, "Time spent on parsing: %7.2f ms.\n", elapsed*1e3);
+            else
+                fprintf(stderr, "Time spent on parsing: %6.3f s.\n", elapsed);
+        }
+    }
+
+    /* Success if we have reached here. */
+    ret = 0;
+
+out:
+    membuf_fini(&buf_in);
+    membuf_fini(&buf_out);
+
+    return ret;
+}
+
+
+#define OPTION_ARG_NONE         0
+#define OPTION_ARG_REQUIRED     1
+#define OPTION_ARG_OPTIONAL     2
+
+static const option cmdline_options[] = {
+    { "output",                     'o', 'o', OPTION_ARG_REQUIRED },
+    { "full-html",                  'f', 'f', OPTION_ARG_NONE },
+    { "stat",                       's', 's', OPTION_ARG_NONE },
+    { "help",                       'h', 'h', OPTION_ARG_NONE },
+    { "version",                    'v', 'v', OPTION_ARG_NONE },
+
+    { "commonmark",                  0,  'c', OPTION_ARG_NONE },
+    { "github",                      0,  'g', OPTION_ARG_NONE },
+
+    { "fcollapse-whitespace",        0,  'W', OPTION_ARG_NONE },
+    { "flatex-math",                 0,  'L', OPTION_ARG_NONE },
+    { "fpermissive-atx-headers",     0,  'A', OPTION_ARG_NONE },
+    { "fpermissive-autolinks",       0,  'V', OPTION_ARG_NONE },
+    { "fpermissive-email-autolinks", 0,  '@', OPTION_ARG_NONE },
+    { "fpermissive-url-autolinks",   0,  'U', OPTION_ARG_NONE },
+    { "fpermissive-www-autolinks",   0,  '.', OPTION_ARG_NONE },
+    { "fstrikethrough",              0,  'S', OPTION_ARG_NONE },
+    { "ftables",                     0,  'T', OPTION_ARG_NONE },
+    { "ftasklists",                  0,  'X', OPTION_ARG_NONE },
+    { "funderline",                  0,  '_', OPTION_ARG_NONE },
+    { "fverbatim-entities",          0,  'E', OPTION_ARG_NONE },
+    { "fwiki-links",                 0,  'K', OPTION_ARG_NONE },
+
+    { "fno-html-blocks",             0,  'F', OPTION_ARG_NONE },
+    { "fno-html-spans",              0,  'G', OPTION_ARG_NONE },
+    { "fno-html",                    0,  'H', OPTION_ARG_NONE },
+    { "fno-indented-code",           0,  'I', OPTION_ARG_NONE },
+
+    { 0 }
+};
+
+static void
+usage(void)
+{
+    printf(
+        "Usage: md2html [OPTION]... [FILE]\n"
+        "Convert input FILE (or standard input) in Markdown format to HTML.\n"
+        "\n"
+        "General options:\n"
+        "  -o  --output=FILE    Output file (default is standard output)\n"
+        "  -f, --full-html      Generate full HTML document, including header\n"
+        "  -s, --stat           Measure time of input parsing\n"
+        "  -h, --help           Display this help and exit\n"
+        "  -v, --version        Display version and exit\n"
+        "\n"
+        "Markdown dialect options:\n"
+        "(note these are equivalent to some combinations of the flags below)\n"
+        "      --commonmark     CommonMark (this is default)\n"
+        "      --github         Github Flavored Markdown\n"
+        "\n"
+        "Markdown extension options:\n"
+        "      --fcollapse-whitespace\n"
+        "                       Collapse non-trivial whitespace\n"
+        "      --flatex-math    Enable LaTeX style mathematics spans\n"
+        "      --fpermissive-atx-headers\n"
+        "                       Allow ATX headers without delimiting space\n"
+        "      --fpermissive-url-autolinks\n"
+        "                       Allow URL autolinks without '<', '>'\n"
+        "      --fpermissive-www-autolinks\n"
+        "                       Allow WWW autolinks without any scheme (e.g. 'www.example.com')\n"
+        "      --fpermissive-email-autolinks  \n"
+        "                       Allow e-mail autolinks without '<', '>' and 'mailto:'\n"
+        "      --fpermissive-autolinks\n"
+        "                       Same as --fpermissive-url-autolinks --fpermissive-www-autolinks\n"
+        "                       --fpermissive-email-autolinks\n"
+        "      --fstrikethrough Enable strike-through spans\n"
+        "      --ftables        Enable tables\n"
+        "      --ftasklists     Enable task lists\n"
+        "      --funderline     Enable underline spans\n"
+        "      --fwiki-links    Enable wiki links\n"
+        "\n"
+        "Markdown suppression options:\n"
+        "      --fno-html-blocks\n"
+        "                       Disable raw HTML blocks\n"
+        "      --fno-html-spans\n"
+        "                       Disable raw HTML spans\n"
+        "      --fno-html       Same as --fno-html-blocks --fno-html-spans\n"
+        "      --fno-indented-code\n"
+        "                       Disable indented code blocks\n"
+        "\n"
+        "HTML generator options:\n"
+        "      --fverbatim-entities\n"
+        "                       Do not translate entities\n"
+        "\n"
+    );
+}
+
+static void
+version(void)
+{
+    printf("%d.%d.%d\n", MD_VERSION_MAJOR, MD_VERSION_MINOR, MD_VERSION_RELEASE);
+}
+
+static const char* input_path = NULL;
+static const char* output_path = NULL;
+
+static int
+cmdline_callback(int opt, char const* value, void* data)
+{
+    switch(opt) {
+        case 0:
+            if(input_path) {
+                fprintf(stderr, "Too many arguments. Only one input file can be specified.\n");
+                fprintf(stderr, "Use --help for more info.\n");
+                exit(1);
+            }
+            input_path = value;
+            break;
+
+        case 'o':   output_path = value; break;
+        case 'f':   want_fullhtml = 1; break;
+        case 's':   want_stat = 1; break;
+        case 'h':   usage(); exit(0); break;
+        case 'v':   version(); exit(0); break;
+
+        case 'c':   parser_flags = MD_DIALECT_COMMONMARK; break;
+        case 'g':   parser_flags = MD_DIALECT_GITHUB; break;
+
+        case 'E':   renderer_flags |= MD_RENDER_FLAG_VERBATIM_ENTITIES; break;
+        case 'A':   parser_flags |= MD_FLAG_PERMISSIVEATXHEADERS; break;
+        case 'I':   parser_flags |= MD_FLAG_NOINDENTEDCODEBLOCKS; break;
+        case 'F':   parser_flags |= MD_FLAG_NOHTMLBLOCKS; break;
+        case 'G':   parser_flags |= MD_FLAG_NOHTMLSPANS; break;
+        case 'H':   parser_flags |= MD_FLAG_NOHTML; break;
+        case 'W':   parser_flags |= MD_FLAG_COLLAPSEWHITESPACE; break;
+        case 'U':   parser_flags |= MD_FLAG_PERMISSIVEURLAUTOLINKS; break;
+        case '.':   parser_flags |= MD_FLAG_PERMISSIVEWWWAUTOLINKS; break;
+        case '@':   parser_flags |= MD_FLAG_PERMISSIVEEMAILAUTOLINKS; break;
+        case 'V':   parser_flags |= MD_FLAG_PERMISSIVEAUTOLINKS; break;
+        case 'T':   parser_flags |= MD_FLAG_TABLES; break;
+        case 'S':   parser_flags |= MD_FLAG_STRIKETHROUGH; break;
+        case 'L':   parser_flags |= MD_FLAG_LATEXMATHSPANS; break;
+        case 'K':   parser_flags |= MD_FLAG_WIKILINKS; break;
+        case 'X':   parser_flags |= MD_FLAG_TASKLISTS; break;
+        case '_':   parser_flags |= MD_FLAG_UNDERLINE; break;
+
+        default:
+            fprintf(stderr, "Illegal option: %s\n", value);
+            fprintf(stderr, "Use --help for more info.\n");
+            exit(1);
+            break;
+    }
+
+    return 0;
+}
+
+int
+main(int argc, char** argv)
+{
+    FILE* in = stdin;
+    FILE* out = stdout;
+    int ret = 0;
+
+    if(readoptions(cmdline_options, argc, argv, cmdline_callback, NULL) < 0) {
+        usage();
+        exit(1);
+    }
+
+    if(input_path != NULL && strcmp(input_path, "-") != 0) {
+        in = fopen(input_path, "rb");
+        if(in == NULL) {
+            fprintf(stderr, "Cannot open %s.\n", input_path);
+            exit(1);
+        }
+    }
+    if(output_path != NULL && strcmp(output_path, "-") != 0) {
+        out = fopen(output_path, "wt");
+        if(out == NULL) {
+            fprintf(stderr, "Cannot open %s.\n", output_path);
+            exit(1);
+        }
+    }
+
+    ret = process_file(in, out);
+    if(in != stdin)
+        fclose(in);
+    if(out != stdout)
+        fclose(out);
+
+    return ret;
+}
--- a/md2html/render_html.c
+++ b/md2html/render_html.c
@ -0,0 +1,561 @@
+/*
+ * MD4C: Markdown parser for C
+ * (http://github.com/mity/md4c)
+ *
+ * Copyright (c) 2016-2019 Martin Mitas
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+
+#include "render_html.h"
+#include "entity.h"
+
+
+#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199409L
+    /* C89/90 or old compilers in general may not understand "inline". */
+    #if defined __GNUC__
+        #define inline __inline__
+    #elif defined _MSC_VER
+        #define inline __inline
+    #else
+        #define inline
+    #endif
+#endif
+
+#ifdef _WIN32
+    #define snprintf _snprintf
+#endif
+
+
+
+typedef struct MD_RENDER_HTML_tag MD_RENDER_HTML;
+struct MD_RENDER_HTML_tag {
+    void (*process_output)(const MD_CHAR*, MD_SIZE, void*);
+    void* userdata;
+    unsigned flags;
+    int image_nesting_level;
+    char escape_map[256];
+};
+
+#define NEED_HTML_ESC_FLAG   0x1
+#define NEED_URL_ESC_FLAG    0x2
+
+
+/*****************************************
+ ***  HTML rendering helper functions  ***
+ *****************************************/
+
+#define ISDIGIT(ch)     ('0' <= (ch) && (ch) <= '9')
+#define ISLOWER(ch)     ('a' <= (ch) && (ch) <= 'z')
+#define ISUPPER(ch)     ('A' <= (ch) && (ch) <= 'Z')
+#define ISALNUM(ch)     (ISLOWER(ch) || ISUPPER(ch) || ISDIGIT(ch))
+
+
+static inline void
+render_verbatim(MD_RENDER_HTML* r, const MD_CHAR* text, MD_SIZE size)
+{
+    r->process_output(text, size, r->userdata);
+}
+
+/* Keep this as a macro. Most compiler should then be smart enough to replace
+ * the strlen() call with a compile-time constant if the string is a C literal. */
+#define RENDER_VERBATIM(r, verbatim)                                    \
+        render_verbatim((r), (verbatim), (MD_SIZE) (strlen(verbatim)))
+
+
+static void
+render_html_escaped(MD_RENDER_HTML* r, const MD_CHAR* data, MD_SIZE size)
+{
+    MD_OFFSET beg = 0;
+    MD_OFFSET off = 0;
+
+    /* Some characters need to be escaped in normal HTML text. */
+    #define NEED_HTML_ESC(ch)   (r->escape_map[(unsigned char)(ch)] & NEED_HTML_ESC_FLAG)
+
+    while(1) {
+        /* Optimization: Use some loop unrolling. */
+        while(off + 3 < size  &&  !NEED_HTML_ESC(data[off+0])  &&  !NEED_HTML_ESC(data[off+1])
+                              &&  !NEED_HTML_ESC(data[off+2])  &&  !NEED_HTML_ESC(data[off+3]))
+            off += 4;
+        while(off < size  &&  !NEED_HTML_ESC(data[off]))
+            off++;
+
+        if(off > beg)
+            render_verbatim(r, data + beg, off - beg);
+
+        if(off < size) {
+            switch(data[off]) {
+                case '&':   RENDER_VERBATIM(r, "&amp;"); break;
+                case '<':   RENDER_VERBATIM(r, "&lt;"); break;
+                case '>':   RENDER_VERBATIM(r, "&gt;"); break;
+                case '"':   RENDER_VERBATIM(r, "&quot;"); break;
+            }
+            off++;
+        } else {
+            break;
+        }
+        beg = off;
+    }
+}
+
+static void
+render_url_escaped(MD_RENDER_HTML* r, const MD_CHAR* data, MD_SIZE size)
+{
+    static const MD_CHAR hex_chars[] = "0123456789ABCDEF";
+    MD_OFFSET beg = 0;
+    MD_OFFSET off = 0;
+
+    /* Some characters need to be escaped in URL attributes. */
+    #define NEED_URL_ESC(ch)    (r->escape_map[(unsigned char)(ch)] & NEED_URL_ESC_FLAG)
+
+    while(1) {
+        while(off < size  &&  !NEED_URL_ESC(data[off]))
+            off++;
+        if(off > beg)
+            render_verbatim(r, data + beg, off - beg);
+
+        if(off < size) {
+            char hex[3];
+
+            switch(data[off]) {
+                case '&':   RENDER_VERBATIM(r, "&amp;"); break;
+                default:
+                    hex[0] = '%';
+                    hex[1] = hex_chars[((unsigned)data[off] >> 4) & 0xf];
+                    hex[2] = hex_chars[((unsigned)data[off] >> 0) & 0xf];
+                    render_verbatim(r, hex, 3);
+                    break;
+            }
+            off++;
+        } else {
+            break;
+        }
+
+        beg = off;
+    }
+}
+
+static unsigned
+hex_val(char ch)
+{
+    if('0' <= ch && ch <= '9')
+        return ch - '0';
+    if('A' <= ch && ch <= 'Z')
+        return ch - 'A' + 10;
+    else
+        return ch - 'a' + 10;
+}
+
+static void
+render_utf8_codepoint(MD_RENDER_HTML* r, unsigned codepoint,
+                      void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
+{
+    static const MD_CHAR utf8_replacement_char[] = { 0xef, 0xbf, 0xbd };
+
+    unsigned char utf8[4];
+    size_t n;
+
+    if(codepoint <= 0x7f) {
+        n = 1;
+        utf8[0] = codepoint;
+    } else if(codepoint <= 0x7ff) {
+        n = 2;
+        utf8[0] = 0xc0 | ((codepoint >>  6) & 0x1f);
+        utf8[1] = 0x80 + ((codepoint >>  0) & 0x3f);
+    } else if(codepoint <= 0xffff) {
+        n = 3;
+        utf8[0] = 0xe0 | ((codepoint >> 12) & 0xf);
+        utf8[1] = 0x80 + ((codepoint >>  6) & 0x3f);
+        utf8[2] = 0x80 + ((codepoint >>  0) & 0x3f);
+    } else {
+        n = 4;
+        utf8[0] = 0xf0 | ((codepoint >> 18) & 0x7);
+        utf8[1] = 0x80 + ((codepoint >> 12) & 0x3f);
+        utf8[2] = 0x80 + ((codepoint >>  6) & 0x3f);
+        utf8[3] = 0x80 + ((codepoint >>  0) & 0x3f);
+    }
+
+    if(0 < codepoint  &&  codepoint <= 0x10ffff)
+        fn_append(r, (char*)utf8, n);
+    else
+        fn_append(r, utf8_replacement_char, 3);
+}
+
+/* Translate entity to its UTF-8 equivalent, or output the verbatim one
+ * if such entity is unknown (or if the translation is disabled). */
+static void
+render_entity(MD_RENDER_HTML* r, const MD_CHAR* text, MD_SIZE size,
+              void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
+{
+    if(r->flags & MD_RENDER_FLAG_VERBATIM_ENTITIES) {
+        fn_append(r, text, size);
+        return;
+    }
+
+    /* We assume UTF-8 output is what is desired. */
+    if(size > 3 && text[1] == '#') {
+        unsigned codepoint = 0;
+
+        if(text[2] == 'x' || text[2] == 'X') {
+            /* Hexadecimal entity (e.g. "&#x1234abcd;")). */
+            MD_SIZE i;
+            for(i = 3; i < size-1; i++)
+                codepoint = 16 * codepoint + hex_val(text[i]);
+        } else {
+            /* Decimal entity (e.g. "&1234;") */
+            MD_SIZE i;
+            for(i = 2; i < size-1; i++)
+                codepoint = 10 * codepoint + (text[i] - '0');
+        }
+
+        render_utf8_codepoint(r, codepoint, fn_append);
+        return;
+    } else {
+        /* Named entity (e.g. "&nbsp;"). */
+        const struct entity* ent;
+
+        ent = entity_lookup(text, size);
+        if(ent != NULL) {
+            render_utf8_codepoint(r, ent->codepoints[0], fn_append);
+            if(ent->codepoints[1])
+                render_utf8_codepoint(r, ent->codepoints[1], fn_append);
+            return;
+        }
+    }
+
+    fn_append(r, text, size);
+}
+
+static void
+render_attribute(MD_RENDER_HTML* r, const MD_ATTRIBUTE* attr,
+                 void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
+{
+    int i;
+
+    for(i = 0; attr->substr_offsets[i] < attr->size; i++) {
+        MD_TEXTTYPE type = attr->substr_types[i];
+        MD_OFFSET off = attr->substr_offsets[i];
+        MD_SIZE size = attr->substr_offsets[i+1] - off;
+        const MD_CHAR* text = attr->text + off;
+
+        switch(type) {
+            case MD_TEXT_NULLCHAR:  render_utf8_codepoint(r, 0x0000, render_verbatim); break;
+            case MD_TEXT_ENTITY:    render_entity(r, text, size, fn_append); break;
+            default:                fn_append(r, text, size); break;
+        }
+    }
+}
+
+
+static void
+render_open_ol_block(MD_RENDER_HTML* r, const MD_BLOCK_OL_DETAIL* det)
+{
+    char buf[64];
+
+    if(det->start == 1) {
+        RENDER_VERBATIM(r, "<ol>\n");
+        return;
+    }
+
+    snprintf(buf, sizeof(buf), "<ol start=\"%u\">\n", det->start);
+    RENDER_VERBATIM(r, buf);
+}
+
+static void
+render_open_li_block(MD_RENDER_HTML* r, const MD_BLOCK_LI_DETAIL* det)
+{
+    if(det->is_task) {
+        RENDER_VERBATIM(r, "<li class=\"task-list-item\">"
+                          "<input type=\"checkbox\" class=\"task-list-item-checkbox\" disabled");
+        if(det->task_mark == 'x' || det->task_mark == 'X')
+            RENDER_VERBATIM(r, " checked");
+        RENDER_VERBATIM(r, ">");
+    } else {
+        RENDER_VERBATIM(r, "<li>");
+    }
+}
+
+static void
+render_open_code_block(MD_RENDER_HTML* r, const MD_BLOCK_CODE_DETAIL* det)
+{
+    RENDER_VERBATIM(r, "<pre><code");
+
+    /* If known, output the HTML 5 attribute class="language-LANGNAME". */
+    if(det->lang.text != NULL) {
+        RENDER_VERBATIM(r, " class=\"language-");
+        render_attribute(r, &det->lang, render_html_escaped);
+        RENDER_VERBATIM(r, "\"");
+    }
+
+    RENDER_VERBATIM(r, ">");
+}
+
+static void
+render_open_td_block(MD_RENDER_HTML* r, const MD_CHAR* cell_type, const MD_BLOCK_TD_DETAIL* det)
+{
+    RENDER_VERBATIM(r, "<");
+    RENDER_VERBATIM(r, cell_type);
+
+    switch(det->align) {
+        case MD_ALIGN_LEFT:     RENDER_VERBATIM(r, " align=\"left\">"); break;
+        case MD_ALIGN_CENTER:   RENDER_VERBATIM(r, " align=\"center\">"); break;
+        case MD_ALIGN_RIGHT:    RENDER_VERBATIM(r, " align=\"right\">"); break;
+        default:                RENDER_VERBATIM(r, ">"); break;
+    }
+}
+
+static void
+render_open_a_span(MD_RENDER_HTML* r, const MD_SPAN_A_DETAIL* det)
+{
+    RENDER_VERBATIM(r, "<a href=\"");
+    render_attribute(r, &det->href, render_url_escaped);
+
+    if(det->title.text != NULL) {
+        RENDER_VERBATIM(r, "\" title=\"");
+        render_attribute(r, &det->title, render_html_escaped);
+    }
+
+    RENDER_VERBATIM(r, "\">");
+}
+
+static void
+render_open_img_span(MD_RENDER_HTML* r, const MD_SPAN_IMG_DETAIL* det)
+{
+    RENDER_VERBATIM(r, "<img src=\"");
+    render_attribute(r, &det->src, render_url_escaped);
+
+    RENDER_VERBATIM(r, "\" alt=\"");
+
+    r->image_nesting_level++;
+}
+
+static void
+render_close_img_span(MD_RENDER_HTML* r, const MD_SPAN_IMG_DETAIL* det)
+{
+    if(det->title.text != NULL) {
+        RENDER_VERBATIM(r, "\" title=\"");
+        render_attribute(r, &det->title, render_html_escaped);
+    }
+
+    RENDER_VERBATIM(r, "\">");
+
+    r->image_nesting_level--;
+}
+
+static void
+render_open_wikilink_span(MD_RENDER_HTML* r, const MD_SPAN_WIKILINK_DETAIL* det)
+{
+    RENDER_VERBATIM(r, "<x-wikilink data-target=\"");
+    render_attribute(r, &det->target, render_html_escaped);
+
+    RENDER_VERBATIM(r, "\">");
+}
+
+
+/**************************************
+ ***  HTML renderer implementation  ***
+ **************************************/
+
+static int
+enter_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
+{
+    static const MD_CHAR* head[6] = { "<h1>", "<h2>", "<h3>", "<h4>", "<h5>", "<h6>" };
+    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
+
+    switch(type) {
+        case MD_BLOCK_DOC:      /* noop */ break;
+        case MD_BLOCK_QUOTE:    RENDER_VERBATIM(r, "<blockquote>\n"); break;
+        case MD_BLOCK_UL:       RENDER_VERBATIM(r, "<ul>\n"); break;
+        case MD_BLOCK_OL:       render_open_ol_block(r, (const MD_BLOCK_OL_DETAIL*)detail); break;
+        case MD_BLOCK_LI:       render_open_li_block(r, (const MD_BLOCK_LI_DETAIL*)detail); break;
+        case MD_BLOCK_HR:       RENDER_VERBATIM(r, "<hr>\n"); break;
+        case MD_BLOCK_H:        RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break;
+        case MD_BLOCK_CODE:     render_open_code_block(r, (const MD_BLOCK_CODE_DETAIL*) detail); break;
+        case MD_BLOCK_HTML:     /* noop */ break;
+        case MD_BLOCK_P:        RENDER_VERBATIM(r, "<p>"); break;
+        case MD_BLOCK_TABLE:    RENDER_VERBATIM(r, "<table>\n"); break;
+        case MD_BLOCK_THEAD:    RENDER_VERBATIM(r, "<thead>\n"); break;
+        case MD_BLOCK_TBODY:    RENDER_VERBATIM(r, "<tbody>\n"); break;
+        case MD_BLOCK_TR:       RENDER_VERBATIM(r, "<tr>\n"); break;
+        case MD_BLOCK_TH:       render_open_td_block(r, "th", (MD_BLOCK_TD_DETAIL*)detail); break;
+        case MD_BLOCK_TD:       render_open_td_block(r, "td", (MD_BLOCK_TD_DETAIL*)detail); break;
+    }
+
+    return 0;
+}
+
+static int
+leave_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
+{
+    static const MD_CHAR* head[6] = { "</h1>\n", "</h2>\n", "</h3>\n", "</h4>\n", "</h5>\n", "</h6>\n" };
+    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
+
+    switch(type) {
+        case MD_BLOCK_DOC:      /*noop*/ break;
+        case MD_BLOCK_QUOTE:    RENDER_VERBATIM(r, "</blockquote>\n"); break;
+        case MD_BLOCK_UL:       RENDER_VERBATIM(r, "</ul>\n"); break;
+        case MD_BLOCK_OL:       RENDER_VERBATIM(r, "</ol>\n"); break;
+        case MD_BLOCK_LI:       RENDER_VERBATIM(r, "</li>\n"); break;
+        case MD_BLOCK_HR:       /*noop*/ break;
+        case MD_BLOCK_H:        RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break;
+        case MD_BLOCK_CODE:     RENDER_VERBATIM(r, "</code></pre>\n"); break;
+        case MD_BLOCK_HTML:     /* noop */ break;
+        case MD_BLOCK_P:        RENDER_VERBATIM(r, "</p>\n"); break;
+        case MD_BLOCK_TABLE:    RENDER_VERBATIM(r, "</table>\n"); break;
+        case MD_BLOCK_THEAD:    RENDER_VERBATIM(r, "</thead>\n"); break;
+        case MD_BLOCK_TBODY:    RENDER_VERBATIM(r, "</tbody>\n"); break;
+        case MD_BLOCK_TR:       RENDER_VERBATIM(r, "</tr>\n"); break;
+        case MD_BLOCK_TH:       RENDER_VERBATIM(r, "</th>\n"); break;
+        case MD_BLOCK_TD:       RENDER_VERBATIM(r, "</td>\n"); break;
+    }
+
+    return 0;
+}
+
+static int
+enter_span_callback(MD_SPANTYPE type, void* detail, void* userdata)
+{
+    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
+
+    if(r->image_nesting_level > 0) {
+        /* We are inside a Markdown image label. Markdown allows to use any
+         * emphasis and other rich contents in that context similarly as in
+         * any link label.
+         *
+         * However, unlike in the case of links (where that contents becomes
+         * contents of the <a>...</a> tag), in the case of images the contents
+         * is supposed to fall into the attribute alt: <img alt="...">.
+         *
+         * In that context we naturally cannot output nested HTML tags. So lets
+         * suppress them and only output the plain text (i.e. what falls into
+         * text() callback).
+         *
+         * This make-it-a-plain-text approach is the recommended practice by
+         * CommonMark specification (for HTML output).
+         */
+        return 0;
+    }
+
+    switch(type) {
+        case MD_SPAN_EM:                RENDER_VERBATIM(r, "<em>"); break;
+        case MD_SPAN_STRONG:            RENDER_VERBATIM(r, "<strong>"); break;
+        case MD_SPAN_U:                 RENDER_VERBATIM(r, "<u>"); break;
+        case MD_SPAN_A:                 render_open_a_span(r, (MD_SPAN_A_DETAIL*) detail); break;
+        case MD_SPAN_IMG:               render_open_img_span(r, (MD_SPAN_IMG_DETAIL*) detail); break;
+        case MD_SPAN_CODE:              RENDER_VERBATIM(r, "<code>"); break;
+        case MD_SPAN_DEL:               RENDER_VERBATIM(r, "<del>"); break;
+        case MD_SPAN_LATEXMATH:         RENDER_VERBATIM(r, "<x-equation>"); break;
+        case MD_SPAN_LATEXMATH_DISPLAY: RENDER_VERBATIM(r, "<x-equation type=\"display\">"); break;
+        case MD_SPAN_WIKILINK:          render_open_wikilink_span(r, (MD_SPAN_WIKILINK_DETAIL*) detail); break;
+    }
+
+    return 0;
+}
+
+static int
+leave_span_callback(MD_SPANTYPE type, void* detail, void* userdata)
+{
+    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
+
+    if(r->image_nesting_level > 0) {
+        /* Ditto as in enter_span_callback(), except we have to allow the
+         * end of the <img> tag. */
+        if(r->image_nesting_level == 1  &&  type == MD_SPAN_IMG)
+            render_close_img_span(r, (MD_SPAN_IMG_DETAIL*) detail);
+        return 0;
+    }
+
+    switch(type) {
+        case MD_SPAN_EM:                RENDER_VERBATIM(r, "</em>"); break;
+        case MD_SPAN_STRONG:            RENDER_VERBATIM(r, "</strong>"); break;
+        case MD_SPAN_U:                 RENDER_VERBATIM(r, "</u>"); break;
+        case MD_SPAN_A:                 RENDER_VERBATIM(r, "</a>"); break;
+        case MD_SPAN_IMG:               /*noop, handled above*/ break;
+        case MD_SPAN_CODE:              RENDER_VERBATIM(r, "</code>"); break;
+        case MD_SPAN_DEL:               RENDER_VERBATIM(r, "</del>"); break;
+        case MD_SPAN_LATEXMATH:         /*fall through*/
+        case MD_SPAN_LATEXMATH_DISPLAY: RENDER_VERBATIM(r, "</x-equation>"); break;
+        case MD_SPAN_WIKILINK:          RENDER_VERBATIM(r, "</x-wikilink>"); break;
+    }
+
+    return 0;
+}
+
+static int
+text_callback(MD_TEXTTYPE type, const MD_CHAR* text, MD_SIZE size, void* userdata)
+{
+    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
+
+    switch(type) {
+        case MD_TEXT_NULLCHAR:  render_utf8_codepoint(r, 0x0000, render_verbatim); break;
+        case MD_TEXT_BR:        RENDER_VERBATIM(r, (r->image_nesting_level == 0 ? "<br>\n" : " ")); break;
+        case MD_TEXT_SOFTBR:    RENDER_VERBATIM(r, (r->image_nesting_level == 0 ? "\n" : " ")); break;
+        case MD_TEXT_HTML:      render_verbatim(r, text, size); break;
+        case MD_TEXT_ENTITY:    render_entity(r, text, size, render_html_escaped); break;
+        default:                render_html_escaped(r, text, size); break;
+    }
+
+    return 0;
+}
+
+static void
+debug_log_callback(const char* msg, void* userdata)
+{
+    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
+    if(r->flags & MD_RENDER_FLAG_DEBUG)
+        fprintf(stderr, "MD4C: %s\n", msg);
+}
+
+int
+md_render_html(const MD_CHAR* input, MD_SIZE input_size,
+               void (*process_output)(const MD_CHAR*, MD_SIZE, void*),
+               void* userdata, unsigned parser_flags, unsigned renderer_flags)
+{
+    MD_RENDER_HTML render = { process_output, userdata, renderer_flags, 0, { 0 } };
+    int i;
+
+    MD_PARSER parser = {
+        0,
+        parser_flags,
+        enter_block_callback,
+        leave_block_callback,
+        enter_span_callback,
+        leave_span_callback,
+        text_callback,
+        debug_log_callback,
+        NULL
+    };
+
+    /* Build map of characters which need escaping. */
+    for(i = 0; i < 256; i++) {
+        unsigned char ch = (unsigned char) i;
+
+        if(strchr("\"&<>", ch) != NULL)
+            render.escape_map[i] |= NEED_HTML_ESC_FLAG;
+
+        if(!ISALNUM(ch)  &&  strchr("-_.+!*(),%#@?=;:/,+$", ch) == NULL)
+            render.escape_map[i] |= NEED_URL_ESC_FLAG;
+    }
+
+    return md_parse(input, input_size, &parser, (void*) &render);
+}
+
--- a/md2html/render_html.h
+++ b/md2html/render_html.h
@ -0,0 +1,66 @@
+/*
+ * MD4C: Markdown parser for C
+ * (http://github.com/mity/md4c)
+ *
+ * Copyright (c) 2016-2017 Martin Mitas
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef MD4C_RENDER_HTML_H
+#define MD4C_RENDER_HTML_H
+
+#include "md4c.h"
+
+#ifdef __cplusplus
+    extern "C" {
+#endif
+
+
+/* If set, debug output from md_parse() is sent to stderr. */
+#define MD_RENDER_FLAG_DEBUG                0x0001
+#define MD_RENDER_FLAG_VERBATIM_ENTITIES    0x0002
+
+
+/* Render Markdown into HTML.
+ *
+ * Note only contents of <body> tag is generated. Caller must generate
+ * HTML header/footer manually before/after calling md_render_html().
+ *
+ * Params input and input_size specify the Markdown input.
+ * Callback process_output() gets called with chunks of HTML output.
+ * (Typical implementation may just output the bytes to file or append to
+ * some buffer).
+ * Param userdata is just propgated back to process_output() callback.
+ * Param parser_flags are flags from md4c.h propagated to md_parse().
+ * Param render_flags is bitmask of MD_RENDER_FLAG_xxxx.
+ *
+ * Returns -1 on error (if md_parse() fails.)
+ * Returns 0 on success.
+ */
+int md_render_html(const MD_CHAR* input, MD_SIZE input_size,
+                   void (*process_output)(const MD_CHAR*, MD_SIZE, void*),
+                   void* userdata, unsigned parser_flags, unsigned renderer_flags);
+
+
+#ifdef __cplusplus
+    }  /* extern "C" { */
+#endif
+
+#endif  /* MD4C_RENDER_HTML_H */
--- a/md4c/CMakeLists.txt
+++ b/md4c/CMakeLists.txt
@ -0,0 +1,32 @@
+# Be sure to export all symbols in Windows.
+set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS 1)
+
+set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -DDEBUG")
+
+set(md4c_src
+    md4c.c
+)
+
+add_library(md4c ${md4c_src})
+
+set_target_properties(md4c PROPERTIES
+    VERSION ${MD_VERSION}
+    SOVERSION ${MD_VERSION_MAJOR}
+    PUBLIC_HEADER md4c.h
+)
+
+install(
+    TARGETS md4c
+    EXPORT md4cConfig
+    ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
+    LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
+    RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
+    PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
+)
+
+# Create a pkg-config file
+configure_file(md4c.pc.in md4c.pc @ONLY)
+install(FILES ${CMAKE_BINARY_DIR}/md4c/md4c.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig)
+
+# And a CMake file
+install(EXPORT md4cConfig DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/md4c/)
--- a/md4c/md4c.c
+++ b/md4c/md4c.c
--- a/md4c/md4c.h
+++ b/md4c/md4c.h
@ -0,0 +1,388 @@
+/*
+ * MD4C: Markdown parser for C
+ * (http://github.com/mity/md4c)
+ *
+ * Copyright (c) 2016-2020 Martin Mitas
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef MD4C_MARKDOWN_H
+#define MD4C_MARKDOWN_H
+
+#ifdef __cplusplus
+    extern "C" {
+#endif
+
+#if defined MD4C_USE_UTF16
+    /* Magic to support UTF-16. Not that in order to use it, you have to define
+     * the macro MD4C_USE_UTF16 both when building MD4C as well as when
+     * including this header in your code. */
+    #ifdef _WIN32
+        #include <windows.h>
+        typedef WCHAR       MD_CHAR;
+    #else
+        #error MD4C_USE_UTF16 is only supported on Windows.
+    #endif
+#else
+    typedef char            MD_CHAR;
+#endif
+
+typedef unsigned MD_SIZE;
+typedef unsigned MD_OFFSET;
+
+
+/* Block represents a part of document hierarchy structure like a paragraph
+ * or list item.
+ */
+typedef enum MD_BLOCKTYPE {
+    /* <body>...</body> */
+    MD_BLOCK_DOC = 0,
+
+    /* <blockquote>...</blockquote> */
+    MD_BLOCK_QUOTE,
+
+    /* <ul>...</ul>
+     * Detail: Structure MD_BLOCK_UL_DETAIL. */
+    MD_BLOCK_UL,
+
+    /* <ol>...</ol>
+     * Detail: Structure MD_BLOCK_OL_DETAIL. */
+    MD_BLOCK_OL,
+
+    /* <li>...</li>
+     * Detail: Structure MD_BLOCK_LI_DETAIL. */
+    MD_BLOCK_LI,
+
+    /* <hr> */
+    MD_BLOCK_HR,
+
+    /* <h1>...</h1> (for levels up to 6)
+     * Detail: Structure MD_BLOCK_H_DETAIL. */
+    MD_BLOCK_H,
+
+    /* <pre><code>...</code></pre>
+     * Note the text lines within code blocks are terminated with '\n'
+     * instead of explicit MD_TEXT_BR. */
+    MD_BLOCK_CODE,
+
+    /* Raw HTML block. This itself does not correspond to any particular HTML
+     * tag. The contents of it _is_ raw HTML source intended to be put
+     * in verbatim form to the HTML output. */
+    MD_BLOCK_HTML,
+
+    /* <p>...</p> */
+    MD_BLOCK_P,
+
+    /* <table>...</table> and its contents.
+     * Detail: Structure MD_BLOCK_TD_DETAIL (used with MD_BLOCK_TH and MD_BLOCK_TD)
+     * Note all of these are used only if extension MD_FLAG_TABLES is enabled. */
+    MD_BLOCK_TABLE,
+    MD_BLOCK_THEAD,
+    MD_BLOCK_TBODY,
+    MD_BLOCK_TR,
+    MD_BLOCK_TH,
+    MD_BLOCK_TD
+} MD_BLOCKTYPE;
+
+/* Span represents an in-line piece of a document which should be rendered with
+ * the same font, color and other attributes. A sequence of spans forms a block
+ * like paragraph or list item. */
+typedef enum MD_SPANTYPE {
+    /* <em>...</em> */
+    MD_SPAN_EM,
+
+    /* <strong>...</strong> */
+    MD_SPAN_STRONG,
+
+    /* <a href="xxx">...</a>
+     * Detail: Structure MD_SPAN_A_DETAIL. */
+    MD_SPAN_A,
+
+    /* <img src="xxx">...</a>
+     * Detail: Structure MD_SPAN_IMG_DETAIL.
+     * Note: Image text can contain nested spans and even nested images.
+     * If rendered into ALT attribute of HTML <IMG> tag, it's responsibility
+     * of the renderer to deal with it.
+     */
+    MD_SPAN_IMG,
+
+    /* <code>...</code> */
+    MD_SPAN_CODE,
+
+    /* <del>...</del>
+     * Note: Recognized only when MD_FLAG_STRIKETHROUGH is enabled.
+     */
+    MD_SPAN_DEL,
+
+    /* For recognizing inline ($) and display ($$) equations
+     * Note: Recognized only when MD_FLAG_LATEXMATHSPANS is enabled.
+     */
+    MD_SPAN_LATEXMATH,
+    MD_SPAN_LATEXMATH_DISPLAY,
+
+    /* Wiki links
+     * Note: Recognized only when MD_FLAG_WIKILINKS is enabled.
+     */
+    MD_SPAN_WIKILINK,
+
+    /* <u>...</u>
+     * Note: Recognized only when MD_FLAG_UNDERLINE is enabled. */
+    MD_SPAN_U
+} MD_SPANTYPE;
+
+/* Text is the actual textual contents of span. */
+typedef enum MD_TEXTTYPE {
+    /* Normal text. */
+    MD_TEXT_NORMAL = 0,
+
+    /* NULL character. CommonMark requires replacing NULL character with
+     * the replacement char U+FFFD, so this allows caller to do that easily. */
+    MD_TEXT_NULLCHAR,
+
+    /* Line breaks.
+     * Note these are not sent from blocks with verbatim output (MD_BLOCK_CODE
+     * or MD_BLOCK_HTML). In such cases, '\n' is part of the text itself. */
+    MD_TEXT_BR,         /* <br> (hard break) */
+    MD_TEXT_SOFTBR,     /* '\n' in source text where it is not semantically meaningful (soft break) */
+
+    /* Entity.
+     * (a) Named entity, e.g. &nbsp; 
+     *     (Note MD4C does not have a list of known entities.
+     *     Anything matching the regexp /&[A-Za-z][A-Za-z0-9]{1,47};/ is
+     *     treated as a named entity.)
+     * (b) Numerical entity, e.g. &#1234;
+     * (c) Hexadecimal entity, e.g. &#x12AB;
+     *
+     * As MD4C is mostly encoding agnostic, application gets the verbatim
+     * entity text into the MD_RENDERER::text_callback(). */
+    MD_TEXT_ENTITY,
+
+    /* Text in a code block (inside MD_BLOCK_CODE) or inlined code (`code`).
+     * If it is inside MD_BLOCK_CODE, it includes spaces for indentation and
+     * '\n' for new lines. MD_TEXT_BR and MD_TEXT_SOFTBR are not sent for this
+     * kind of text. */
+    MD_TEXT_CODE,
+
+    /* Text is a raw HTML. If it is contents of a raw HTML block (i.e. not
+     * an inline raw HTML), then MD_TEXT_BR and MD_TEXT_SOFTBR are not used.
+     * The text contains verbatim '\n' for the new lines. */
+    MD_TEXT_HTML,
+
+    /* Text is inside an equation. This is processed the same way as inlined code
+     * spans (`code`). */
+    MD_TEXT_LATEXMATH
+} MD_TEXTTYPE;
+
+
+/* Alignment enumeration. */
+typedef enum MD_ALIGN {
+    MD_ALIGN_DEFAULT = 0,   /* When unspecified. */
+    MD_ALIGN_LEFT,
+    MD_ALIGN_CENTER,
+    MD_ALIGN_RIGHT
+} MD_ALIGN;
+
+
+/* String attribute.
+ *
+ * This wraps strings which are outside of a normal text flow and which are
+ * propagated within various detailed structures, but which still may contain
+ * string portions of different types like e.g. entities.
+ *
+ * So, for example, lets consider an image has a title attribute string
+ * set to "foo &quot; bar". (Note the string size is 14.)
+ *
+ * Then the attribute MD_SPAN_IMG_DETAIL::title shall provide the following:
+ *  -- [0]: "foo "   (substr_types[0] == MD_TEXT_NORMAL; substr_offsets[0] == 0)
+ *  -- [1]: "&quot;" (substr_types[1] == MD_TEXT_ENTITY; substr_offsets[1] == 4)
+ *  -- [2]: " bar"   (substr_types[2] == MD_TEXT_NORMAL; substr_offsets[2] == 10)
+ *  -- [3]: (n/a)    (n/a                              ; substr_offsets[3] == 14)
+ *
+ * Note that these conditions are guaranteed:
+ *  -- substr_offsets[0] == 0
+ *  -- substr_offsets[LAST+1] == size
+ *  -- Only MD_TEXT_NORMAL, MD_TEXT_ENTITY, MD_TEXT_NULLCHAR substrings can appear.
+ */
+typedef struct MD_ATTRIBUTE {
+    const MD_CHAR* text;
+    MD_SIZE size;
+    const MD_TEXTTYPE* substr_types;
+    const MD_OFFSET* substr_offsets;
+} MD_ATTRIBUTE;
+
+
+/* Detailed info for MD_BLOCK_UL. */
+typedef struct MD_BLOCK_UL_DETAIL {
+    int is_tight;           /* Non-zero if tight list, zero if loose. */
+    MD_CHAR mark;           /* Item bullet character in MarkDown source of the list, e.g. '-', '+', '*'. */
+} MD_BLOCK_UL_DETAIL;
+
+/* Detailed info for MD_BLOCK_OL. */
+typedef struct MD_BLOCK_OL_DETAIL {
+    unsigned start;         /* Start index of the ordered list. */
+    int is_tight;           /* Non-zero if tight list, zero if loose. */
+    MD_CHAR mark_delimiter; /* Character delimiting the item marks in MarkDown source, e.g. '.' or ')' */
+} MD_BLOCK_OL_DETAIL;
+
+/* Detailed info for MD_BLOCK_LI. */
+typedef struct MD_BLOCK_LI_DETAIL {
+    int is_task;            /* Can be non-zero only with MD_FLAG_TASKLISTS */
+    MD_CHAR task_mark;      /* If is_task, then one of 'x', 'X' or ' '. Undefined otherwise. */
+    MD_OFFSET task_mark_offset;  /* If is_task, then offset in the input of the char between '[' and ']'. */
+} MD_BLOCK_LI_DETAIL;
+
+/* Detailed info for MD_BLOCK_H. */
+typedef struct MD_BLOCK_H_DETAIL {
+    unsigned level;         /* Header level (1 - 6) */
+} MD_BLOCK_H_DETAIL;
+
+/* Detailed info for MD_BLOCK_CODE. */
+typedef struct MD_BLOCK_CODE_DETAIL {
+    MD_ATTRIBUTE info;
+    MD_ATTRIBUTE lang;
+    MD_CHAR fence_char;     /* The character used for fenced code block; or zero for indented code block. */
+} MD_BLOCK_CODE_DETAIL;
+
+/* Detailed info for MD_BLOCK_TH and MD_BLOCK_TD. */
+typedef struct MD_BLOCK_TD_DETAIL {
+    MD_ALIGN align;
+} MD_BLOCK_TD_DETAIL;
+
+/* Detailed info for MD_SPAN_A. */
+typedef struct MD_SPAN_A_DETAIL {
+    MD_ATTRIBUTE href;
+    MD_ATTRIBUTE title;
+} MD_SPAN_A_DETAIL;
+
+/* Detailed info for MD_SPAN_IMG. */
+typedef struct MD_SPAN_IMG_DETAIL {
+    MD_ATTRIBUTE src;
+    MD_ATTRIBUTE title;
+} MD_SPAN_IMG_DETAIL;
+
+/* Detailed info for MD_SPAN_WIKILINK. */
+typedef struct MD_SPAN_WIKILINK {
+    MD_ATTRIBUTE target;
+} MD_SPAN_WIKILINK_DETAIL;
+
+/* Flags specifying extensions/deviations from CommonMark specification.
+ *
+ * By default (when MD_RENDERER::flags == 0), we follow CommonMark specification.
+ * The following flags may allow some extensions or deviations from it.
+ */
+#define MD_FLAG_COLLAPSEWHITESPACE          0x0001  /* In MD_TEXT_NORMAL, collapse non-trivial whitespace into single ' ' */
+#define MD_FLAG_PERMISSIVEATXHEADERS        0x0002  /* Do not require space in ATX headers ( ###header ) */
+#define MD_FLAG_PERMISSIVEURLAUTOLINKS      0x0004  /* Recognize URLs as autolinks even without '<', '>' */
+#define MD_FLAG_PERMISSIVEEMAILAUTOLINKS    0x0008  /* Recognize e-mails as autolinks even without '<', '>' and 'mailto:' */
+#define MD_FLAG_NOINDENTEDCODEBLOCKS        0x0010  /* Disable indented code blocks. (Only fenced code works.) */
+#define MD_FLAG_NOHTMLBLOCKS                0x0020  /* Disable raw HTML blocks. */
+#define MD_FLAG_NOHTMLSPANS                 0x0040  /* Disable raw HTML (inline). */
+#define MD_FLAG_TABLES                      0x0100  /* Enable tables extension. */
+#define MD_FLAG_STRIKETHROUGH               0x0200  /* Enable strikethrough extension. */
+#define MD_FLAG_PERMISSIVEWWWAUTOLINKS      0x0400  /* Enable WWW autolinks (even without any scheme prefix, if they begin with 'www.') */
+#define MD_FLAG_TASKLISTS                   0x0800  /* Enable task list extension. */
+#define MD_FLAG_LATEXMATHSPANS              0x1000  /* Enable $ and $$ containing LaTeX equations. */
+#define MD_FLAG_WIKILINKS                   0x2000  /* Enable wiki links extension. */
+#define MD_FLAG_UNDERLINE                   0x4000  /* Enable underline extension (and disables '_' for normal emphasis). */
+
+#define MD_FLAG_PERMISSIVEAUTOLINKS         (MD_FLAG_PERMISSIVEEMAILAUTOLINKS | MD_FLAG_PERMISSIVEURLAUTOLINKS | MD_FLAG_PERMISSIVEWWWAUTOLINKS)
+#define MD_FLAG_NOHTML                      (MD_FLAG_NOHTMLBLOCKS | MD_FLAG_NOHTMLSPANS)
+
+/* Convenient sets of flags corresponding to well-known Markdown dialects.
+ *
+ * Note we may only support subset of features of the referred dialect.
+ * The constant just enables those extensions which bring us as close as
+ * possible given what features we implement.
+ *
+ * ABI compatibility note: Meaning of these can change in time as new
+ * extensions, bringing the dialect closer to the original, are implemented.
+ */
+#define MD_DIALECT_COMMONMARK               0
+#define MD_DIALECT_GITHUB                   (MD_FLAG_PERMISSIVEAUTOLINKS | MD_FLAG_TABLES | MD_FLAG_STRIKETHROUGH | MD_FLAG_TASKLISTS)
+
+/* Renderer structure.
+ */
+typedef struct MD_PARSER {
+    /* Reserved. Set to zero.
+     */
+    unsigned abi_version;
+
+    /* Dialect options. Bitmask of MD_FLAG_xxxx values.
+     */
+    unsigned flags;
+
+    /* Caller-provided rendering callbacks.
+     *
+     * For some block/span types, more detailed information is provided in a
+     * type-specific structure pointed by the argument 'detail'.
+     *
+     * The last argument of all callbacks, 'userdata', is just propagated from
+     * md_parse() and is available for any use by the application.
+     *
+     * Note any strings provided to the callbacks as their arguments or as
+     * members of any detail structure are generally not zero-terminated.
+     * Application has take the respective size information into account.
+     *
+     * Callbacks may abort further parsing of the document by returning non-zero.
+     */
+    int (*enter_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
+    int (*leave_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
+
+    int (*enter_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
+    int (*leave_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
+
+    int (*text)(MD_TEXTTYPE /*type*/, const MD_CHAR* /*text*/, MD_SIZE /*size*/, void* /*userdata*/);
+
+    /* Debug callback. Optional (may be NULL).
+     *
+     * If provided and something goes wrong, this function gets called.
+     * This is intended for debugging and problem diagnosis for developers;
+     * it is not intended to provide any errors suitable for displaying to an
+     * end user.
+     */
+    void (*debug_log)(const char* /*msg*/, void* /*userdata*/);
+
+    /* Reserved. Set to NULL.
+     */
+    void (*syntax)(void);
+} MD_PARSER;
+
+
+/* For backward compatibility. Do not use in new code. */
+typedef MD_PARSER MD_RENDERER;
+
+
+/* Parse the Markdown document stored in the string 'text' of size 'size'.
+ * The renderer provides callbacks to be called during the parsing so the
+ * caller can render the document on the screen or convert the Markdown
+ * to another format.
+ *
+ * Zero is returned on success. If a runtime error occurs (e.g. a memory
+ * fails), -1 is returned. If the processing is aborted due any callback
+ * returning non-zero, md_parse() the return value of the callback is returned.
+ */
+int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata);
+
+
+#ifdef __cplusplus
+    }  /* extern "C" { */
+#endif
+
+#endif  /* MD4C_MARKDOWN_H */
--- a/md4c/md4c.pc.in
+++ b/md4c/md4c.pc.in
@ -0,0 +1,12 @@
+prefix=@CMAKE_INSTALL_PREFIX@
+exec_prefix=@CMAKE_INSTALL_PREFIX@
+libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@
+includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@
+
+Name: @PROJECT_NAME@
+Description: @PROJECT_DESCRIPTION@
+Version: @PROJECT_VERSION@
+
+Requires:
+Libs: -L${libdir} -lmd4c
+Cflags: -I${includedir}
--- a/scripts/build_folding_map.py
+++ b/scripts/build_folding_map.py
@ -0,0 +1,118 @@
+#!/usr/bin/env python3
+
+import os
+import sys
+import textwrap
+
+
+self_path = os.path.dirname(os.path.realpath(__file__));
+f = open(self_path + "/unicode/CaseFolding.txt", "r")
+
+status_list = [ "C", "F" ]
+
+folding_list = [ dict(), dict(), dict() ]
+
+# Filter the foldings for "full" folding.
+for line in f:
+    comment_off = line.find("#")
+    if comment_off >= 0:
+        line = line[:comment_off]
+    line = line.strip()
+    if not line:
+        continue
+
+    raw_codepoint, status, raw_mapping, ignored_tail = line.split(";", 3)
+    if not status.strip() in status_list:
+        continue
+    codepoint = int(raw_codepoint.strip(), 16)
+    mapping = [int(it, 16) for it in raw_mapping.strip().split(" ")]
+    mapping_len = len(mapping)
+
+    if mapping_len in range(1, 4):
+        folding_list[mapping_len-1][codepoint] = mapping
+    else:
+        assert(False)
+f.close()
+
+
+# If we assume that range (index0 ... index-1) makes a range, check that index
+# is compatible with it too.
+#
+# We are capable to handle ranges which:
+#
+# (1) either form consecutive sequence of codepoints and which map that range
+#     to other consecutive range of codepoints;
+#
+# (2) or consecutive range of codepoints with step 2 where each codepoint
+#     CP is mapped to the next codepoint CP+1
+#     (e.g. 0x1234 -> 0x1235; 0x1236 -> 0x1238; ...).
+#
+# (If the mappings have multiple codepoints, only the 1st mapped codepoint is
+# considered and all the other ones have to be the same for the whole range.)
+def is_range_compatible(folding, codepoint_list, index0, index):
+    N = index - index0
+    codepoint0 = codepoint_list[index0]
+    codepoint1 = codepoint_list[index0+1]
+    codepointN = codepoint_list[index]
+    mapping0 = folding[codepoint0]
+    mapping1 = folding[codepoint1]
+    mappingN = folding[codepointN]
+
+    # Check the range type (1):
+    if codepoint1 - codepoint0 == 1 and codepointN - codepoint0 == N                \
+            and mapping1[0] - mapping0[0] == 1 and mapping1[1:] == mapping0[1:]     \
+            and mappingN[0] - mapping0[0] == N and mappingN[1:] == mapping0[1:]:
+        return True
+
+    # Check the range type (2):
+    if codepoint1 - codepoint0 == 2 and codepointN - codepoint0 == 2 * N            \
+            and mapping0[0] - codepoint0 == 1                                       \
+            and mapping1[0] - codepoint1 == 1 and mapping1[1:] == mapping0[1:]      \
+            and mappingN[0] - codepointN == 1 and mappingN[1:] == mapping0[1:]:
+        return True
+
+    return False
+
+
+def mapping_str(list, mapping):
+    return ",".join("0x{:04x}".format(x) for x in mapping)
+
+for mapping_len in range(1, 4):
+    folding = folding_list[mapping_len-1]
+    codepoint_list = list(folding)
+
+    index0 = 0
+    count = len(folding)
+
+    records = list()
+    data_records = list()
+
+    while index0 < count:
+        index1 = index0 + 1
+        while index1 < count and is_range_compatible(folding, codepoint_list, index0, index1):
+            index1 += 1
+
+        if index1 - index0 > 2:
+            # Range of codepoints
+            records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
+            data_records.append(mapping_str(data_records, folding[codepoint_list[index0]]))
+            data_records.append(mapping_str(data_records, folding[codepoint_list[index1-1]]))
+        else:
+            # Single codepoint
+            records.append("S(0x{:04x})".format(codepoint_list[index0]))
+            data_records.append(mapping_str(data_records, folding[codepoint_list[index0]]))
+
+        index0 = index1
+
+    sys.stdout.write("static const unsigned FOLD_MAP_{}[] = {{\n".format(mapping_len))
+    sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
+                        initial_indent = "    ", subsequent_indent="    ")))
+    sys.stdout.write("\n};\n")
+
+    sys.stdout.write("static const unsigned FOLD_MAP_{}_DATA[] = {{\n".format(mapping_len))
+    sys.stdout.write("\n".join(textwrap.wrap(", ".join(data_records), 110,
+                        initial_indent = "    ", subsequent_indent="    ")))
+    sys.stdout.write("\n};\n")
+
+
+
--- a/scripts/build_punct_map.py
+++ b/scripts/build_punct_map.py
@ -0,0 +1,66 @@
+#!/usr/bin/env python3
+
+import os
+import sys
+import textwrap
+
+
+self_path = os.path.dirname(os.path.realpath(__file__));
+f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r")
+
+codepoint_list = []
+category_list = [ "Pc", "Pd", "Pe", "Pf", "Pi", "Po", "Ps" ]
+
+# Filter codepoints falling in the right category:
+for line in f:
+    comment_off = line.find("#")
+    if comment_off >= 0:
+        line = line[:comment_off]
+    line = line.strip()
+    if not line:
+        continue
+
+    char_range, category = line.split(";")
+    char_range = char_range.strip()
+    category = category.strip()
+
+    if not category in category_list:
+        continue
+
+    delim_off = char_range.find("..")
+    if delim_off >= 0:
+        codepoint0 = int(char_range[:delim_off], 16)
+        codepoint1 = int(char_range[delim_off+2:], 16)
+        for codepoint in range(codepoint0, codepoint1 + 1):
+            codepoint_list.append(codepoint)
+    else:
+        codepoint = int(char_range, 16)
+        codepoint_list.append(codepoint)
+f.close()
+
+
+codepoint_list.sort()
+
+
+index0 = 0
+count = len(codepoint_list)
+
+records = list()
+while index0 < count:
+    index1 = index0 + 1
+    while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1:
+        index1 += 1
+
+    if index1 - index0 > 1:
+        # Range of codepoints
+        records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
+    else:
+        # Single codepoint
+        records.append("S(0x{:04x})".format(codepoint_list[index0]))
+
+    index0 = index1
+
+sys.stdout.write("static const unsigned PUNCT_MAP[] = {\n")
+sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
+                    initial_indent = "    ", subsequent_indent="    ")))
+sys.stdout.write("\n};\n\n")
--- a/scripts/build_whitespace_map.py
+++ b/scripts/build_whitespace_map.py
@ -0,0 +1,66 @@
+#!/usr/bin/env python3
+
+import os
+import sys
+import textwrap
+
+
+self_path = os.path.dirname(os.path.realpath(__file__));
+f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r")
+
+codepoint_list = []
+category_list = [ "Zs" ]
+
+# Filter codepoints falling in the right category:
+for line in f:
+    comment_off = line.find("#")
+    if comment_off >= 0:
+        line = line[:comment_off]
+    line = line.strip()
+    if not line:
+        continue
+
+    char_range, category = line.split(";")
+    char_range = char_range.strip()
+    category = category.strip()
+
+    if not category in category_list:
+        continue
+
+    delim_off = char_range.find("..")
+    if delim_off >= 0:
+        codepoint0 = int(char_range[:delim_off], 16)
+        codepoint1 = int(char_range[delim_off+2:], 16)
+        for codepoint in range(codepoint0, codepoint1 + 1):
+            codepoint_list.append(codepoint)
+    else:
+        codepoint = int(char_range, 16)
+        codepoint_list.append(codepoint)
+f.close()
+
+
+codepoint_list.sort()
+
+
+index0 = 0
+count = len(codepoint_list)
+
+records = list()
+while index0 < count:
+    index1 = index0 + 1
+    while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1:
+        index1 += 1
+
+    if index1 - index0 > 1:
+        # Range of codepoints
+        records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
+    else:
+        # Single codepoint
+        records.append("S(0x{:04x})".format(codepoint_list[index0]))
+
+    index0 = index1
+
+sys.stdout.write("static const unsigned WHITESPACE_MAP[] = {\n")
+sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
+                    initial_indent = "    ", subsequent_indent="    ")))
+sys.stdout.write("\n};\n\n")
--- a/scripts/coverity.sh
+++ b/scripts/coverity.sh
@ -0,0 +1,70 @@
+#!/bin/sh
+#
+# This scripts attempts to build the project via cov-build utility, and prepare
+# a package for uploading to the coverity scan service.
+#
+# (See http://scan.coverity.com for more info.)
+
+set -e
+
+# Check presence of coverity static analyzer.
+if ! which cov-build; then
+    echo "Utility cov-build not found in PATH."
+    exit 1
+fi
+
+# Choose a build system (ninja or GNU make).
+if which ninja; then
+    BUILD_TOOL=ninja
+    GENERATOR=Ninja
+elif which make; then
+    BUILD_TOOL=make
+    GENERATOR="MSYS Makefiles"
+else
+    echo "No suitable build system found."
+    exit 1
+fi
+
+# Choose a zip tool.
+if which 7za; then
+    MKZIP="7za a -r -mx9"
+elif which 7z; then
+    MKZIP="7z a -r -mx9"
+elif which zip; then
+    MKZIP="zip -r"
+else
+    echo "No suitable zip utility found"
+    exit 1
+fi
+
+# Change dir to project root.
+cd `dirname "$0"`/..
+
+CWD=`pwd`
+ROOT_DIR="$CWD"
+BUILD_DIR="$CWD/coverity"
+OUTPUT="$CWD/cov-int.zip"
+
+# Sanity checks.
+if [ ! -x "$ROOT_DIR/scripts/coverity.sh" ]; then
+    echo "There is some path mismatch."
+    exit 1
+fi
+if [ -e "$BUILD_DIR" ]; then
+    echo "Path $BUILD_DIR already exists. Delete it and retry."
+    exit 1
+fi
+if [ -e "$OUTPUT" ]; then
+    echo "Path $OUTPUT already exists. Delete it and retry."
+    exit 1
+fi
+
+# Build the project with the Coverity analyzes enabled.
+mkdir -p "$BUILD_DIR"
+cd "$BUILD_DIR"
+cmake -G "$GENERATOR" "$ROOT_DIR"
+cov-build --dir cov-int "$BUILD_TOOL"
+$MKZIP "$OUTPUT" "cov-int"
+cd "$ROOT_DIR"
+rm -rf "$BUILD_DIR"
+
--- a/scripts/run-tests.sh
+++ b/scripts/run-tests.sh
@ -0,0 +1,75 @@
+#!/bin/sh
+#
+# Run this script from build directory.
+
+#set -e
+
+SELF_DIR=`dirname $0`
+PROJECT_DIR="$SELF_DIR/.."
+TEST_DIR="$PROJECT_DIR/test"
+
+
+PROGRAM="md2html/md2html"
+if [ ! -x "$PROGRAM" ]; then
+    echo "Cannot find the $PROGRAM." >&2
+    echo "You have to run this script from the build directory." >&2
+    exit 1
+fi
+
+if which py >>/dev/null 2>&1; then
+    PYTHON=py
+elif which python3 >>/dev/null 2>&1; then
+    PYTHON=python3
+elif which python >>/dev/null 2>&1; then
+    if [ `python --version | awk '{print $2}' | cut -d. -f1` -ge 3 ]; then
+        PYTHON=python
+    fi
+fi
+
+echo
+echo "CommonMark specification:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/spec.txt" -p "$PROGRAM"
+
+echo
+echo "Code coverage & regressions:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/coverage.txt" -p "$PROGRAM"
+
+echo
+echo "Permissive e-mail autolinks extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-email-autolinks.txt" -p "$PROGRAM --fpermissive-email-autolinks"
+
+echo
+echo "Permissive URL autolinks extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-url-autolinks.txt" -p "$PROGRAM --fpermissive-url-autolinks"
+
+echo
+echo "WWW autolinks extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-www-autolinks.txt" -p "$PROGRAM --fpermissive-www-autolinks"
+
+echo
+echo "Tables extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/tables.txt" -p "$PROGRAM --ftables"
+
+echo
+echo "Strikethrough extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/strikethrough.txt" -p "$PROGRAM --fstrikethrough"
+
+echo
+echo "Task lists extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/tasklists.txt" -p "$PROGRAM --ftasklists"
+
+echo
+echo "LaTeX extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/latex-math.txt" -p "$PROGRAM --flatex-math"
+
+echo
+echo "Wiki links extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/wiki-links.txt" -p "$PROGRAM --fwiki-links --ftables"
+
+echo
+echo "Underline extension:"
+$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/underline.txt" -p "$PROGRAM --funderline"
+
+echo
+echo "Pathological input:"
+$PYTHON "$TEST_DIR/pathological_tests.py" -p "$PROGRAM"
--- a/scripts/unicode/CaseFolding.txt
+++ b/scripts/unicode/CaseFolding.txt
--- a/scripts/unicode/DerivedGeneralCategory.txt
+++ b/scripts/unicode/DerivedGeneralCategory.txt
--- a/test/LICENSE
+++ b/test/LICENSE
@ -0,0 +1,64 @@
+The CommonMark spec (spec.txt) and DTD (CommonMark.dtd) are
+
+Copyright (C) 2014-16 John MacFarlane
+
+Released under the Creative Commons CC-BY-SA 4.0 license:
+<http://creativecommons.org/licenses/by-sa/4.0/>.
+
+---
+
+The test software in test/ and the programs in tools/ are
+
+Copyright (c) 2014, John MacFarlane
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+    * Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+
+    * Redistributions in binary form must reproduce the above
+      copyright notice, this list of conditions and the following
+      disclaimer in the documentation and/or other materials provided
+      with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+---
+
+The normalization code in runtests.py was derived from the
+markdowntest project, Copyright 2013 Karl Dubost:
+
+The MIT License (MIT)
+
+Copyright (c) 2013 Karl Dubost
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--- a/test/cmark.py
+++ b/test/cmark.py
@ -0,0 +1,40 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+from ctypes import CDLL, c_char_p, c_long
+from subprocess import *
+import platform
+import os
+
+def pipe_through_prog(prog, text):
+    p1 = Popen(prog.split(), stdout=PIPE, stdin=PIPE, stderr=PIPE)
+    [result, err] = p1.communicate(input=text.encode('utf-8'))
+    return [p1.returncode, result.decode('utf-8'), err]
+
+def use_library(lib, text):
+    textbytes = text.encode('utf-8')
+    textlen = len(textbytes)
+    return [0, lib(textbytes, textlen, 0).decode('utf-8'), '']
+
+class CMark:
+    def __init__(self, prog=None, library_dir=None):
+        self.prog = prog
+        if prog:
+            self.to_html = lambda x: pipe_through_prog(prog, x)
+        else:
+            sysname = platform.system()
+            if sysname == 'Darwin':
+                libname = "libcmark.dylib"
+            elif sysname == 'Windows':
+                libname = "cmark.dll"
+            else:
+                libname = "libcmark.so"
+            if library_dir:
+                libpath = os.path.join(library_dir, libname)
+            else:
+                libpath = os.path.join("build", "src", libname)
+            cmark = CDLL(libpath)
+            markdown = cmark.cmark_markdown_to_html
+            markdown.restype = c_char_p
+            markdown.argtypes = [c_char_p, c_long]
+            self.to_html = lambda x: use_library(markdown, x)
--- a/test/coverage.txt
+++ b/test/coverage.txt
@ -0,0 +1,464 @@
+
+# Coverage
+
+This file is just a collection of unit tests not covered elsewhere.
+
+Most notably regression tests, tests improving code coverage and other useful
+things may drop here.
+
+(However any tests requiring any additional command line option, like enabling
+an extension, must be included in their respective files.)
+
+
+## GitHub Issues
+
+### [Issue 2](https://github.com/mity/md4c/issues/2)
+
+Raw HTML block:
+
+```````````````````````````````` example
+<gi att1=tok1 att2=tok2>
+.
+<gi att1=tok1 att2=tok2>
+````````````````````````````````
+
+Inline:
+
+```````````````````````````````` example
+foo <gi att1=tok1 att2=tok2> bar
+.
+<p>foo <gi att1=tok1 att2=tok2> bar</p>
+````````````````````````````````
+
+Inline with a line break:
+
+```````````````````````````````` example
+foo <gi att1=tok1
+att2=tok2> bar
+.
+<p>foo <gi att1=tok1
+att2=tok2> bar</p>
+````````````````````````````````
+
+
+### [Issue 4](https://github.com/mity/md4c/issues/4)
+
+```````````````````````````````` example
+![alt text with *entity* &copy;](img.png 'title')
+.
+<p><img src="img.png" alt="alt text with entity ©" title="title"></p>
+````````````````````````````````
+
+
+### [Issue 9](https://github.com/mity/md4c/issues/9)
+
+```````````````````````````````` example
+> [foo
+> bar]: /url
+>
+> [foo bar]
+.
+<blockquote>
+<p><a href="/url">foo
+bar</a></p>
+</blockquote>
+````````````````````````````````
+
+
+### [Issue 10](https://github.com/mity/md4c/issues/10)
+
+```````````````````````````````` example
+[x]:
+x
+- <?
+
+  x
+.
+<ul>
+<li><?
+
+x
+</li>
+</ul>
+````````````````````````````````
+
+
+### [Issue 11](https://github.com/mity/md4c/issues/11)
+
+```````````````````````````````` example
+x [link](/url "foo &ndash; bar") x
+.
+<p>x <a href="/url" title="foo – bar">link</a> x</p>
+````````````````````````````````
+
+
+### [Issue 14](https://github.com/mity/md4c/issues/14)
+
+```````````````````````````````` example
+a***b* c*
+.
+<p>a*<em><em>b</em> c</em></p>
+````````````````````````````````
+
+
+### [Issue 15](https://github.com/mity/md4c/issues/15)
+
+```````````````````````````````` example
+***b* c*
+.
+<p>*<em><em>b</em> c</em></p>
+````````````````````````````````
+
+
+### [Issue 21](https://github.com/mity/md4c/issues/21)
+
+```````````````````````````````` example
+a*b**c*
+.
+<p>a<em>b**c</em></p>
+````````````````````````````````
+
+
+### [Issue 33](https://github.com/mity/md4c/issues/33)
+
+```````````````````````````````` example
+```&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;
+.
+<pre><code class="language-&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;"></code></pre>
+````````````````````````````````
+
+
+### [Issue 36](https://github.com/mity/md4c/issues/36)
+
+```````````````````````````````` example
+__x_ _x___
+.
+<p><em><em>x</em> <em>x</em></em>_</p>
+````````````````````````````````
+
+
+### [Issue 39](https://github.com/mity/md4c/issues/39)
+
+```````````````````````````````` example
+[\\]: x
+.
+````````````````````````````````
+
+
+### [Issue 40](https://github.com/mity/md4c/issues/40)
+
+```````````````````````````````` example
+[x](url
+'title'
+)x
+.
+<p><a href="url" title="title">x</a>x</p>
+````````````````````````````````
+
+
+### [Issue 65](https://github.com/mity/md4c/issues/65)
+
+```````````````````````````````` example
+`
+.
+<p>`</p>
+````````````````````````````````
+
+
+### [Issue 74](https://github.com/mity/md4c/issues/74)
+
+```````````````````````````````` example
+[f]:
+-
+    xx
+-
+.
+<pre><code>xx
+</code></pre>
+<ul>
+<li></li>
+</ul>
+````````````````````````````````
+
+
+### [Issue 78](https://github.com/mity/md4c/issues/78)
+
+```````````````````````````````` example
+[SS ẞ]: /url
+[ẞ SS]
+.
+<p><a href="/url">ẞ SS</a></p>
+````````````````````````````````
+
+
+### [Issue 83](https://github.com/mity/md4c/issues/83)
+
+```````````````````````````````` example
+foo
+>
+.
+<p>foo</p>
+<blockquote>
+</blockquote>
+
+````````````````````````````````
+
+
+### [Issue 95](https://github.com/mity/md4c/issues/95)
+
+```````````````````````````````` example
+. foo
+.
+<p>. foo</p>
+````````````````````````````````
+
+
+### [Issue 96](https://github.com/mity/md4c/issues/96)
+
+```````````````````````````````` example
+[ab]: /foo
+[a] [ab] [abc]
+.
+<p>[a] <a href="/foo">ab</a> [abc]</p>
+````````````````````````````````
+
+```````````````````````````````` example
+[a b]: /foo
+[a   b]
+.
+<p><a href="/foo">a   b</a></p>
+````````````````````````````````
+
+
+### [Issue 97](https://github.com/mity/md4c/issues/97)
+
+```````````````````````````````` example
+*a **b c* d**
+.
+<p><em>a <em><em>b c</em> d</em></em></p>
+
+````````````````````````````````
+
+
+### [Issue 100](https://github.com/mity/md4c/issues/100)
+
+```````````````````````````````` example
+<foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123>
+.
+<p><a href="mailto:foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123">foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123</a></p>
+````````````````````````````````
+
+```````````````````````````````` example
+<foo@123456789012345678901234567890123456789012345678901234567890123x.123456789012345678901234567890123456789012345678901234567890123>
+.
+<p>&lt;foo@123456789012345678901234567890123456789012345678901234567890123x.123456789012345678901234567890123456789012345678901234567890123&gt;</p>
+````````````````````````````````
+(Note the `x` here which turns it over the max. allowed length limit.)
+
+
+### [Issue 107](https://github.com/mity/md4c/issues/107)
+
+```````````````````````````````` example
+***foo *bar baz***
+.
+<p>*<strong>foo <em>bar baz</em></strong></p>
+
+````````````````````````````````
+
+
+## Code coverage
+
+### `md_is_unicode_whitespace__()`
+
+Unicode whitespace (here U+2000) forms a word boundary so these cannot be
+resolved as emphasis span because there is no closer mark.
+
+```````````````````````````````` example
+*foo *bar
+.
+<p>*foo *bar</p>
+````````````````````````````````
+
+
+### `md_is_unicode_punct__()`
+
+Ditto for Unicode punctuation (here U+00A1).
+
+```````````````````````````````` example
+*foo¡*bar
+.
+<p>*foo¡*bar</p>
+````````````````````````````````
+
+
+### `md_get_unicode_fold_info()`
+
+```````````````````````````````` example
+[Příliš žluťoučký kůň úpěl ďábelské ódy.]
+
+[PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY.]: /url
+.
+<p><a href="/url">Příliš žluťoučký kůň úpěl ďábelské ódy.</a></p>
+````````````````````````````````
+
+
+### `md_decode_utf8__()` and `md_decode_utf8_before__()`
+
+```````````````````````````````` example
+á*Á (U+00E1, i.e. two byte UTF-8 sequence)
+ *  (U+2000, i.e. three byte UTF-8 sequence)
+.
+<p>á*Á (U+00E1, i.e. two byte UTF-8 sequence)
+ *  (U+2000, i.e. three byte UTF-8 sequence)</p>
+````````````````````````````````
+
+
+### `md_is_link_destination_A()`
+
+```````````````````````````````` example
+[link](</url\.with\.escape>)
+.
+<p><a href="/url.with.escape">link</a></p>
+````````````````````````````````
+
+
+### `md_link_label_eq()`
+
+```````````````````````````````` example
+[foo bar]
+
+[foo bar]: /url
+.
+<p><a href="/url">foo bar</a></p>
+````````````````````````````````
+
+
+### `md_is_inline_link_spec()`
+
+```````````````````````````````` example
+> [link](/url 'foo
+> bar')
+.
+<blockquote>
+<p><a href="/url" title="foo
+bar">link</a></p>
+</blockquote>
+````````````````````````````````
+
+
+### `md_build_ref_def_hashtable()`
+
+All link labels in the following example all have the same FNV1a hash (after
+normalization of the label, which means after converting to a vector of Unicode
+codepoints and lowercase folding).
+
+So the example triggers quite complex code paths which are not otherwise easily
+tested.
+
+```````````````````````````````` example
+[foo]: /foo
+[qnptgbh]: /qnptgbh
+[abgbrwcv]: /abgbrwcv
+[abgbrwcv]: /abgbrwcv2
+[abgbrwcv]: /abgbrwcv3
+[abgbrwcv]: /abgbrwcv4
+[alqadfgn]: /alqadfgn
+
+[foo]
+[qnptgbh]
+[abgbrwcv]
+[alqadfgn]
+[axgydtdu]
+.
+<p><a href="/foo">foo</a>
+<a href="/qnptgbh">qnptgbh</a>
+<a href="/abgbrwcv">abgbrwcv</a>
+<a href="/alqadfgn">alqadfgn</a>
+[axgydtdu]</p>
+````````````````````````````````
+
+For the sake of completeness, the following C program was used to find the hash
+collisions by brute force:
+
+~~~
+
+#include <stdio.h>
+#include <string.h>
+
+
+static unsigned etalon;
+
+
+
+#define MD_FNV1A_BASE       2166136261
+#define MD_FNV1A_PRIME      16777619
+
+static inline unsigned
+fnv1a(unsigned base, const void* data, size_t n)
+{
+    const unsigned char* buf = (const unsigned char*) data;
+    unsigned hash = base;
+    size_t i;
+
+    for(i = 0; i < n; i++) {
+        hash ^= buf[i];
+        hash *= MD_FNV1A_PRIME;
+    }
+
+    return hash;
+}
+
+
+static unsigned
+unicode_hash(const char* data, size_t n)
+{
+    unsigned value;
+    unsigned hash = MD_FNV1A_BASE;
+    int i;
+
+    for(i = 0; i < n; i++) {
+        value = data[i];
+        hash = fnv1a(hash, &value, sizeof(unsigned));
+    }
+
+    return hash;
+}
+
+
+static void
+recurse(char* buffer, size_t off, size_t len)
+{
+    int ch;
+
+    if(off < len - 1) {
+        for(ch = 'a'; ch <= 'z'; ch++) {
+            buffer[off] = ch;
+            recurse(buffer, off+1, len);
+        }
+    } else {
+        for(ch = 'a'; ch <= 'z'; ch++) {
+            buffer[off] = ch;
+            if(unicode_hash(buffer, len) == etalon) {
+                printf("Dup: %.*s\n", (int)len, buffer);
+            }
+        }
+    }
+}
+
+int
+main(int argc, char** argv)
+{
+    char buffer[32];
+    int len;
+
+    if(argc < 2)
+        etalon = unicode_hash("foo", 3);
+    else
+        etalon = unicode_hash(argv[1], strlen(argv[1]));
+
+    for(len = 1; len <= sizeof(buffer); len++)
+        recurse(buffer, 0, len);
+
+    return 0;
+}
+~~~
--- a/test/fuzz-input/commonmark.md
+++ b/test/fuzz-input/commonmark.md
@ -0,0 +1,41 @@
+
+# h1
+## h2
+### h3
+#### h4
+##### h5
+###### h6
+
+h1
+==
+
+h2
+--
+
+--------------------
+
+    indented code
+
+```
+fenced code
+```
+
+<tag attr='val' attr2="val2">
+
+> quote
+
+* list item
+1. list item
+
+[ref]: /url
+
+paragraph
+&copy; &#1234; &#xabcd;
+`code`
+*emph* **strong** ***strong emph***
+_emph_ __strong__ ___strong emph___
+[ref] [ref][] [link](/url)
+![ref] ![ref][] ![img](/url)
+<http://example.com> <doe@example.com>
+www.example.com doe@example.com
+\\ \* \. \` \
--- a/test/fuzz-input/gfm.md
+++ b/test/fuzz-input/gfm.md
@ -0,0 +1,8 @@
+* [ ] unchecked
+* [x] checked
+
+ A | B | C
+---|--:|:-:
+aaa|bbb|ccc
+
+~del~ ~~del~~
--- a/test/fuzz-input/latex-math.md
+++ b/test/fuzz-input/latex-math.md
@ -0,0 +1 @@
+$a^2+b^2=c^2$ $$a^2+b^2=c^2$$
--- a/test/fuzz-input/wiki.md
+++ b/test/fuzz-input/wiki.md
@ -0,0 +1 @@
+[[wiki]] [[wiki|label]]
--- a/test/latex-math.txt
+++ b/test/latex-math.txt
@ -0,0 +1,39 @@
+
+# LaTeX Math
+
+With the flag `MD_FLAG_LATEXMATHSPANS`, MD4C enables extension for recognition
+of LaTeX style math spans.
+
+A math span is is any text wrapped in dollars or double dollars (`$...$` or
+`$$...$$`).
+
+```````````````````````````````` example
+$a+b=c$ Hello, world!
+.
+<p><x-equation>a+b=c</x-equation> Hello, world!</p>
+````````````````````````````````
+
+If the double dollar sign is used, the math span is a display math span.
+
+```````````````````````````````` example
+This is a display equation: $$\int_a^b x dx$$.
+.
+<p>This is a display equation: <x-equation type="display">\int_a^b x dx</x-equation>.</p>
+````````````````````````````````
+
+Math spans may span multiple lines as they are normal spans:
+
+```````````````````````````````` example
+$$
+\int_a^b
+f(x) dx
+$$
+.
+<p><x-equation type="display">\int_a^b f(x) dx </x-equation></p>
+````````````````````````````````
+
+Note though that many (simple) renderers may output the math spans just as a
+verbatim text. (This includes the HTML renderer used by the `md2html` utility.)
+
+Only advanced renderers which implement LaTeX math syntax can be expected to
+provide better results.
--- a/test/normalize.py
+++ b/test/normalize.py
@ -0,0 +1,194 @@
+# -*- coding: utf-8 -*-
+from html.parser import HTMLParser
+import urllib
+
+try:
+    from html.parser import HTMLParseError
+except ImportError:
+    # HTMLParseError was removed in Python 3.5. It could never be
+    # thrown, so we define a placeholder instead.
+    class HTMLParseError(Exception):
+        pass
+
+from html.entities import name2codepoint
+import sys
+import re
+import cgi
+
+# Normalization code, adapted from
+# https://github.com/karlcow/markdown-testsuite/
+significant_attrs = ["alt", "href", "src", "title"]
+whitespace_re = re.compile('\s+')
+class MyHTMLParser(HTMLParser):
+    def __init__(self):
+        HTMLParser.__init__(self)
+        self.convert_charrefs = False
+        self.last = "starttag"
+        self.in_pre = False
+        self.output = ""
+        self.last_tag = ""
+    def handle_data(self, data):
+        after_tag = self.last == "endtag" or self.last == "starttag"
+        after_block_tag = after_tag and self.is_block_tag(self.last_tag)
+        if after_tag and self.last_tag == "br":
+            data = data.lstrip('\n')
+        if not self.in_pre:
+            data = whitespace_re.sub(' ', data)
+        if after_block_tag and not self.in_pre:
+            if self.last == "starttag":
+                data = data.lstrip()
+            elif self.last == "endtag":
+                data = data.strip()
+        self.output += data
+        self.last = "data"
+    def handle_endtag(self, tag):
+        if tag == "pre":
+            self.in_pre = False
+        elif self.is_block_tag(tag):
+            self.output = self.output.rstrip()
+        self.output += "</" + tag + ">"
+        self.last_tag = tag
+        self.last = "endtag"
+    def handle_starttag(self, tag, attrs):
+        if tag == "pre":
+            self.in_pre = True
+        if self.is_block_tag(tag):
+            self.output = self.output.rstrip()
+        self.output += "<" + tag
+        # For now we don't strip out 'extra' attributes, because of
+        # raw HTML test cases.
+        # attrs = filter(lambda attr: attr[0] in significant_attrs, attrs)
+        if attrs:
+            attrs.sort()
+            for (k,v) in attrs:
+                self.output += " " + k
+                if v in ['href','src']:
+                    self.output += ("=" + '"' +
+                            urllib.quote(urllib.unquote(v), safe='/') + '"')
+                elif v != None:
+                    self.output += ("=" + '"' + cgi.escape(v,quote=True) + '"')
+        self.output += ">"
+        self.last_tag = tag
+        self.last = "starttag"
+    def handle_startendtag(self, tag, attrs):
+        """Ignore closing tag for self-closing """
+        self.handle_starttag(tag, attrs)
+        self.last_tag = tag
+        self.last = "endtag"
+    def handle_comment(self, data):
+        self.output += '<!--' + data + '-->'
+        self.last = "comment"
+    def handle_decl(self, data):
+        self.output += '<!' + data + '>'
+        self.last = "decl"
+    def unknown_decl(self, data):
+        self.output += '<!' + data + '>'
+        self.last = "decl"
+    def handle_pi(self,data):
+        self.output += '<?' + data + '>'
+        self.last = "pi"
+    def handle_entityref(self, name):
+        try:
+            c = chr(name2codepoint[name])
+        except KeyError:
+            c = None
+        self.output_char(c, '&' + name + ';')
+        self.last = "ref"
+    def handle_charref(self, name):
+        try:
+            if name.startswith("x"):
+                c = chr(int(name[1:], 16))
+            else:
+                c = chr(int(name))
+        except ValueError:
+                c = None
+        self.output_char(c, '&' + name + ';')
+        self.last = "ref"
+    # Helpers.
+    def output_char(self, c, fallback):
+        if c == '<':
+            self.output += "&lt;"
+        elif c == '>':
+            self.output += "&gt;"
+        elif c == '&':
+            self.output += "&amp;"
+        elif c == '"':
+            self.output += "&quot;"
+        elif c == None:
+            self.output += fallback
+        else:
+            self.output += c
+
+    def is_block_tag(self,tag):
+        return (tag in ['article', 'header', 'aside', 'hgroup', 'blockquote',
+            'hr', 'iframe', 'body', 'li', 'map', 'button', 'object', 'canvas',
+            'ol', 'caption', 'output', 'col', 'p', 'colgroup', 'pre', 'dd',
+            'progress', 'div', 'section', 'dl', 'table', 'td', 'dt',
+            'tbody', 'embed', 'textarea', 'fieldset', 'tfoot', 'figcaption',
+            'th', 'figure', 'thead', 'footer', 'tr', 'form', 'ul',
+            'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'video', 'script', 'style'])
+
+def normalize_html(html):
+    r"""
+    Return normalized form of HTML which ignores insignificant output
+    differences:
+
+    Multiple inner whitespaces are collapsed to a single space (except
+    in pre tags):
+
+        >>> normalize_html("<p>a  \t b</p>")
+        '<p>a b</p>'
+
+        >>> normalize_html("<p>a  \t\nb</p>")
+        '<p>a b</p>'
+
+    * Whitespace surrounding block-level tags is removed.
+
+        >>> normalize_html("<p>a  b</p>")
+        '<p>a b</p>'
+
+        >>> normalize_html(" <p>a  b</p>")
+        '<p>a b</p>'
+
+        >>> normalize_html("<p>a  b</p> ")
+        '<p>a b</p>'
+
+        >>> normalize_html("\n\t<p>\n\t\ta  b\t\t</p>\n\t")
+        '<p>a b</p>'
+
+        >>> normalize_html("<i>a  b</i> ")
+        '<i>a b</i> '
+
+    * Self-closing tags are converted to open tags.
+
+        >>> normalize_html("<br />")
+        '<br>'
+
+    * Attributes are sorted and lowercased.
+
+        >>> normalize_html('<a title="bar" HREF="foo">x</a>')
+        '<a href="foo" title="bar">x</a>'
+
+    * References are converted to unicode, except that '<', '>', '&', and
+      '"' are rendered using entities.
+
+        >>> normalize_html("&forall;&amp;&gt;&lt;&quot;")
+        '\u2200&amp;&gt;&lt;&quot;'
+
+    """
+    html_chunk_re = re.compile("(\<!\[CDATA\[.*?\]\]\>|\<[^>]*\>|[^<]+)")
+    try:
+        parser = MyHTMLParser()
+        # We work around HTMLParser's limitations parsing CDATA
+        # by breaking the input into chunks and passing CDATA chunks
+        # through verbatim.
+        for chunk in re.finditer(html_chunk_re, html):
+            if chunk.group(0)[:8] == "<![CDATA":
+                parser.output += chunk.group(0)
+            else:
+                parser.feed(chunk.group(0))
+        parser.close()
+        return parser.output
+    except HTMLParseError as e:
+        sys.stderr.write("Normalization error: " + e.msg + "\n")
+        return html  # on error, return unnormalized HTML
--- a/test/pathological_tests.py
+++ b/test/pathological_tests.py
@ -0,0 +1,122 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+import re
+import argparse
+import sys
+import platform
+from cmark import CMark
+from timeit import default_timer as timer
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Run cmark tests.')
+    parser.add_argument('-p', '--program', dest='program', nargs='?', default=None,
+            help='program to test')
+    parser.add_argument('--library-dir', dest='library_dir', nargs='?',
+            default=None, help='directory containing dynamic library')
+    args = parser.parse_args(sys.argv[1:])
+
+cmark = CMark(prog=args.program, library_dir=args.library_dir)
+
+# list of pairs consisting of input and a regex that must match the output.
+pathological = {
+    # note - some pythons have limit of 65535 for {num-matches} in re.
+    "nested strong emph":
+                (("*a **a " * 65000) + "b" + (" a** a*" * 65000),
+                 re.compile("(<em>a <strong>a ){65000}b( a</strong> a</em>){65000}")),
+    "many emph closers with no openers":
+                 (("a_ " * 65000),
+                  re.compile("(a[_] ){64999}a_")),
+    "many emph openers with no closers":
+                 (("_a " * 65000),
+                  re.compile("(_a ){64999}_a")),
+    "many 3-emph openers with no closers":
+                 (("a***" * 65000),
+                  re.compile("(a<em><strong>a</strong></em>){32500}")),
+    "many link closers with no openers":
+                 (("a]" * 65000),
+                  re.compile("(a\]){65000}")),
+    "many link openers with no closers":
+                 (("[a" * 65000),
+                  re.compile("(\[a){65000}")),
+    "mismatched openers and closers":
+                 (("*a_ " * 50000),
+                  re.compile("([*]a[_] ){49999}[*]a_")),
+    "openers and closers multiple of 3":
+                 (("a**b" + ("c* " * 50000)),
+                  re.compile("a[*][*]b(c[*] ){49999}c[*]")),
+    "link openers and emph closers":
+                 (("[ a_" * 50000),
+                  re.compile("(\[ a_){50000}")),
+    "hard link/emph case":
+                 ("**x [a*b**c*](d)",
+                  re.compile("\\*\\*x <a href=\"d\">a<em>b\\*\\*c</em></a>")),
+    "nested brackets":
+                 (("[" * 50000) + "a" + ("]" * 50000),
+                  re.compile("\[{50000}a\]{50000}")),
+    "nested block quotes":
+                 ((("> " * 50000) + "a"),
+                  re.compile("(<blockquote>\r?\n){50000}")),
+    "U+0000 in input":
+                 ("abc\u0000de\u0000",
+                  re.compile("abc\ufffd?de\ufffd?")),
+    "backticks":
+                 ("".join(map(lambda x: ("e" + "`" * x), range(1,1000))),
+                  re.compile("^<p>[e`]*</p>\r?\n$")),
+    "many links":
+                 ("[t](/u) " * 50000,
+                  re.compile("(<a href=\"/u\">t</a> ?){50000}")),
+    "many references":
+                 ("".join(map(lambda x: ("[" + str(x) + "]: u\n"), range(1,20000 * 16))) + "[0] " * 20000,
+                  re.compile("(\[0\] ){19999}")),
+    "deeply nested lists":
+                 ("".join(map(lambda x: ("  " * x + "* a\n"), range(0,1000))),
+                  re.compile("<ul>\r?\n(<li>a<ul>\r?\n){999}<li>a</li>\r?\n</ul>\r?\n(</li>\r?\n</ul>\r?\n){999}")),
+    "many html openers and closers":
+                 (("<>" * 50000),
+                  re.compile("(&lt;&gt;){50000}")),
+    "many html proc. inst. openers":
+                 (("x" + "<?" * 50000),
+                  re.compile("x(&lt;\\?){50000}")),
+    "many html CDATA openers":
+                 (("x" + "<![CDATA[" * 50000),
+                  re.compile("x(&lt;!\\[CDATA\\[){50000}")),
+    "many backticks and escapes":
+                 (("\\``" * 50000),
+                  re.compile("(``){50000}")),
+    "many broken link titles":
+                 (("[ (](" * 50000),
+                  re.compile("(\[ \(\]\(){50000}")),
+    "broken thematic break":
+                 (("* " * 50000 + "a"),
+                  re.compile("<ul>\r?\n(<li><ul>\r?\n){49999}<li>a</li>\r?\n</ul>\r?\n(</li>\r?\n</ul>\r?\n){49999}"))
+    }
+
+whitespace_re = re.compile('/s+/')
+passed = 0
+errored = 0
+failed = 0
+
+#print("Testing pathological cases:")
+for description in pathological:
+    (inp, regex) = pathological[description]
+    start = timer()
+    [rc, actual, err] = cmark.to_html(inp)
+    end = timer()
+    if rc != 0:
+        errored += 1
+        print('{:35} [ERRORED (return code %d)]'.format(description, rc))
+        print(err)
+    elif regex.search(actual):
+        print('{:35} [PASSED] {:.3f} secs'.format(description, end-start))
+        passed += 1
+    else:
+        print('{:35} [FAILED]'.format(description))
+        print(repr(actual))
+        failed += 1
+
+print("%d passed, %d failed, %d errored" % (passed, failed, errored))
+if (failed == 0 and errored == 0):
+    exit(0)
+else:
+    exit(1)
--- a/test/permissive-email-autolinks.txt
+++ b/test/permissive-email-autolinks.txt
@ -0,0 +1,50 @@
+
+# Permissive E-mail Autolinks
+
+With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, MD4C enables more permissive
+recognition of e-mail addresses and transforms them to autolinks, even if they
+do not exactly follow the syntax of autolink as specified in CommonMark
+specification.
+
+This is standard CommonMark e-mail autolink:
+
+```````````````````````````````` example
+E-mail: <mailto:john.doe@gmail.com>
+.
+<p>E-mail: <a href="mailto:john.doe@gmail.com">mailto:john.doe@gmail.com</a></p>
+````````````````````````````````
+
+With the permissive autolinks enabled, this is sufficient:
+
+```````````````````````````````` example
+E-mail: john.doe@gmail.com
+.
+<p>E-mail: <a href="mailto:john.doe@gmail.com">john.doe@gmail.com</a></p>
+````````````````````````````````
+
+`+` can occur before the `@`, but not after.
+
+```````````````````````````````` example
+hello@mail+xyz.example isn't valid, but hello+xyz@mail.example is.
+.
+<p>hello@mail+xyz.example isn't valid, but <a href="mailto:hello+xyz@mail.example">hello+xyz@mail.example</a> is.</p>
+````````````````````````````````
+
+`.`, `-`, and `_` can occur on both sides of the `@`, but only `.` may occur at
+the end of the email address, in which case it will not be considered part of
+the address:
+
+```````````````````````````````` example
+a.b-c_d@a.b
+
+a.b-c_d@a.b.
+
+a.b-c_d@a.b-
+
+a.b-c_d@a.b_
+.
+<p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a></p>
+<p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a>.</p>
+<p>a.b-c_d@a.b-</p>
+<p>a.b-c_d@a.b_</p>
+````````````````````````````````
--- a/test/permissive-url-autolinks.txt
+++ b/test/permissive-url-autolinks.txt
@ -0,0 +1,92 @@
+
+# Permissive URL Autolinks
+
+With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS`, MD4C enables more permissive recognition
+of URLs and transform them to autolinks, even if they do not exactly follow the syntax
+of autolink as specified in CommonMark specification.
+
+This is standard CommonMark autolink:
+
+```````````````````````````````` example
+Homepage: <https://github.com/mity/md4c>
+.
+<p>Homepage: <a href="https://github.com/mity/md4c">https://github.com/mity/md4c</a></p>
+````````````````````````````````
+
+With the permissive autolinks enabled, this is sufficient:
+
+```````````````````````````````` example
+Homepage: https://github.com/mity/md4c
+.
+<p>Homepage: <a href="https://github.com/mity/md4c">https://github.com/mity/md4c</a></p>
+````````````````````````````````
+
+But this permissive autolink feature can work only for very widely used URL
+schemes, in alphabetical order `ftp:`, `http:`, `https:`.
+
+That's why this is not a permissive autolink:
+
+```````````````````````````````` example
+ssh://root@example.com
+.
+<p>ssh://root@example.com</p>
+````````````````````````````````
+
+The same rules for path validation as for permissivve WWW autolinks apply.
+Therefore the final question mark here is not part of the autolink:
+
+```````````````````````````````` example
+Have you ever visited http://www.zombo.com?
+.
+<p>Have you ever visited <a href="http://www.zombo.com">http://www.zombo.com</a>?</p>
+````````````````````````````````
+
+But in contrast, in this example it is:
+
+```````````````````````````````` example
+http://www.bing.com/search?q=md4c
+.
+<p><a href="http://www.bing.com/search?q=md4c">http://www.bing.com/search?q=md4c</a></p>
+````````````````````````````````
+
+And finally one complex example:
+
+```````````````````````````````` example
+http://commonmark.org
+
+(Visit https://encrypted.google.com/search?q=Markup+(business))
+
+Anonymous FTP is available at ftp://foo.bar.baz.
+.
+<p><a href="http://commonmark.org">http://commonmark.org</a></p>
+<p>(Visit <a href="https://encrypted.google.com/search?q=Markup+(business)">https://encrypted.google.com/search?q=Markup+(business)</a>)</p>
+<p>Anonymous FTP is available at <a href="ftp://foo.bar.baz">ftp://foo.bar.baz</a>.</p>
+````````````````````````````````
+
+
+## GitHub Issues
+
+### [Issue 53](https://github.com/mity/md4c/issues/53)
+
+```````````````````````````````` example
+This is [link](http://github.com/).
+.
+<p>This is <a href="http://github.com/">link</a>.</p>
+````````````````````````````````
+
+```````````````````````````````` example
+This is [link](http://github.com/)X
+.
+<p>This is <a href="http://github.com/">link</a>X</p>
+````````````````````````````````
+
+
+## [Issue 76](https://github.com/mity/md4c/issues/76)
+
+```````````````````````````````` example
+*(http://example.com)*
+.
+<p><em>(<a href="http://example.com">http://example.com</a>)</em></p>
+````````````````````````````````
+
+
--- a/test/permissive-www-autolinks.txt
+++ b/test/permissive-www-autolinks.txt
@ -0,0 +1,107 @@
+
+# Permissive WWW Autolinks
+
+With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS`, MD4C enables recognition of
+autolinks starting with `www.`, even if they do not exactly follow the syntax
+of autolink as specified in CommonMark specification.
+
+These do not have to be enclosed in `<` and `>`, and they even do not need
+any preceding scheme specification.
+
+The WWW autolink will be recognized when a valid domain is found.
+
+A valid domain consists of the text `www.`, followed by alphanumeric characters,
+nderscores (`_`), hyphens (`-`) and periods (`.`). There must be at least one
+period, and no underscores may be present in the last two segments of the domain.
+
+The scheme `http` will be inserted automatically:
+
+```````````````````````````````` example
+www.commonmark.org
+.
+<p><a href="http://www.commonmark.org">www.commonmark.org</a></p>
+````````````````````````````````
+
+After a valid domain, zero or more non-space non-`<` characters may follow:
+
+```````````````````````````````` example
+Visit www.commonmark.org/help for more information.
+.
+<p>Visit <a href="http://www.commonmark.org/help">www.commonmark.org/help</a> for more information.</p>
+````````````````````````````````
+
+We then apply extended autolink path validation as follows:
+
+Trailing punctuation (specifically, `?`, `!`, `.`, `,`, `:`, `*`, `_`, and `~`)
+will not be considered part of the autolink, though they may be included in the
+interior of the link:
+
+```````````````````````````````` example
+Visit www.commonmark.org.
+
+Visit www.commonmark.org/a.b.
+.
+<p>Visit <a href="http://www.commonmark.org">www.commonmark.org</a>.</p>
+<p>Visit <a href="http://www.commonmark.org/a.b">www.commonmark.org/a.b</a>.</p>
+````````````````````````````````
+
+When an autolink ends in `)`, we scan the entire autolink for the total number
+of parentheses.  If there is a greater number of closing parentheses than
+opening ones, we don't consider the last character part of the autolink, in
+order to facilitate including an autolink inside a parenthesis:
+
+```````````````````````````````` example
+www.google.com/search?q=Markup+(business)
+
+(www.google.com/search?q=Markup+(business))
+.
+<p><a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a></p>
+<p>(<a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a>)</p>
+````````````````````````````````
+
+This check is only done when the link ends in a closing parentheses `)`, so if
+the only parentheses are in the interior of the autolink, no special rules are
+applied:
+
+```````````````````````````````` example
+www.google.com/search?q=(business)+ok
+.
+<p><a href="http://www.google.com/search?q=(business)+ok">www.google.com/search?q=(business)+ok</a></p>
+````````````````````````````````
+
+If an autolink ends in a semicolon (`;`), we check to see if it appears to
+resemble an [entity reference][entity references]; if the preceding text is `&`
+followed by one or more alphanumeric characters.  If so, it is excluded from
+the autolink:
+
+```````````````````````````````` example
+www.google.com/search?q=commonmark&hl=en
+
+www.google.com/search?q=commonmark&hl;
+.
+<p><a href="http://www.google.com/search?q=commonmark&amp;hl=en">www.google.com/search?q=commonmark&amp;hl=en</a></p>
+<p><a href="http://www.google.com/search?q=commonmark">www.google.com/search?q=commonmark</a>&amp;hl;</p>
+````````````````````````````````
+
+`<` immediately ends an autolink.
+
+```````````````````````````````` example
+www.commonmark.org/he<lp
+.
+<p><a href="http://www.commonmark.org/he">www.commonmark.org/he</a>&lt;lp</p>
+````````````````````````````````
+
+
+## GitHub Issues
+
+### [Issue 53](https://github.com/mity/md4c/issues/53)
+```````````````````````````````` example
+This is [link](www.github.com/).
+.
+<p>This is <a href="www.github.com/">link</a>.</p>
+````````````````````````````````
+```````````````````````````````` example
+This is [link](www.github.com/)X
+.
+<p>This is <a href="www.github.com/">link</a>X</p>
+````````````````````````````````
--- a/test/spec.txt
+++ b/test/spec.txt
--- a/test/spec_tests.py
+++ b/test/spec_tests.py
@ -0,0 +1,144 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+import sys
+from difflib import unified_diff
+import argparse
+import re
+import json
+from cmark import CMark
+from normalize import normalize_html
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description='Run cmark tests.')
+    parser.add_argument('-p', '--program', dest='program', nargs='?', default=None,
+            help='program to test')
+    parser.add_argument('-s', '--spec', dest='spec', nargs='?', default='spec.txt',
+            help='path to spec')
+    parser.add_argument('-P', '--pattern', dest='pattern', nargs='?',
+            default=None, help='limit to sections matching regex pattern')
+    parser.add_argument('--library-dir', dest='library_dir', nargs='?',
+            default=None, help='directory containing dynamic library')
+    parser.add_argument('--no-normalize', dest='normalize',
+            action='store_const', const=False, default=True,
+            help='do not normalize HTML')
+    parser.add_argument('-d', '--dump-tests', dest='dump_tests',
+            action='store_const', const=True, default=False,
+            help='dump tests in JSON format')
+    parser.add_argument('--debug-normalization', dest='debug_normalization',
+            action='store_const', const=True,
+            default=False, help='filter stdin through normalizer for testing')
+    parser.add_argument('-n', '--number', type=int, default=None,
+            help='only consider the test with the given number')
+    args = parser.parse_args(sys.argv[1:])
+
+def out(str):
+    sys.stdout.buffer.write(str.encode('utf-8')) 
+
+def print_test_header(headertext, example_number, start_line, end_line):
+    out("Example %d (lines %d-%d) %s\n" % (example_number,start_line,end_line,headertext))
+
+def do_test(test, normalize, result_counts):
+    [retcode, actual_html, err] = cmark.to_html(test['markdown'])
+    if retcode == 0:
+        expected_html = test['html']
+        unicode_error = None
+        if normalize:
+            try:
+                passed = normalize_html(actual_html) == normalize_html(expected_html)
+            except UnicodeDecodeError as e:
+                unicode_error = e
+                passed = False
+        else:
+            passed = actual_html == expected_html
+        if passed:
+            result_counts['pass'] += 1
+        else:
+            print_test_header(test['section'], test['example'], test['start_line'], test['end_line'])
+            out(test['markdown'] + '\n')
+            if unicode_error:
+                out("Unicode error: " + str(unicode_error) + '\n')
+                out("Expected: " + repr(expected_html) + '\n')
+                out("Got:      " + repr(actual_html) + '\n')
+            else:
+                expected_html_lines = expected_html.splitlines(True)
+                actual_html_lines = actual_html.splitlines(True)
+                for diffline in unified_diff(expected_html_lines, actual_html_lines,
+                                "expected HTML", "actual HTML"):
+                    out(diffline)
+            out('\n')
+            result_counts['fail'] += 1
+    else:
+        print_test_header(test['section'], test['example'], test['start_line'], test['end_line'])
+        out("program returned error code %d\n" % retcode)
+        sys.stdout.buffer.write(err)
+        result_counts['error'] += 1
+
+def get_tests(specfile):
+    line_number = 0
+    start_line = 0
+    end_line = 0
+    example_number = 0
+    markdown_lines = []
+    html_lines = []
+    state = 0  # 0 regular text, 1 markdown example, 2 html output
+    headertext = ''
+    tests = []
+
+    header_re = re.compile('#+ ')
+
+    with open(specfile, 'r', encoding='utf-8', newline='\n') as specf:
+        for line in specf:
+            line_number = line_number + 1
+            l = line.strip()
+            #if l == "`" * 32 + " example":
+            if re.match("`{32} example( [a-z]{1,})?", l):
+                state = 1
+            elif state == 2 and l == "`" * 32:
+                state = 0
+                example_number = example_number + 1
+                end_line = line_number
+                tests.append({
+                    "markdown":''.join(markdown_lines).replace('→',"\t"),
+                    "html":''.join(html_lines).replace('→',"\t"),
+                    "example": example_number,
+                    "start_line": start_line,
+                    "end_line": end_line,
+                    "section": headertext})
+                start_line = 0
+                markdown_lines = []
+                html_lines = []
+            elif l == ".":
+                state = 2
+            elif state == 1:
+                if start_line == 0:
+                    start_line = line_number - 1
+                markdown_lines.append(line)
+            elif state == 2:
+                html_lines.append(line)
+            elif state == 0 and re.match(header_re, line):
+                headertext = header_re.sub('', line).strip()
+    return tests
+
+if __name__ == "__main__":
+    if args.debug_normalization:
+        out(normalize_html(sys.stdin.read()))
+        exit(0)
+
+    all_tests = get_tests(args.spec)
+    if args.pattern:
+        pattern_re = re.compile(args.pattern, re.IGNORECASE)
+    else:
+        pattern_re = re.compile('.')
+    tests = [ test for test in all_tests if re.search(pattern_re, test['section']) and (not args.number or test['example'] == args.number) ]
+    if args.dump_tests:
+        out(json.dumps(tests, ensure_ascii=False, indent=2))
+        exit(0)
+    else:
+        skipped = len(all_tests) - len(tests)
+        cmark = CMark(prog=args.program, library_dir=args.library_dir)
+        result_counts = {'pass': 0, 'fail': 0, 'error': 0, 'skip': skipped}
+        for test in tests:
+            do_test(test, args.normalize, result_counts)
+        out("{pass} passed, {fail} failed, {error} errored, {skip} skipped\n".format(**result_counts))
+        exit(result_counts['fail'] + result_counts['error'])
--- a/test/strikethrough.txt
+++ b/test/strikethrough.txt
@ -0,0 +1,75 @@
+
+# Strike-Through
+
+With the flag `MD_FLAG_STRIKETHROUGH`, MD4C enables extension for recognition
+of strike-through spans.
+
+Strike-through text is any text wrapped in one or two tildes (`~`).
+
+```````````````````````````````` example
+~Hi~ Hello, world!
+.
+<p><del>Hi</del> Hello, world!</p>
+````````````````````````````````
+
+If the length of the opener and closer doesn't match, the strike-through is
+not recognized.
+
+```````````````````````````````` example
+This ~text~~ is curious.
+.
+<p>This ~text~~ is curious.</p>
+````````````````````````````````
+
+Too long tilde sequence won't be recognized:
+
+```````````````````````````````` example
+foo ~~~bar~~~
+.
+<p>foo ~~~bar~~~</p>
+````````````````````````````````
+
+Also note the markers cannot open a strike-through span if they are followed
+with a whitespace; and similarly, then cannot close the span if they are
+preceded with a whitespace:
+
+```````````````````````````````` example
+~foo ~bar
+.
+<p>~foo ~bar</p>
+````````````````````````````````
+
+
+As with regular emphasis delimiters, a new paragraph will cause the cessation
+of parsing a strike-through:
+
+```````````````````````````````` example
+This ~~has a
+
+new paragraph~~.
+.
+<p>This ~~has a</p>
+<p>new paragraph~~.</p>
+````````````````````````````````
+
+
+## GitHub Issues
+
+### [Issue 69](https://github.com/mity/md4c/issues/69)
+```````````````````````````````` example
+~`foo`~
+.
+<p><del><code>foo</code></del></p>
+````````````````````````````````
+
+```````````````````````````````` example
+~*foo*~
+.
+<p><del><em>foo</em></del></p>
+````````````````````````````````
+
+```````````````````````````````` example
+*~foo~*
+.
+<p><em><del>foo</del></em></p>
+````````````````````````````````
--- a/test/tables.txt
+++ b/test/tables.txt
@ -0,0 +1,363 @@
+
+# Tables
+
+With the flag `MD_FLAG_TABLES`, MD4C enables extension for recognition of
+tables.
+
+Basic table example of a table with two columns and three lines (when not
+counting the header) is as follows:
+
+```````````````````````````````` example
+| Column 1 | Column 2 |
+|----------|----------|
+| foo      | bar      |
+| baz      | qux      |
+| quux     | quuz     |
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td>foo</td><td>bar</td></tr>
+<tr><td>baz</td><td>qux</td></tr>
+<tr><td>quux</td><td>quuz</td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+The leading and succeeding pipe characters (`|`) on each line are optional:
+
+```````````````````````````````` example
+Column 1 | Column 2 |
+---------|--------- |
+foo      | bar      |
+baz      | qux      |
+quux     | quuz     |
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td>foo</td><td>bar</td></tr>
+<tr><td>baz</td><td>qux</td></tr>
+<tr><td>quux</td><td>quuz</td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+```````````````````````````````` example
+| Column 1 | Column 2
+|----------|---------
+| foo      | bar
+| baz      | qux
+| quux     | quuz
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td>foo</td><td>bar</td></tr>
+<tr><td>baz</td><td>qux</td></tr>
+<tr><td>quux</td><td>quuz</td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+```````````````````````````````` example
+Column 1 | Column 2
+---------|---------
+foo      | bar
+baz      | qux
+quux     | quuz
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td>foo</td><td>bar</td></tr>
+<tr><td>baz</td><td>qux</td></tr>
+<tr><td>quux</td><td>quuz</td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+However for one-column table, at least one pipe has to be used in the table
+header underline, otherwise it would be parsed as a Setext title followed by
+a paragraph.
+
+```````````````````````````````` example
+Column 1
+--------
+foo
+baz
+quux
+.
+<h2>Column 1</h2>
+<p>foo
+baz
+quux</p>
+````````````````````````````````
+
+Leading and trailing whitespace in a table cell is ignored and the columns do
+not need to be aligned.
+
+```````````````````````````````` example
+Column 1 |Column 2
+---|---
+foo | bar
+baz| qux
+quux|quuz
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td>foo</td><td>bar</td></tr>
+<tr><td>baz</td><td>qux</td></tr>
+<tr><td>quux</td><td>quuz</td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+The table cannot interrupt a paragraph.
+
+```````````````````````````````` example
+Lorem ipsum dolor sit amet.
+| Column 1 | Column 2
+| ---------|---------
+| foo      | bar
+| baz      | qux
+| quux     | quuz
+.
+<p>Lorem ipsum dolor sit amet.
+| Column 1 | Column 2
+| ---------|---------
+| foo      | bar
+| baz      | qux
+| quux     | quuz</p>
+````````````````````````````````
+
+Similarly, paragraph cannot interrupt a table:
+
+```````````````````````````````` example
+Column 1 | Column 2
+---------|---------
+foo      | bar
+baz      | qux
+quux     | quuz
+Lorem ipsum dolor sit amet.
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td>foo</td><td>bar</td></tr>
+<tr><td>baz</td><td>qux</td></tr>
+<tr><td>quux</td><td>quuz</td></tr>
+<tr><td>Lorem ipsum dolor sit amet.</td><td></td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+The underline of the table is crucial for recognition of the table, count of
+its columns and their alignment: The line has to contain at least one pipe,
+and it has provide at least three dash (`-`) characters for every column in
+the table.
+
+Thus this is not a table because there are too few dashes for Column 2.
+
+```````````````````````````````` example
+| Column 1 | Column 2
+| ---------|--
+| foo      | bar
+| baz      | qux
+| quux     | quuz
+.
+<p>| Column 1 | Column 2
+| ---------|--
+| foo      | bar
+| baz      | qux
+| quux     | quuz</p>
+````````````````````````````````
+
+The first, the last or both the first and the last dash in each column
+underline can be replaced with a colon (`:`) to request left, right or middle
+alignment of the respective column:
+
+```````````````````````````````` example
+| Column 1 | Column 2 | Column 3 | Column 4 |
+|----------|:---------|:--------:|---------:|
+| default  | left     | center   | right    |
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th align="left">Column 2</th><th align="center">Column 3</th><th align="right">Column 4</th></tr>
+</thead>
+<tbody>
+<tr><td>default</td><td align="left">left</td><td align="center">center</td><td align="right">right</td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+To include a literal pipe character in any cell, it has to be escaped.
+
+```````````````````````````````` example
+Column 1 | Column 2
+---------|---------
+foo      | bar
+baz      | qux \| xyzzy
+quux     | quuz
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td>foo</td><td>bar</td></tr>
+<tr><td>baz</td><td>qux | xyzzy</td></tr>
+<tr><td>quux</td><td>quuz</td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+Contents of each cell is parsed as an inline text which may contents any
+inline Markdown spans like emphasis, strong emphasis, links etc.
+
+```````````````````````````````` example
+Column 1 | Column 2
+---------|---------
+*foo*    | bar
+**baz**  | [qux]
+quux     | [quuz](/url2)
+
+[qux]: /url
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td><em>foo</em></td><td>bar</td></tr>
+<tr><td><strong>baz</strong></td><td><a href="/url">qux</a></td></tr>
+<tr><td>quux</td><td><a href="/url2">quuz</a></td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+However pipes which are inside a code span are not recognized as cell
+boundaries.
+
+```````````````````````````````` example
+Column 1 | Column 2
+---------|---------
+`foo     | bar`
+baz      | qux
+quux     | quuz
+.
+<table>
+<thead>
+<tr><th>Column 1</th><th>Column 2</th></tr>
+</thead>
+<tbody>
+<tr><td><code>foo     | bar</code></td><td></td></tr>
+<tr><td>baz</td><td>qux</td></tr>
+<tr><td>quux</td><td>quuz</td></tr>
+</tbody>
+</table>
+````````````````````````````````
+
+
+## GitHub Issues
+
+### [Issue 41](https://github.com/mity/md4c/issues/41)
+```````````````````````````````` example
+* x|x
+---|---
+.
+<ul>
+<li>x|x
+---|---</li>
+</ul>
+````````````````````````````````
+(Not a table, because the underline has wrong indentation and is not part of the
+list item.)
+
+```````````````````````````````` example
+* x|x
+  ---|---
+x|x
+.
+<ul>
+<li><table>
+<thead>
+<tr>
+<th>x</th>
+<th>x</th>
+</tr>
+</thead>
+<tbody>
+</tbody>
+</table>
+</li>
+</ul>
+<p>x|x</p>
+````````````````````````````````
+(Here the underline has the right indentation so the table is detected.
+But the last line is not part of it due its indentation.)
+
+
+### [Issue 42](https://github.com/mity/md4c/issues/42)
+
+```````````````````````````````` example
+] http://x.x *x*
+
+|x|x|
+|---|---|
+|x|
+.
+<p>] http://x.x <em>x</em></p>
+<table>
+<thead>
+<tr>
+<th>x</th>
+<th>x</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>x</td>
+<td></td>
+</tr>
+</tbody>
+</table>
+````````````````````````````````
+
+
+### [Issue 104](https://github.com/mity/md4c/issues/104)
+
+```````````````````````````````` example
+A | B
+--- | ---
+[x](url)
+.
+<table>
+<thead>
+<tr>
+<th>A</th>
+<th>B</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><a href="url">x</a></td>
+<td></td>
+</tr>
+</tbody>
+</table>
+````````````````````````````````
--- a/test/tasklists.txt
+++ b/test/tasklists.txt
@ -0,0 +1,117 @@
+
+# Tasklists
+
+With the flag `MD_FLAG_TASKLISTS`, MD4C enables extension for recognition of
+task lists.
+
+Basic task list may look as follows:
+
+```````````````````````````````` example
+ * [x] foo
+ * [X] bar
+ * [ ] baz
+.
+<ul>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
+</ul>
+````````````````````````````````
+
+Task lists can also be in ordered lists:
+
+```````````````````````````````` example
+ 1. [x] foo
+ 2. [X] bar
+ 3. [ ] baz
+.
+<ol>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
+</ol>
+````````````````````````````````
+
+Task lists can also be nested in ordinary lists:
+
+```````````````````````````````` example
+ * xxx:
+   * [x] foo
+   * [x] bar
+   * [ ] baz
+ * yyy:
+   * [ ] qux
+   * [x] quux
+   * [ ] quuz
+.
+<ul>
+<li>xxx:
+<ul>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
+</ul></li>
+<li>yyy:
+<ul>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>qux</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>quux</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>quuz</li>
+</ul></li>
+</ul>
+````````````````````````````````
+
+Or in a parent task list:
+
+```````````````````````````````` example
+ 1. [x] xxx:
+    * [x] foo
+    * [x] bar
+    * [ ] baz
+ 2. [ ] yyy:
+    * [ ] qux
+    * [x] quux
+    * [ ] quuz
+.
+<ol>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>xxx:
+<ul>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
+</ul></li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>yyy:
+<ul>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>qux</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>quux</li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>quuz</li>
+</ul></li>
+</ol>
+````````````````````````````````
+
+Also, ordinary lists can be nested in the task lists.
+
+```````````````````````````````` example
+ * [x] xxx:
+   * foo
+   * bar
+   * baz
+ * [ ] yyy:
+   * qux
+   * quux
+   * quuz
+.
+<ul>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>xxx:
+<ul>
+<li>foo</li>
+<li>bar</li>
+<li>baz</li>
+</ul></li>
+<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>yyy:
+<ul>
+<li>qux</li>
+<li>quux</li>
+<li>quuz</li>
+</ul></li>
+</ul>
+````````````````````````````````
--- a/test/underline.txt
+++ b/test/underline.txt
@ -0,0 +1,39 @@
+
+# Underline
+
+With the flag `MD_FLAG_UNDERLINE`, MD4C sees underscore `_` rather as a mark
+denoting an underlined span rather then an ordinary emphasis (or a strong
+emphasis).
+
+```````````````````````````````` example
+_foo_
+.
+<p><u>foo</u></p>
+````````````````````````````````
+
+In sequences of multiple underscores, each single one translates into an
+underline span mark.
+
+```````````````````````````````` example
+___foo___
+.
+<p><u><u><u>foo</u></u></u></p>
+````````````````````````````````
+
+Intra-word underscores are not recognized as underline marks:
+
+```````````````````````````````` example
+foo_bar_baz
+.
+<p>foo_bar_baz</p>
+````````````````````````````````
+
+Also the parser follows the standard understanding when the underscore can
+or cannot open or close a span. Therefore there is no underline in the following
+example because no underline can be seen as a closing mark.
+
+```````````````````````````````` example
+_foo _bar
+.
+<p>_foo _bar</p>
+````````````````````````````````
--- a/test/wiki-links.txt
+++ b/test/wiki-links.txt
@ -0,0 +1,232 @@
+
+# Wiki Links
+
+With the flag `MD_FLAG_WIKILINKS`, MD4C recognizes wiki links.
+
+The simple wiki-link is a wiki-link destination enclosed in `[[` followed with
+`]]`.
+
+```````````````````````````````` example
+[[foo]]
+.
+<p><x-wikilink data-target="foo">foo</x-wikilink></p>
+````````````````````````````````
+
+However wiki-link may contain an explicit label, delimited from the destination
+with `|`.
+
+```````````````````````````````` example
+[[foo|bar]]
+.
+<p><x-wikilink data-target="foo">bar</x-wikilink></p>
+````````````````````````````````
+
+A wiki-link destination cannot be empty.
+
+```````````````````````````````` example
+[[]]
+.
+<p>[[]]</p>
+````````````````````````````````
+
+```````````````````````````````` example
+[[|foo]]
+.
+<p>[[|foo]]</p>
+````````````````````````````````
+
+
+The wiki-link destination cannot contain a new line.
+
+```````````````````````````````` example
+[[foo
+bar]]
+.
+<p>[[foo
+bar]]</p>
+````````````````````````````````
+
+```````````````````````````````` example
+[[foo
+bar|baz]]
+.
+<p>[[foo
+bar|baz]]</p>
+````````````````````````````````
+
+The wiki-link destination is rendered verbatim; inline markup in it is not
+recognized.
+
+```````````````````````````````` example
+[[*foo*]]
+.
+<p><x-wikilink data-target="*foo*">*foo*</x-wikilink></p>
+````````````````````````````````
+
+```````````````````````````````` example
+[[foo|![bar](bar.jpg)]]
+.
+<p><x-wikilink data-target="foo"><img src="bar.jpg" alt="bar"></x-wikilink></p>
+````````````````````````````````
+
+With multiple `|` delimiters, only the first one is recognized and the other
+ones are part of the label.
+
+```````````````````````````````` example
+[[foo|bar|baz]]
+.
+<p><x-wikilink data-target="foo">bar|baz</x-wikilink></p>
+````````````````````````````````
+
+However the delimiter `|` can be escaped with `/`.
+
+```````````````````````````````` example
+[[foo\|bar|baz]]
+.
+<p><x-wikilink data-target="foo|bar">baz</x-wikilink></p>
+````````````````````````````````
+
+The label can contain inline elements.
+
+```````````````````````````````` example
+[[foo|*bar*]]
+.
+<p><x-wikilink data-target="foo"><em>bar</em></x-wikilink></p>
+````````````````````````````````
+
+Empty explicit label is the same as using the implicit label; i.e. the verbatim
+destination string is used as the label.
+
+```````````````````````````````` example
+[[foo|]]
+.
+<p><x-wikilink data-target="foo">foo</x-wikilink></p>
+````````````````````````````````
+
+The label can span multiple lines.
+
+```````````````````````````````` example
+[[foo|foo
+bar
+baz]]
+.
+<p><x-wikilink data-target="foo">foo
+bar
+baz</x-wikilink></p>
+````````````````````````````````
+
+Wiki-links have higher priority then links.
+
+```````````````````````````````` example
+[[foo]](foo.jpg)
+.
+<p><x-wikilink data-target="foo">foo</x-wikilink>(foo.jpg)</p>
+````````````````````````````````
+
+```````````````````````````````` example
+[foo]: /url
+
+[[foo]]
+.
+<p><x-wikilink data-target="foo">foo</x-wikilink></p>
+````````````````````````````````
+
+Wiki links can be inlined in tables.
+
+```````````````````````````````` example
+| A                | B   |
+|------------------|-----|
+| [[foo|*bar*]]    | baz |
+.
+<table>
+<thead>
+<tr>
+<th>A</th>
+<th>B</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><x-wikilink data-target="foo"><em>bar</em></x-wikilink></td>
+<td>baz</td>
+</tr>
+</tbody>
+</table>
+````````````````````````````````
+
+Wiki-links are not prioritized over images.
+
+```````````````````````````````` example
+![[foo]](foo.jpg)
+.
+<p><img src="foo.jpg" alt="[foo]"></p>
+````````````````````````````````
+
+Something that may look like a wiki-link at first, but turns out not to be,
+is recognized as a normal link.
+
+```````````````````````````````` example
+[[foo]
+
+[foo]: /url
+.
+<p>[<a href="/url">foo</a></p>
+````````````````````````````````
+
+Escaping the opening `[` escapes only that one character, not the whole `[[`
+opener:
+
+```````````````````````````````` example
+\[[foo]]
+
+[foo]: /url
+.
+<p>[<a href="/url">foo</a>]</p>
+````````````````````````````````
+
+Like with other inline links, the innermost wiki-link is preferred.
+
+```````````````````````````````` example
+[[foo[[bar]]]]
+.
+<p>[[foo<x-wikilink data-target="bar">bar</x-wikilink>]]</p>
+````````````````````````````````
+
+There is limit of 100 characters for the wiki-link destination.
+
+```````````````````````````````` example
+[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901]]
+[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901|foo]]
+.
+<p>[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901]]
+[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901|foo]]</p>
+````````````````````````````````
+
+100 characters inside a wiki link target works.
+
+```````````````````````````````` example
+[[1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890]]
+[[1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890|foo]]
+.
+<p><x-wikilink data-target="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890">1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890</x-wikilink>
+<x-wikilink data-target="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890">foo</x-wikilink></p>
+````````````````````````````````
+
+The limit on link content does not include any characters belonging to a block
+quote, if the label spans multiple lines contained in a block quote.
+
+```````````````````````````````` example
+> [[12345678901234567890123456789012345678901234567890|1234567890
+> 1234567890
+> 1234567890
+> 1234567890
+> 123456789]]
+.
+<blockquote>
+<p><x-wikilink data-target="12345678901234567890123456789012345678901234567890">1234567890
+1234567890
+1234567890
+1234567890
+123456789</x-wikilink></p>
+</blockquote>
+````````````````````````````````