Import Upstream version 0.4.3

2022-06-02 17:41:00 +08:00 · 2022-06-02 17:41:00 +08:00 · e73391db57
commit e73391db57
47 changed files with 29103 additions and 0 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -0,0 +1,34 @@
 # YAML definition for travis-ci.com continuous integration.
 # See https://docs.travis-ci.com/user/languages/c
 language: c
 dist: bionic
 compiler:
    - gcc
 addons:
    apt:
        packages:
            - python3   # for running tests
            - lcov      # for generating code coverage report
 before_script:
    - mkdir build
    - cd build
    # We enforce -Wdeclaration-after-statement because Qt project needs to
    # build MD4C with Integrity compiler which chokes whenever a declaration
    # is not at the beginning of a block.
    - CFLAGS='--coverage -g -O0 -Wall -Wdeclaration-after-statement -Werror' cmake -DCMAKE_BUILD_TYPE=Debug -G 'Unix Makefiles' ..
 script:
    - make VERBOSE=1
 after_success:
    - ../scripts/run-tests.sh
    # Creating report
    - lcov --directory . --capture --output-file coverage.info # capture coverage info
    - lcov --remove coverage.info '/usr/*' --output-file coverage.info # filter out system
    - lcov --list coverage.info # debug info
    # Uploading report to CodeCov
    - bash <(curl -s https://codecov.io/bash) || echo "Codecov did not collect coverage reports"
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -0,0 +1,268 @@
 # MD4C Change Log
 ## Version 0.4.3
 New features:
 * With `MD_FLAG_UNDERLINE`, spans enclosed in underscore (`_foo_`) are seen
   as underline (`MD_SPAN_UNDERLINE`) rather then an ordinary emphasis or
   strong emphasis.
 Changes:
 * The implementation of wiki-links extension (with `MD_FLAG_WIKILINKS`) has
   been simplified.
    - A noticeable increase of MD4C's memory footprint introduced by the
      extension implementation in 0.4.0 has been removed.
    - The priority handling towards other inline elements have been unified.
      (This affects an obscure case where syntax of an image was in place of
      wiki-link destination made the wiki-link invalid. Now *all* inline spans
      in the wiki-link destination, including the images, is suppressed.)
    - The length limitation of 100 characters now always applies to wiki-link
      destination.
 * Recognition of strike-through spans (with the flag `MD_FLAG_STRIKETHROUGH`)
   has become much stricter and, arguably, reasonable.
    - Only single tildes (`~`) and double tildes (`~~`) are recognized as
      strike-through marks. Longer ones are not anymore.
    - The length of the opener and closer marks have to be the same.
    - The tildes cannot open a strike-through span if a whitespace follows.
    - The tildes cannot close a strike-through span if a whitespace precedes.
   This change follows the changes of behavior in cmark-gfm some time ago, so
   it is also beneficial from compatibility point of view.
 * When building MD4C by hand instead of using its CMake-based build, the UTF-8
   support was by default disabled, unless explicitly asked for by defining
   a preprocessor macro `MD4C_USE_UTF8`.
   This has been changed and the UTF-8 mode now becomes the default, no matter
   how `md4c.c` is compiled. If you need to disable it and use the ASCII-only
   mode, you have explicitly define macro `MD4C_USE_ASCII` when compiling it.
   (The CMake-based build as provided in our repository explicitly asked for
   the UTF-8 support with `-DMD4C_USE_UTF8`. I.e. if you are using MD4C library
   built with our vanilla `CMakeLists.txt` files, this change should not affect
   you.)
 Fixes:
 * Fixed some string length handling in the special `MD4C_USE_UTF16` build.
   (This does not affect you unless you are on Windows and explicitly define
   the macro when building MD4C.)
 * [#100](https://github.com/mity/md4c/issues/100):
   Fixed an off-by-one error in the maximal length limit of some segments
   of e-mail addresses used in autolinks.
 * [#107](https://github.com/mity/md4c/issues/107):
   Fix mis-detection of asterisk-encoded emphasis in some corner cases when
   length of the opener and closer differs, as in `***foo *bar baz***`.
 ## Version 0.4.2
 Fixes:
 * [#98](https://github.com/mity/md4c/issues/98):
   Fix mis-detection of asterisk-encoded emphasis in some corner cases when
   length of the opener and closer differs, as in `**a *b c** d*`.
 ## Version 0.4.1
 Unfortunately, 0.4.0 has been released with badly updated ChangeLog. Fixing
 this is the only change on 0.4.1.
 ## Version 0.4.0
 New features:
 * With `MD_FLAG_LATEXMATHSPANS`, LaTeX math spans (`$...$`) and LaTeX display
   math spans (`$$...$$`) are now recognized. (Note though that the HTML
   renderer outputs them verbatim in a custom `<x-equation>` tag.)
   Contributed by [Tilman Roeder](https://github.com/dyedgreen).
 * With `MD_FLAG_WIKILINKS`, Wiki-style links (`[[...]]`) are now recognized.
   (Note though that the HTML renderer renders them as a custom `<x-wikilink>`
   tag.)
   Contributed by [Nils Blomqvist](https://github.com/niblo).
 Changes:
 * Parsing of tables (with `MD_FLAG_TABLES`) is now closer to the way how
   cmark-gfm parses tables as we do not require every row of the table to
   contain a pipe `|` anymore.
   As a consequence, paragraphs now cannot interrupt tables. A paragraph which
   follows the table has to be delimited with a blank line.
 Fixes:
 * [#94](https://github.com/mity/md4c/issues/94):
   `md_build_ref_def_hashtable()`: Do not allocate more memory then strictly
   needed.
 * [#95](https://github.com/mity/md4c/issues/95):
   `md_is_container_mark()`: Ordered list mark requires at least one digit.
 * [#96](https://github.com/mity/md4c/issues/96):
   Some fixes for link label comparison.
 ## Version 0.3.4
 Changes:
 * Make Unicode-specific code compliant to Unicode 12.1.
 * Structure `MD_BLOCK_CODE_DETAIL` got new member `fenced_char`. Application
   can use it to detect character used to form the block fences (`` ` `` or
   `~`). In the case of indented code block, it is set to zero.
 Fixes:
 * [#77](https://github.com/mity/md4c/issues/77):
   Fix maximal count of digits for numerical character references, as requested
   by CommonMark specification 0.29.
 * [#78](https://github.com/mity/md4c/issues/78):
   Fix link reference definition label matching for Unicode characters where
   the folding mapping leads to multiple codepoints, as e.g. in `ẞ` -> `SS`.
 * [#83](https://github.com/mity/md4c/issues/83):
   Fix recognition of an empty blockquote which interrupts a paragraph.
 ## Version 0.3.3
 Changes:
 * Make permissive URL autolink and permissive WWW autolink extensions stricter.
   This brings the behavior closer to GFM and mitigates risk of false positives.
   In particular, the domain has to contain at least one dot and parenthesis
   can be part of the link destination only if `(` and `)` are balanced.
 Fixes:
 * [#73](https://github.com/mity/md4c/issues/73):
   Some raw HTML inputs could lead to quadratic parsing times.
 * [#74](https://github.com/mity/md4c/issues/74):
   Fix input leading to a crash. Found by fuzzing.
 * [#76](https://github.com/mity/md4c/issues/76):
   Fix handling of parenthesis in some corner cases of permissive URL autolink
   and permissive WWW autolink extensions.
 ## Version 0.3.2
 Changes:
 * Changes mandated by CommonMark specification 0.29.
   Most importantly, the white-space trimming rules for code spans have changed.
   At most one space/newline is trimmed from beginning/end of the code span
   (if the code span contains some non-space contents, and if it begins and
   ends with space at the same time). In all other cases the spaces in the code
   span are now left intact.
   Other changes in behavior are in corner cases only. Refer to [CommonMark
   0.29 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.29)
   for more info.
 Fixes:
 * [#68](https://github.com/mity/md4c/issues/68):
   Some specific HTML blocks were not recognized when EOF follows without any
   end-of-line character.
 * [#69](https://github.com/mity/md4c/issues/69):
   Strike-through span not working correctly when its opener mark is directly
   followed by other opener mark; or when other closer mark directly precedes
   its closer mark.
 ## Version 0.3.1
 Fixes:
 * [#58](https://github.com/mity/md4c/issues/58),
   [#59](https://github.com/mity/md4c/issues/59),
   [#60](https://github.com/mity/md4c/issues/60),
   [#63](https://github.com/mity/md4c/issues/63),
   [#66](https://github.com/mity/md4c/issues/66):
   Some inputs could lead to quadratic parsing times. Thanks to Anders Kaseorg
   for finding all those issues.
 * [#61](https://github.com/mity/md4c/issues/59):
   Flag `MD_FLAG_NOHTMLSPANS` erroneously affected also recognition of
   CommonMark autolinks.
 ## Version 0.3.0
 New features:
 * Add extension for GitHub-style task lists:
   ```
    * [x] foo
    * [x] bar
    * [ ] baz
   ```
   (It has to be explicitly enabled with `MD_FLAG_TASKLISTS`.)
 * Added support for building as a shared library. On non-Windows platforms,
   this is now default behavior; on Windows static library is still the default.
   The CMake option `BUILD_SHARED_LIBS` can be used to request one or the other
   explicitly.
   Contributed by Lisandro Damián Nicanor Pérez Meyer.
 * Renamed structure `MD_RENDERER` to `MD_PARSER` and refactorize its contents
   a little bit. Note this is source-level incompatible and initialization code
   in apps may need to be updated.
   The aim of the change is to be more friendly for long-term ABI compatibility
   we shall maintain, starting with this release.
 * Added `CHANGELOG.md` (this file).
 * Make sure `md_process_table_row()` reports the same count of table cells for
   all table rows, no matter how broken the input is. The cell count is derived
   from table underline line. Bogus cells in other rows are silently ignored.
   Missing cells in other rows are reported as empty ones.
 Fixes:
 * CID 1475544:
   Calling `md_free_attribute()` on uninitialized data.
 * [#47](https://github.com/mity/md4c/issues/47):
   Using bad offsets in `md_is_entity_str()`, in some cases leading to buffer
   overflow.
 * [#51](https://github.com/mity/md4c/issues/51):
   Segfault in `md_process_table_cell()`.
 * [#53](https://github.com/mity/md4c/issues/53):
   With `MD_FLAG_PERMISSIVEURLAUTOLINKS` or `MD_FLAG_PERMISSIVEWWWAUTOLINKS`
   we could generate bad output for ordinary Markdown links, if a non-space
   character immediately follows like e.g. in `[link](http://github.com)X`.
 ## Version 0.2.7
 This was the last version before the changelog has been added.
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -0,0 +1,56 @@
 cmake_minimum_required(VERSION 3.4)
 project(MD4C C)
 set(MD_VERSION_MAJOR 0)
 set(MD_VERSION_MINOR 4)
 set(MD_VERSION_RELEASE 3)
 set(MD_VERSION "${MD_VERSION_MAJOR}.${MD_VERSION_MINOR}.${MD_VERSION_RELEASE}")
 if(WIN32)
    # On Windows, given there is no standard lib install dir etc., we rather
    # by default build static lib.
    option(BUILD_SHARED_LIBS "help string describing option" OFF)
 else()
    # On Linux, MD4C is slowly being adding into some distros which prefer
    # shared lib.
    option(BUILD_SHARED_LIBS "help string describing option" ON)
 endif()
 add_definitions(
    -DMD_VERSION_MAJOR=${MD_VERSION_MAJOR}
    -DMD_VERSION_MINOR=${MD_VERSION_MINOR}
    -DMD_VERSION_RELEASE=${MD_VERSION_RELEASE}
 )
 set(CMAKE_CONFIGURATION_TYPES Debug Release RelWithDebInfo MinSizeRel)
 if("${CMAKE_BUILD_TYPE}" STREQUAL "")
    set(CMAKE_BUILD_TYPE $ENV{CMAKE_BUILD_TYPE})
    if("${CMAKE_BUILD_TYPE}" STREQUAL "")
        set(CMAKE_BUILD_TYPE "Release")
    endif()
 endif()
 if(${CMAKE_C_COMPILER_ID} MATCHES GNU|Clang)
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall")
 elseif(MSVC)
    # Disable warnings about the so-called unsecured functions:
    add_definitions(/D_CRT_SECURE_NO_WARNINGS)
    # Specify proper C runtime library:
    string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
    string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE}")
    string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_RELWITHDEBINFO "{$CMAKE_C_FLAGS_RELWITHDEBINFO}")
    string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_MINSIZEREL}")
    set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /MTd")
    set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /MT")
    set(CMAKE_C_FLAGS_RELWITHDEBINFO "${CMAKE_C_FLAGS_RELEASE} /MT")
    set(CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_RELEASE} /MT")
 endif()
 include(GNUInstallDirs)
 add_subdirectory(md4c)
 add_subdirectory(md2html)
--- a/LICENSE.md
+++ b/LICENSE.md
@ -0,0 +1,22 @@
 # The MIT License (MIT)
 Copyright © 2016-2020 Martin Mitáš
 Permission is hereby granted, free of charge, to any person obtaining a
 copy of this software and associated documentation files (the “Software”),
 to deal in the Software without restriction, including without limitation
 the rights to use, copy, modify, merge, publish, distribute, sublicense,
 and/or sell copies of the Software, and to permit persons to whom the
 Software is furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included
 in all copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS
 OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 IN THE SOFTWARE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,286 @@
 [![Linux Build Status (travis-ci.com)](https://img.shields.io/travis/mity/md4c/master.svg?logo=linux&label=linux%20build)](https://travis-ci.org/mity/md4c)
 [![Windows Build Status (appveyor.com)](https://img.shields.io/appveyor/ci/mity/md4c/master.svg?logo=windows&label=windows%20build)](https://ci.appveyor.com/project/mity/md4c/branch/master)
 [![Code Coverage Status (codecov.io)](https://img.shields.io/codecov/c/github/mity/md4c/master.svg?logo=codecov&label=code%20coverage)](https://codecov.io/github/mity/md4c)
 [![Coverity Scan Status](https://img.shields.io/coverity/scan/mity-md4c.svg?label=coverity%20scan)](https://scan.coverity.com/projects/mity-md4c)
 # MD4C Readme
 * Home: http://github.com/mity/md4c
 * Wiki: http://github.com/mity/md4c/wiki
 * Issue tracker: http://github.com/mity/md4c/issues
 MD4C stands for "Markdown for C" and that's exactly what this project is about.
 ## What is Markdown
 In short, Markdown is the markup language this `README.md` file is written in.
 The following resources can explain more if you are unfamiliar with it:
 * [Wikipedia article](http://en.wikipedia.org/wiki/Markdown)
 * [CommonMark site](http://commonmark.org)
 ## What is MD4C
 MD4C is C Markdown parser with the following features:
 * **Compliance:** Generally MD4C aims to be compliant to the latest version of
  [CommonMark specification](http://spec.commonmark.org/). Currently, we are
  fully compliant to CommonMark 0.29.
 * **Extensions:** MD4C supports some commonly requested and accepted extensions.
  See below.
 * **Compactness:** MD4C is implemented in one source file and one header file.
  There are no dependencies other then standard C library.
 * **Embedding:** MD4C is easy to reuse in other projects, its API is very
  straightforward: There is actually just one function, `md_parse()`.
 * **Push model:** MD4C parses the complete document and calls few callback
  functions provided by the application to inform it about a start/end of
  every block, a start/end of every span, and with any textual contents.
 * **Portability:** MD4C builds and works on Windows and POSIX-compliant OSes.
  (It should be simple to make it run also on most other platforms, at least as
  long as the platform provides C standard library, including a heap memory
  management.)
 * **Encoding:** MD4C can be compiled to recognize ASCII-only control characters,
  UTF-8 and, on Windows, also UTF-16 (i.e. what is on Windows commonly called
  just "Unicode"). See more details below.
 * **Permissive license:** MD4C is available under the MIT license.
 * **Performance:** MD4C is [very fast](https://talk.commonmark.org/t/2520).
 ## Using MD4C
 Application has to include the header `md4c.h` and link against MD4C library;
 or alternatively it may include `md4c.h` and `md4c.c` directly into its source
 base as the parser is only implemented in the single C source file.
 The main provided function is `md_parse()`. It takes a text in the Markdown
 syntax and a pointer to a structure which provides pointers to several callback
 functions.
 As `md_parse()` processes the input, it calls the callbacks (when entering or
 leaving any Markdown block or span; and when outputting any textual content of
 the document), allowing application to convert it into another format or render
 it onto the screen.
 An example implementation of simple renderer is available in the `md2html`
 directory which implements a conversion utility from Markdown to HTML.
 ## Markdown Extensions
 The default behavior is to recognize only Markdown syntax defined by the
 [CommonMark specification](http://spec.commonmark.org/).
 However with appropriate flags, the behavior can be tuned to enable some
 additional extensions:
 * With the flag `MD_FLAG_COLLAPSEWHITESPACE`, a non-trivial whitespace is
  collapsed into a single space.
 * With the flag `MD_FLAG_TABLES`, GitHub-style tables are supported.
 * With the flag `MD_FLAG_TASKLISTS`, GitHub-style task lists are supported.
 * With the flag `MD_FLAG_STRIKETHROUGH`, strike-through spans are enabled
  (text enclosed in tilde marks, e.g. `~foo bar~`).
 * With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS` permissive URL autolinks
  (not enclosed in `<` and `>`) are supported.
 * With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, permissive e-mail
  autolinks (not enclosed in `<` and `>`) are supported.
 * With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS` permissive WWW autolinks
  without any scheme specified (e.g. `www.example.com`) are supported. MD4C
  then assumes `http:` scheme.
 * With the flag `MD_FLAG_LATEXMATHSPANS` LaTeX math spans (`$...$`) and
  LaTeX display math spans (`$$...$$`) are supported. (Note though that the
  HTML renderer outputs them verbatim in a custom tag `<x-equation>`.)
 * With the flag `MD_FLAG_WIKILINKS`, wiki-style links (`[[link label]]` and
  `[[target article|link label]]`) are supported. (Note that the HTML renderer
  outputs them in a custom tag `<x-wikilink>`.)
 * With the flag `MD_FLAG_UNDERLINE`, underscore (`_`) denotes an underline
  instead of an ordinary emphasis or strong emphasis.
 Few features of CommonMark (those some people see as mis-features) may be
 disabled:
 * With the flag `MD_FLAG_NOHTMLSPANS` or `MD_FLAG_NOHTMLBLOCKS`, raw inline
  HTML or raw HTML blocks respectively are disabled.
 * With the flag `MD_FLAG_NOINDENTEDCODEBLOCKS`, indented code blocks are
  disabled.
 ## Input/Output Encoding
 The CommonMark specification generally assumes UTF-8 input, but under closer
 inspection, Unicode plays any role in few very specific situations when parsing
 Markdown documents:
 1. For detection of word boundaries when processing emphasis and strong
   emphasis, some classification of Unicode characters (whether it is
   a whitespace or a punctuation) is needed.
 2. For (case-insensitive) matching of a link reference label with the
   corresponding link reference definition, Unicode case folding is used.
 3. For translating HTML entities (e.g. `&amp;`) and numeric character
   references (e.g. `&#35;` or `&#xcab;`) into their Unicode equivalents.
   However MD4C leaves this translation on the renderer/application; as the
   renderer is supposed to really know output encoding and whether it really
   needs to perform this kind of translation. (For example, when the renderer
   outputs HTML, it may leave the entities untranslated and defer the work to
   a web browser.)
 MD4C relies on this property of the CommonMark and the implementation is, to
 a large degree, encoding-agnostic. Most of MD4C code only assumes that the
 encoding of your choice is compatible with ASCII, i.e. that the codepoints
 below 128 have the same numeric values as ASCII.
 Any input MD4C does not understand is simply seen as part of the document text
 and sent to the renderer's callback functions unchanged.
 The two situations (word boundary detection and link reference matching) where
 MD4C has to understand Unicode are handled as specified by the following rules:
 * If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8 for the
  word boundary detection and for the case-insensitive matching of link labels.
  When none of these macros is explicitly used, this is the default behavior.
 * On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses
  `WCHAR` instead of `char` and assumes UTF-16 encoding in those situations.
  (UTF-16 is what Windows developers usually call just "Unicode" and what
  Win32API generally works with.)
  Note that because this macro affects also the types in `md4c.h`, you have
  to define the macro both when building MD4C as well as when including
  `md4c.h`.
  Also note this is only supported in the parser (`md4c.[hc]`). The HTML
  renderer does not support this and you will have to write your own custom
  renderer to use this feature.
 * If preprocessor macro `MD4C_USE_ASCII` is defined, MD4C assumes nothing but
  an ASCII input.
  That effectively means that non-ASCII whitespace or punctuation characters
  won't be recognized as such and that link reference matching will work in
  a case-insensitive way only for ASCII letters (`[a-zA-Z]`).
 ## Documentation
 The API is quite well documented in the comments in the `md4c.h` header.
 There is also [project wiki](http://github.com/mity/md4c/wiki) which provides
 some more comprehensive documentation. However note it is incomplete and some
 details may be little-bit outdated.
 ## FAQ
 **Q: In my code, I need to convert Markdown to HTML. How?**
 **A:** Indeed the API, as provided by `md4c.h`, is just a SAX-like Markdown
 parser. Nothing more and nothing less.
 That said, there is a complete HTML generator built on top of the parser in the
 directory `md2html` (the files `render_html.[hc]` and `entity.[hc]`). At this
 time, you have to directly reuse that code in your project.
 There is [some discussion](https://github.com/mity/md4c/issues/82) whether this
 should be changed (and how) in the future.
 **Q: How does MD4C compare to a parser XY?**
 **A:** Some other implementations combine Markdown parser and HTML generator
 into a single entangled code hidden behind an interface which just allows the
 conversion from Markdown to HTML, and they are unusable if you want to process
 the input in any other way.
 Even when the parsing is available as a standalone feature, most parsers (if
 not all of them; at least within the scope of C/C++ language) are full DOM-like
 parsers: They construct abstract syntax tree (AST) representation of the whole
 Markdown document. That takes time and it leads to bigger memory footprint.
 It's completely fine as long as you really need it. If you don't need the full
 AST, there is very high chance that using MD4C will be faster and much less
 memory-hungry.
 Last but not least, some Markdown parsers are implemented in a naive way. When
 fed with a [smartly crafted input pattern](test/pathological_tests.py), they
 may exhibit quadratic (or even worse) parsing times. What MD4C can still parse
 in a fraction of second may turn into long minutes or possibly hours with them.
 Hence, when such a naive parser is used to process an input from an untrusted
 source, the possibility of denial-of-service attacks becomes a real danger.
 A lot of our effort went into providing linear parsing times no matter what
 kind of crazy input MD4C parser is fed with. (If you encounter an input pattern
 which leads to a sub-linear parsing times, please do not hesitate and report it
 as a bug.)
 **Q: Does MD4C perform any input validation?**
 **A:** No.
 CommonMark specification declares that any sequence of (Unicode) characters is
 a valid Markdown document; i.e. that it does not matter whether some Markdown
 syntax is in some way broken or not. If it is broken, it will simply not be
 recognized and the parser should see the broken syntax construction just as a
 verbatim text.
 MD4C takes this a step further. It sees any sequence of bytes as a valid input,
 following completely the GIGO philosophy (garbage in, garbage out).
 If you need to validate that the input is, say, a valid UTF-8 document, you
 have to do it on your own. You can simply validate the whole Markdown document
 before passing it to the MD4C parser.
 Alternatively, you may perform the validation on the fly during the parsing,
 in the `MD_PARSER::text()` callback. (Given how MD4C works internally, it will
 never break a sequence of bytes into multiple calls of `MD_PARSER::text()`,
 unless that sequence is already broken to multiple pieces in the input by some
 whitespace, new line character(s) and/or any Markdown syntax construction.)
 ## License
 MD4C is covered with MIT license, see the file `LICENSE.md`.
 ## Links to Related Projects
 Ports and bindings to other languages:
 * [commonmark-d](https://github.com/AuburnSounds/commonmark-d):
  Port of MD4C to D language.
 * [markdown-wasm](https://github.com/rsms/markdown-wasm):
  Markdown parser and HTML generator for WebAssembly, based on MD4C.
 Software using MD4C:
 * [Qt](https://www.qt.io/):
  Cross-platform C++ GUI framework.
 * [Textosaurus](https://github.com/martinrotter/textosaurus):
  Cross-platform text editor based on Qt and Scintilla.
 * [8th](https://8th-dev.com/):
  Cross-platform concatenative programming language.
--- a/appveyor.yml
+++ b/appveyor.yml
@ -0,0 +1,29 @@
 # YAML definition for Appveyor.com continuous integration.
 # See http://www.appveyor.com/docs/appveyor-yml
 version: '{branch}-{build}'
 before_build:
  - 'cmake --version'
  - 'if "%PLATFORM%"=="x64" cmake -G "Visual Studio 12 Win64" .'
  - 'if not "%PLATFORM%"=="x64" cmake -G "Visual Studio 12" .'
 build:
  project: md4c.sln
  verbosity: detailed
 skip_tags: true
 os:
  - Windows Server 2012 R2
 configuration:
  - Debug
  - Release
 platform:
  - x64    # 64-bit build
  - win32  # 32-bit build
 artifacts:
  - path: $(configuration)/md2html/md2html.exe
--- a/codecov.yml
+++ b/codecov.yml
@ -0,0 +1,4 @@
 # YAML definition for codecov.io code coverage reports.
 ignore:
    - "md2html"
--- a/md2html/CMakeLists.txt
+++ b/md2html/CMakeLists.txt
@ -0,0 +1,15 @@
 include_directories("${PROJECT_SOURCE_DIR}/md4c")
 add_executable(md2html cmdline.c cmdline.h entity.c entity.h md2html.c render_html.c render_html.h)
 target_link_libraries(md2html md4c)
 install(
    TARGETS md2html
    ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
    LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
    RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
    PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
 )
 install(FILES "md2html.1" DESTINATION "${CMAKE_INSTALL_MANDIR}/man1")
--- a/md2html/cmdline.c
+++ b/md2html/cmdline.c
@ -0,0 +1,296 @@
 /* cmdline.c: a reentrant version of getopt(). Written 2006 by Brian
 * Raiter. This code is in the public domain.
 */
 #include	<stdio.h>
 #include	<stdlib.h>
 #include	<string.h>
 #include	<ctype.h>
 #include	"cmdline.h"
 #define	docallback(opt, val) \
 	    do { if ((r = callback(opt, val, data)) != 0) return r; } while (0)
 /* Parse the given cmdline arguments.
 */
 int readoptions(option const* list, int argc, char **argv,
 		int (*callback)(int, char const*, void*), void *data)
 {
    char		argstring[] = "--";
    option const       *opt;
    char const	       *val;
    char const	       *p;
    int			stop = 0;
    int			argi, len, r;
    if (!list || !callback)
 	return -1;
    for (argi = 1 ; argi < argc ; ++argi)
    {
 	/* First, check for "--", which forces all remaining arguments
 	 * to be treated as non-options.
 	 */
 	if (!stop && argv[argi][0] == '-' && argv[argi][1] == '-'
 					  && argv[argi][2] == '\0') {
 	    stop = 1;
 	    continue;
 	}
 	/* Arguments that do not begin with '-' (or are only "-") are
 	 * not options.
 	 */
 	if (stop || argv[argi][0] != '-' || argv[argi][1] == '\0') {
 	    docallback(0, argv[argi]);
 	    continue;
 	}
 	if (argv[argi][1] == '-')
 	{
 	    /* Arguments that begin with a double-dash are long
 	     * options.
 	     */
 	    p = argv[argi] + 2;
 	    val = strchr(p, '=');
 	    if (val)
 		len = val++ - p;
 	    else
 		len = strlen(p);
 	    /* Is it on the list of valid options? If so, does it
 	     * expect a parameter?
 	     */
 	    for (opt = list ; opt->optval ; ++opt)
 		if (opt->name && !strncmp(p, opt->name, len)
 			      && !opt->name[len])
 		    break;
 	    if (!opt->optval) {
 		docallback('?', argv[argi]);
 	    } else if (!val && opt->arg == 1) {
 		docallback(':', argv[argi]);
 	    } else if (val && opt->arg == 0) {
 		docallback('=', argv[argi]);
 	    } else {
 		docallback(opt->optval, val);
 	    }
 	}
 	else
 	{
 	    /* Arguments that begin with a single dash contain one or
 	     * more short options. Each character in the argument is
 	     * examined in turn, unless a parameter consumes the rest
 	     * of the argument (or possibly even the following
 	     * argument).
 	     */
 	    for (p = argv[argi] + 1 ; *p ; ++p) {
 		for (opt = list ; opt->optval ; ++opt)
 		    if (opt->chname == *p)
 			break;
 		if (!opt->optval) {
 		    argstring[1] = *p;
 		    docallback('?', argstring);
 		    continue;
 		} else if (opt->arg == 0) {
 		    docallback(opt->optval, NULL);
 		    continue;
 		} else if (p[1]) {
 		    docallback(opt->optval, p + 1);
 		    break;
 		} else if (argi + 1 < argc && strcmp(argv[argi + 1], "--")) {
 		    ++argi;
 		    docallback(opt->optval, argv[argi]);
 		    break;
 		} else if (opt->arg == 2) {
 		    docallback(opt->optval, NULL);
 		    continue;
 		} else {
 		    argstring[1] = *p;
 		    docallback(':', argstring);
 		    break;
 		}
 	    }
 	}
    }
    return 0;
 }
 /* Verify that str points to an ASCII zero or one (optionally with
 * whitespace) and return the value present, or -1 if str's contents
 * are anything else.
 */
 static int readboolvalue(char const *str)
 {
    char	d;
    while (isspace(*str))
 	++str;
    if (!*str)
 	return -1;
    d = *str++;
    while (isspace(*str))
 	++str;
    if (*str)
 	return -1;
    if (d == '0')
 	return 0;
    else if (d == '1')
 	return 1;
    else
 	return -1;
 }
 /* Parse a configuration file.
 */
 int readcfgfile(option const* list, FILE *fp,
 		int (*callback)(int, char const*, void*), void *data)
 {
    char		buf[1024];
    option const       *opt;
    char	       *name, *val, *p;
    int			len, f, r;
    while (fgets(buf, sizeof buf, fp) != NULL)
    {
 	/* Strip off the trailing newline and any leading whitespace.
 	 * If the line begins with a hash sign, skip it entirely.
 	 */
 	len = strlen(buf);
 	if (len && buf[len - 1] == '\n')
 	    buf[--len] = '\0';
 	for (p = buf ; isspace(*p) ; ++p) ;
 	if (!*p || *p == '#')
 	    continue;
 	/* Find the end of the option's name and the beginning of the
 	 * parameter, if any.
 	 */
 	for (name = p ; *p && *p != '=' && !isspace(*p) ; ++p) ;
 	len = p - name;
 	for ( ; *p == '=' || isspace(*p) ; ++p) ;
 	val = p;
 	/* Is it on the list of valid options? Does it take a
 	 * full parameter, or just an optional boolean?
 	 */
 	for (opt = list ; opt->optval ; ++opt)
 	    if (opt->name && !strncmp(name, opt->name, len)
 			  && !opt->name[len])
 		    break;
 	if (!opt->optval) {
 	    docallback('?', name);
 	} else if (!*val && opt->arg == 1) {
 	    docallback(':', name);
 	} else if (*val && opt->arg == 0) {
 	    f = readboolvalue(val);
 	    if (f < 0)
 		docallback('=', name);
 	    else if (f == 1)
 		docallback(opt->optval, NULL);
 	} else {
 	    docallback(opt->optval, val);
 	}
    }
    return ferror(fp) ? -1 : 0;
 }
 /* Turn a string containing a cmdline into an argc-argv pair.
 */
 int makecmdline(char const *cmdline, int *argcp, char ***argvp)
 {
    char      **argv;
    int		argc;
    char const *s;
    int		n, quoted;
    if (!cmdline)
 	return 0;
    /* Calcuate argc by counting the number of "clumps" of non-spaces.
     */
    for (s = cmdline ; isspace(*s) ; ++s) ;
    if (!*s) {
 	*argcp = 1;
 	if (argvp) {
 	    *argvp = malloc(2 * sizeof(char*));
 	    if (!*argvp)
 		return 0;
 	    (*argvp)[0] = NULL;
 	    (*argvp)[1] = NULL;
 	}
 	return 1;
    }
    for (argc = 2, quoted = 0 ; *s ; ++s) {
 	if (quoted == '"') {
 	    if (*s == '"')
 		quoted = 0;
 	    else if (*s == '\\' && s[1])
 		++s;
 	} else if (quoted == '\'') {
 	    if (*s == '\'')
 		quoted = 0;
 	} else {
 	    if (isspace(*s)) {
 		for ( ; isspace(s[1]) ; ++s) ;
 		if (!s[1])
 		    break;
 		++argc;
 	    } else if (*s == '"' || *s == '\'') {
 		quoted = *s;
 	    }
 	}
    }
    *argcp = argc;
    if (!argvp)
 	return 1;
    /* Allocate space for all the arguments and their pointers.
     */
    argv = malloc((argc + 1) * sizeof(char*) + strlen(cmdline) + 1);
    *argvp = argv;
    if (!argv)
 	return 0;
    argv[0] = NULL;
    argv[1] = (char*)(argv + argc + 1);
    /* Copy the string into the allocated memory immediately after the
     * argv array. Where spaces immediately follows a nonspace,
     * replace it with a \0. Where a nonspace immediately follows
     * spaces, store a pointer to it. (Except, of course, when the
     * space-nonspace transitions occur within quotes.)
     */
    for (s = cmdline ; isspace(*s) ; ++s) ;
    for (argc = 1, n = 0, quoted = 0 ; *s ; ++s) {
 	if (quoted == '"') {
 	    if (*s == '"') {
 		quoted = 0;
 	    } else {
 		if (*s == '\\' && s[1])
 		    ++s;
 		argv[argc][n++] = *s;
 	    }
 	} else if (quoted == '\'') {
 	    if (*s == '\'')
 		quoted = 0;
 	    else
 		argv[argc][n++] = *s;
 	} else {
 	    if (isspace(*s)) {
 		argv[argc][n] = '\0';
 		for ( ; isspace(s[1]) ; ++s) ;
 		if (!s[1])
 		    break;
 		argv[argc + 1] = argv[argc] + n + 1;
 		++argc;
 		n = 0;
 	    } else {
 		if (*s == '"' || *s == '\'')
 		    quoted = *s;
 		else
 		    argv[argc][n++] = *s;
 	    }
 	}
    }
    argv[argc + 1] = NULL;
    return 1;
 }
--- a/md2html/cmdline.h
+++ b/md2html/cmdline.h
@ -0,0 +1,86 @@
 /* cmdline.h: a reentrant version of getopt(). Written 2006 by Brian
 * Raiter. This code is in the public domain.
 */
 #ifndef	_cmdline_h_
 #define	_cmdline_h_
 /* The information specifying a single cmdline option.
 */
 typedef struct option {
    char const *name;		/* the option's long name, or "" if none */
    char	chname;		/* a single-char name, or zero if none */
    int		optval;		/* a unique value representing this option */
    int		arg;		/* 0 = no arg, 1 = arg req'd, 2 = optional */
 } option;
 /* Parse the given cmdline arguments. list is an array of option
 * structs, each entry specifying a valid option. The last struct in
 * the array must have name set to NULL. argc and argv give the
 * cmdline to parse. callback is the function to call for each option
 * and non-option found on the cmdline. data is a pointer that is
 * passed to each invocation of callback. The return value of callback
 * should be zero to continue processing the cmdline, or any other
 * value to abort. The return value of readoptions() is the value
 * returned from the last callback, or zero if no arguments were
 * found, or -1 if an error occurred.
 *
 * When readoptions() encounters a regular cmdline argument (i.e. a
 * non-option argument), callback() is invoked with opt equal to zero
 * and val pointing to the argument. When an option is found,
 * callback() is invoked with opt equal to the optval field in the
 * option struct corresponding to that option, and val points to the
 * option's paramter, or is NULL if the option does not take a
 * parameter. If readoptions() finds an option that does not appear in
 * the list of valid options, callback() is invoked with opt equal to
 * '?'. If readoptions() encounters an option that is missing its
 * required parameter, callback() is invoked with opt equal to ':'. If
 * readoptions() finds a parameter on a long option that does not
 * admit a parameter, callback() is invoked with opt equal to '='. In
 * each of these cases, val will point to the erroneous option
 * argument.
 */
 extern int readoptions(option const* list, int argc, char **argv,
 		       int (*callback)(int opt, char const *val, void *data),
 		       void *data);
 /* Parse the given file. list is an array of option structs, in the
 * same form as taken by readoptions(). fp is a pointer to an open
 * text file. callback is the function to call for each line found in
 * the configuration file. data is a pointer that is passed to each
 * invocation of callback. The return value of readcfgfile() is the
 * value returned from the last callback, or zero if no arguments were
 * found, or -1 if an error occurred while reading the file.
 *
 * The function will ignore lines that contain only whitespace, or
 * lines that begin with a hash sign. All other lines should be of the
 * form "OPTION=VALUE", where OPTION is one of the long options in
 * list. Whitespace around the equal sign is permitted. An option that
 * takes no arguments can either have a VALUE of 0 or 1, or omit the
 * "=VALUE" entirely. (A VALUE of 0 will behave the same as if the
 * line was not present.)
 */
 extern int readcfgfile(option const* list, FILE *fp,
 		       int (*callback)(int opt, char const *val, void *data),
 		       void *data);
 /* Create an argc-argv pair from a string containing a command line.
 * cmdline is the string to be parsed. argcp points to the variable to
 * receive the argc value, and argvp points to the variable to receive
 * the argv value. argvp can be NULL if the caller just wants to get
 * argc. Zero is returned on failure. This function allocates memory
 * on behalf of the caller. The memory is allocated as a single block,
 * so it is sufficient to simply free() the pointer returned through
 * argvp. Note that argv[0] will always be initialized to NULL; the
 * first argument will be stored in argv[1]. The string is parsed by
 * separating arguments on whitespace boundaries. Space within
 * substrings enclosed in single-quotes is ignored. A substring
 * enclosed in double-quotes is treated the same, except that the
 * backslash is recognized as an escape character within such a
 * substring. Enclosing quotes and escaping backslashes are not copied
 * into the argv values.
 */
 extern int makecmdline(char const *cmdline, int *argcp, char ***argvp);
 #endif
--- a/md2html/entity.c
+++ b/md2html/entity.c
--- a/md2html/entity.h
+++ b/md2html/entity.h
@ -0,0 +1,42 @@
 /*
 * MD4C: Markdown parser for C
 * (http://github.com/mity/md4c)
 *
 * Copyright (c) 2016-2017 Martin Mitas
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 * IN THE SOFTWARE.
 */
 #ifndef MD2HTML_ENTITY_H
 #define MD2HTML_ENTITY_H
 #include <stdlib.h>
 /* Most entities are formed by single Unicode codepoint, few by two codepoints.
 * Single-codepoint entities have codepoints[1] set to zero. */
 struct entity {
    const char* name;
    unsigned codepoints[2];
 };
 const struct entity* entity_lookup(const char* name, size_t name_size);
 #endif  /* MD2HTML_ENTITY_H */
--- a/md2html/md2html.1
+++ b/md2html/md2html.1
@ -0,0 +1,113 @@
 .TH MD2HTML 1 "June 2019" "" "General Commands Manual"
 .nh
 .ad l
 .
 .SH NAME
 .
 md2html \- convert Markdown to HTML
 .
 .SH SYNOPSIS
 .
 .B md2html
 .RI [ OPTION ]...\&
 .RI [ FILE ]
 .
 .SH OPTIONS
 .
 .SS General options:
 .
 .TP
 .BR -o ", " --output= \fIOUTFILE\fR
 Write output to \fIOUTFILE\fR instead of \fBstdout\fR(3)
 .
 .TP
 .BR -f ", " --full-html
 Generate full HTML document, including header
 .
 .TP
 .BR -s ", " --stat
 Measure time of input parsing
 .
 .TP
 .BR -h ", " --help
 Display help and exit
 .
 .TP
 .BR -v ", " --version
 Display version and exit
 .
 .SS Markdown dialect options:
 .
 .TP
 .B --commonmark
 CommonMark (the default)
 .
 .TP
 .B --github
 Github Flavored Markdown
 .
 .PP
 Note: dialect options are equivalent to some combination of flags below.
 .
 .SS Markdown extension options:
 .
 .TP
 .B --fcollapse-whitespace
 Collapse non-trivial whitespace
 .
 .TP
 .B --fverbatim-entities
 Do not translate entities
 .
 .TP
 .B --fpermissive-atx-headers
 Allow ATX headers without delimiting space
 .
 .TP
 .B --fpermissive-url-autolinks
 Allow URL autolinks without "<" and ">" delimiters
 .
 .TP
 .B --fpermissive-www-autolinks
 Allow WWW autolinks without any scheme (e.g. "www.example.com")
 .
 .TP
 .B --fpermissive-email-autolinks
 Allow e-mail autolinks without "<", ">" and "mailto:"
 .
 .TP
 .B --fpermissive-autolinks
 Enable all 3 of the above permissive autolinks options
 .
 .TP
 .B --fno-indented-code
 Disable indented code blocks
 .
 .TP
 .B --fno-html-blocks
 Disable raw HTML blocks
 .
 .TP
 .B --fno-html-spans
 Disable raw HTML spans
 .
 .TP
 .B --fno-html
 Same as \fB--fno-html-blocks --fno-html-spans\fR
 .
 .TP
 .B --ftables
 Enable tables
 .
 .TP
 .B --fstrikethrough
 Enable strikethrough spans
 .
 .TP
 .B --ftasklists
 Enable task lists
 .
 .SH SEE ALSO
 .
 https://github.com/mity/md4c
 .
--- a/md2html/md2html.c
+++ b/md2html/md2html.c
@ -0,0 +1,371 @@
 /*
 * MD4C: Markdown parser for C
 * (http://github.com/mity/md4c)
 *
 * Copyright (c) 2016-2017 Martin Mitas
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 * IN THE SOFTWARE.
 */
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <time.h>
 #include "render_html.h"
 #include "cmdline.h"
 /* Global options. */
 static unsigned parser_flags = 0;
 static unsigned renderer_flags = MD_RENDER_FLAG_DEBUG;
 static int want_fullhtml = 0;
 static int want_stat = 0;
 /*********************************
 ***  Simple grow-able buffer  ***
 *********************************/
 /* We render to a memory buffer instead of directly outputting the rendered
 * documents, as this allows using this utility for evaluating performance
 * of MD4C (--stat option). This allows us to measure just time of the parser,
 * without the I/O.
 */
 struct membuffer {
    char* data;
    size_t asize;
    size_t size;
 };
 static void
 membuf_init(struct membuffer* buf, MD_SIZE new_asize)
 {
    buf->size = 0;
    buf->asize = new_asize;
    buf->data = malloc(buf->asize);
    if(buf->data == NULL) {
        fprintf(stderr, "membuf_init: malloc() failed.\n");
        exit(1);
    }
 }
 static void
 membuf_fini(struct membuffer* buf)
 {
    if(buf->data)
        free(buf->data);
 }
 static void
 membuf_grow(struct membuffer* buf, size_t new_asize)
 {
    buf->data = realloc(buf->data, new_asize);
    if(buf->data == NULL) {
        fprintf(stderr, "membuf_grow: realloc() failed.\n");
        exit(1);
    }
    buf->asize = new_asize;
 }
 static void
 membuf_append(struct membuffer* buf, const char* data, MD_SIZE size)
 {
    if(buf->asize < buf->size + size)
        membuf_grow(buf, buf->size + buf->size / 2 + size);
    memcpy(buf->data + buf->size, data, size);
    buf->size += size;
 }
 /**********************
 ***  Main program  ***
 **********************/
 static void
 process_output(const MD_CHAR* text, MD_SIZE size, void* userdata)
 {
    membuf_append((struct membuffer*) userdata, text, size);
 }
 static int
 process_file(FILE* in, FILE* out)
 {
    MD_SIZE n;
    struct membuffer buf_in = {0};
    struct membuffer buf_out = {0};
    int ret = -1;
    clock_t t0, t1;
    membuf_init(&buf_in, 32 * 1024);
    /* Read the input file into a buffer. */
    while(1) {
        if(buf_in.size >= buf_in.asize)
            membuf_grow(&buf_in, buf_in.asize + buf_in.asize / 2);
        n = fread(buf_in.data + buf_in.size, 1, buf_in.asize - buf_in.size, in);
        if(n == 0)
            break;
        buf_in.size += n;
    }
    /* Input size is good estimation of output size. Add some more reserve to
     * deal with the HTML header/footer and tags. */
    membuf_init(&buf_out, buf_in.size + buf_in.size/8 + 64);
    /* Parse the document. This shall call our callbacks provided via the
     * md_renderer_t structure. */
    t0 = clock();
    ret = md_render_html(buf_in.data, buf_in.size, process_output,
                (void*) &buf_out, parser_flags, renderer_flags);
    t1 = clock();
    if(ret != 0) {
        fprintf(stderr, "Parsing failed.\n");
        goto out;
    }
    /* Write down the document in the HTML format. */
    if(want_fullhtml) {
        fprintf(out, "<html>\n");
        fprintf(out, "<head>\n");
        fprintf(out, "<title></title>\n");
        fprintf(out, "<meta name=\"generator\" content=\"md2html\">\n");
        fprintf(out, "</head>\n");
        fprintf(out, "<body>\n");
    }
    fwrite(buf_out.data, 1, buf_out.size, out);
    if(want_fullhtml) {
        fprintf(out, "</body>\n");
        fprintf(out, "</html>\n");
    }
    if(want_stat) {
        if(t0 != (clock_t)-1  &&  t1 != (clock_t)-1) {
            double elapsed = (double)(t1 - t0) / CLOCKS_PER_SEC;
            if (elapsed < 1)
                fprintf(stderr, "Time spent on parsing: %7.2f ms.\n", elapsed*1e3);
            else
                fprintf(stderr, "Time spent on parsing: %6.3f s.\n", elapsed);
        }
    }
    /* Success if we have reached here. */
    ret = 0;
 out:
    membuf_fini(&buf_in);
    membuf_fini(&buf_out);
    return ret;
 }
 #define OPTION_ARG_NONE         0
 #define OPTION_ARG_REQUIRED     1
 #define OPTION_ARG_OPTIONAL     2
 static const option cmdline_options[] = {
    { "output",                     'o', 'o', OPTION_ARG_REQUIRED },
    { "full-html",                  'f', 'f', OPTION_ARG_NONE },
    { "stat",                       's', 's', OPTION_ARG_NONE },
    { "help",                       'h', 'h', OPTION_ARG_NONE },
    { "version",                    'v', 'v', OPTION_ARG_NONE },
    { "commonmark",                  0,  'c', OPTION_ARG_NONE },
    { "github",                      0,  'g', OPTION_ARG_NONE },
    { "fcollapse-whitespace",        0,  'W', OPTION_ARG_NONE },
    { "flatex-math",                 0,  'L', OPTION_ARG_NONE },
    { "fpermissive-atx-headers",     0,  'A', OPTION_ARG_NONE },
    { "fpermissive-autolinks",       0,  'V', OPTION_ARG_NONE },
    { "fpermissive-email-autolinks", 0,  '@', OPTION_ARG_NONE },
    { "fpermissive-url-autolinks",   0,  'U', OPTION_ARG_NONE },
    { "fpermissive-www-autolinks",   0,  '.', OPTION_ARG_NONE },
    { "fstrikethrough",              0,  'S', OPTION_ARG_NONE },
    { "ftables",                     0,  'T', OPTION_ARG_NONE },
    { "ftasklists",                  0,  'X', OPTION_ARG_NONE },
    { "funderline",                  0,  '_', OPTION_ARG_NONE },
    { "fverbatim-entities",          0,  'E', OPTION_ARG_NONE },
    { "fwiki-links",                 0,  'K', OPTION_ARG_NONE },
    { "fno-html-blocks",             0,  'F', OPTION_ARG_NONE },
    { "fno-html-spans",              0,  'G', OPTION_ARG_NONE },
    { "fno-html",                    0,  'H', OPTION_ARG_NONE },
    { "fno-indented-code",           0,  'I', OPTION_ARG_NONE },
    { 0 }
 };
 static void
 usage(void)
 {
    printf(
        "Usage: md2html [OPTION]... [FILE]\n"
        "Convert input FILE (or standard input) in Markdown format to HTML.\n"
        "\n"
        "General options:\n"
        "  -o  --output=FILE    Output file (default is standard output)\n"
        "  -f, --full-html      Generate full HTML document, including header\n"
        "  -s, --stat           Measure time of input parsing\n"
        "  -h, --help           Display this help and exit\n"
        "  -v, --version        Display version and exit\n"
        "\n"
        "Markdown dialect options:\n"
        "(note these are equivalent to some combinations of the flags below)\n"
        "      --commonmark     CommonMark (this is default)\n"
        "      --github         Github Flavored Markdown\n"
        "\n"
        "Markdown extension options:\n"
        "      --fcollapse-whitespace\n"
        "                       Collapse non-trivial whitespace\n"
        "      --flatex-math    Enable LaTeX style mathematics spans\n"
        "      --fpermissive-atx-headers\n"
        "                       Allow ATX headers without delimiting space\n"
        "      --fpermissive-url-autolinks\n"
        "                       Allow URL autolinks without '<', '>'\n"
        "      --fpermissive-www-autolinks\n"
        "                       Allow WWW autolinks without any scheme (e.g. 'www.example.com')\n"
        "      --fpermissive-email-autolinks  \n"
        "                       Allow e-mail autolinks without '<', '>' and 'mailto:'\n"
        "      --fpermissive-autolinks\n"
        "                       Same as --fpermissive-url-autolinks --fpermissive-www-autolinks\n"
        "                       --fpermissive-email-autolinks\n"
        "      --fstrikethrough Enable strike-through spans\n"
        "      --ftables        Enable tables\n"
        "      --ftasklists     Enable task lists\n"
        "      --funderline     Enable underline spans\n"
        "      --fwiki-links    Enable wiki links\n"
        "\n"
        "Markdown suppression options:\n"
        "      --fno-html-blocks\n"
        "                       Disable raw HTML blocks\n"
        "      --fno-html-spans\n"
        "                       Disable raw HTML spans\n"
        "      --fno-html       Same as --fno-html-blocks --fno-html-spans\n"
        "      --fno-indented-code\n"
        "                       Disable indented code blocks\n"
        "\n"
        "HTML generator options:\n"
        "      --fverbatim-entities\n"
        "                       Do not translate entities\n"
        "\n"
    );
 }
 static void
 version(void)
 {
    printf("%d.%d.%d\n", MD_VERSION_MAJOR, MD_VERSION_MINOR, MD_VERSION_RELEASE);
 }
 static const char* input_path = NULL;
 static const char* output_path = NULL;
 static int
 cmdline_callback(int opt, char const* value, void* data)
 {
    switch(opt) {
        case 0:
            if(input_path) {
                fprintf(stderr, "Too many arguments. Only one input file can be specified.\n");
                fprintf(stderr, "Use --help for more info.\n");
                exit(1);
            }
            input_path = value;
            break;
        case 'o':   output_path = value; break;
        case 'f':   want_fullhtml = 1; break;
        case 's':   want_stat = 1; break;
        case 'h':   usage(); exit(0); break;
        case 'v':   version(); exit(0); break;
        case 'c':   parser_flags = MD_DIALECT_COMMONMARK; break;
        case 'g':   parser_flags = MD_DIALECT_GITHUB; break;
        case 'E':   renderer_flags |= MD_RENDER_FLAG_VERBATIM_ENTITIES; break;
        case 'A':   parser_flags |= MD_FLAG_PERMISSIVEATXHEADERS; break;
        case 'I':   parser_flags |= MD_FLAG_NOINDENTEDCODEBLOCKS; break;
        case 'F':   parser_flags |= MD_FLAG_NOHTMLBLOCKS; break;
        case 'G':   parser_flags |= MD_FLAG_NOHTMLSPANS; break;
        case 'H':   parser_flags |= MD_FLAG_NOHTML; break;
        case 'W':   parser_flags |= MD_FLAG_COLLAPSEWHITESPACE; break;
        case 'U':   parser_flags |= MD_FLAG_PERMISSIVEURLAUTOLINKS; break;
        case '.':   parser_flags |= MD_FLAG_PERMISSIVEWWWAUTOLINKS; break;
        case '@':   parser_flags |= MD_FLAG_PERMISSIVEEMAILAUTOLINKS; break;
        case 'V':   parser_flags |= MD_FLAG_PERMISSIVEAUTOLINKS; break;
        case 'T':   parser_flags |= MD_FLAG_TABLES; break;
        case 'S':   parser_flags |= MD_FLAG_STRIKETHROUGH; break;
        case 'L':   parser_flags |= MD_FLAG_LATEXMATHSPANS; break;
        case 'K':   parser_flags |= MD_FLAG_WIKILINKS; break;
        case 'X':   parser_flags |= MD_FLAG_TASKLISTS; break;
        case '_':   parser_flags |= MD_FLAG_UNDERLINE; break;
        default:
            fprintf(stderr, "Illegal option: %s\n", value);
            fprintf(stderr, "Use --help for more info.\n");
            exit(1);
            break;
    }
    return 0;
 }
 int
 main(int argc, char** argv)
 {
    FILE* in = stdin;
    FILE* out = stdout;
    int ret = 0;
    if(readoptions(cmdline_options, argc, argv, cmdline_callback, NULL) < 0) {
        usage();
        exit(1);
    }
    if(input_path != NULL && strcmp(input_path, "-") != 0) {
        in = fopen(input_path, "rb");
        if(in == NULL) {
            fprintf(stderr, "Cannot open %s.\n", input_path);
            exit(1);
        }
    }
    if(output_path != NULL && strcmp(output_path, "-") != 0) {
        out = fopen(output_path, "wt");
        if(out == NULL) {
            fprintf(stderr, "Cannot open %s.\n", output_path);
            exit(1);
        }
    }
    ret = process_file(in, out);
    if(in != stdin)
        fclose(in);
    if(out != stdout)
        fclose(out);
    return ret;
 }
--- a/md2html/render_html.c
+++ b/md2html/render_html.c
@ -0,0 +1,561 @@
 /*
 * MD4C: Markdown parser for C
 * (http://github.com/mity/md4c)
 *
 * Copyright (c) 2016-2019 Martin Mitas
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 * IN THE SOFTWARE.
 */
 #include <stdio.h>
 #include <string.h>
 #include "render_html.h"
 #include "entity.h"
 #if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199409L
    /* C89/90 or old compilers in general may not understand "inline". */
    #if defined __GNUC__
        #define inline __inline__
    #elif defined _MSC_VER
        #define inline __inline
    #else
        #define inline
    #endif
 #endif
 #ifdef _WIN32
    #define snprintf _snprintf
 #endif
 typedef struct MD_RENDER_HTML_tag MD_RENDER_HTML;
 struct MD_RENDER_HTML_tag {
    void (*process_output)(const MD_CHAR*, MD_SIZE, void*);
    void* userdata;
    unsigned flags;
    int image_nesting_level;
    char escape_map[256];
 };
 #define NEED_HTML_ESC_FLAG   0x1
 #define NEED_URL_ESC_FLAG    0x2
 /*****************************************
 ***  HTML rendering helper functions  ***
 *****************************************/
 #define ISDIGIT(ch)     ('0' <= (ch) && (ch) <= '9')
 #define ISLOWER(ch)     ('a' <= (ch) && (ch) <= 'z')
 #define ISUPPER(ch)     ('A' <= (ch) && (ch) <= 'Z')
 #define ISALNUM(ch)     (ISLOWER(ch) || ISUPPER(ch) || ISDIGIT(ch))
 static inline void
 render_verbatim(MD_RENDER_HTML* r, const MD_CHAR* text, MD_SIZE size)
 {
    r->process_output(text, size, r->userdata);
 }
 /* Keep this as a macro. Most compiler should then be smart enough to replace
 * the strlen() call with a compile-time constant if the string is a C literal. */
 #define RENDER_VERBATIM(r, verbatim)                                    \
        render_verbatim((r), (verbatim), (MD_SIZE) (strlen(verbatim)))
 static void
 render_html_escaped(MD_RENDER_HTML* r, const MD_CHAR* data, MD_SIZE size)
 {
    MD_OFFSET beg = 0;
    MD_OFFSET off = 0;
    /* Some characters need to be escaped in normal HTML text. */
    #define NEED_HTML_ESC(ch)   (r->escape_map[(unsigned char)(ch)] & NEED_HTML_ESC_FLAG)
    while(1) {
        /* Optimization: Use some loop unrolling. */
        while(off + 3 < size  &&  !NEED_HTML_ESC(data[off+0])  &&  !NEED_HTML_ESC(data[off+1])
                              &&  !NEED_HTML_ESC(data[off+2])  &&  !NEED_HTML_ESC(data[off+3]))
            off += 4;
        while(off < size  &&  !NEED_HTML_ESC(data[off]))
            off++;
        if(off > beg)
            render_verbatim(r, data + beg, off - beg);
        if(off < size) {
            switch(data[off]) {
                case '&':   RENDER_VERBATIM(r, "&amp;"); break;
                case '<':   RENDER_VERBATIM(r, "&lt;"); break;
                case '>':   RENDER_VERBATIM(r, "&gt;"); break;
                case '"':   RENDER_VERBATIM(r, "&quot;"); break;
            }
            off++;
        } else {
            break;
        }
        beg = off;
    }
 }
 static void
 render_url_escaped(MD_RENDER_HTML* r, const MD_CHAR* data, MD_SIZE size)
 {
    static const MD_CHAR hex_chars[] = "0123456789ABCDEF";
    MD_OFFSET beg = 0;
    MD_OFFSET off = 0;
    /* Some characters need to be escaped in URL attributes. */
    #define NEED_URL_ESC(ch)    (r->escape_map[(unsigned char)(ch)] & NEED_URL_ESC_FLAG)
    while(1) {
        while(off < size  &&  !NEED_URL_ESC(data[off]))
            off++;
        if(off > beg)
            render_verbatim(r, data + beg, off - beg);
        if(off < size) {
            char hex[3];
            switch(data[off]) {
                case '&':   RENDER_VERBATIM(r, "&amp;"); break;
                default:
                    hex[0] = '%';
                    hex[1] = hex_chars[((unsigned)data[off] >> 4) & 0xf];
                    hex[2] = hex_chars[((unsigned)data[off] >> 0) & 0xf];
                    render_verbatim(r, hex, 3);
                    break;
            }
            off++;
        } else {
            break;
        }
        beg = off;
    }
 }
 static unsigned
 hex_val(char ch)
 {
    if('0' <= ch && ch <= '9')
        return ch - '0';
    if('A' <= ch && ch <= 'Z')
        return ch - 'A' + 10;
    else
        return ch - 'a' + 10;
 }
 static void
 render_utf8_codepoint(MD_RENDER_HTML* r, unsigned codepoint,
                      void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
 {
    static const MD_CHAR utf8_replacement_char[] = { 0xef, 0xbf, 0xbd };
    unsigned char utf8[4];
    size_t n;
    if(codepoint <= 0x7f) {
        n = 1;
        utf8[0] = codepoint;
    } else if(codepoint <= 0x7ff) {
        n = 2;
        utf8[0] = 0xc0 | ((codepoint >>  6) & 0x1f);
        utf8[1] = 0x80 + ((codepoint >>  0) & 0x3f);
    } else if(codepoint <= 0xffff) {
        n = 3;
        utf8[0] = 0xe0 | ((codepoint >> 12) & 0xf);
        utf8[1] = 0x80 + ((codepoint >>  6) & 0x3f);
        utf8[2] = 0x80 + ((codepoint >>  0) & 0x3f);
    } else {
        n = 4;
        utf8[0] = 0xf0 | ((codepoint >> 18) & 0x7);
        utf8[1] = 0x80 + ((codepoint >> 12) & 0x3f);
        utf8[2] = 0x80 + ((codepoint >>  6) & 0x3f);
        utf8[3] = 0x80 + ((codepoint >>  0) & 0x3f);
    }
    if(0 < codepoint  &&  codepoint <= 0x10ffff)
        fn_append(r, (char*)utf8, n);
    else
        fn_append(r, utf8_replacement_char, 3);
 }
 /* Translate entity to its UTF-8 equivalent, or output the verbatim one
 * if such entity is unknown (or if the translation is disabled). */
 static void
 render_entity(MD_RENDER_HTML* r, const MD_CHAR* text, MD_SIZE size,
              void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
 {
    if(r->flags & MD_RENDER_FLAG_VERBATIM_ENTITIES) {
        fn_append(r, text, size);
        return;
    }
    /* We assume UTF-8 output is what is desired. */
    if(size > 3 && text[1] == '#') {
        unsigned codepoint = 0;
        if(text[2] == 'x' || text[2] == 'X') {
            /* Hexadecimal entity (e.g. "&#x1234abcd;")). */
            MD_SIZE i;
            for(i = 3; i < size-1; i++)
                codepoint = 16 * codepoint + hex_val(text[i]);
        } else {
            /* Decimal entity (e.g. "&1234;") */
            MD_SIZE i;
            for(i = 2; i < size-1; i++)
                codepoint = 10 * codepoint + (text[i] - '0');
        }
        render_utf8_codepoint(r, codepoint, fn_append);
        return;
    } else {
        /* Named entity (e.g. "&nbsp;"). */
        const struct entity* ent;
        ent = entity_lookup(text, size);
        if(ent != NULL) {
            render_utf8_codepoint(r, ent->codepoints[0], fn_append);
            if(ent->codepoints[1])
                render_utf8_codepoint(r, ent->codepoints[1], fn_append);
            return;
        }
    }
    fn_append(r, text, size);
 }
 static void
 render_attribute(MD_RENDER_HTML* r, const MD_ATTRIBUTE* attr,
                 void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
 {
    int i;
    for(i = 0; attr->substr_offsets[i] < attr->size; i++) {
        MD_TEXTTYPE type = attr->substr_types[i];
        MD_OFFSET off = attr->substr_offsets[i];
        MD_SIZE size = attr->substr_offsets[i+1] - off;
        const MD_CHAR* text = attr->text + off;
        switch(type) {
            case MD_TEXT_NULLCHAR:  render_utf8_codepoint(r, 0x0000, render_verbatim); break;
            case MD_TEXT_ENTITY:    render_entity(r, text, size, fn_append); break;
            default:                fn_append(r, text, size); break;
        }
    }
 }
 static void
 render_open_ol_block(MD_RENDER_HTML* r, const MD_BLOCK_OL_DETAIL* det)
 {
    char buf[64];
    if(det->start == 1) {
        RENDER_VERBATIM(r, "<ol>\n");
        return;
    }
    snprintf(buf, sizeof(buf), "<ol start=\"%u\">\n", det->start);
    RENDER_VERBATIM(r, buf);
 }
 static void
 render_open_li_block(MD_RENDER_HTML* r, const MD_BLOCK_LI_DETAIL* det)
 {
    if(det->is_task) {
        RENDER_VERBATIM(r, "<li class=\"task-list-item\">"
                          "<input type=\"checkbox\" class=\"task-list-item-checkbox\" disabled");
        if(det->task_mark == 'x' || det->task_mark == 'X')
            RENDER_VERBATIM(r, " checked");
        RENDER_VERBATIM(r, ">");
    } else {
        RENDER_VERBATIM(r, "<li>");
    }
 }
 static void
 render_open_code_block(MD_RENDER_HTML* r, const MD_BLOCK_CODE_DETAIL* det)
 {
    RENDER_VERBATIM(r, "<pre><code");
    /* If known, output the HTML 5 attribute class="language-LANGNAME". */
    if(det->lang.text != NULL) {
        RENDER_VERBATIM(r, " class=\"language-");
        render_attribute(r, &det->lang, render_html_escaped);
        RENDER_VERBATIM(r, "\"");
    }
    RENDER_VERBATIM(r, ">");
 }
 static void
 render_open_td_block(MD_RENDER_HTML* r, const MD_CHAR* cell_type, const MD_BLOCK_TD_DETAIL* det)
 {
    RENDER_VERBATIM(r, "<");
    RENDER_VERBATIM(r, cell_type);
    switch(det->align) {
        case MD_ALIGN_LEFT:     RENDER_VERBATIM(r, " align=\"left\">"); break;
        case MD_ALIGN_CENTER:   RENDER_VERBATIM(r, " align=\"center\">"); break;
        case MD_ALIGN_RIGHT:    RENDER_VERBATIM(r, " align=\"right\">"); break;
        default:                RENDER_VERBATIM(r, ">"); break;
    }
 }
 static void
 render_open_a_span(MD_RENDER_HTML* r, const MD_SPAN_A_DETAIL* det)
 {
    RENDER_VERBATIM(r, "<a href=\"");
    render_attribute(r, &det->href, render_url_escaped);
    if(det->title.text != NULL) {
        RENDER_VERBATIM(r, "\" title=\"");
        render_attribute(r, &det->title, render_html_escaped);
    }
    RENDER_VERBATIM(r, "\">");
 }
 static void
 render_open_img_span(MD_RENDER_HTML* r, const MD_SPAN_IMG_DETAIL* det)
 {
    RENDER_VERBATIM(r, "<img src=\"");
    render_attribute(r, &det->src, render_url_escaped);
    RENDER_VERBATIM(r, "\" alt=\"");
    r->image_nesting_level++;
 }
 static void
 render_close_img_span(MD_RENDER_HTML* r, const MD_SPAN_IMG_DETAIL* det)
 {
    if(det->title.text != NULL) {
        RENDER_VERBATIM(r, "\" title=\"");
        render_attribute(r, &det->title, render_html_escaped);
    }
    RENDER_VERBATIM(r, "\">");
    r->image_nesting_level--;
 }
 static void
 render_open_wikilink_span(MD_RENDER_HTML* r, const MD_SPAN_WIKILINK_DETAIL* det)
 {
    RENDER_VERBATIM(r, "<x-wikilink data-target=\"");
    render_attribute(r, &det->target, render_html_escaped);
    RENDER_VERBATIM(r, "\">");
 }
 /**************************************
 ***  HTML renderer implementation  ***
 **************************************/
 static int
 enter_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
 {
    static const MD_CHAR* head[6] = { "<h1>", "<h2>", "<h3>", "<h4>", "<h5>", "<h6>" };
    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
    switch(type) {
        case MD_BLOCK_DOC:      /* noop */ break;
        case MD_BLOCK_QUOTE:    RENDER_VERBATIM(r, "<blockquote>\n"); break;
        case MD_BLOCK_UL:       RENDER_VERBATIM(r, "<ul>\n"); break;
        case MD_BLOCK_OL:       render_open_ol_block(r, (const MD_BLOCK_OL_DETAIL*)detail); break;
        case MD_BLOCK_LI:       render_open_li_block(r, (const MD_BLOCK_LI_DETAIL*)detail); break;
        case MD_BLOCK_HR:       RENDER_VERBATIM(r, "<hr>\n"); break;
        case MD_BLOCK_H:        RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break;
        case MD_BLOCK_CODE:     render_open_code_block(r, (const MD_BLOCK_CODE_DETAIL*) detail); break;
        case MD_BLOCK_HTML:     /* noop */ break;
        case MD_BLOCK_P:        RENDER_VERBATIM(r, "<p>"); break;
        case MD_BLOCK_TABLE:    RENDER_VERBATIM(r, "<table>\n"); break;
        case MD_BLOCK_THEAD:    RENDER_VERBATIM(r, "<thead>\n"); break;
        case MD_BLOCK_TBODY:    RENDER_VERBATIM(r, "<tbody>\n"); break;
        case MD_BLOCK_TR:       RENDER_VERBATIM(r, "<tr>\n"); break;
        case MD_BLOCK_TH:       render_open_td_block(r, "th", (MD_BLOCK_TD_DETAIL*)detail); break;
        case MD_BLOCK_TD:       render_open_td_block(r, "td", (MD_BLOCK_TD_DETAIL*)detail); break;
    }
    return 0;
 }
 static int
 leave_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
 {
    static const MD_CHAR* head[6] = { "</h1>\n", "</h2>\n", "</h3>\n", "</h4>\n", "</h5>\n", "</h6>\n" };
    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
    switch(type) {
        case MD_BLOCK_DOC:      /*noop*/ break;
        case MD_BLOCK_QUOTE:    RENDER_VERBATIM(r, "</blockquote>\n"); break;
        case MD_BLOCK_UL:       RENDER_VERBATIM(r, "</ul>\n"); break;
        case MD_BLOCK_OL:       RENDER_VERBATIM(r, "</ol>\n"); break;
        case MD_BLOCK_LI:       RENDER_VERBATIM(r, "</li>\n"); break;
        case MD_BLOCK_HR:       /*noop*/ break;
        case MD_BLOCK_H:        RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break;
        case MD_BLOCK_CODE:     RENDER_VERBATIM(r, "</code></pre>\n"); break;
        case MD_BLOCK_HTML:     /* noop */ break;
        case MD_BLOCK_P:        RENDER_VERBATIM(r, "</p>\n"); break;
        case MD_BLOCK_TABLE:    RENDER_VERBATIM(r, "</table>\n"); break;
        case MD_BLOCK_THEAD:    RENDER_VERBATIM(r, "</thead>\n"); break;
        case MD_BLOCK_TBODY:    RENDER_VERBATIM(r, "</tbody>\n"); break;
        case MD_BLOCK_TR:       RENDER_VERBATIM(r, "</tr>\n"); break;
        case MD_BLOCK_TH:       RENDER_VERBATIM(r, "</th>\n"); break;
        case MD_BLOCK_TD:       RENDER_VERBATIM(r, "</td>\n"); break;
    }
    return 0;
 }
 static int
 enter_span_callback(MD_SPANTYPE type, void* detail, void* userdata)
 {
    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
    if(r->image_nesting_level > 0) {
        /* We are inside a Markdown image label. Markdown allows to use any
         * emphasis and other rich contents in that context similarly as in
         * any link label.
         *
         * However, unlike in the case of links (where that contents becomes
         * contents of the <a>...</a> tag), in the case of images the contents
         * is supposed to fall into the attribute alt: <img alt="...">.
         *
         * In that context we naturally cannot output nested HTML tags. So lets
         * suppress them and only output the plain text (i.e. what falls into
         * text() callback).
         *
         * This make-it-a-plain-text approach is the recommended practice by
         * CommonMark specification (for HTML output).
         */
        return 0;
    }
    switch(type) {
        case MD_SPAN_EM:                RENDER_VERBATIM(r, "<em>"); break;
        case MD_SPAN_STRONG:            RENDER_VERBATIM(r, "<strong>"); break;
        case MD_SPAN_U:                 RENDER_VERBATIM(r, "<u>"); break;
        case MD_SPAN_A:                 render_open_a_span(r, (MD_SPAN_A_DETAIL*) detail); break;
        case MD_SPAN_IMG:               render_open_img_span(r, (MD_SPAN_IMG_DETAIL*) detail); break;
        case MD_SPAN_CODE:              RENDER_VERBATIM(r, "<code>"); break;
        case MD_SPAN_DEL:               RENDER_VERBATIM(r, "<del>"); break;
        case MD_SPAN_LATEXMATH:         RENDER_VERBATIM(r, "<x-equation>"); break;
        case MD_SPAN_LATEXMATH_DISPLAY: RENDER_VERBATIM(r, "<x-equation type=\"display\">"); break;
        case MD_SPAN_WIKILINK:          render_open_wikilink_span(r, (MD_SPAN_WIKILINK_DETAIL*) detail); break;
    }
    return 0;
 }
 static int
 leave_span_callback(MD_SPANTYPE type, void* detail, void* userdata)
 {
    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
    if(r->image_nesting_level > 0) {
        /* Ditto as in enter_span_callback(), except we have to allow the
         * end of the <img> tag. */
        if(r->image_nesting_level == 1  &&  type == MD_SPAN_IMG)
            render_close_img_span(r, (MD_SPAN_IMG_DETAIL*) detail);
        return 0;
    }
    switch(type) {
        case MD_SPAN_EM:                RENDER_VERBATIM(r, "</em>"); break;
        case MD_SPAN_STRONG:            RENDER_VERBATIM(r, "</strong>"); break;
        case MD_SPAN_U:                 RENDER_VERBATIM(r, "</u>"); break;
        case MD_SPAN_A:                 RENDER_VERBATIM(r, "</a>"); break;
        case MD_SPAN_IMG:               /*noop, handled above*/ break;
        case MD_SPAN_CODE:              RENDER_VERBATIM(r, "</code>"); break;
        case MD_SPAN_DEL:               RENDER_VERBATIM(r, "</del>"); break;
        case MD_SPAN_LATEXMATH:         /*fall through*/
        case MD_SPAN_LATEXMATH_DISPLAY: RENDER_VERBATIM(r, "</x-equation>"); break;
        case MD_SPAN_WIKILINK:          RENDER_VERBATIM(r, "</x-wikilink>"); break;
    }
    return 0;
 }
 static int
 text_callback(MD_TEXTTYPE type, const MD_CHAR* text, MD_SIZE size, void* userdata)
 {
    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
    switch(type) {
        case MD_TEXT_NULLCHAR:  render_utf8_codepoint(r, 0x0000, render_verbatim); break;
        case MD_TEXT_BR:        RENDER_VERBATIM(r, (r->image_nesting_level == 0 ? "<br>\n" : " ")); break;
        case MD_TEXT_SOFTBR:    RENDER_VERBATIM(r, (r->image_nesting_level == 0 ? "\n" : " ")); break;
        case MD_TEXT_HTML:      render_verbatim(r, text, size); break;
        case MD_TEXT_ENTITY:    render_entity(r, text, size, render_html_escaped); break;
        default:                render_html_escaped(r, text, size); break;
    }
    return 0;
 }
 static void
 debug_log_callback(const char* msg, void* userdata)
 {
    MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
    if(r->flags & MD_RENDER_FLAG_DEBUG)
        fprintf(stderr, "MD4C: %s\n", msg);
 }
 int
 md_render_html(const MD_CHAR* input, MD_SIZE input_size,
               void (*process_output)(const MD_CHAR*, MD_SIZE, void*),
               void* userdata, unsigned parser_flags, unsigned renderer_flags)
 {
    MD_RENDER_HTML render = { process_output, userdata, renderer_flags, 0, { 0 } };
    int i;
    MD_PARSER parser = {
        0,
        parser_flags,
        enter_block_callback,
        leave_block_callback,
        enter_span_callback,
        leave_span_callback,
        text_callback,
        debug_log_callback,
        NULL
    };
    /* Build map of characters which need escaping. */
    for(i = 0; i < 256; i++) {
        unsigned char ch = (unsigned char) i;
        if(strchr("\"&<>", ch) != NULL)
            render.escape_map[i] |= NEED_HTML_ESC_FLAG;
        if(!ISALNUM(ch)  &&  strchr("-_.+!*(),%#@?=;:/,+$", ch) == NULL)
            render.escape_map[i] |= NEED_URL_ESC_FLAG;
    }
    return md_parse(input, input_size, &parser, (void*) &render);
 }
--- a/md2html/render_html.h
+++ b/md2html/render_html.h
@ -0,0 +1,66 @@
 /*
 * MD4C: Markdown parser for C
 * (http://github.com/mity/md4c)
 *
 * Copyright (c) 2016-2017 Martin Mitas
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 * IN THE SOFTWARE.
 */
 #ifndef MD4C_RENDER_HTML_H
 #define MD4C_RENDER_HTML_H
 #include "md4c.h"
 #ifdef __cplusplus
    extern "C" {
 #endif
 /* If set, debug output from md_parse() is sent to stderr. */
 #define MD_RENDER_FLAG_DEBUG                0x0001
 #define MD_RENDER_FLAG_VERBATIM_ENTITIES    0x0002
 /* Render Markdown into HTML.
 *
 * Note only contents of <body> tag is generated. Caller must generate
 * HTML header/footer manually before/after calling md_render_html().
 *
 * Params input and input_size specify the Markdown input.
 * Callback process_output() gets called with chunks of HTML output.
 * (Typical implementation may just output the bytes to file or append to
 * some buffer).
 * Param userdata is just propgated back to process_output() callback.
 * Param parser_flags are flags from md4c.h propagated to md_parse().
 * Param render_flags is bitmask of MD_RENDER_FLAG_xxxx.
 *
 * Returns -1 on error (if md_parse() fails.)
 * Returns 0 on success.
 */
 int md_render_html(const MD_CHAR* input, MD_SIZE input_size,
                   void (*process_output)(const MD_CHAR*, MD_SIZE, void*),
                   void* userdata, unsigned parser_flags, unsigned renderer_flags);
 #ifdef __cplusplus
    }  /* extern "C" { */
 #endif
 #endif  /* MD4C_RENDER_HTML_H */
--- a/md4c/CMakeLists.txt
+++ b/md4c/CMakeLists.txt
@ -0,0 +1,32 @@
 # Be sure to export all symbols in Windows.
 set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS 1)
 set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -DDEBUG")
 set(md4c_src
    md4c.c
 )
 add_library(md4c ${md4c_src})
 set_target_properties(md4c PROPERTIES
    VERSION ${MD_VERSION}
    SOVERSION ${MD_VERSION_MAJOR}
    PUBLIC_HEADER md4c.h
 )
 install(
    TARGETS md4c
    EXPORT md4cConfig
    ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
    LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
    RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
    PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
 )
 # Create a pkg-config file
 configure_file(md4c.pc.in md4c.pc @ONLY)
 install(FILES ${CMAKE_BINARY_DIR}/md4c/md4c.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig)
 # And a CMake file
 install(EXPORT md4cConfig DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/md4c/)
--- a/md4c/md4c.c
+++ b/md4c/md4c.c
--- a/md4c/md4c.h
+++ b/md4c/md4c.h
@ -0,0 +1,388 @@
 /*
 * MD4C: Markdown parser for C
 * (http://github.com/mity/md4c)
 *
 * Copyright (c) 2016-2020 Martin Mitas
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 * IN THE SOFTWARE.
 */
 #ifndef MD4C_MARKDOWN_H
 #define MD4C_MARKDOWN_H
 #ifdef __cplusplus
    extern "C" {
 #endif
 #if defined MD4C_USE_UTF16
    /* Magic to support UTF-16. Not that in order to use it, you have to define
     * the macro MD4C_USE_UTF16 both when building MD4C as well as when
     * including this header in your code. */
    #ifdef _WIN32
        #include <windows.h>
        typedef WCHAR       MD_CHAR;
    #else
        #error MD4C_USE_UTF16 is only supported on Windows.
    #endif
 #else
    typedef char            MD_CHAR;
 #endif
 typedef unsigned MD_SIZE;
 typedef unsigned MD_OFFSET;
 /* Block represents a part of document hierarchy structure like a paragraph
 * or list item.
 */
 typedef enum MD_BLOCKTYPE {
    /* <body>...</body> */
    MD_BLOCK_DOC = 0,
    /* <blockquote>...</blockquote> */
    MD_BLOCK_QUOTE,
    /* <ul>...</ul>
     * Detail: Structure MD_BLOCK_UL_DETAIL. */
    MD_BLOCK_UL,
    /* <ol>...</ol>
     * Detail: Structure MD_BLOCK_OL_DETAIL. */
    MD_BLOCK_OL,
    /* <li>...</li>
     * Detail: Structure MD_BLOCK_LI_DETAIL. */
    MD_BLOCK_LI,
    /* <hr> */
    MD_BLOCK_HR,
    /* <h1>...</h1> (for levels up to 6)
     * Detail: Structure MD_BLOCK_H_DETAIL. */
    MD_BLOCK_H,
    /* <pre><code>...</code></pre>
     * Note the text lines within code blocks are terminated with '\n'
     * instead of explicit MD_TEXT_BR. */
    MD_BLOCK_CODE,
    /* Raw HTML block. This itself does not correspond to any particular HTML
     * tag. The contents of it _is_ raw HTML source intended to be put
     * in verbatim form to the HTML output. */
    MD_BLOCK_HTML,
    /* <p>...</p> */
    MD_BLOCK_P,
    /* <table>...</table> and its contents.
     * Detail: Structure MD_BLOCK_TD_DETAIL (used with MD_BLOCK_TH and MD_BLOCK_TD)
     * Note all of these are used only if extension MD_FLAG_TABLES is enabled. */
    MD_BLOCK_TABLE,
    MD_BLOCK_THEAD,
    MD_BLOCK_TBODY,
    MD_BLOCK_TR,
    MD_BLOCK_TH,
    MD_BLOCK_TD
 } MD_BLOCKTYPE;
 /* Span represents an in-line piece of a document which should be rendered with
 * the same font, color and other attributes. A sequence of spans forms a block
 * like paragraph or list item. */
 typedef enum MD_SPANTYPE {
    /* <em>...</em> */
    MD_SPAN_EM,
    /* <strong>...</strong> */
    MD_SPAN_STRONG,
    /* <a href="xxx">...</a>
     * Detail: Structure MD_SPAN_A_DETAIL. */
    MD_SPAN_A,
    /* <img src="xxx">...</a>
     * Detail: Structure MD_SPAN_IMG_DETAIL.
     * Note: Image text can contain nested spans and even nested images.
     * If rendered into ALT attribute of HTML <IMG> tag, it's responsibility
     * of the renderer to deal with it.
     */
    MD_SPAN_IMG,
    /* <code>...</code> */
    MD_SPAN_CODE,
    /* <del>...</del>
     * Note: Recognized only when MD_FLAG_STRIKETHROUGH is enabled.
     */
    MD_SPAN_DEL,
    /* For recognizing inline ($) and display ($$) equations
     * Note: Recognized only when MD_FLAG_LATEXMATHSPANS is enabled.
     */
    MD_SPAN_LATEXMATH,
    MD_SPAN_LATEXMATH_DISPLAY,
    /* Wiki links
     * Note: Recognized only when MD_FLAG_WIKILINKS is enabled.
     */
    MD_SPAN_WIKILINK,
    /* <u>...</u>
     * Note: Recognized only when MD_FLAG_UNDERLINE is enabled. */
    MD_SPAN_U
 } MD_SPANTYPE;
 /* Text is the actual textual contents of span. */
 typedef enum MD_TEXTTYPE {
    /* Normal text. */
    MD_TEXT_NORMAL = 0,
    /* NULL character. CommonMark requires replacing NULL character with
     * the replacement char U+FFFD, so this allows caller to do that easily. */
    MD_TEXT_NULLCHAR,
    /* Line breaks.
     * Note these are not sent from blocks with verbatim output (MD_BLOCK_CODE
     * or MD_BLOCK_HTML). In such cases, '\n' is part of the text itself. */
    MD_TEXT_BR,         /* <br> (hard break) */
    MD_TEXT_SOFTBR,     /* '\n' in source text where it is not semantically meaningful (soft break) */
    /* Entity.
     * (a) Named entity, e.g. &nbsp; 
     *     (Note MD4C does not have a list of known entities.
     *     Anything matching the regexp /&[A-Za-z][A-Za-z0-9]{1,47};/ is
     *     treated as a named entity.)
     * (b) Numerical entity, e.g. &#1234;
     * (c) Hexadecimal entity, e.g. &#x12AB;
     *
     * As MD4C is mostly encoding agnostic, application gets the verbatim
     * entity text into the MD_RENDERER::text_callback(). */
    MD_TEXT_ENTITY,
    /* Text in a code block (inside MD_BLOCK_CODE) or inlined code (`code`).
     * If it is inside MD_BLOCK_CODE, it includes spaces for indentation and
     * '\n' for new lines. MD_TEXT_BR and MD_TEXT_SOFTBR are not sent for this
     * kind of text. */
    MD_TEXT_CODE,
    /* Text is a raw HTML. If it is contents of a raw HTML block (i.e. not
     * an inline raw HTML), then MD_TEXT_BR and MD_TEXT_SOFTBR are not used.
     * The text contains verbatim '\n' for the new lines. */
    MD_TEXT_HTML,
    /* Text is inside an equation. This is processed the same way as inlined code
     * spans (`code`). */
    MD_TEXT_LATEXMATH
 } MD_TEXTTYPE;
 /* Alignment enumeration. */
 typedef enum MD_ALIGN {
    MD_ALIGN_DEFAULT = 0,   /* When unspecified. */
    MD_ALIGN_LEFT,
    MD_ALIGN_CENTER,
    MD_ALIGN_RIGHT
 } MD_ALIGN;
 /* String attribute.
 *
 * This wraps strings which are outside of a normal text flow and which are
 * propagated within various detailed structures, but which still may contain
 * string portions of different types like e.g. entities.
 *
 * So, for example, lets consider an image has a title attribute string
 * set to "foo &quot; bar". (Note the string size is 14.)
 *
 * Then the attribute MD_SPAN_IMG_DETAIL::title shall provide the following:
 *  -- [0]: "foo "   (substr_types[0] == MD_TEXT_NORMAL; substr_offsets[0] == 0)
 *  -- [1]: "&quot;" (substr_types[1] == MD_TEXT_ENTITY; substr_offsets[1] == 4)
 *  -- [2]: " bar"   (substr_types[2] == MD_TEXT_NORMAL; substr_offsets[2] == 10)
 *  -- [3]: (n/a)    (n/a                              ; substr_offsets[3] == 14)
 *
 * Note that these conditions are guaranteed:
 *  -- substr_offsets[0] == 0
 *  -- substr_offsets[LAST+1] == size
 *  -- Only MD_TEXT_NORMAL, MD_TEXT_ENTITY, MD_TEXT_NULLCHAR substrings can appear.
 */
 typedef struct MD_ATTRIBUTE {
    const MD_CHAR* text;
    MD_SIZE size;
    const MD_TEXTTYPE* substr_types;
    const MD_OFFSET* substr_offsets;
 } MD_ATTRIBUTE;
 /* Detailed info for MD_BLOCK_UL. */
 typedef struct MD_BLOCK_UL_DETAIL {
    int is_tight;           /* Non-zero if tight list, zero if loose. */
    MD_CHAR mark;           /* Item bullet character in MarkDown source of the list, e.g. '-', '+', '*'. */
 } MD_BLOCK_UL_DETAIL;
 /* Detailed info for MD_BLOCK_OL. */
 typedef struct MD_BLOCK_OL_DETAIL {
    unsigned start;         /* Start index of the ordered list. */
    int is_tight;           /* Non-zero if tight list, zero if loose. */
    MD_CHAR mark_delimiter; /* Character delimiting the item marks in MarkDown source, e.g. '.' or ')' */
 } MD_BLOCK_OL_DETAIL;
 /* Detailed info for MD_BLOCK_LI. */
 typedef struct MD_BLOCK_LI_DETAIL {
    int is_task;            /* Can be non-zero only with MD_FLAG_TASKLISTS */
    MD_CHAR task_mark;      /* If is_task, then one of 'x', 'X' or ' '. Undefined otherwise. */
    MD_OFFSET task_mark_offset;  /* If is_task, then offset in the input of the char between '[' and ']'. */
 } MD_BLOCK_LI_DETAIL;
 /* Detailed info for MD_BLOCK_H. */
 typedef struct MD_BLOCK_H_DETAIL {
    unsigned level;         /* Header level (1 - 6) */
 } MD_BLOCK_H_DETAIL;
 /* Detailed info for MD_BLOCK_CODE. */
 typedef struct MD_BLOCK_CODE_DETAIL {
    MD_ATTRIBUTE info;
    MD_ATTRIBUTE lang;
    MD_CHAR fence_char;     /* The character used for fenced code block; or zero for indented code block. */
 } MD_BLOCK_CODE_DETAIL;
 /* Detailed info for MD_BLOCK_TH and MD_BLOCK_TD. */
 typedef struct MD_BLOCK_TD_DETAIL {
    MD_ALIGN align;
 } MD_BLOCK_TD_DETAIL;
 /* Detailed info for MD_SPAN_A. */
 typedef struct MD_SPAN_A_DETAIL {
    MD_ATTRIBUTE href;
    MD_ATTRIBUTE title;
 } MD_SPAN_A_DETAIL;
 /* Detailed info for MD_SPAN_IMG. */
 typedef struct MD_SPAN_IMG_DETAIL {
    MD_ATTRIBUTE src;
    MD_ATTRIBUTE title;
 } MD_SPAN_IMG_DETAIL;
 /* Detailed info for MD_SPAN_WIKILINK. */
 typedef struct MD_SPAN_WIKILINK {
    MD_ATTRIBUTE target;
 } MD_SPAN_WIKILINK_DETAIL;
 /* Flags specifying extensions/deviations from CommonMark specification.
 *
 * By default (when MD_RENDERER::flags == 0), we follow CommonMark specification.
 * The following flags may allow some extensions or deviations from it.
 */
 #define MD_FLAG_COLLAPSEWHITESPACE          0x0001  /* In MD_TEXT_NORMAL, collapse non-trivial whitespace into single ' ' */
 #define MD_FLAG_PERMISSIVEATXHEADERS        0x0002  /* Do not require space in ATX headers ( ###header ) */
 #define MD_FLAG_PERMISSIVEURLAUTOLINKS      0x0004  /* Recognize URLs as autolinks even without '<', '>' */
 #define MD_FLAG_PERMISSIVEEMAILAUTOLINKS    0x0008  /* Recognize e-mails as autolinks even without '<', '>' and 'mailto:' */
 #define MD_FLAG_NOINDENTEDCODEBLOCKS        0x0010  /* Disable indented code blocks. (Only fenced code works.) */
 #define MD_FLAG_NOHTMLBLOCKS                0x0020  /* Disable raw HTML blocks. */
 #define MD_FLAG_NOHTMLSPANS                 0x0040  /* Disable raw HTML (inline). */
 #define MD_FLAG_TABLES                      0x0100  /* Enable tables extension. */
 #define MD_FLAG_STRIKETHROUGH               0x0200  /* Enable strikethrough extension. */
 #define MD_FLAG_PERMISSIVEWWWAUTOLINKS      0x0400  /* Enable WWW autolinks (even without any scheme prefix, if they begin with 'www.') */
 #define MD_FLAG_TASKLISTS                   0x0800  /* Enable task list extension. */
 #define MD_FLAG_LATEXMATHSPANS              0x1000  /* Enable $ and $$ containing LaTeX equations. */
 #define MD_FLAG_WIKILINKS                   0x2000  /* Enable wiki links extension. */
 #define MD_FLAG_UNDERLINE                   0x4000  /* Enable underline extension (and disables '_' for normal emphasis). */
 #define MD_FLAG_PERMISSIVEAUTOLINKS         (MD_FLAG_PERMISSIVEEMAILAUTOLINKS | MD_FLAG_PERMISSIVEURLAUTOLINKS | MD_FLAG_PERMISSIVEWWWAUTOLINKS)
 #define MD_FLAG_NOHTML                      (MD_FLAG_NOHTMLBLOCKS | MD_FLAG_NOHTMLSPANS)
 /* Convenient sets of flags corresponding to well-known Markdown dialects.
 *
 * Note we may only support subset of features of the referred dialect.
 * The constant just enables those extensions which bring us as close as
 * possible given what features we implement.
 *
 * ABI compatibility note: Meaning of these can change in time as new
 * extensions, bringing the dialect closer to the original, are implemented.
 */
 #define MD_DIALECT_COMMONMARK               0
 #define MD_DIALECT_GITHUB                   (MD_FLAG_PERMISSIVEAUTOLINKS | MD_FLAG_TABLES | MD_FLAG_STRIKETHROUGH | MD_FLAG_TASKLISTS)
 /* Renderer structure.
 */
 typedef struct MD_PARSER {
    /* Reserved. Set to zero.
     */
    unsigned abi_version;
    /* Dialect options. Bitmask of MD_FLAG_xxxx values.
     */
    unsigned flags;
    /* Caller-provided rendering callbacks.
     *
     * For some block/span types, more detailed information is provided in a
     * type-specific structure pointed by the argument 'detail'.
     *
     * The last argument of all callbacks, 'userdata', is just propagated from
     * md_parse() and is available for any use by the application.
     *
     * Note any strings provided to the callbacks as their arguments or as
     * members of any detail structure are generally not zero-terminated.
     * Application has take the respective size information into account.
     *
     * Callbacks may abort further parsing of the document by returning non-zero.
     */
    int (*enter_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*leave_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*enter_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*leave_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*text)(MD_TEXTTYPE /*type*/, const MD_CHAR* /*text*/, MD_SIZE /*size*/, void* /*userdata*/);
    /* Debug callback. Optional (may be NULL).
     *
     * If provided and something goes wrong, this function gets called.
     * This is intended for debugging and problem diagnosis for developers;
     * it is not intended to provide any errors suitable for displaying to an
     * end user.
     */
    void (*debug_log)(const char* /*msg*/, void* /*userdata*/);
    /* Reserved. Set to NULL.
     */
    void (*syntax)(void);
 } MD_PARSER;
 /* For backward compatibility. Do not use in new code. */
 typedef MD_PARSER MD_RENDERER;
 /* Parse the Markdown document stored in the string 'text' of size 'size'.
 * The renderer provides callbacks to be called during the parsing so the
 * caller can render the document on the screen or convert the Markdown
 * to another format.
 *
 * Zero is returned on success. If a runtime error occurs (e.g. a memory
 * fails), -1 is returned. If the processing is aborted due any callback
 * returning non-zero, md_parse() the return value of the callback is returned.
 */
 int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata);
 #ifdef __cplusplus
    }  /* extern "C" { */
 #endif
 #endif  /* MD4C_MARKDOWN_H */
--- a/md4c/md4c.pc.in
+++ b/md4c/md4c.pc.in
@ -0,0 +1,12 @@
 prefix=@CMAKE_INSTALL_PREFIX@
 exec_prefix=@CMAKE_INSTALL_PREFIX@
 libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@
 includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@
 Name: @PROJECT_NAME@
 Description: @PROJECT_DESCRIPTION@
 Version: @PROJECT_VERSION@
 Requires:
 Libs: -L${libdir} -lmd4c
 Cflags: -I${includedir}
--- a/scripts/build_folding_map.py
+++ b/scripts/build_folding_map.py
@ -0,0 +1,118 @@
 #!/usr/bin/env python3
 import os
 import sys
 import textwrap
 self_path = os.path.dirname(os.path.realpath(__file__));
 f = open(self_path + "/unicode/CaseFolding.txt", "r")
 status_list = [ "C", "F" ]
 folding_list = [ dict(), dict(), dict() ]
 # Filter the foldings for "full" folding.
 for line in f:
    comment_off = line.find("#")
    if comment_off >= 0:
        line = line[:comment_off]
    line = line.strip()
    if not line:
        continue
    raw_codepoint, status, raw_mapping, ignored_tail = line.split(";", 3)
    if not status.strip() in status_list:
        continue
    codepoint = int(raw_codepoint.strip(), 16)
    mapping = [int(it, 16) for it in raw_mapping.strip().split(" ")]
    mapping_len = len(mapping)
    if mapping_len in range(1, 4):
        folding_list[mapping_len-1][codepoint] = mapping
    else:
        assert(False)
 f.close()
 # If we assume that range (index0 ... index-1) makes a range, check that index
 # is compatible with it too.
 #
 # We are capable to handle ranges which:
 #
 # (1) either form consecutive sequence of codepoints and which map that range
 #     to other consecutive range of codepoints;
 #
 # (2) or consecutive range of codepoints with step 2 where each codepoint
 #     CP is mapped to the next codepoint CP+1
 #     (e.g. 0x1234 -> 0x1235; 0x1236 -> 0x1238; ...).
 #
 # (If the mappings have multiple codepoints, only the 1st mapped codepoint is
 # considered and all the other ones have to be the same for the whole range.)
 def is_range_compatible(folding, codepoint_list, index0, index):
    N = index - index0
    codepoint0 = codepoint_list[index0]
    codepoint1 = codepoint_list[index0+1]
    codepointN = codepoint_list[index]
    mapping0 = folding[codepoint0]
    mapping1 = folding[codepoint1]
    mappingN = folding[codepointN]
    # Check the range type (1):
    if codepoint1 - codepoint0 == 1 and codepointN - codepoint0 == N                \
            and mapping1[0] - mapping0[0] == 1 and mapping1[1:] == mapping0[1:]     \
            and mappingN[0] - mapping0[0] == N and mappingN[1:] == mapping0[1:]:
        return True
    # Check the range type (2):
    if codepoint1 - codepoint0 == 2 and codepointN - codepoint0 == 2 * N            \
            and mapping0[0] - codepoint0 == 1                                       \
            and mapping1[0] - codepoint1 == 1 and mapping1[1:] == mapping0[1:]      \
            and mappingN[0] - codepointN == 1 and mappingN[1:] == mapping0[1:]:
        return True
    return False
 def mapping_str(list, mapping):
    return ",".join("0x{:04x}".format(x) for x in mapping)
 for mapping_len in range(1, 4):
    folding = folding_list[mapping_len-1]
    codepoint_list = list(folding)
    index0 = 0
    count = len(folding)
    records = list()
    data_records = list()
    while index0 < count:
        index1 = index0 + 1
        while index1 < count and is_range_compatible(folding, codepoint_list, index0, index1):
            index1 += 1
        if index1 - index0 > 2:
            # Range of codepoints
            records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
            data_records.append(mapping_str(data_records, folding[codepoint_list[index0]]))
            data_records.append(mapping_str(data_records, folding[codepoint_list[index1-1]]))
        else:
            # Single codepoint
            records.append("S(0x{:04x})".format(codepoint_list[index0]))
            data_records.append(mapping_str(data_records, folding[codepoint_list[index0]]))
        index0 = index1
    sys.stdout.write("static const unsigned FOLD_MAP_{}[] = {{\n".format(mapping_len))
    sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
                        initial_indent = "    ", subsequent_indent="    ")))
    sys.stdout.write("\n};\n")
    sys.stdout.write("static const unsigned FOLD_MAP_{}_DATA[] = {{\n".format(mapping_len))
    sys.stdout.write("\n".join(textwrap.wrap(", ".join(data_records), 110,
                        initial_indent = "    ", subsequent_indent="    ")))
    sys.stdout.write("\n};\n")
--- a/scripts/build_punct_map.py
+++ b/scripts/build_punct_map.py
@ -0,0 +1,66 @@
 #!/usr/bin/env python3
 import os
 import sys
 import textwrap
 self_path = os.path.dirname(os.path.realpath(__file__));
 f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r")
 codepoint_list = []
 category_list = [ "Pc", "Pd", "Pe", "Pf", "Pi", "Po", "Ps" ]
 # Filter codepoints falling in the right category:
 for line in f:
    comment_off = line.find("#")
    if comment_off >= 0:
        line = line[:comment_off]
    line = line.strip()
    if not line:
        continue
    char_range, category = line.split(";")
    char_range = char_range.strip()
    category = category.strip()
    if not category in category_list:
        continue
    delim_off = char_range.find("..")
    if delim_off >= 0:
        codepoint0 = int(char_range[:delim_off], 16)
        codepoint1 = int(char_range[delim_off+2:], 16)
        for codepoint in range(codepoint0, codepoint1 + 1):
            codepoint_list.append(codepoint)
    else:
        codepoint = int(char_range, 16)
        codepoint_list.append(codepoint)
 f.close()
 codepoint_list.sort()
 index0 = 0
 count = len(codepoint_list)
 records = list()
 while index0 < count:
    index1 = index0 + 1
    while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1:
        index1 += 1
    if index1 - index0 > 1:
        # Range of codepoints
        records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
    else:
        # Single codepoint
        records.append("S(0x{:04x})".format(codepoint_list[index0]))
    index0 = index1
 sys.stdout.write("static const unsigned PUNCT_MAP[] = {\n")
 sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
                    initial_indent = "    ", subsequent_indent="    ")))
 sys.stdout.write("\n};\n\n")
--- a/scripts/build_whitespace_map.py
+++ b/scripts/build_whitespace_map.py
@ -0,0 +1,66 @@
 #!/usr/bin/env python3
 import os
 import sys
 import textwrap
 self_path = os.path.dirname(os.path.realpath(__file__));
 f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r")
 codepoint_list = []
 category_list = [ "Zs" ]
 # Filter codepoints falling in the right category:
 for line in f:
    comment_off = line.find("#")
    if comment_off >= 0:
        line = line[:comment_off]
    line = line.strip()
    if not line:
        continue
    char_range, category = line.split(";")
    char_range = char_range.strip()
    category = category.strip()
    if not category in category_list:
        continue
    delim_off = char_range.find("..")
    if delim_off >= 0:
        codepoint0 = int(char_range[:delim_off], 16)
        codepoint1 = int(char_range[delim_off+2:], 16)
        for codepoint in range(codepoint0, codepoint1 + 1):
            codepoint_list.append(codepoint)
    else:
        codepoint = int(char_range, 16)
        codepoint_list.append(codepoint)
 f.close()
 codepoint_list.sort()
 index0 = 0
 count = len(codepoint_list)
 records = list()
 while index0 < count:
    index1 = index0 + 1
    while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1:
        index1 += 1
    if index1 - index0 > 1:
        # Range of codepoints
        records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
    else:
        # Single codepoint
        records.append("S(0x{:04x})".format(codepoint_list[index0]))
    index0 = index1
 sys.stdout.write("static const unsigned WHITESPACE_MAP[] = {\n")
 sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
                    initial_indent = "    ", subsequent_indent="    ")))
 sys.stdout.write("\n};\n\n")
--- a/scripts/coverity.sh
+++ b/scripts/coverity.sh
@ -0,0 +1,70 @@
 #!/bin/sh
 #
 # This scripts attempts to build the project via cov-build utility, and prepare
 # a package for uploading to the coverity scan service.
 #
 # (See http://scan.coverity.com for more info.)
 set -e
 # Check presence of coverity static analyzer.
 if ! which cov-build; then
    echo "Utility cov-build not found in PATH."
    exit 1
 fi
 # Choose a build system (ninja or GNU make).
 if which ninja; then
    BUILD_TOOL=ninja
    GENERATOR=Ninja
 elif which make; then
    BUILD_TOOL=make
    GENERATOR="MSYS Makefiles"
 else
    echo "No suitable build system found."
    exit 1
 fi
 # Choose a zip tool.
 if which 7za; then
    MKZIP="7za a -r -mx9"
 elif which 7z; then
    MKZIP="7z a -r -mx9"
 elif which zip; then
    MKZIP="zip -r"
 else
    echo "No suitable zip utility found"
    exit 1
 fi
 # Change dir to project root.
 cd `dirname "$0"`/..
 CWD=`pwd`
 ROOT_DIR="$CWD"
 BUILD_DIR="$CWD/coverity"
 OUTPUT="$CWD/cov-int.zip"
 # Sanity checks.
 if [ ! -x "$ROOT_DIR/scripts/coverity.sh" ]; then
    echo "There is some path mismatch."
    exit 1
 fi
 if [ -e "$BUILD_DIR" ]; then
    echo "Path $BUILD_DIR already exists. Delete it and retry."
    exit 1
 fi
 if [ -e "$OUTPUT" ]; then
    echo "Path $OUTPUT already exists. Delete it and retry."
    exit 1
 fi
 # Build the project with the Coverity analyzes enabled.
 mkdir -p "$BUILD_DIR"
 cd "$BUILD_DIR"
 cmake -G "$GENERATOR" "$ROOT_DIR"
 cov-build --dir cov-int "$BUILD_TOOL"
 $MKZIP "$OUTPUT" "cov-int"
 cd "$ROOT_DIR"
 rm -rf "$BUILD_DIR"
--- a/scripts/run-tests.sh
+++ b/scripts/run-tests.sh
@ -0,0 +1,75 @@
 #!/bin/sh
 #
 # Run this script from build directory.
 #set -e
 SELF_DIR=`dirname $0`
 PROJECT_DIR="$SELF_DIR/.."
 TEST_DIR="$PROJECT_DIR/test"
 PROGRAM="md2html/md2html"
 if [ ! -x "$PROGRAM" ]; then
    echo "Cannot find the $PROGRAM." >&2
    echo "You have to run this script from the build directory." >&2
    exit 1
 fi
 if which py >>/dev/null 2>&1; then
    PYTHON=py
 elif which python3 >>/dev/null 2>&1; then
    PYTHON=python3
 elif which python >>/dev/null 2>&1; then
    if [ `python --version | awk '{print $2}' | cut -d. -f1` -ge 3 ]; then
        PYTHON=python
    fi
 fi
 echo
 echo "CommonMark specification:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/spec.txt" -p "$PROGRAM"
 echo
 echo "Code coverage & regressions:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/coverage.txt" -p "$PROGRAM"
 echo
 echo "Permissive e-mail autolinks extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-email-autolinks.txt" -p "$PROGRAM --fpermissive-email-autolinks"
 echo
 echo "Permissive URL autolinks extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-url-autolinks.txt" -p "$PROGRAM --fpermissive-url-autolinks"
 echo
 echo "WWW autolinks extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-www-autolinks.txt" -p "$PROGRAM --fpermissive-www-autolinks"
 echo
 echo "Tables extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/tables.txt" -p "$PROGRAM --ftables"
 echo
 echo "Strikethrough extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/strikethrough.txt" -p "$PROGRAM --fstrikethrough"
 echo
 echo "Task lists extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/tasklists.txt" -p "$PROGRAM --ftasklists"
 echo
 echo "LaTeX extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/latex-math.txt" -p "$PROGRAM --flatex-math"
 echo
 echo "Wiki links extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/wiki-links.txt" -p "$PROGRAM --fwiki-links --ftables"
 echo
 echo "Underline extension:"
 $PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/underline.txt" -p "$PROGRAM --funderline"
 echo
 echo "Pathological input:"
 $PYTHON "$TEST_DIR/pathological_tests.py" -p "$PROGRAM"
--- a/scripts/unicode/CaseFolding.txt
+++ b/scripts/unicode/CaseFolding.txt
--- a/scripts/unicode/DerivedGeneralCategory.txt
+++ b/scripts/unicode/DerivedGeneralCategory.txt
--- a/test/LICENSE
+++ b/test/LICENSE
@ -0,0 +1,64 @@
 The CommonMark spec (spec.txt) and DTD (CommonMark.dtd) are
 Copyright (C) 2014-16 John MacFarlane
 Released under the Creative Commons CC-BY-SA 4.0 license:
 <http://creativecommons.org/licenses/by-sa/4.0/>.
 ---
 The test software in test/ and the programs in tools/ are
 Copyright (c) 2014, John MacFarlane
 All rights reserved.
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above
      copyright notice, this list of conditions and the following
      disclaimer in the documentation and/or other materials provided
      with the distribution.
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 ---
 The normalization code in runtests.py was derived from the
 markdowntest project, Copyright 2013 Karl Dubost:
 The MIT License (MIT)
 Copyright (c) 2013 Karl Dubost
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in
 all copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
 LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--- a/test/cmark.py
+++ b/test/cmark.py
@ -0,0 +1,40 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 from ctypes import CDLL, c_char_p, c_long
 from subprocess import *
 import platform
 import os
 def pipe_through_prog(prog, text):
    p1 = Popen(prog.split(), stdout=PIPE, stdin=PIPE, stderr=PIPE)
    [result, err] = p1.communicate(input=text.encode('utf-8'))
    return [p1.returncode, result.decode('utf-8'), err]
 def use_library(lib, text):
    textbytes = text.encode('utf-8')
    textlen = len(textbytes)
    return [0, lib(textbytes, textlen, 0).decode('utf-8'), '']
 class CMark:
    def __init__(self, prog=None, library_dir=None):
        self.prog = prog
        if prog:
            self.to_html = lambda x: pipe_through_prog(prog, x)
        else:
            sysname = platform.system()
            if sysname == 'Darwin':
                libname = "libcmark.dylib"
            elif sysname == 'Windows':
                libname = "cmark.dll"
            else:
                libname = "libcmark.so"
            if library_dir:
                libpath = os.path.join(library_dir, libname)
            else:
                libpath = os.path.join("build", "src", libname)
            cmark = CDLL(libpath)
            markdown = cmark.cmark_markdown_to_html
            markdown.restype = c_char_p
            markdown.argtypes = [c_char_p, c_long]
            self.to_html = lambda x: use_library(markdown, x)
--- a/test/coverage.txt
+++ b/test/coverage.txt
@ -0,0 +1,464 @@
 # Coverage
 This file is just a collection of unit tests not covered elsewhere.
 Most notably regression tests, tests improving code coverage and other useful
 things may drop here.
 (However any tests requiring any additional command line option, like enabling
 an extension, must be included in their respective files.)
 ## GitHub Issues
 ### [Issue 2](https://github.com/mity/md4c/issues/2)
 Raw HTML block:
 ```````````````````````````````` example
 <gi att1=tok1 att2=tok2>
 .
 <gi att1=tok1 att2=tok2>
 ````````````````````````````````
 Inline:
 ```````````````````````````````` example
 foo <gi att1=tok1 att2=tok2> bar
 .
 <p>foo <gi att1=tok1 att2=tok2> bar</p>
 ````````````````````````````````
 Inline with a line break:
 ```````````````````````````````` example
 foo <gi att1=tok1
 att2=tok2> bar
 .
 <p>foo <gi att1=tok1
 att2=tok2> bar</p>
 ````````````````````````````````
 ### [Issue 4](https://github.com/mity/md4c/issues/4)
 ```````````````````````````````` example
 ![alt text with *entity* &copy;](img.png 'title')
 .
 <p><img src="img.png" alt="alt text with entity ©" title="title"></p>
 ````````````````````````````````
 ### [Issue 9](https://github.com/mity/md4c/issues/9)
 ```````````````````````````````` example
 > [foo
 > bar]: /url
 >
 > [foo bar]
 .
 <blockquote>
 <p><a href="/url">foo
 bar</a></p>
 </blockquote>
 ````````````````````````````````
 ### [Issue 10](https://github.com/mity/md4c/issues/10)
 ```````````````````````````````` example
 [x]:
 x
 - <?
  x
 .
 <ul>
 <li><?
 x
 </li>
 </ul>
 ````````````````````````````````
 ### [Issue 11](https://github.com/mity/md4c/issues/11)
 ```````````````````````````````` example
 x [link](/url "foo &ndash; bar") x
 .
 <p>x <a href="/url" title="foo – bar">link</a> x</p>
 ````````````````````````````````
 ### [Issue 14](https://github.com/mity/md4c/issues/14)
 ```````````````````````````````` example
 a***b* c*
 .
 <p>a*<em><em>b</em> c</em></p>
 ````````````````````````````````
 ### [Issue 15](https://github.com/mity/md4c/issues/15)
 ```````````````````````````````` example
 ***b* c*
 .
 <p>*<em><em>b</em> c</em></p>
 ````````````````````````````````
 ### [Issue 21](https://github.com/mity/md4c/issues/21)
 ```````````````````````````````` example
 a*b**c*
 .
 <p>a<em>b**c</em></p>
 ````````````````````````````````
 ### [Issue 33](https://github.com/mity/md4c/issues/33)
 ```````````````````````````````` example
 ```&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;
 .
 <pre><code class="language-&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;"></code></pre>
 ````````````````````````````````
 ### [Issue 36](https://github.com/mity/md4c/issues/36)
 ```````````````````````````````` example
 __x_ _x___
 .
 <p><em><em>x</em> <em>x</em></em>_</p>
 ````````````````````````````````
 ### [Issue 39](https://github.com/mity/md4c/issues/39)
 ```````````````````````````````` example
 [\\]: x
 .
 ````````````````````````````````
 ### [Issue 40](https://github.com/mity/md4c/issues/40)
 ```````````````````````````````` example
 [x](url
 'title'
 )x
 .
 <p><a href="url" title="title">x</a>x</p>
 ````````````````````````````````
 ### [Issue 65](https://github.com/mity/md4c/issues/65)
 ```````````````````````````````` example
 `
 .
 <p>`</p>
 ````````````````````````````````
 ### [Issue 74](https://github.com/mity/md4c/issues/74)
 ```````````````````````````````` example
 [f]:
 -
    xx
 -
 .
 <pre><code>xx
 </code></pre>
 <ul>
 <li></li>
 </ul>
 ````````````````````````````````
 ### [Issue 78](https://github.com/mity/md4c/issues/78)
 ```````````````````````````````` example
 [SS ẞ]: /url
 [ẞ SS]
 .
 <p><a href="/url">ẞ SS</a></p>
 ````````````````````````````````
 ### [Issue 83](https://github.com/mity/md4c/issues/83)
 ```````````````````````````````` example
 foo
 >
 .
 <p>foo</p>
 <blockquote>
 </blockquote>
 ````````````````````````````````
 ### [Issue 95](https://github.com/mity/md4c/issues/95)
 ```````````````````````````````` example
 . foo
 .
 <p>. foo</p>
 ````````````````````````````````
 ### [Issue 96](https://github.com/mity/md4c/issues/96)
 ```````````````````````````````` example
 [ab]: /foo
 [a] [ab] [abc]
 .
 <p>[a] <a href="/foo">ab</a> [abc]</p>
 ````````````````````````````````
 ```````````````````````````````` example
 [a b]: /foo
 [a   b]
 .
 <p><a href="/foo">a   b</a></p>
 ````````````````````````````````
 ### [Issue 97](https://github.com/mity/md4c/issues/97)
 ```````````````````````````````` example
 *a **b c* d**
 .
 <p><em>a <em><em>b c</em> d</em></em></p>
 ````````````````````````````````
 ### [Issue 100](https://github.com/mity/md4c/issues/100)
 ```````````````````````````````` example
 <foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123>
 .
 <p><a href="mailto:foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123">foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123</a></p>
 ````````````````````````````````
 ```````````````````````````````` example
 <foo@123456789012345678901234567890123456789012345678901234567890123x.123456789012345678901234567890123456789012345678901234567890123>
 .
 <p>&lt;foo@123456789012345678901234567890123456789012345678901234567890123x.123456789012345678901234567890123456789012345678901234567890123&gt;</p>
 ````````````````````````````````
 (Note the `x` here which turns it over the max. allowed length limit.)
 ### [Issue 107](https://github.com/mity/md4c/issues/107)
 ```````````````````````````````` example
 ***foo *bar baz***
 .
 <p>*<strong>foo <em>bar baz</em></strong></p>
 ````````````````````````````````
 ## Code coverage
 ### `md_is_unicode_whitespace__()`
 Unicode whitespace (here U+2000) forms a word boundary so these cannot be
 resolved as emphasis span because there is no closer mark.
 ```````````````````````````````` example
 *foo *bar
 .
 <p>*foo *bar</p>
 ````````````````````````````````
 ### `md_is_unicode_punct__()`
 Ditto for Unicode punctuation (here U+00A1).
 ```````````````````````````````` example
 *foo¡*bar
 .
 <p>*foo¡*bar</p>
 ````````````````````````````````
 ### `md_get_unicode_fold_info()`
 ```````````````````````````````` example
 [Příliš žluťoučký kůň úpěl ďábelské ódy.]
 [PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY.]: /url
 .
 <p><a href="/url">Příliš žluťoučký kůň úpěl ďábelské ódy.</a></p>
 ````````````````````````````````
 ### `md_decode_utf8__()` and `md_decode_utf8_before__()`
 ```````````````````````````````` example
 á*Á (U+00E1, i.e. two byte UTF-8 sequence)
  *  (U+2000, i.e. three byte UTF-8 sequence)
 .
 <p>á*Á (U+00E1, i.e. two byte UTF-8 sequence)
 *  (U+2000, i.e. three byte UTF-8 sequence)</p>
 ````````````````````````````````
 ### `md_is_link_destination_A()`
 ```````````````````````````````` example
 [link](</url\.with\.escape>)
 .
 <p><a href="/url.with.escape">link</a></p>
 ````````````````````````````````
 ### `md_link_label_eq()`
 ```````````````````````````````` example
 [foo bar]
 [foo bar]: /url
 .
 <p><a href="/url">foo bar</a></p>
 ````````````````````````````````
 ### `md_is_inline_link_spec()`
 ```````````````````````````````` example
 > [link](/url 'foo
 > bar')
 .
 <blockquote>
 <p><a href="/url" title="foo
 bar">link</a></p>
 </blockquote>
 ````````````````````````````````
 ### `md_build_ref_def_hashtable()`
 All link labels in the following example all have the same FNV1a hash (after
 normalization of the label, which means after converting to a vector of Unicode
 codepoints and lowercase folding).
 So the example triggers quite complex code paths which are not otherwise easily
 tested.
 ```````````````````````````````` example
 [foo]: /foo
 [qnptgbh]: /qnptgbh
 [abgbrwcv]: /abgbrwcv
 [abgbrwcv]: /abgbrwcv2
 [abgbrwcv]: /abgbrwcv3
 [abgbrwcv]: /abgbrwcv4
 [alqadfgn]: /alqadfgn
 [foo]
 [qnptgbh]
 [abgbrwcv]
 [alqadfgn]
 [axgydtdu]
 .
 <p><a href="/foo">foo</a>
 <a href="/qnptgbh">qnptgbh</a>
 <a href="/abgbrwcv">abgbrwcv</a>
 <a href="/alqadfgn">alqadfgn</a>
 [axgydtdu]</p>
 ````````````````````````````````
 For the sake of completeness, the following C program was used to find the hash
 collisions by brute force:
 ~~~
 #include <stdio.h>
 #include <string.h>
 static unsigned etalon;
 #define MD_FNV1A_BASE       2166136261
 #define MD_FNV1A_PRIME      16777619
 static inline unsigned
 fnv1a(unsigned base, const void* data, size_t n)
 {
    const unsigned char* buf = (const unsigned char*) data;
    unsigned hash = base;
    size_t i;
    for(i = 0; i < n; i++) {
        hash ^= buf[i];
        hash *= MD_FNV1A_PRIME;
    }
    return hash;
 }
 static unsigned
 unicode_hash(const char* data, size_t n)
 {
    unsigned value;
    unsigned hash = MD_FNV1A_BASE;
    int i;
    for(i = 0; i < n; i++) {
        value = data[i];
        hash = fnv1a(hash, &value, sizeof(unsigned));
    }
    return hash;
 }
 static void
 recurse(char* buffer, size_t off, size_t len)
 {
    int ch;
    if(off < len - 1) {
        for(ch = 'a'; ch <= 'z'; ch++) {
            buffer[off] = ch;
            recurse(buffer, off+1, len);
        }
    } else {
        for(ch = 'a'; ch <= 'z'; ch++) {
            buffer[off] = ch;
            if(unicode_hash(buffer, len) == etalon) {
                printf("Dup: %.*s\n", (int)len, buffer);
            }
        }
    }
 }
 int
 main(int argc, char** argv)
 {
    char buffer[32];
    int len;
    if(argc < 2)
        etalon = unicode_hash("foo", 3);
    else
        etalon = unicode_hash(argv[1], strlen(argv[1]));
    for(len = 1; len <= sizeof(buffer); len++)
        recurse(buffer, 0, len);
    return 0;
 }
 ~~~
--- a/test/fuzz-input/commonmark.md
+++ b/test/fuzz-input/commonmark.md
@ -0,0 +1,41 @@
 # h1
 ## h2
 ### h3
 #### h4
 ##### h5
 ###### h6
 h1
 ==
 h2
 --
 --------------------
    indented code
 ```
 fenced code
 ```
 <tag attr='val' attr2="val2">
 > quote
 * list item
 1. list item
 [ref]: /url
 paragraph
 &copy; &#1234; &#xabcd;
 `code`
 *emph* **strong** ***strong emph***
 _emph_ __strong__ ___strong emph___
 [ref] [ref][] [link](/url)
 ![ref] ![ref][] ![img](/url)
 <http://example.com> <doe@example.com>
 www.example.com doe@example.com
 \\ \* \. \` \
--- a/test/fuzz-input/gfm.md
+++ b/test/fuzz-input/gfm.md
@ -0,0 +1,8 @@
 * [ ] unchecked
 * [x] checked
 A | B | C
 ---|--:|:-:
 aaa|bbb|ccc
 ~del~ ~~del~~
--- a/test/fuzz-input/latex-math.md
+++ b/test/fuzz-input/latex-math.md
@ -0,0 +1 @@
 $a^2+b^2=c^2$ $$a^2+b^2=c^2$$
--- a/test/fuzz-input/wiki.md
+++ b/test/fuzz-input/wiki.md
@ -0,0 +1 @@
 [[wiki]] [[wiki|label]]
--- a/test/latex-math.txt
+++ b/test/latex-math.txt
@ -0,0 +1,39 @@
 # LaTeX Math
 With the flag `MD_FLAG_LATEXMATHSPANS`, MD4C enables extension for recognition
 of LaTeX style math spans.
 A math span is is any text wrapped in dollars or double dollars (`$...$` or
 `$$...$$`).
 ```````````````````````````````` example
 $a+b=c$ Hello, world!
 .
 <p><x-equation>a+b=c</x-equation> Hello, world!</p>
 ````````````````````````````````
 If the double dollar sign is used, the math span is a display math span.
 ```````````````````````````````` example
 This is a display equation: $$\int_a^b x dx$$.
 .
 <p>This is a display equation: <x-equation type="display">\int_a^b x dx</x-equation>.</p>
 ````````````````````````````````
 Math spans may span multiple lines as they are normal spans:
 ```````````````````````````````` example
 $$
 \int_a^b
 f(x) dx
 $$
 .
 <p><x-equation type="display">\int_a^b f(x) dx </x-equation></p>
 ````````````````````````````````
 Note though that many (simple) renderers may output the math spans just as a
 verbatim text. (This includes the HTML renderer used by the `md2html` utility.)
 Only advanced renderers which implement LaTeX math syntax can be expected to
 provide better results.
--- a/test/normalize.py
+++ b/test/normalize.py
@ -0,0 +1,194 @@
 # -*- coding: utf-8 -*-
 from html.parser import HTMLParser
 import urllib
 try:
    from html.parser import HTMLParseError
 except ImportError:
    # HTMLParseError was removed in Python 3.5. It could never be
    # thrown, so we define a placeholder instead.
    class HTMLParseError(Exception):
        pass
 from html.entities import name2codepoint
 import sys
 import re
 import cgi
 # Normalization code, adapted from
 # https://github.com/karlcow/markdown-testsuite/
 significant_attrs = ["alt", "href", "src", "title"]
 whitespace_re = re.compile('\s+')
 class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.convert_charrefs = False
        self.last = "starttag"
        self.in_pre = False
        self.output = ""
        self.last_tag = ""
    def handle_data(self, data):
        after_tag = self.last == "endtag" or self.last == "starttag"
        after_block_tag = after_tag and self.is_block_tag(self.last_tag)
        if after_tag and self.last_tag == "br":
            data = data.lstrip('\n')
        if not self.in_pre:
            data = whitespace_re.sub(' ', data)
        if after_block_tag and not self.in_pre:
            if self.last == "starttag":
                data = data.lstrip()
            elif self.last == "endtag":
                data = data.strip()
        self.output += data
        self.last = "data"
    def handle_endtag(self, tag):
        if tag == "pre":
            self.in_pre = False
        elif self.is_block_tag(tag):
            self.output = self.output.rstrip()
        self.output += "</" + tag + ">"
        self.last_tag = tag
        self.last = "endtag"
    def handle_starttag(self, tag, attrs):
        if tag == "pre":
            self.in_pre = True
        if self.is_block_tag(tag):
            self.output = self.output.rstrip()
        self.output += "<" + tag
        # For now we don't strip out 'extra' attributes, because of
        # raw HTML test cases.
        # attrs = filter(lambda attr: attr[0] in significant_attrs, attrs)
        if attrs:
            attrs.sort()
            for (k,v) in attrs:
                self.output += " " + k
                if v in ['href','src']:
                    self.output += ("=" + '"' +
                            urllib.quote(urllib.unquote(v), safe='/') + '"')
                elif v != None:
                    self.output += ("=" + '"' + cgi.escape(v,quote=True) + '"')
        self.output += ">"
        self.last_tag = tag
        self.last = "starttag"
    def handle_startendtag(self, tag, attrs):
        """Ignore closing tag for self-closing """
        self.handle_starttag(tag, attrs)
        self.last_tag = tag
        self.last = "endtag"
    def handle_comment(self, data):
        self.output += '<!--' + data + '-->'
        self.last = "comment"
    def handle_decl(self, data):
        self.output += '<!' + data + '>'
        self.last = "decl"
    def unknown_decl(self, data):
        self.output += '<!' + data + '>'
        self.last = "decl"
    def handle_pi(self,data):
        self.output += '<?' + data + '>'
        self.last = "pi"
    def handle_entityref(self, name):
        try:
            c = chr(name2codepoint[name])
        except KeyError:
            c = None
        self.output_char(c, '&' + name + ';')
        self.last = "ref"
    def handle_charref(self, name):
        try:
            if name.startswith("x"):
                c = chr(int(name[1:], 16))
            else:
                c = chr(int(name))
        except ValueError:
                c = None
        self.output_char(c, '&' + name + ';')
        self.last = "ref"
    # Helpers.
    def output_char(self, c, fallback):
        if c == '<':
            self.output += "&lt;"
        elif c == '>':
            self.output += "&gt;"
        elif c == '&':
            self.output += "&amp;"
        elif c == '"':
            self.output += "&quot;"
        elif c == None:
            self.output += fallback
        else:
            self.output += c
    def is_block_tag(self,tag):
        return (tag in ['article', 'header', 'aside', 'hgroup', 'blockquote',
            'hr', 'iframe', 'body', 'li', 'map', 'button', 'object', 'canvas',
            'ol', 'caption', 'output', 'col', 'p', 'colgroup', 'pre', 'dd',
            'progress', 'div', 'section', 'dl', 'table', 'td', 'dt',
            'tbody', 'embed', 'textarea', 'fieldset', 'tfoot', 'figcaption',
            'th', 'figure', 'thead', 'footer', 'tr', 'form', 'ul',
            'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'video', 'script', 'style'])
 def normalize_html(html):
    r"""
    Return normalized form of HTML which ignores insignificant output
    differences:
    Multiple inner whitespaces are collapsed to a single space (except
    in pre tags):
        >>> normalize_html("<p>a  \t b</p>")
        '<p>a b</p>'
        >>> normalize_html("<p>a  \t\nb</p>")
        '<p>a b</p>'
    * Whitespace surrounding block-level tags is removed.
        >>> normalize_html("<p>a  b</p>")
        '<p>a b</p>'
        >>> normalize_html(" <p>a  b</p>")
        '<p>a b</p>'
        >>> normalize_html("<p>a  b</p> ")
        '<p>a b</p>'
        >>> normalize_html("\n\t<p>\n\t\ta  b\t\t</p>\n\t")
        '<p>a b</p>'
        >>> normalize_html("<i>a  b</i> ")
        '<i>a b</i> '
    * Self-closing tags are converted to open tags.
        >>> normalize_html("<br />")
        '<br>'
    * Attributes are sorted and lowercased.
        >>> normalize_html('<a title="bar" HREF="foo">x</a>')
        '<a href="foo" title="bar">x</a>'
    * References are converted to unicode, except that '<', '>', '&', and
      '"' are rendered using entities.
        >>> normalize_html("&forall;&amp;&gt;&lt;&quot;")
        '\u2200&amp;&gt;&lt;&quot;'
    """
    html_chunk_re = re.compile("(\<!\[CDATA\[.*?\]\]\>|\<[^>]*\>|[^<]+)")
    try:
        parser = MyHTMLParser()
        # We work around HTMLParser's limitations parsing CDATA
        # by breaking the input into chunks and passing CDATA chunks
        # through verbatim.
        for chunk in re.finditer(html_chunk_re, html):
            if chunk.group(0)[:8] == "<![CDATA":
                parser.output += chunk.group(0)
            else:
                parser.feed(chunk.group(0))
        parser.close()
        return parser.output
    except HTMLParseError as e:
        sys.stderr.write("Normalization error: " + e.msg + "\n")
        return html  # on error, return unnormalized HTML
--- a/test/pathological_tests.py
+++ b/test/pathological_tests.py
@ -0,0 +1,122 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 import re
 import argparse
 import sys
 import platform
 from cmark import CMark
 from timeit import default_timer as timer
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Run cmark tests.')
    parser.add_argument('-p', '--program', dest='program', nargs='?', default=None,
            help='program to test')
    parser.add_argument('--library-dir', dest='library_dir', nargs='?',
            default=None, help='directory containing dynamic library')
    args = parser.parse_args(sys.argv[1:])
 cmark = CMark(prog=args.program, library_dir=args.library_dir)
 # list of pairs consisting of input and a regex that must match the output.
 pathological = {
    # note - some pythons have limit of 65535 for {num-matches} in re.
    "nested strong emph":
                (("*a **a " * 65000) + "b" + (" a** a*" * 65000),
                 re.compile("(<em>a <strong>a ){65000}b( a</strong> a</em>){65000}")),
    "many emph closers with no openers":
                 (("a_ " * 65000),
                  re.compile("(a[_] ){64999}a_")),
    "many emph openers with no closers":
                 (("_a " * 65000),
                  re.compile("(_a ){64999}_a")),
    "many 3-emph openers with no closers":
                 (("a***" * 65000),
                  re.compile("(a<em><strong>a</strong></em>){32500}")),
    "many link closers with no openers":
                 (("a]" * 65000),
                  re.compile("(a\]){65000}")),
    "many link openers with no closers":
                 (("[a" * 65000),
                  re.compile("(\[a){65000}")),
    "mismatched openers and closers":
                 (("*a_ " * 50000),
                  re.compile("([*]a[_] ){49999}[*]a_")),
    "openers and closers multiple of 3":
                 (("a**b" + ("c* " * 50000)),
                  re.compile("a[*][*]b(c[*] ){49999}c[*]")),
    "link openers and emph closers":
                 (("[ a_" * 50000),
                  re.compile("(\[ a_){50000}")),
    "hard link/emph case":
                 ("**x [a*b**c*](d)",
                  re.compile("\\*\\*x <a href=\"d\">a<em>b\\*\\*c</em></a>")),
    "nested brackets":
                 (("[" * 50000) + "a" + ("]" * 50000),
                  re.compile("\[{50000}a\]{50000}")),
    "nested block quotes":
                 ((("> " * 50000) + "a"),
                  re.compile("(<blockquote>\r?\n){50000}")),
    "U+0000 in input":
                 ("abc\u0000de\u0000",
                  re.compile("abc\ufffd?de\ufffd?")),
    "backticks":
                 ("".join(map(lambda x: ("e" + "`" * x), range(1,1000))),
                  re.compile("^<p>[e`]*</p>\r?\n$")),
    "many links":
                 ("[t](/u) " * 50000,
                  re.compile("(<a href=\"/u\">t</a> ?){50000}")),
    "many references":
                 ("".join(map(lambda x: ("[" + str(x) + "]: u\n"), range(1,20000 * 16))) + "[0] " * 20000,
                  re.compile("(\[0\] ){19999}")),
    "deeply nested lists":
                 ("".join(map(lambda x: ("  " * x + "* a\n"), range(0,1000))),
                  re.compile("<ul>\r?\n(<li>a<ul>\r?\n){999}<li>a</li>\r?\n</ul>\r?\n(</li>\r?\n</ul>\r?\n){999}")),
    "many html openers and closers":
                 (("<>" * 50000),
                  re.compile("(&lt;&gt;){50000}")),
    "many html proc. inst. openers":
                 (("x" + "<?" * 50000),
                  re.compile("x(&lt;\\?){50000}")),
    "many html CDATA openers":
                 (("x" + "<![CDATA[" * 50000),
                  re.compile("x(&lt;!\\[CDATA\\[){50000}")),
    "many backticks and escapes":
                 (("\\``" * 50000),
                  re.compile("(``){50000}")),
    "many broken link titles":
                 (("[ (](" * 50000),
                  re.compile("(\[ \(\]\(){50000}")),
    "broken thematic break":
                 (("* " * 50000 + "a"),
                  re.compile("<ul>\r?\n(<li><ul>\r?\n){49999}<li>a</li>\r?\n</ul>\r?\n(</li>\r?\n</ul>\r?\n){49999}"))
    }
 whitespace_re = re.compile('/s+/')
 passed = 0
 errored = 0
 failed = 0
 #print("Testing pathological cases:")
 for description in pathological:
    (inp, regex) = pathological[description]
    start = timer()
    [rc, actual, err] = cmark.to_html(inp)
    end = timer()
    if rc != 0:
        errored += 1
        print('{:35} [ERRORED (return code %d)]'.format(description, rc))
        print(err)
    elif regex.search(actual):
        print('{:35} [PASSED] {:.3f} secs'.format(description, end-start))
        passed += 1
    else:
        print('{:35} [FAILED]'.format(description))
        print(repr(actual))
        failed += 1
 print("%d passed, %d failed, %d errored" % (passed, failed, errored))
 if (failed == 0 and errored == 0):
    exit(0)
 else:
    exit(1)
--- a/test/permissive-email-autolinks.txt
+++ b/test/permissive-email-autolinks.txt
@ -0,0 +1,50 @@
 # Permissive E-mail Autolinks
 With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, MD4C enables more permissive
 recognition of e-mail addresses and transforms them to autolinks, even if they
 do not exactly follow the syntax of autolink as specified in CommonMark
 specification.
 This is standard CommonMark e-mail autolink:
 ```````````````````````````````` example
 E-mail: <mailto:john.doe@gmail.com>
 .
 <p>E-mail: <a href="mailto:john.doe@gmail.com">mailto:john.doe@gmail.com</a></p>
 ````````````````````````````````
 With the permissive autolinks enabled, this is sufficient:
 ```````````````````````````````` example
 E-mail: john.doe@gmail.com
 .
 <p>E-mail: <a href="mailto:john.doe@gmail.com">john.doe@gmail.com</a></p>
 ````````````````````````````````
 `+` can occur before the `@`, but not after.
 ```````````````````````````````` example
 hello@mail+xyz.example isn't valid, but hello+xyz@mail.example is.
 .
 <p>hello@mail+xyz.example isn't valid, but <a href="mailto:hello+xyz@mail.example">hello+xyz@mail.example</a> is.</p>
 ````````````````````````````````
 `.`, `-`, and `_` can occur on both sides of the `@`, but only `.` may occur at
 the end of the email address, in which case it will not be considered part of
 the address:
 ```````````````````````````````` example
 a.b-c_d@a.b
 a.b-c_d@a.b.
 a.b-c_d@a.b-
 a.b-c_d@a.b_
 .
 <p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a></p>
 <p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a>.</p>
 <p>a.b-c_d@a.b-</p>
 <p>a.b-c_d@a.b_</p>
 ````````````````````````````````
--- a/test/permissive-url-autolinks.txt
+++ b/test/permissive-url-autolinks.txt
@ -0,0 +1,92 @@
 # Permissive URL Autolinks
 With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS`, MD4C enables more permissive recognition
 of URLs and transform them to autolinks, even if they do not exactly follow the syntax
 of autolink as specified in CommonMark specification.
 This is standard CommonMark autolink:
 ```````````````````````````````` example
 Homepage: <https://github.com/mity/md4c>
 .
 <p>Homepage: <a href="https://github.com/mity/md4c">https://github.com/mity/md4c</a></p>
 ````````````````````````````````
 With the permissive autolinks enabled, this is sufficient:
 ```````````````````````````````` example
 Homepage: https://github.com/mity/md4c
 .
 <p>Homepage: <a href="https://github.com/mity/md4c">https://github.com/mity/md4c</a></p>
 ````````````````````````````````
 But this permissive autolink feature can work only for very widely used URL
 schemes, in alphabetical order `ftp:`, `http:`, `https:`.
 That's why this is not a permissive autolink:
 ```````````````````````````````` example
 ssh://root@example.com
 .
 <p>ssh://root@example.com</p>
 ````````````````````````````````
 The same rules for path validation as for permissivve WWW autolinks apply.
 Therefore the final question mark here is not part of the autolink:
 ```````````````````````````````` example
 Have you ever visited http://www.zombo.com?
 .
 <p>Have you ever visited <a href="http://www.zombo.com">http://www.zombo.com</a>?</p>
 ````````````````````````````````
 But in contrast, in this example it is:
 ```````````````````````````````` example
 http://www.bing.com/search?q=md4c
 .
 <p><a href="http://www.bing.com/search?q=md4c">http://www.bing.com/search?q=md4c</a></p>
 ````````````````````````````````
 And finally one complex example:
 ```````````````````````````````` example
 http://commonmark.org
 (Visit https://encrypted.google.com/search?q=Markup+(business))
 Anonymous FTP is available at ftp://foo.bar.baz.
 .
 <p><a href="http://commonmark.org">http://commonmark.org</a></p>
 <p>(Visit <a href="https://encrypted.google.com/search?q=Markup+(business)">https://encrypted.google.com/search?q=Markup+(business)</a>)</p>
 <p>Anonymous FTP is available at <a href="ftp://foo.bar.baz">ftp://foo.bar.baz</a>.</p>
 ````````````````````````````````
 ## GitHub Issues
 ### [Issue 53](https://github.com/mity/md4c/issues/53)
 ```````````````````````````````` example
 This is [link](http://github.com/).
 .
 <p>This is <a href="http://github.com/">link</a>.</p>
 ````````````````````````````````
 ```````````````````````````````` example
 This is [link](http://github.com/)X
 .
 <p>This is <a href="http://github.com/">link</a>X</p>
 ````````````````````````````````
 ## [Issue 76](https://github.com/mity/md4c/issues/76)
 ```````````````````````````````` example
 *(http://example.com)*
 .
 <p><em>(<a href="http://example.com">http://example.com</a>)</em></p>
 ````````````````````````````````
--- a/test/permissive-www-autolinks.txt
+++ b/test/permissive-www-autolinks.txt
@ -0,0 +1,107 @@
 # Permissive WWW Autolinks
 With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS`, MD4C enables recognition of
 autolinks starting with `www.`, even if they do not exactly follow the syntax
 of autolink as specified in CommonMark specification.
 These do not have to be enclosed in `<` and `>`, and they even do not need
 any preceding scheme specification.
 The WWW autolink will be recognized when a valid domain is found.
 A valid domain consists of the text `www.`, followed by alphanumeric characters,
 nderscores (`_`), hyphens (`-`) and periods (`.`). There must be at least one
 period, and no underscores may be present in the last two segments of the domain.
 The scheme `http` will be inserted automatically:
 ```````````````````````````````` example
 www.commonmark.org
 .
 <p><a href="http://www.commonmark.org">www.commonmark.org</a></p>
 ````````````````````````````````
 After a valid domain, zero or more non-space non-`<` characters may follow:
 ```````````````````````````````` example
 Visit www.commonmark.org/help for more information.
 .
 <p>Visit <a href="http://www.commonmark.org/help">www.commonmark.org/help</a> for more information.</p>
 ````````````````````````````````
 We then apply extended autolink path validation as follows:
 Trailing punctuation (specifically, `?`, `!`, `.`, `,`, `:`, `*`, `_`, and `~`)
 will not be considered part of the autolink, though they may be included in the
 interior of the link:
 ```````````````````````````````` example
 Visit www.commonmark.org.
 Visit www.commonmark.org/a.b.
 .
 <p>Visit <a href="http://www.commonmark.org">www.commonmark.org</a>.</p>
 <p>Visit <a href="http://www.commonmark.org/a.b">www.commonmark.org/a.b</a>.</p>
 ````````````````````````````````
 When an autolink ends in `)`, we scan the entire autolink for the total number
 of parentheses.  If there is a greater number of closing parentheses than
 opening ones, we don't consider the last character part of the autolink, in
 order to facilitate including an autolink inside a parenthesis:
 ```````````````````````````````` example
 www.google.com/search?q=Markup+(business)
 (www.google.com/search?q=Markup+(business))
 .
 <p><a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a></p>
 <p>(<a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a>)</p>
 ````````````````````````````````
 This check is only done when the link ends in a closing parentheses `)`, so if
 the only parentheses are in the interior of the autolink, no special rules are
 applied:
 ```````````````````````````````` example
 www.google.com/search?q=(business)+ok
 .
 <p><a href="http://www.google.com/search?q=(business)+ok">www.google.com/search?q=(business)+ok</a></p>
 ````````````````````````````````
 If an autolink ends in a semicolon (`;`), we check to see if it appears to
 resemble an [entity reference][entity references]; if the preceding text is `&`
 followed by one or more alphanumeric characters.  If so, it is excluded from
 the autolink:
 ```````````````````````````````` example
 www.google.com/search?q=commonmark&hl=en
 www.google.com/search?q=commonmark&hl;
 .
 <p><a href="http://www.google.com/search?q=commonmark&amp;hl=en">www.google.com/search?q=commonmark&amp;hl=en</a></p>
 <p><a href="http://www.google.com/search?q=commonmark">www.google.com/search?q=commonmark</a>&amp;hl;</p>
 ````````````````````````````````
 `<` immediately ends an autolink.
 ```````````````````````````````` example
 www.commonmark.org/he<lp
 .
 <p><a href="http://www.commonmark.org/he">www.commonmark.org/he</a>&lt;lp</p>
 ````````````````````````````````
 ## GitHub Issues
 ### [Issue 53](https://github.com/mity/md4c/issues/53)
 ```````````````````````````````` example
 This is [link](www.github.com/).
 .
 <p>This is <a href="www.github.com/">link</a>.</p>
 ````````````````````````````````
 ```````````````````````````````` example
 This is [link](www.github.com/)X
 .
 <p>This is <a href="www.github.com/">link</a>X</p>
 ````````````````````````````````
--- a/test/spec.txt
+++ b/test/spec.txt
--- a/test/spec_tests.py
+++ b/test/spec_tests.py
@ -0,0 +1,144 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 import sys
 from difflib import unified_diff
 import argparse
 import re
 import json
 from cmark import CMark
 from normalize import normalize_html
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Run cmark tests.')
    parser.add_argument('-p', '--program', dest='program', nargs='?', default=None,
            help='program to test')
    parser.add_argument('-s', '--spec', dest='spec', nargs='?', default='spec.txt',
            help='path to spec')
    parser.add_argument('-P', '--pattern', dest='pattern', nargs='?',
            default=None, help='limit to sections matching regex pattern')
    parser.add_argument('--library-dir', dest='library_dir', nargs='?',
            default=None, help='directory containing dynamic library')
    parser.add_argument('--no-normalize', dest='normalize',
            action='store_const', const=False, default=True,
            help='do not normalize HTML')
    parser.add_argument('-d', '--dump-tests', dest='dump_tests',
            action='store_const', const=True, default=False,
            help='dump tests in JSON format')
    parser.add_argument('--debug-normalization', dest='debug_normalization',
            action='store_const', const=True,
            default=False, help='filter stdin through normalizer for testing')
    parser.add_argument('-n', '--number', type=int, default=None,
            help='only consider the test with the given number')
    args = parser.parse_args(sys.argv[1:])
 def out(str):
    sys.stdout.buffer.write(str.encode('utf-8')) 
 def print_test_header(headertext, example_number, start_line, end_line):
    out("Example %d (lines %d-%d) %s\n" % (example_number,start_line,end_line,headertext))
 def do_test(test, normalize, result_counts):
    [retcode, actual_html, err] = cmark.to_html(test['markdown'])
    if retcode == 0:
        expected_html = test['html']
        unicode_error = None
        if normalize:
            try:
                passed = normalize_html(actual_html) == normalize_html(expected_html)
            except UnicodeDecodeError as e:
                unicode_error = e
                passed = False
        else:
            passed = actual_html == expected_html
        if passed:
            result_counts['pass'] += 1
        else:
            print_test_header(test['section'], test['example'], test['start_line'], test['end_line'])
            out(test['markdown'] + '\n')
            if unicode_error:
                out("Unicode error: " + str(unicode_error) + '\n')
                out("Expected: " + repr(expected_html) + '\n')
                out("Got:      " + repr(actual_html) + '\n')
            else:
                expected_html_lines = expected_html.splitlines(True)
                actual_html_lines = actual_html.splitlines(True)
                for diffline in unified_diff(expected_html_lines, actual_html_lines,
                                "expected HTML", "actual HTML"):
                    out(diffline)
            out('\n')
            result_counts['fail'] += 1
    else:
        print_test_header(test['section'], test['example'], test['start_line'], test['end_line'])
        out("program returned error code %d\n" % retcode)
        sys.stdout.buffer.write(err)
        result_counts['error'] += 1
 def get_tests(specfile):
    line_number = 0
    start_line = 0
    end_line = 0
    example_number = 0
    markdown_lines = []
    html_lines = []
    state = 0  # 0 regular text, 1 markdown example, 2 html output
    headertext = ''
    tests = []
    header_re = re.compile('#+ ')
    with open(specfile, 'r', encoding='utf-8', newline='\n') as specf:
        for line in specf:
            line_number = line_number + 1
            l = line.strip()
            #if l == "`" * 32 + " example":
            if re.match("`{32} example( [a-z]{1,})?", l):
                state = 1
            elif state == 2 and l == "`" * 32:
                state = 0
                example_number = example_number + 1
                end_line = line_number
                tests.append({
                    "markdown":''.join(markdown_lines).replace('→',"\t"),
                    "html":''.join(html_lines).replace('→',"\t"),
                    "example": example_number,
                    "start_line": start_line,
                    "end_line": end_line,
                    "section": headertext})
                start_line = 0
                markdown_lines = []
                html_lines = []
            elif l == ".":
                state = 2
            elif state == 1:
                if start_line == 0:
                    start_line = line_number - 1
                markdown_lines.append(line)
            elif state == 2:
                html_lines.append(line)
            elif state == 0 and re.match(header_re, line):
                headertext = header_re.sub('', line).strip()
    return tests
 if __name__ == "__main__":
    if args.debug_normalization:
        out(normalize_html(sys.stdin.read()))
        exit(0)
    all_tests = get_tests(args.spec)
    if args.pattern:
        pattern_re = re.compile(args.pattern, re.IGNORECASE)
    else:
        pattern_re = re.compile('.')
    tests = [ test for test in all_tests if re.search(pattern_re, test['section']) and (not args.number or test['example'] == args.number) ]
    if args.dump_tests:
        out(json.dumps(tests, ensure_ascii=False, indent=2))
        exit(0)
    else:
        skipped = len(all_tests) - len(tests)
        cmark = CMark(prog=args.program, library_dir=args.library_dir)
        result_counts = {'pass': 0, 'fail': 0, 'error': 0, 'skip': skipped}
        for test in tests:
            do_test(test, args.normalize, result_counts)
        out("{pass} passed, {fail} failed, {error} errored, {skip} skipped\n".format(**result_counts))
        exit(result_counts['fail'] + result_counts['error'])
--- a/test/strikethrough.txt
+++ b/test/strikethrough.txt
@ -0,0 +1,75 @@
 # Strike-Through
 With the flag `MD_FLAG_STRIKETHROUGH`, MD4C enables extension for recognition
 of strike-through spans.
 Strike-through text is any text wrapped in one or two tildes (`~`).
 ```````````````````````````````` example
 ~Hi~ Hello, world!
 .
 <p><del>Hi</del> Hello, world!</p>
 ````````````````````````````````
 If the length of the opener and closer doesn't match, the strike-through is
 not recognized.
 ```````````````````````````````` example
 This ~text~~ is curious.
 .
 <p>This ~text~~ is curious.</p>
 ````````````````````````````````
 Too long tilde sequence won't be recognized:
 ```````````````````````````````` example
 foo ~~~bar~~~
 .
 <p>foo ~~~bar~~~</p>
 ````````````````````````````````
 Also note the markers cannot open a strike-through span if they are followed
 with a whitespace; and similarly, then cannot close the span if they are
 preceded with a whitespace:
 ```````````````````````````````` example
 ~foo ~bar
 .
 <p>~foo ~bar</p>
 ````````````````````````````````
 As with regular emphasis delimiters, a new paragraph will cause the cessation
 of parsing a strike-through:
 ```````````````````````````````` example
 This ~~has a
 new paragraph~~.
 .
 <p>This ~~has a</p>
 <p>new paragraph~~.</p>
 ````````````````````````````````
 ## GitHub Issues
 ### [Issue 69](https://github.com/mity/md4c/issues/69)
 ```````````````````````````````` example
 ~`foo`~
 .
 <p><del><code>foo</code></del></p>
 ````````````````````````````````
 ```````````````````````````````` example
 ~*foo*~
 .
 <p><del><em>foo</em></del></p>
 ````````````````````````````````
 ```````````````````````````````` example
 *~foo~*
 .
 <p><em><del>foo</del></em></p>
 ````````````````````````````````
--- a/test/tables.txt
+++ b/test/tables.txt
@ -0,0 +1,363 @@
 # Tables
 With the flag `MD_FLAG_TABLES`, MD4C enables extension for recognition of
 tables.
 Basic table example of a table with two columns and three lines (when not
 counting the header) is as follows:
 ```````````````````````````````` example
 | Column 1 | Column 2 |
 |----------|----------|
 | foo      | bar      |
 | baz      | qux      |
 | quux     | quuz     |
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td>foo</td><td>bar</td></tr>
 <tr><td>baz</td><td>qux</td></tr>
 <tr><td>quux</td><td>quuz</td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 The leading and succeeding pipe characters (`|`) on each line are optional:
 ```````````````````````````````` example
 Column 1 | Column 2 |
 ---------|--------- |
 foo      | bar      |
 baz      | qux      |
 quux     | quuz     |
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td>foo</td><td>bar</td></tr>
 <tr><td>baz</td><td>qux</td></tr>
 <tr><td>quux</td><td>quuz</td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 ```````````````````````````````` example
 | Column 1 | Column 2
 |----------|---------
 | foo      | bar
 | baz      | qux
 | quux     | quuz
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td>foo</td><td>bar</td></tr>
 <tr><td>baz</td><td>qux</td></tr>
 <tr><td>quux</td><td>quuz</td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 ```````````````````````````````` example
 Column 1 | Column 2
 ---------|---------
 foo      | bar
 baz      | qux
 quux     | quuz
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td>foo</td><td>bar</td></tr>
 <tr><td>baz</td><td>qux</td></tr>
 <tr><td>quux</td><td>quuz</td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 However for one-column table, at least one pipe has to be used in the table
 header underline, otherwise it would be parsed as a Setext title followed by
 a paragraph.
 ```````````````````````````````` example
 Column 1
 --------
 foo
 baz
 quux
 .
 <h2>Column 1</h2>
 <p>foo
 baz
 quux</p>
 ````````````````````````````````
 Leading and trailing whitespace in a table cell is ignored and the columns do
 not need to be aligned.
 ```````````````````````````````` example
 Column 1 |Column 2
 ---|---
 foo | bar
 baz| qux
 quux|quuz
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td>foo</td><td>bar</td></tr>
 <tr><td>baz</td><td>qux</td></tr>
 <tr><td>quux</td><td>quuz</td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 The table cannot interrupt a paragraph.
 ```````````````````````````````` example
 Lorem ipsum dolor sit amet.
 | Column 1 | Column 2
 | ---------|---------
 | foo      | bar
 | baz      | qux
 | quux     | quuz
 .
 <p>Lorem ipsum dolor sit amet.
 | Column 1 | Column 2
 | ---------|---------
 | foo      | bar
 | baz      | qux
 | quux     | quuz</p>
 ````````````````````````````````
 Similarly, paragraph cannot interrupt a table:
 ```````````````````````````````` example
 Column 1 | Column 2
 ---------|---------
 foo      | bar
 baz      | qux
 quux     | quuz
 Lorem ipsum dolor sit amet.
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td>foo</td><td>bar</td></tr>
 <tr><td>baz</td><td>qux</td></tr>
 <tr><td>quux</td><td>quuz</td></tr>
 <tr><td>Lorem ipsum dolor sit amet.</td><td></td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 The underline of the table is crucial for recognition of the table, count of
 its columns and their alignment: The line has to contain at least one pipe,
 and it has provide at least three dash (`-`) characters for every column in
 the table.
 Thus this is not a table because there are too few dashes for Column 2.
 ```````````````````````````````` example
 | Column 1 | Column 2
 | ---------|--
 | foo      | bar
 | baz      | qux
 | quux     | quuz
 .
 <p>| Column 1 | Column 2
 | ---------|--
 | foo      | bar
 | baz      | qux
 | quux     | quuz</p>
 ````````````````````````````````
 The first, the last or both the first and the last dash in each column
 underline can be replaced with a colon (`:`) to request left, right or middle
 alignment of the respective column:
 ```````````````````````````````` example
 | Column 1 | Column 2 | Column 3 | Column 4 |
 |----------|:---------|:--------:|---------:|
 | default  | left     | center   | right    |
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th align="left">Column 2</th><th align="center">Column 3</th><th align="right">Column 4</th></tr>
 </thead>
 <tbody>
 <tr><td>default</td><td align="left">left</td><td align="center">center</td><td align="right">right</td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 To include a literal pipe character in any cell, it has to be escaped.
 ```````````````````````````````` example
 Column 1 | Column 2
 ---------|---------
 foo      | bar
 baz      | qux \| xyzzy
 quux     | quuz
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td>foo</td><td>bar</td></tr>
 <tr><td>baz</td><td>qux | xyzzy</td></tr>
 <tr><td>quux</td><td>quuz</td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 Contents of each cell is parsed as an inline text which may contents any
 inline Markdown spans like emphasis, strong emphasis, links etc.
 ```````````````````````````````` example
 Column 1 | Column 2
 ---------|---------
 *foo*    | bar
 **baz**  | [qux]
 quux     | [quuz](/url2)
 [qux]: /url
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td><em>foo</em></td><td>bar</td></tr>
 <tr><td><strong>baz</strong></td><td><a href="/url">qux</a></td></tr>
 <tr><td>quux</td><td><a href="/url2">quuz</a></td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 However pipes which are inside a code span are not recognized as cell
 boundaries.
 ```````````````````````````````` example
 Column 1 | Column 2
 ---------|---------
 `foo     | bar`
 baz      | qux
 quux     | quuz
 .
 <table>
 <thead>
 <tr><th>Column 1</th><th>Column 2</th></tr>
 </thead>
 <tbody>
 <tr><td><code>foo     | bar</code></td><td></td></tr>
 <tr><td>baz</td><td>qux</td></tr>
 <tr><td>quux</td><td>quuz</td></tr>
 </tbody>
 </table>
 ````````````````````````````````
 ## GitHub Issues
 ### [Issue 41](https://github.com/mity/md4c/issues/41)
 ```````````````````````````````` example
 * x|x
 ---|---
 .
 <ul>
 <li>x|x
 ---|---</li>
 </ul>
 ````````````````````````````````
 (Not a table, because the underline has wrong indentation and is not part of the
 list item.)
 ```````````````````````````````` example
 * x|x
  ---|---
 x|x
 .
 <ul>
 <li><table>
 <thead>
 <tr>
 <th>x</th>
 <th>x</th>
 </tr>
 </thead>
 <tbody>
 </tbody>
 </table>
 </li>
 </ul>
 <p>x|x</p>
 ````````````````````````````````
 (Here the underline has the right indentation so the table is detected.
 But the last line is not part of it due its indentation.)
 ### [Issue 42](https://github.com/mity/md4c/issues/42)
 ```````````````````````````````` example
 ] http://x.x *x*
 |x|x|
 |---|---|
 |x|
 .
 <p>] http://x.x <em>x</em></p>
 <table>
 <thead>
 <tr>
 <th>x</th>
 <th>x</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td>x</td>
 <td></td>
 </tr>
 </tbody>
 </table>
 ````````````````````````````````
 ### [Issue 104](https://github.com/mity/md4c/issues/104)
 ```````````````````````````````` example
 A | B
 --- | ---
 [x](url)
 .
 <table>
 <thead>
 <tr>
 <th>A</th>
 <th>B</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td><a href="url">x</a></td>
 <td></td>
 </tr>
 </tbody>
 </table>
 ````````````````````````````````
--- a/test/tasklists.txt
+++ b/test/tasklists.txt
@ -0,0 +1,117 @@
 # Tasklists
 With the flag `MD_FLAG_TASKLISTS`, MD4C enables extension for recognition of
 task lists.
 Basic task list may look as follows:
 ```````````````````````````````` example
 * [x] foo
 * [X] bar
 * [ ] baz
 .
 <ul>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
 </ul>
 ````````````````````````````````
 Task lists can also be in ordered lists:
 ```````````````````````````````` example
 1. [x] foo
 2. [X] bar
 3. [ ] baz
 .
 <ol>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
 </ol>
 ````````````````````````````````
 Task lists can also be nested in ordinary lists:
 ```````````````````````````````` example
 * xxx:
   * [x] foo
   * [x] bar
   * [ ] baz
 * yyy:
   * [ ] qux
   * [x] quux
   * [ ] quuz
 .
 <ul>
 <li>xxx:
 <ul>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
 </ul></li>
 <li>yyy:
 <ul>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>qux</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>quux</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>quuz</li>
 </ul></li>
 </ul>
 ````````````````````````````````
 Or in a parent task list:
 ```````````````````````````````` example
 1. [x] xxx:
    * [x] foo
    * [x] bar
    * [ ] baz
 2. [ ] yyy:
    * [ ] qux
    * [x] quux
    * [ ] quuz
 .
 <ol>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>xxx:
 <ul>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
 </ul></li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>yyy:
 <ul>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>qux</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>quux</li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>quuz</li>
 </ul></li>
 </ol>
 ````````````````````````````````
 Also, ordinary lists can be nested in the task lists.
 ```````````````````````````````` example
 * [x] xxx:
   * foo
   * bar
   * baz
 * [ ] yyy:
   * qux
   * quux
   * quuz
 .
 <ul>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>xxx:
 <ul>
 <li>foo</li>
 <li>bar</li>
 <li>baz</li>
 </ul></li>
 <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>yyy:
 <ul>
 <li>qux</li>
 <li>quux</li>
 <li>quuz</li>
 </ul></li>
 </ul>
 ````````````````````````````````
--- a/test/underline.txt
+++ b/test/underline.txt
@ -0,0 +1,39 @@
 # Underline
 With the flag `MD_FLAG_UNDERLINE`, MD4C sees underscore `_` rather as a mark
 denoting an underlined span rather then an ordinary emphasis (or a strong
 emphasis).
 ```````````````````````````````` example
 _foo_
 .
 <p><u>foo</u></p>
 ````````````````````````````````
 In sequences of multiple underscores, each single one translates into an
 underline span mark.
 ```````````````````````````````` example
 ___foo___
 .
 <p><u><u><u>foo</u></u></u></p>
 ````````````````````````````````
 Intra-word underscores are not recognized as underline marks:
 ```````````````````````````````` example
 foo_bar_baz
 .
 <p>foo_bar_baz</p>
 ````````````````````````````````
 Also the parser follows the standard understanding when the underscore can
 or cannot open or close a span. Therefore there is no underline in the following
 example because no underline can be seen as a closing mark.
 ```````````````````````````````` example
 _foo _bar
 .
 <p>_foo _bar</p>
 ````````````````````````````````
--- a/test/wiki-links.txt
+++ b/test/wiki-links.txt
@ -0,0 +1,232 @@
 # Wiki Links
 With the flag `MD_FLAG_WIKILINKS`, MD4C recognizes wiki links.
 The simple wiki-link is a wiki-link destination enclosed in `[[` followed with
 `]]`.
 ```````````````````````````````` example
 [[foo]]
 .
 <p><x-wikilink data-target="foo">foo</x-wikilink></p>
 ````````````````````````````````
 However wiki-link may contain an explicit label, delimited from the destination
 with `|`.
 ```````````````````````````````` example
 [[foo|bar]]
 .
 <p><x-wikilink data-target="foo">bar</x-wikilink></p>
 ````````````````````````````````
 A wiki-link destination cannot be empty.
 ```````````````````````````````` example
 [[]]
 .
 <p>[[]]</p>
 ````````````````````````````````
 ```````````````````````````````` example
 [[|foo]]
 .
 <p>[[|foo]]</p>
 ````````````````````````````````
 The wiki-link destination cannot contain a new line.
 ```````````````````````````````` example
 [[foo
 bar]]
 .
 <p>[[foo
 bar]]</p>
 ````````````````````````````````
 ```````````````````````````````` example
 [[foo
 bar|baz]]
 .
 <p>[[foo
 bar|baz]]</p>
 ````````````````````````````````
 The wiki-link destination is rendered verbatim; inline markup in it is not
 recognized.
 ```````````````````````````````` example
 [[*foo*]]
 .
 <p><x-wikilink data-target="*foo*">*foo*</x-wikilink></p>
 ````````````````````````````````
 ```````````````````````````````` example
 [[foo|![bar](bar.jpg)]]
 .
 <p><x-wikilink data-target="foo"><img src="bar.jpg" alt="bar"></x-wikilink></p>
 ````````````````````````````````
 With multiple `|` delimiters, only the first one is recognized and the other
 ones are part of the label.
 ```````````````````````````````` example
 [[foo|bar|baz]]
 .
 <p><x-wikilink data-target="foo">bar|baz</x-wikilink></p>
 ````````````````````````````````
 However the delimiter `|` can be escaped with `/`.
 ```````````````````````````````` example
 [[foo\|bar|baz]]
 .
 <p><x-wikilink data-target="foo|bar">baz</x-wikilink></p>
 ````````````````````````````````
 The label can contain inline elements.
 ```````````````````````````````` example
 [[foo|*bar*]]
 .
 <p><x-wikilink data-target="foo"><em>bar</em></x-wikilink></p>
 ````````````````````````````````
 Empty explicit label is the same as using the implicit label; i.e. the verbatim
 destination string is used as the label.
 ```````````````````````````````` example
 [[foo|]]
 .
 <p><x-wikilink data-target="foo">foo</x-wikilink></p>
 ````````````````````````````````
 The label can span multiple lines.
 ```````````````````````````````` example
 [[foo|foo
 bar
 baz]]
 .
 <p><x-wikilink data-target="foo">foo
 bar
 baz</x-wikilink></p>
 ````````````````````````````````
 Wiki-links have higher priority then links.
 ```````````````````````````````` example
 [[foo]](foo.jpg)
 .
 <p><x-wikilink data-target="foo">foo</x-wikilink>(foo.jpg)</p>
 ````````````````````````````````
 ```````````````````````````````` example
 [foo]: /url
 [[foo]]
 .
 <p><x-wikilink data-target="foo">foo</x-wikilink></p>
 ````````````````````````````````
 Wiki links can be inlined in tables.
 ```````````````````````````````` example
 | A                | B   |
 |------------------|-----|
 | [[foo|*bar*]]    | baz |
 .
 <table>
 <thead>
 <tr>
 <th>A</th>
 <th>B</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td><x-wikilink data-target="foo"><em>bar</em></x-wikilink></td>
 <td>baz</td>
 </tr>
 </tbody>
 </table>
 ````````````````````````````````
 Wiki-links are not prioritized over images.
 ```````````````````````````````` example
 ![[foo]](foo.jpg)
 .
 <p><img src="foo.jpg" alt="[foo]"></p>
 ````````````````````````````````
 Something that may look like a wiki-link at first, but turns out not to be,
 is recognized as a normal link.
 ```````````````````````````````` example
 [[foo]
 [foo]: /url
 .
 <p>[<a href="/url">foo</a></p>
 ````````````````````````````````
 Escaping the opening `[` escapes only that one character, not the whole `[[`
 opener:
 ```````````````````````````````` example
 \[[foo]]
 [foo]: /url
 .
 <p>[<a href="/url">foo</a>]</p>
 ````````````````````````````````
 Like with other inline links, the innermost wiki-link is preferred.
 ```````````````````````````````` example
 [[foo[[bar]]]]
 .
 <p>[[foo<x-wikilink data-target="bar">bar</x-wikilink>]]</p>
 ````````````````````````````````
 There is limit of 100 characters for the wiki-link destination.
 ```````````````````````````````` example
 [[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901]]
 [[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901|foo]]
 .
 <p>[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901]]
 [[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901|foo]]</p>
 ````````````````````````````````
 100 characters inside a wiki link target works.
 ```````````````````````````````` example
 [[1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890]]
 [[1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890|foo]]
 .
 <p><x-wikilink data-target="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890">1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890</x-wikilink>
 <x-wikilink data-target="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890">foo</x-wikilink></p>
 ````````````````````````````````
 The limit on link content does not include any characters belonging to a block
 quote, if the label spans multiple lines contained in a block quote.
 ```````````````````````````````` example
 > [[12345678901234567890123456789012345678901234567890|1234567890
 > 1234567890
 > 1234567890
 > 1234567890
 > 123456789]]
 .
 <blockquote>
 <p><x-wikilink data-target="12345678901234567890123456789012345678901234567890">1234567890
 1234567890
 1234567890
 1234567890
 123456789</x-wikilink></p>
 </blockquote>
 ````````````````````````````````