Import Upstream version 0.4.3
This commit is contained in:
commit
e73391db57
|
@ -0,0 +1,34 @@
|
|||
# YAML definition for travis-ci.com continuous integration.
|
||||
# See https://docs.travis-ci.com/user/languages/c
|
||||
|
||||
language: c
|
||||
dist: bionic
|
||||
|
||||
compiler:
|
||||
- gcc
|
||||
|
||||
addons:
|
||||
apt:
|
||||
packages:
|
||||
- python3 # for running tests
|
||||
- lcov # for generating code coverage report
|
||||
|
||||
before_script:
|
||||
- mkdir build
|
||||
- cd build
|
||||
# We enforce -Wdeclaration-after-statement because Qt project needs to
|
||||
# build MD4C with Integrity compiler which chokes whenever a declaration
|
||||
# is not at the beginning of a block.
|
||||
- CFLAGS='--coverage -g -O0 -Wall -Wdeclaration-after-statement -Werror' cmake -DCMAKE_BUILD_TYPE=Debug -G 'Unix Makefiles' ..
|
||||
|
||||
script:
|
||||
- make VERBOSE=1
|
||||
|
||||
after_success:
|
||||
- ../scripts/run-tests.sh
|
||||
# Creating report
|
||||
- lcov --directory . --capture --output-file coverage.info # capture coverage info
|
||||
- lcov --remove coverage.info '/usr/*' --output-file coverage.info # filter out system
|
||||
- lcov --list coverage.info # debug info
|
||||
# Uploading report to CodeCov
|
||||
- bash <(curl -s https://codecov.io/bash) || echo "Codecov did not collect coverage reports"
|
|
@ -0,0 +1,268 @@
|
|||
|
||||
# MD4C Change Log
|
||||
|
||||
|
||||
## Version 0.4.3
|
||||
|
||||
New features:
|
||||
|
||||
* With `MD_FLAG_UNDERLINE`, spans enclosed in underscore (`_foo_`) are seen
|
||||
as underline (`MD_SPAN_UNDERLINE`) rather then an ordinary emphasis or
|
||||
strong emphasis.
|
||||
|
||||
Changes:
|
||||
|
||||
* The implementation of wiki-links extension (with `MD_FLAG_WIKILINKS`) has
|
||||
been simplified.
|
||||
|
||||
- A noticeable increase of MD4C's memory footprint introduced by the
|
||||
extension implementation in 0.4.0 has been removed.
|
||||
- The priority handling towards other inline elements have been unified.
|
||||
(This affects an obscure case where syntax of an image was in place of
|
||||
wiki-link destination made the wiki-link invalid. Now *all* inline spans
|
||||
in the wiki-link destination, including the images, is suppressed.)
|
||||
- The length limitation of 100 characters now always applies to wiki-link
|
||||
destination.
|
||||
|
||||
* Recognition of strike-through spans (with the flag `MD_FLAG_STRIKETHROUGH`)
|
||||
has become much stricter and, arguably, reasonable.
|
||||
|
||||
- Only single tildes (`~`) and double tildes (`~~`) are recognized as
|
||||
strike-through marks. Longer ones are not anymore.
|
||||
- The length of the opener and closer marks have to be the same.
|
||||
- The tildes cannot open a strike-through span if a whitespace follows.
|
||||
- The tildes cannot close a strike-through span if a whitespace precedes.
|
||||
|
||||
This change follows the changes of behavior in cmark-gfm some time ago, so
|
||||
it is also beneficial from compatibility point of view.
|
||||
|
||||
* When building MD4C by hand instead of using its CMake-based build, the UTF-8
|
||||
support was by default disabled, unless explicitly asked for by defining
|
||||
a preprocessor macro `MD4C_USE_UTF8`.
|
||||
|
||||
This has been changed and the UTF-8 mode now becomes the default, no matter
|
||||
how `md4c.c` is compiled. If you need to disable it and use the ASCII-only
|
||||
mode, you have explicitly define macro `MD4C_USE_ASCII` when compiling it.
|
||||
|
||||
(The CMake-based build as provided in our repository explicitly asked for
|
||||
the UTF-8 support with `-DMD4C_USE_UTF8`. I.e. if you are using MD4C library
|
||||
built with our vanilla `CMakeLists.txt` files, this change should not affect
|
||||
you.)
|
||||
|
||||
Fixes:
|
||||
|
||||
* Fixed some string length handling in the special `MD4C_USE_UTF16` build.
|
||||
|
||||
(This does not affect you unless you are on Windows and explicitly define
|
||||
the macro when building MD4C.)
|
||||
|
||||
* [#100](https://github.com/mity/md4c/issues/100):
|
||||
Fixed an off-by-one error in the maximal length limit of some segments
|
||||
of e-mail addresses used in autolinks.
|
||||
|
||||
* [#107](https://github.com/mity/md4c/issues/107):
|
||||
Fix mis-detection of asterisk-encoded emphasis in some corner cases when
|
||||
length of the opener and closer differs, as in `***foo *bar baz***`.
|
||||
|
||||
|
||||
## Version 0.4.2
|
||||
|
||||
Fixes:
|
||||
|
||||
* [#98](https://github.com/mity/md4c/issues/98):
|
||||
Fix mis-detection of asterisk-encoded emphasis in some corner cases when
|
||||
length of the opener and closer differs, as in `**a *b c** d*`.
|
||||
|
||||
|
||||
## Version 0.4.1
|
||||
|
||||
Unfortunately, 0.4.0 has been released with badly updated ChangeLog. Fixing
|
||||
this is the only change on 0.4.1.
|
||||
|
||||
|
||||
## Version 0.4.0
|
||||
|
||||
New features:
|
||||
|
||||
* With `MD_FLAG_LATEXMATHSPANS`, LaTeX math spans (`$...$`) and LaTeX display
|
||||
math spans (`$$...$$`) are now recognized. (Note though that the HTML
|
||||
renderer outputs them verbatim in a custom `<x-equation>` tag.)
|
||||
|
||||
Contributed by [Tilman Roeder](https://github.com/dyedgreen).
|
||||
|
||||
* With `MD_FLAG_WIKILINKS`, Wiki-style links (`[[...]]`) are now recognized.
|
||||
(Note though that the HTML renderer renders them as a custom `<x-wikilink>`
|
||||
tag.)
|
||||
|
||||
Contributed by [Nils Blomqvist](https://github.com/niblo).
|
||||
|
||||
Changes:
|
||||
|
||||
* Parsing of tables (with `MD_FLAG_TABLES`) is now closer to the way how
|
||||
cmark-gfm parses tables as we do not require every row of the table to
|
||||
contain a pipe `|` anymore.
|
||||
|
||||
As a consequence, paragraphs now cannot interrupt tables. A paragraph which
|
||||
follows the table has to be delimited with a blank line.
|
||||
|
||||
Fixes:
|
||||
|
||||
* [#94](https://github.com/mity/md4c/issues/94):
|
||||
`md_build_ref_def_hashtable()`: Do not allocate more memory then strictly
|
||||
needed.
|
||||
|
||||
* [#95](https://github.com/mity/md4c/issues/95):
|
||||
`md_is_container_mark()`: Ordered list mark requires at least one digit.
|
||||
|
||||
* [#96](https://github.com/mity/md4c/issues/96):
|
||||
Some fixes for link label comparison.
|
||||
|
||||
|
||||
## Version 0.3.4
|
||||
|
||||
Changes:
|
||||
|
||||
* Make Unicode-specific code compliant to Unicode 12.1.
|
||||
|
||||
* Structure `MD_BLOCK_CODE_DETAIL` got new member `fenced_char`. Application
|
||||
can use it to detect character used to form the block fences (`` ` `` or
|
||||
`~`). In the case of indented code block, it is set to zero.
|
||||
|
||||
Fixes:
|
||||
|
||||
* [#77](https://github.com/mity/md4c/issues/77):
|
||||
Fix maximal count of digits for numerical character references, as requested
|
||||
by CommonMark specification 0.29.
|
||||
|
||||
* [#78](https://github.com/mity/md4c/issues/78):
|
||||
Fix link reference definition label matching for Unicode characters where
|
||||
the folding mapping leads to multiple codepoints, as e.g. in `ẞ` -> `SS`.
|
||||
|
||||
* [#83](https://github.com/mity/md4c/issues/83):
|
||||
Fix recognition of an empty blockquote which interrupts a paragraph.
|
||||
|
||||
|
||||
## Version 0.3.3
|
||||
|
||||
Changes:
|
||||
|
||||
* Make permissive URL autolink and permissive WWW autolink extensions stricter.
|
||||
|
||||
This brings the behavior closer to GFM and mitigates risk of false positives.
|
||||
In particular, the domain has to contain at least one dot and parenthesis
|
||||
can be part of the link destination only if `(` and `)` are balanced.
|
||||
|
||||
Fixes:
|
||||
|
||||
* [#73](https://github.com/mity/md4c/issues/73):
|
||||
Some raw HTML inputs could lead to quadratic parsing times.
|
||||
|
||||
* [#74](https://github.com/mity/md4c/issues/74):
|
||||
Fix input leading to a crash. Found by fuzzing.
|
||||
|
||||
* [#76](https://github.com/mity/md4c/issues/76):
|
||||
Fix handling of parenthesis in some corner cases of permissive URL autolink
|
||||
and permissive WWW autolink extensions.
|
||||
|
||||
|
||||
## Version 0.3.2
|
||||
|
||||
Changes:
|
||||
|
||||
* Changes mandated by CommonMark specification 0.29.
|
||||
|
||||
Most importantly, the white-space trimming rules for code spans have changed.
|
||||
At most one space/newline is trimmed from beginning/end of the code span
|
||||
(if the code span contains some non-space contents, and if it begins and
|
||||
ends with space at the same time). In all other cases the spaces in the code
|
||||
span are now left intact.
|
||||
|
||||
Other changes in behavior are in corner cases only. Refer to [CommonMark
|
||||
0.29 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.29)
|
||||
for more info.
|
||||
|
||||
Fixes:
|
||||
|
||||
* [#68](https://github.com/mity/md4c/issues/68):
|
||||
Some specific HTML blocks were not recognized when EOF follows without any
|
||||
end-of-line character.
|
||||
|
||||
* [#69](https://github.com/mity/md4c/issues/69):
|
||||
Strike-through span not working correctly when its opener mark is directly
|
||||
followed by other opener mark; or when other closer mark directly precedes
|
||||
its closer mark.
|
||||
|
||||
|
||||
## Version 0.3.1
|
||||
|
||||
Fixes:
|
||||
|
||||
* [#58](https://github.com/mity/md4c/issues/58),
|
||||
[#59](https://github.com/mity/md4c/issues/59),
|
||||
[#60](https://github.com/mity/md4c/issues/60),
|
||||
[#63](https://github.com/mity/md4c/issues/63),
|
||||
[#66](https://github.com/mity/md4c/issues/66):
|
||||
Some inputs could lead to quadratic parsing times. Thanks to Anders Kaseorg
|
||||
for finding all those issues.
|
||||
|
||||
* [#61](https://github.com/mity/md4c/issues/59):
|
||||
Flag `MD_FLAG_NOHTMLSPANS` erroneously affected also recognition of
|
||||
CommonMark autolinks.
|
||||
|
||||
|
||||
## Version 0.3.0
|
||||
|
||||
New features:
|
||||
|
||||
* Add extension for GitHub-style task lists:
|
||||
|
||||
```
|
||||
* [x] foo
|
||||
* [x] bar
|
||||
* [ ] baz
|
||||
```
|
||||
|
||||
(It has to be explicitly enabled with `MD_FLAG_TASKLISTS`.)
|
||||
|
||||
* Added support for building as a shared library. On non-Windows platforms,
|
||||
this is now default behavior; on Windows static library is still the default.
|
||||
The CMake option `BUILD_SHARED_LIBS` can be used to request one or the other
|
||||
explicitly.
|
||||
|
||||
Contributed by Lisandro Damián Nicanor Pérez Meyer.
|
||||
|
||||
* Renamed structure `MD_RENDERER` to `MD_PARSER` and refactorize its contents
|
||||
a little bit. Note this is source-level incompatible and initialization code
|
||||
in apps may need to be updated.
|
||||
|
||||
The aim of the change is to be more friendly for long-term ABI compatibility
|
||||
we shall maintain, starting with this release.
|
||||
|
||||
* Added `CHANGELOG.md` (this file).
|
||||
|
||||
* Make sure `md_process_table_row()` reports the same count of table cells for
|
||||
all table rows, no matter how broken the input is. The cell count is derived
|
||||
from table underline line. Bogus cells in other rows are silently ignored.
|
||||
Missing cells in other rows are reported as empty ones.
|
||||
|
||||
Fixes:
|
||||
|
||||
* CID 1475544:
|
||||
Calling `md_free_attribute()` on uninitialized data.
|
||||
|
||||
* [#47](https://github.com/mity/md4c/issues/47):
|
||||
Using bad offsets in `md_is_entity_str()`, in some cases leading to buffer
|
||||
overflow.
|
||||
|
||||
* [#51](https://github.com/mity/md4c/issues/51):
|
||||
Segfault in `md_process_table_cell()`.
|
||||
|
||||
* [#53](https://github.com/mity/md4c/issues/53):
|
||||
With `MD_FLAG_PERMISSIVEURLAUTOLINKS` or `MD_FLAG_PERMISSIVEWWWAUTOLINKS`
|
||||
we could generate bad output for ordinary Markdown links, if a non-space
|
||||
character immediately follows like e.g. in `[link](http://github.com)X`.
|
||||
|
||||
|
||||
## Version 0.2.7
|
||||
|
||||
This was the last version before the changelog has been added.
|
|
@ -0,0 +1,56 @@
|
|||
|
||||
cmake_minimum_required(VERSION 3.4)
|
||||
project(MD4C C)
|
||||
|
||||
set(MD_VERSION_MAJOR 0)
|
||||
set(MD_VERSION_MINOR 4)
|
||||
set(MD_VERSION_RELEASE 3)
|
||||
set(MD_VERSION "${MD_VERSION_MAJOR}.${MD_VERSION_MINOR}.${MD_VERSION_RELEASE}")
|
||||
|
||||
if(WIN32)
|
||||
# On Windows, given there is no standard lib install dir etc., we rather
|
||||
# by default build static lib.
|
||||
option(BUILD_SHARED_LIBS "help string describing option" OFF)
|
||||
else()
|
||||
# On Linux, MD4C is slowly being adding into some distros which prefer
|
||||
# shared lib.
|
||||
option(BUILD_SHARED_LIBS "help string describing option" ON)
|
||||
endif()
|
||||
|
||||
add_definitions(
|
||||
-DMD_VERSION_MAJOR=${MD_VERSION_MAJOR}
|
||||
-DMD_VERSION_MINOR=${MD_VERSION_MINOR}
|
||||
-DMD_VERSION_RELEASE=${MD_VERSION_RELEASE}
|
||||
)
|
||||
|
||||
set(CMAKE_CONFIGURATION_TYPES Debug Release RelWithDebInfo MinSizeRel)
|
||||
if("${CMAKE_BUILD_TYPE}" STREQUAL "")
|
||||
set(CMAKE_BUILD_TYPE $ENV{CMAKE_BUILD_TYPE})
|
||||
|
||||
if("${CMAKE_BUILD_TYPE}" STREQUAL "")
|
||||
set(CMAKE_BUILD_TYPE "Release")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
|
||||
if(${CMAKE_C_COMPILER_ID} MATCHES GNU|Clang)
|
||||
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall")
|
||||
elseif(MSVC)
|
||||
# Disable warnings about the so-called unsecured functions:
|
||||
add_definitions(/D_CRT_SECURE_NO_WARNINGS)
|
||||
|
||||
# Specify proper C runtime library:
|
||||
string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
|
||||
string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE}")
|
||||
string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_RELWITHDEBINFO "{$CMAKE_C_FLAGS_RELWITHDEBINFO}")
|
||||
string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_MINSIZEREL}")
|
||||
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /MTd")
|
||||
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /MT")
|
||||
set(CMAKE_C_FLAGS_RELWITHDEBINFO "${CMAKE_C_FLAGS_RELEASE} /MT")
|
||||
set(CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_RELEASE} /MT")
|
||||
endif()
|
||||
|
||||
include(GNUInstallDirs)
|
||||
|
||||
add_subdirectory(md4c)
|
||||
add_subdirectory(md2html)
|
|
@ -0,0 +1,22 @@
|
|||
|
||||
# The MIT License (MIT)
|
||||
|
||||
Copyright © 2016-2020 Martin Mitáš
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a
|
||||
copy of this software and associated documentation files (the “Software”),
|
||||
to deal in the Software without restriction, including without limitation
|
||||
the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
and/or sell copies of the Software, and to permit persons to whom the
|
||||
Software is furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included
|
||||
in all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
IN THE SOFTWARE.
|
|
@ -0,0 +1,286 @@
|
|||
[![Linux Build Status (travis-ci.com)](https://img.shields.io/travis/mity/md4c/master.svg?logo=linux&label=linux%20build)](https://travis-ci.org/mity/md4c)
|
||||
[![Windows Build Status (appveyor.com)](https://img.shields.io/appveyor/ci/mity/md4c/master.svg?logo=windows&label=windows%20build)](https://ci.appveyor.com/project/mity/md4c/branch/master)
|
||||
[![Code Coverage Status (codecov.io)](https://img.shields.io/codecov/c/github/mity/md4c/master.svg?logo=codecov&label=code%20coverage)](https://codecov.io/github/mity/md4c)
|
||||
[![Coverity Scan Status](https://img.shields.io/coverity/scan/mity-md4c.svg?label=coverity%20scan)](https://scan.coverity.com/projects/mity-md4c)
|
||||
|
||||
|
||||
# MD4C Readme
|
||||
|
||||
* Home: http://github.com/mity/md4c
|
||||
* Wiki: http://github.com/mity/md4c/wiki
|
||||
* Issue tracker: http://github.com/mity/md4c/issues
|
||||
|
||||
MD4C stands for "Markdown for C" and that's exactly what this project is about.
|
||||
|
||||
|
||||
## What is Markdown
|
||||
|
||||
In short, Markdown is the markup language this `README.md` file is written in.
|
||||
|
||||
The following resources can explain more if you are unfamiliar with it:
|
||||
* [Wikipedia article](http://en.wikipedia.org/wiki/Markdown)
|
||||
* [CommonMark site](http://commonmark.org)
|
||||
|
||||
|
||||
## What is MD4C
|
||||
|
||||
MD4C is C Markdown parser with the following features:
|
||||
|
||||
* **Compliance:** Generally MD4C aims to be compliant to the latest version of
|
||||
[CommonMark specification](http://spec.commonmark.org/). Currently, we are
|
||||
fully compliant to CommonMark 0.29.
|
||||
|
||||
* **Extensions:** MD4C supports some commonly requested and accepted extensions.
|
||||
See below.
|
||||
|
||||
* **Compactness:** MD4C is implemented in one source file and one header file.
|
||||
There are no dependencies other then standard C library.
|
||||
|
||||
* **Embedding:** MD4C is easy to reuse in other projects, its API is very
|
||||
straightforward: There is actually just one function, `md_parse()`.
|
||||
|
||||
* **Push model:** MD4C parses the complete document and calls few callback
|
||||
functions provided by the application to inform it about a start/end of
|
||||
every block, a start/end of every span, and with any textual contents.
|
||||
|
||||
* **Portability:** MD4C builds and works on Windows and POSIX-compliant OSes.
|
||||
(It should be simple to make it run also on most other platforms, at least as
|
||||
long as the platform provides C standard library, including a heap memory
|
||||
management.)
|
||||
|
||||
* **Encoding:** MD4C can be compiled to recognize ASCII-only control characters,
|
||||
UTF-8 and, on Windows, also UTF-16 (i.e. what is on Windows commonly called
|
||||
just "Unicode"). See more details below.
|
||||
|
||||
* **Permissive license:** MD4C is available under the MIT license.
|
||||
|
||||
* **Performance:** MD4C is [very fast](https://talk.commonmark.org/t/2520).
|
||||
|
||||
|
||||
## Using MD4C
|
||||
|
||||
Application has to include the header `md4c.h` and link against MD4C library;
|
||||
or alternatively it may include `md4c.h` and `md4c.c` directly into its source
|
||||
base as the parser is only implemented in the single C source file.
|
||||
|
||||
The main provided function is `md_parse()`. It takes a text in the Markdown
|
||||
syntax and a pointer to a structure which provides pointers to several callback
|
||||
functions.
|
||||
|
||||
As `md_parse()` processes the input, it calls the callbacks (when entering or
|
||||
leaving any Markdown block or span; and when outputting any textual content of
|
||||
the document), allowing application to convert it into another format or render
|
||||
it onto the screen.
|
||||
|
||||
An example implementation of simple renderer is available in the `md2html`
|
||||
directory which implements a conversion utility from Markdown to HTML.
|
||||
|
||||
|
||||
## Markdown Extensions
|
||||
|
||||
The default behavior is to recognize only Markdown syntax defined by the
|
||||
[CommonMark specification](http://spec.commonmark.org/).
|
||||
|
||||
However with appropriate flags, the behavior can be tuned to enable some
|
||||
additional extensions:
|
||||
|
||||
* With the flag `MD_FLAG_COLLAPSEWHITESPACE`, a non-trivial whitespace is
|
||||
collapsed into a single space.
|
||||
|
||||
* With the flag `MD_FLAG_TABLES`, GitHub-style tables are supported.
|
||||
|
||||
* With the flag `MD_FLAG_TASKLISTS`, GitHub-style task lists are supported.
|
||||
|
||||
* With the flag `MD_FLAG_STRIKETHROUGH`, strike-through spans are enabled
|
||||
(text enclosed in tilde marks, e.g. `~foo bar~`).
|
||||
|
||||
* With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS` permissive URL autolinks
|
||||
(not enclosed in `<` and `>`) are supported.
|
||||
|
||||
* With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, permissive e-mail
|
||||
autolinks (not enclosed in `<` and `>`) are supported.
|
||||
|
||||
* With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS` permissive WWW autolinks
|
||||
without any scheme specified (e.g. `www.example.com`) are supported. MD4C
|
||||
then assumes `http:` scheme.
|
||||
|
||||
* With the flag `MD_FLAG_LATEXMATHSPANS` LaTeX math spans (`$...$`) and
|
||||
LaTeX display math spans (`$$...$$`) are supported. (Note though that the
|
||||
HTML renderer outputs them verbatim in a custom tag `<x-equation>`.)
|
||||
|
||||
* With the flag `MD_FLAG_WIKILINKS`, wiki-style links (`[[link label]]` and
|
||||
`[[target article|link label]]`) are supported. (Note that the HTML renderer
|
||||
outputs them in a custom tag `<x-wikilink>`.)
|
||||
|
||||
* With the flag `MD_FLAG_UNDERLINE`, underscore (`_`) denotes an underline
|
||||
instead of an ordinary emphasis or strong emphasis.
|
||||
|
||||
Few features of CommonMark (those some people see as mis-features) may be
|
||||
disabled:
|
||||
|
||||
* With the flag `MD_FLAG_NOHTMLSPANS` or `MD_FLAG_NOHTMLBLOCKS`, raw inline
|
||||
HTML or raw HTML blocks respectively are disabled.
|
||||
|
||||
* With the flag `MD_FLAG_NOINDENTEDCODEBLOCKS`, indented code blocks are
|
||||
disabled.
|
||||
|
||||
|
||||
## Input/Output Encoding
|
||||
|
||||
The CommonMark specification generally assumes UTF-8 input, but under closer
|
||||
inspection, Unicode plays any role in few very specific situations when parsing
|
||||
Markdown documents:
|
||||
|
||||
1. For detection of word boundaries when processing emphasis and strong
|
||||
emphasis, some classification of Unicode characters (whether it is
|
||||
a whitespace or a punctuation) is needed.
|
||||
|
||||
2. For (case-insensitive) matching of a link reference label with the
|
||||
corresponding link reference definition, Unicode case folding is used.
|
||||
|
||||
3. For translating HTML entities (e.g. `&`) and numeric character
|
||||
references (e.g. `#` or `ಫ`) into their Unicode equivalents.
|
||||
|
||||
However MD4C leaves this translation on the renderer/application; as the
|
||||
renderer is supposed to really know output encoding and whether it really
|
||||
needs to perform this kind of translation. (For example, when the renderer
|
||||
outputs HTML, it may leave the entities untranslated and defer the work to
|
||||
a web browser.)
|
||||
|
||||
MD4C relies on this property of the CommonMark and the implementation is, to
|
||||
a large degree, encoding-agnostic. Most of MD4C code only assumes that the
|
||||
encoding of your choice is compatible with ASCII, i.e. that the codepoints
|
||||
below 128 have the same numeric values as ASCII.
|
||||
|
||||
Any input MD4C does not understand is simply seen as part of the document text
|
||||
and sent to the renderer's callback functions unchanged.
|
||||
|
||||
The two situations (word boundary detection and link reference matching) where
|
||||
MD4C has to understand Unicode are handled as specified by the following rules:
|
||||
|
||||
* If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8 for the
|
||||
word boundary detection and for the case-insensitive matching of link labels.
|
||||
|
||||
When none of these macros is explicitly used, this is the default behavior.
|
||||
|
||||
* On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses
|
||||
`WCHAR` instead of `char` and assumes UTF-16 encoding in those situations.
|
||||
(UTF-16 is what Windows developers usually call just "Unicode" and what
|
||||
Win32API generally works with.)
|
||||
|
||||
Note that because this macro affects also the types in `md4c.h`, you have
|
||||
to define the macro both when building MD4C as well as when including
|
||||
`md4c.h`.
|
||||
|
||||
Also note this is only supported in the parser (`md4c.[hc]`). The HTML
|
||||
renderer does not support this and you will have to write your own custom
|
||||
renderer to use this feature.
|
||||
|
||||
* If preprocessor macro `MD4C_USE_ASCII` is defined, MD4C assumes nothing but
|
||||
an ASCII input.
|
||||
|
||||
That effectively means that non-ASCII whitespace or punctuation characters
|
||||
won't be recognized as such and that link reference matching will work in
|
||||
a case-insensitive way only for ASCII letters (`[a-zA-Z]`).
|
||||
|
||||
|
||||
## Documentation
|
||||
|
||||
The API is quite well documented in the comments in the `md4c.h` header.
|
||||
|
||||
There is also [project wiki](http://github.com/mity/md4c/wiki) which provides
|
||||
some more comprehensive documentation. However note it is incomplete and some
|
||||
details may be little-bit outdated.
|
||||
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: In my code, I need to convert Markdown to HTML. How?**
|
||||
|
||||
**A:** Indeed the API, as provided by `md4c.h`, is just a SAX-like Markdown
|
||||
parser. Nothing more and nothing less.
|
||||
|
||||
That said, there is a complete HTML generator built on top of the parser in the
|
||||
directory `md2html` (the files `render_html.[hc]` and `entity.[hc]`). At this
|
||||
time, you have to directly reuse that code in your project.
|
||||
|
||||
There is [some discussion](https://github.com/mity/md4c/issues/82) whether this
|
||||
should be changed (and how) in the future.
|
||||
|
||||
**Q: How does MD4C compare to a parser XY?**
|
||||
|
||||
**A:** Some other implementations combine Markdown parser and HTML generator
|
||||
into a single entangled code hidden behind an interface which just allows the
|
||||
conversion from Markdown to HTML, and they are unusable if you want to process
|
||||
the input in any other way.
|
||||
|
||||
Even when the parsing is available as a standalone feature, most parsers (if
|
||||
not all of them; at least within the scope of C/C++ language) are full DOM-like
|
||||
parsers: They construct abstract syntax tree (AST) representation of the whole
|
||||
Markdown document. That takes time and it leads to bigger memory footprint.
|
||||
|
||||
It's completely fine as long as you really need it. If you don't need the full
|
||||
AST, there is very high chance that using MD4C will be faster and much less
|
||||
memory-hungry.
|
||||
|
||||
Last but not least, some Markdown parsers are implemented in a naive way. When
|
||||
fed with a [smartly crafted input pattern](test/pathological_tests.py), they
|
||||
may exhibit quadratic (or even worse) parsing times. What MD4C can still parse
|
||||
in a fraction of second may turn into long minutes or possibly hours with them.
|
||||
Hence, when such a naive parser is used to process an input from an untrusted
|
||||
source, the possibility of denial-of-service attacks becomes a real danger.
|
||||
|
||||
A lot of our effort went into providing linear parsing times no matter what
|
||||
kind of crazy input MD4C parser is fed with. (If you encounter an input pattern
|
||||
which leads to a sub-linear parsing times, please do not hesitate and report it
|
||||
as a bug.)
|
||||
|
||||
**Q: Does MD4C perform any input validation?**
|
||||
|
||||
**A:** No.
|
||||
|
||||
CommonMark specification declares that any sequence of (Unicode) characters is
|
||||
a valid Markdown document; i.e. that it does not matter whether some Markdown
|
||||
syntax is in some way broken or not. If it is broken, it will simply not be
|
||||
recognized and the parser should see the broken syntax construction just as a
|
||||
verbatim text.
|
||||
|
||||
MD4C takes this a step further. It sees any sequence of bytes as a valid input,
|
||||
following completely the GIGO philosophy (garbage in, garbage out).
|
||||
|
||||
If you need to validate that the input is, say, a valid UTF-8 document, you
|
||||
have to do it on your own. You can simply validate the whole Markdown document
|
||||
before passing it to the MD4C parser.
|
||||
|
||||
Alternatively, you may perform the validation on the fly during the parsing,
|
||||
in the `MD_PARSER::text()` callback. (Given how MD4C works internally, it will
|
||||
never break a sequence of bytes into multiple calls of `MD_PARSER::text()`,
|
||||
unless that sequence is already broken to multiple pieces in the input by some
|
||||
whitespace, new line character(s) and/or any Markdown syntax construction.)
|
||||
|
||||
|
||||
## License
|
||||
|
||||
MD4C is covered with MIT license, see the file `LICENSE.md`.
|
||||
|
||||
|
||||
## Links to Related Projects
|
||||
|
||||
Ports and bindings to other languages:
|
||||
|
||||
* [commonmark-d](https://github.com/AuburnSounds/commonmark-d):
|
||||
Port of MD4C to D language.
|
||||
|
||||
* [markdown-wasm](https://github.com/rsms/markdown-wasm):
|
||||
Markdown parser and HTML generator for WebAssembly, based on MD4C.
|
||||
|
||||
Software using MD4C:
|
||||
|
||||
* [Qt](https://www.qt.io/):
|
||||
Cross-platform C++ GUI framework.
|
||||
|
||||
* [Textosaurus](https://github.com/martinrotter/textosaurus):
|
||||
Cross-platform text editor based on Qt and Scintilla.
|
||||
|
||||
* [8th](https://8th-dev.com/):
|
||||
Cross-platform concatenative programming language.
|
|
@ -0,0 +1,29 @@
|
|||
# YAML definition for Appveyor.com continuous integration.
|
||||
# See http://www.appveyor.com/docs/appveyor-yml
|
||||
|
||||
version: '{branch}-{build}'
|
||||
|
||||
before_build:
|
||||
- 'cmake --version'
|
||||
- 'if "%PLATFORM%"=="x64" cmake -G "Visual Studio 12 Win64" .'
|
||||
- 'if not "%PLATFORM%"=="x64" cmake -G "Visual Studio 12" .'
|
||||
|
||||
build:
|
||||
project: md4c.sln
|
||||
verbosity: detailed
|
||||
|
||||
skip_tags: true
|
||||
|
||||
os:
|
||||
- Windows Server 2012 R2
|
||||
|
||||
configuration:
|
||||
- Debug
|
||||
- Release
|
||||
|
||||
platform:
|
||||
- x64 # 64-bit build
|
||||
- win32 # 32-bit build
|
||||
|
||||
artifacts:
|
||||
- path: $(configuration)/md2html/md2html.exe
|
|
@ -0,0 +1,4 @@
|
|||
# YAML definition for codecov.io code coverage reports.
|
||||
|
||||
ignore:
|
||||
- "md2html"
|
|
@ -0,0 +1,15 @@
|
|||
|
||||
include_directories("${PROJECT_SOURCE_DIR}/md4c")
|
||||
|
||||
add_executable(md2html cmdline.c cmdline.h entity.c entity.h md2html.c render_html.c render_html.h)
|
||||
target_link_libraries(md2html md4c)
|
||||
|
||||
install(
|
||||
TARGETS md2html
|
||||
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
|
||||
)
|
||||
|
||||
install(FILES "md2html.1" DESTINATION "${CMAKE_INSTALL_MANDIR}/man1")
|
|
@ -0,0 +1,296 @@
|
|||
/* cmdline.c: a reentrant version of getopt(). Written 2006 by Brian
|
||||
* Raiter. This code is in the public domain.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <ctype.h>
|
||||
#include "cmdline.h"
|
||||
|
||||
#define docallback(opt, val) \
|
||||
do { if ((r = callback(opt, val, data)) != 0) return r; } while (0)
|
||||
|
||||
/* Parse the given cmdline arguments.
|
||||
*/
|
||||
int readoptions(option const* list, int argc, char **argv,
|
||||
int (*callback)(int, char const*, void*), void *data)
|
||||
{
|
||||
char argstring[] = "--";
|
||||
option const *opt;
|
||||
char const *val;
|
||||
char const *p;
|
||||
int stop = 0;
|
||||
int argi, len, r;
|
||||
|
||||
if (!list || !callback)
|
||||
return -1;
|
||||
|
||||
for (argi = 1 ; argi < argc ; ++argi)
|
||||
{
|
||||
/* First, check for "--", which forces all remaining arguments
|
||||
* to be treated as non-options.
|
||||
*/
|
||||
if (!stop && argv[argi][0] == '-' && argv[argi][1] == '-'
|
||||
&& argv[argi][2] == '\0') {
|
||||
stop = 1;
|
||||
continue;
|
||||
}
|
||||
|
||||
/* Arguments that do not begin with '-' (or are only "-") are
|
||||
* not options.
|
||||
*/
|
||||
if (stop || argv[argi][0] != '-' || argv[argi][1] == '\0') {
|
||||
docallback(0, argv[argi]);
|
||||
continue;
|
||||
}
|
||||
|
||||
if (argv[argi][1] == '-')
|
||||
{
|
||||
/* Arguments that begin with a double-dash are long
|
||||
* options.
|
||||
*/
|
||||
p = argv[argi] + 2;
|
||||
val = strchr(p, '=');
|
||||
if (val)
|
||||
len = val++ - p;
|
||||
else
|
||||
len = strlen(p);
|
||||
|
||||
/* Is it on the list of valid options? If so, does it
|
||||
* expect a parameter?
|
||||
*/
|
||||
for (opt = list ; opt->optval ; ++opt)
|
||||
if (opt->name && !strncmp(p, opt->name, len)
|
||||
&& !opt->name[len])
|
||||
break;
|
||||
if (!opt->optval) {
|
||||
docallback('?', argv[argi]);
|
||||
} else if (!val && opt->arg == 1) {
|
||||
docallback(':', argv[argi]);
|
||||
} else if (val && opt->arg == 0) {
|
||||
docallback('=', argv[argi]);
|
||||
} else {
|
||||
docallback(opt->optval, val);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
/* Arguments that begin with a single dash contain one or
|
||||
* more short options. Each character in the argument is
|
||||
* examined in turn, unless a parameter consumes the rest
|
||||
* of the argument (or possibly even the following
|
||||
* argument).
|
||||
*/
|
||||
for (p = argv[argi] + 1 ; *p ; ++p) {
|
||||
for (opt = list ; opt->optval ; ++opt)
|
||||
if (opt->chname == *p)
|
||||
break;
|
||||
if (!opt->optval) {
|
||||
argstring[1] = *p;
|
||||
docallback('?', argstring);
|
||||
continue;
|
||||
} else if (opt->arg == 0) {
|
||||
docallback(opt->optval, NULL);
|
||||
continue;
|
||||
} else if (p[1]) {
|
||||
docallback(opt->optval, p + 1);
|
||||
break;
|
||||
} else if (argi + 1 < argc && strcmp(argv[argi + 1], "--")) {
|
||||
++argi;
|
||||
docallback(opt->optval, argv[argi]);
|
||||
break;
|
||||
} else if (opt->arg == 2) {
|
||||
docallback(opt->optval, NULL);
|
||||
continue;
|
||||
} else {
|
||||
argstring[1] = *p;
|
||||
docallback(':', argstring);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Verify that str points to an ASCII zero or one (optionally with
|
||||
* whitespace) and return the value present, or -1 if str's contents
|
||||
* are anything else.
|
||||
*/
|
||||
static int readboolvalue(char const *str)
|
||||
{
|
||||
char d;
|
||||
|
||||
while (isspace(*str))
|
||||
++str;
|
||||
if (!*str)
|
||||
return -1;
|
||||
d = *str++;
|
||||
while (isspace(*str))
|
||||
++str;
|
||||
if (*str)
|
||||
return -1;
|
||||
if (d == '0')
|
||||
return 0;
|
||||
else if (d == '1')
|
||||
return 1;
|
||||
else
|
||||
return -1;
|
||||
}
|
||||
|
||||
/* Parse a configuration file.
|
||||
*/
|
||||
int readcfgfile(option const* list, FILE *fp,
|
||||
int (*callback)(int, char const*, void*), void *data)
|
||||
{
|
||||
char buf[1024];
|
||||
option const *opt;
|
||||
char *name, *val, *p;
|
||||
int len, f, r;
|
||||
|
||||
while (fgets(buf, sizeof buf, fp) != NULL)
|
||||
{
|
||||
/* Strip off the trailing newline and any leading whitespace.
|
||||
* If the line begins with a hash sign, skip it entirely.
|
||||
*/
|
||||
len = strlen(buf);
|
||||
if (len && buf[len - 1] == '\n')
|
||||
buf[--len] = '\0';
|
||||
for (p = buf ; isspace(*p) ; ++p) ;
|
||||
if (!*p || *p == '#')
|
||||
continue;
|
||||
|
||||
/* Find the end of the option's name and the beginning of the
|
||||
* parameter, if any.
|
||||
*/
|
||||
for (name = p ; *p && *p != '=' && !isspace(*p) ; ++p) ;
|
||||
len = p - name;
|
||||
for ( ; *p == '=' || isspace(*p) ; ++p) ;
|
||||
val = p;
|
||||
|
||||
/* Is it on the list of valid options? Does it take a
|
||||
* full parameter, or just an optional boolean?
|
||||
*/
|
||||
for (opt = list ; opt->optval ; ++opt)
|
||||
if (opt->name && !strncmp(name, opt->name, len)
|
||||
&& !opt->name[len])
|
||||
break;
|
||||
if (!opt->optval) {
|
||||
docallback('?', name);
|
||||
} else if (!*val && opt->arg == 1) {
|
||||
docallback(':', name);
|
||||
} else if (*val && opt->arg == 0) {
|
||||
f = readboolvalue(val);
|
||||
if (f < 0)
|
||||
docallback('=', name);
|
||||
else if (f == 1)
|
||||
docallback(opt->optval, NULL);
|
||||
} else {
|
||||
docallback(opt->optval, val);
|
||||
}
|
||||
}
|
||||
return ferror(fp) ? -1 : 0;
|
||||
}
|
||||
|
||||
/* Turn a string containing a cmdline into an argc-argv pair.
|
||||
*/
|
||||
int makecmdline(char const *cmdline, int *argcp, char ***argvp)
|
||||
{
|
||||
char **argv;
|
||||
int argc;
|
||||
char const *s;
|
||||
int n, quoted;
|
||||
|
||||
if (!cmdline)
|
||||
return 0;
|
||||
|
||||
/* Calcuate argc by counting the number of "clumps" of non-spaces.
|
||||
*/
|
||||
for (s = cmdline ; isspace(*s) ; ++s) ;
|
||||
if (!*s) {
|
||||
*argcp = 1;
|
||||
if (argvp) {
|
||||
*argvp = malloc(2 * sizeof(char*));
|
||||
if (!*argvp)
|
||||
return 0;
|
||||
(*argvp)[0] = NULL;
|
||||
(*argvp)[1] = NULL;
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
for (argc = 2, quoted = 0 ; *s ; ++s) {
|
||||
if (quoted == '"') {
|
||||
if (*s == '"')
|
||||
quoted = 0;
|
||||
else if (*s == '\\' && s[1])
|
||||
++s;
|
||||
} else if (quoted == '\'') {
|
||||
if (*s == '\'')
|
||||
quoted = 0;
|
||||
} else {
|
||||
if (isspace(*s)) {
|
||||
for ( ; isspace(s[1]) ; ++s) ;
|
||||
if (!s[1])
|
||||
break;
|
||||
++argc;
|
||||
} else if (*s == '"' || *s == '\'') {
|
||||
quoted = *s;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
*argcp = argc;
|
||||
if (!argvp)
|
||||
return 1;
|
||||
|
||||
/* Allocate space for all the arguments and their pointers.
|
||||
*/
|
||||
argv = malloc((argc + 1) * sizeof(char*) + strlen(cmdline) + 1);
|
||||
*argvp = argv;
|
||||
if (!argv)
|
||||
return 0;
|
||||
argv[0] = NULL;
|
||||
argv[1] = (char*)(argv + argc + 1);
|
||||
|
||||
/* Copy the string into the allocated memory immediately after the
|
||||
* argv array. Where spaces immediately follows a nonspace,
|
||||
* replace it with a \0. Where a nonspace immediately follows
|
||||
* spaces, store a pointer to it. (Except, of course, when the
|
||||
* space-nonspace transitions occur within quotes.)
|
||||
*/
|
||||
for (s = cmdline ; isspace(*s) ; ++s) ;
|
||||
for (argc = 1, n = 0, quoted = 0 ; *s ; ++s) {
|
||||
if (quoted == '"') {
|
||||
if (*s == '"') {
|
||||
quoted = 0;
|
||||
} else {
|
||||
if (*s == '\\' && s[1])
|
||||
++s;
|
||||
argv[argc][n++] = *s;
|
||||
}
|
||||
} else if (quoted == '\'') {
|
||||
if (*s == '\'')
|
||||
quoted = 0;
|
||||
else
|
||||
argv[argc][n++] = *s;
|
||||
} else {
|
||||
if (isspace(*s)) {
|
||||
argv[argc][n] = '\0';
|
||||
for ( ; isspace(s[1]) ; ++s) ;
|
||||
if (!s[1])
|
||||
break;
|
||||
argv[argc + 1] = argv[argc] + n + 1;
|
||||
++argc;
|
||||
n = 0;
|
||||
} else {
|
||||
if (*s == '"' || *s == '\'')
|
||||
quoted = *s;
|
||||
else
|
||||
argv[argc][n++] = *s;
|
||||
}
|
||||
}
|
||||
}
|
||||
argv[argc + 1] = NULL;
|
||||
return 1;
|
||||
}
|
|
@ -0,0 +1,86 @@
|
|||
/* cmdline.h: a reentrant version of getopt(). Written 2006 by Brian
|
||||
* Raiter. This code is in the public domain.
|
||||
*/
|
||||
|
||||
#ifndef _cmdline_h_
|
||||
#define _cmdline_h_
|
||||
|
||||
/* The information specifying a single cmdline option.
|
||||
*/
|
||||
typedef struct option {
|
||||
char const *name; /* the option's long name, or "" if none */
|
||||
char chname; /* a single-char name, or zero if none */
|
||||
int optval; /* a unique value representing this option */
|
||||
int arg; /* 0 = no arg, 1 = arg req'd, 2 = optional */
|
||||
} option;
|
||||
|
||||
/* Parse the given cmdline arguments. list is an array of option
|
||||
* structs, each entry specifying a valid option. The last struct in
|
||||
* the array must have name set to NULL. argc and argv give the
|
||||
* cmdline to parse. callback is the function to call for each option
|
||||
* and non-option found on the cmdline. data is a pointer that is
|
||||
* passed to each invocation of callback. The return value of callback
|
||||
* should be zero to continue processing the cmdline, or any other
|
||||
* value to abort. The return value of readoptions() is the value
|
||||
* returned from the last callback, or zero if no arguments were
|
||||
* found, or -1 if an error occurred.
|
||||
*
|
||||
* When readoptions() encounters a regular cmdline argument (i.e. a
|
||||
* non-option argument), callback() is invoked with opt equal to zero
|
||||
* and val pointing to the argument. When an option is found,
|
||||
* callback() is invoked with opt equal to the optval field in the
|
||||
* option struct corresponding to that option, and val points to the
|
||||
* option's paramter, or is NULL if the option does not take a
|
||||
* parameter. If readoptions() finds an option that does not appear in
|
||||
* the list of valid options, callback() is invoked with opt equal to
|
||||
* '?'. If readoptions() encounters an option that is missing its
|
||||
* required parameter, callback() is invoked with opt equal to ':'. If
|
||||
* readoptions() finds a parameter on a long option that does not
|
||||
* admit a parameter, callback() is invoked with opt equal to '='. In
|
||||
* each of these cases, val will point to the erroneous option
|
||||
* argument.
|
||||
*/
|
||||
extern int readoptions(option const* list, int argc, char **argv,
|
||||
int (*callback)(int opt, char const *val, void *data),
|
||||
void *data);
|
||||
|
||||
/* Parse the given file. list is an array of option structs, in the
|
||||
* same form as taken by readoptions(). fp is a pointer to an open
|
||||
* text file. callback is the function to call for each line found in
|
||||
* the configuration file. data is a pointer that is passed to each
|
||||
* invocation of callback. The return value of readcfgfile() is the
|
||||
* value returned from the last callback, or zero if no arguments were
|
||||
* found, or -1 if an error occurred while reading the file.
|
||||
*
|
||||
* The function will ignore lines that contain only whitespace, or
|
||||
* lines that begin with a hash sign. All other lines should be of the
|
||||
* form "OPTION=VALUE", where OPTION is one of the long options in
|
||||
* list. Whitespace around the equal sign is permitted. An option that
|
||||
* takes no arguments can either have a VALUE of 0 or 1, or omit the
|
||||
* "=VALUE" entirely. (A VALUE of 0 will behave the same as if the
|
||||
* line was not present.)
|
||||
*/
|
||||
extern int readcfgfile(option const* list, FILE *fp,
|
||||
int (*callback)(int opt, char const *val, void *data),
|
||||
void *data);
|
||||
|
||||
|
||||
/* Create an argc-argv pair from a string containing a command line.
|
||||
* cmdline is the string to be parsed. argcp points to the variable to
|
||||
* receive the argc value, and argvp points to the variable to receive
|
||||
* the argv value. argvp can be NULL if the caller just wants to get
|
||||
* argc. Zero is returned on failure. This function allocates memory
|
||||
* on behalf of the caller. The memory is allocated as a single block,
|
||||
* so it is sufficient to simply free() the pointer returned through
|
||||
* argvp. Note that argv[0] will always be initialized to NULL; the
|
||||
* first argument will be stored in argv[1]. The string is parsed by
|
||||
* separating arguments on whitespace boundaries. Space within
|
||||
* substrings enclosed in single-quotes is ignored. A substring
|
||||
* enclosed in double-quotes is treated the same, except that the
|
||||
* backslash is recognized as an escape character within such a
|
||||
* substring. Enclosing quotes and escaping backslashes are not copied
|
||||
* into the argv values.
|
||||
*/
|
||||
extern int makecmdline(char const *cmdline, int *argcp, char ***argvp);
|
||||
|
||||
#endif
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,42 @@
|
|||
/*
|
||||
* MD4C: Markdown parser for C
|
||||
* (http://github.com/mity/md4c)
|
||||
*
|
||||
* Copyright (c) 2016-2017 Martin Mitas
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#ifndef MD2HTML_ENTITY_H
|
||||
#define MD2HTML_ENTITY_H
|
||||
|
||||
#include <stdlib.h>
|
||||
|
||||
|
||||
/* Most entities are formed by single Unicode codepoint, few by two codepoints.
|
||||
* Single-codepoint entities have codepoints[1] set to zero. */
|
||||
struct entity {
|
||||
const char* name;
|
||||
unsigned codepoints[2];
|
||||
};
|
||||
|
||||
const struct entity* entity_lookup(const char* name, size_t name_size);
|
||||
|
||||
|
||||
#endif /* MD2HTML_ENTITY_H */
|
|
@ -0,0 +1,113 @@
|
|||
.TH MD2HTML 1 "June 2019" "" "General Commands Manual"
|
||||
.nh
|
||||
.ad l
|
||||
.
|
||||
.SH NAME
|
||||
.
|
||||
md2html \- convert Markdown to HTML
|
||||
.
|
||||
.SH SYNOPSIS
|
||||
.
|
||||
.B md2html
|
||||
.RI [ OPTION ]...\&
|
||||
.RI [ FILE ]
|
||||
.
|
||||
.SH OPTIONS
|
||||
.
|
||||
.SS General options:
|
||||
.
|
||||
.TP
|
||||
.BR -o ", " --output= \fIOUTFILE\fR
|
||||
Write output to \fIOUTFILE\fR instead of \fBstdout\fR(3)
|
||||
.
|
||||
.TP
|
||||
.BR -f ", " --full-html
|
||||
Generate full HTML document, including header
|
||||
.
|
||||
.TP
|
||||
.BR -s ", " --stat
|
||||
Measure time of input parsing
|
||||
.
|
||||
.TP
|
||||
.BR -h ", " --help
|
||||
Display help and exit
|
||||
.
|
||||
.TP
|
||||
.BR -v ", " --version
|
||||
Display version and exit
|
||||
.
|
||||
.SS Markdown dialect options:
|
||||
.
|
||||
.TP
|
||||
.B --commonmark
|
||||
CommonMark (the default)
|
||||
.
|
||||
.TP
|
||||
.B --github
|
||||
Github Flavored Markdown
|
||||
.
|
||||
.PP
|
||||
Note: dialect options are equivalent to some combination of flags below.
|
||||
.
|
||||
.SS Markdown extension options:
|
||||
.
|
||||
.TP
|
||||
.B --fcollapse-whitespace
|
||||
Collapse non-trivial whitespace
|
||||
.
|
||||
.TP
|
||||
.B --fverbatim-entities
|
||||
Do not translate entities
|
||||
.
|
||||
.TP
|
||||
.B --fpermissive-atx-headers
|
||||
Allow ATX headers without delimiting space
|
||||
.
|
||||
.TP
|
||||
.B --fpermissive-url-autolinks
|
||||
Allow URL autolinks without "<" and ">" delimiters
|
||||
.
|
||||
.TP
|
||||
.B --fpermissive-www-autolinks
|
||||
Allow WWW autolinks without any scheme (e.g. "www.example.com")
|
||||
.
|
||||
.TP
|
||||
.B --fpermissive-email-autolinks
|
||||
Allow e-mail autolinks without "<", ">" and "mailto:"
|
||||
.
|
||||
.TP
|
||||
.B --fpermissive-autolinks
|
||||
Enable all 3 of the above permissive autolinks options
|
||||
.
|
||||
.TP
|
||||
.B --fno-indented-code
|
||||
Disable indented code blocks
|
||||
.
|
||||
.TP
|
||||
.B --fno-html-blocks
|
||||
Disable raw HTML blocks
|
||||
.
|
||||
.TP
|
||||
.B --fno-html-spans
|
||||
Disable raw HTML spans
|
||||
.
|
||||
.TP
|
||||
.B --fno-html
|
||||
Same as \fB--fno-html-blocks --fno-html-spans\fR
|
||||
.
|
||||
.TP
|
||||
.B --ftables
|
||||
Enable tables
|
||||
.
|
||||
.TP
|
||||
.B --fstrikethrough
|
||||
Enable strikethrough spans
|
||||
.
|
||||
.TP
|
||||
.B --ftasklists
|
||||
Enable task lists
|
||||
.
|
||||
.SH SEE ALSO
|
||||
.
|
||||
https://github.com/mity/md4c
|
||||
.
|
|
@ -0,0 +1,371 @@
|
|||
/*
|
||||
* MD4C: Markdown parser for C
|
||||
* (http://github.com/mity/md4c)
|
||||
*
|
||||
* Copyright (c) 2016-2017 Martin Mitas
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <time.h>
|
||||
|
||||
#include "render_html.h"
|
||||
#include "cmdline.h"
|
||||
|
||||
|
||||
|
||||
/* Global options. */
|
||||
static unsigned parser_flags = 0;
|
||||
static unsigned renderer_flags = MD_RENDER_FLAG_DEBUG;
|
||||
static int want_fullhtml = 0;
|
||||
static int want_stat = 0;
|
||||
|
||||
|
||||
/*********************************
|
||||
*** Simple grow-able buffer ***
|
||||
*********************************/
|
||||
|
||||
/* We render to a memory buffer instead of directly outputting the rendered
|
||||
* documents, as this allows using this utility for evaluating performance
|
||||
* of MD4C (--stat option). This allows us to measure just time of the parser,
|
||||
* without the I/O.
|
||||
*/
|
||||
|
||||
struct membuffer {
|
||||
char* data;
|
||||
size_t asize;
|
||||
size_t size;
|
||||
};
|
||||
|
||||
static void
|
||||
membuf_init(struct membuffer* buf, MD_SIZE new_asize)
|
||||
{
|
||||
buf->size = 0;
|
||||
buf->asize = new_asize;
|
||||
buf->data = malloc(buf->asize);
|
||||
if(buf->data == NULL) {
|
||||
fprintf(stderr, "membuf_init: malloc() failed.\n");
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
membuf_fini(struct membuffer* buf)
|
||||
{
|
||||
if(buf->data)
|
||||
free(buf->data);
|
||||
}
|
||||
|
||||
static void
|
||||
membuf_grow(struct membuffer* buf, size_t new_asize)
|
||||
{
|
||||
buf->data = realloc(buf->data, new_asize);
|
||||
if(buf->data == NULL) {
|
||||
fprintf(stderr, "membuf_grow: realloc() failed.\n");
|
||||
exit(1);
|
||||
}
|
||||
buf->asize = new_asize;
|
||||
}
|
||||
|
||||
static void
|
||||
membuf_append(struct membuffer* buf, const char* data, MD_SIZE size)
|
||||
{
|
||||
if(buf->asize < buf->size + size)
|
||||
membuf_grow(buf, buf->size + buf->size / 2 + size);
|
||||
memcpy(buf->data + buf->size, data, size);
|
||||
buf->size += size;
|
||||
}
|
||||
|
||||
|
||||
/**********************
|
||||
*** Main program ***
|
||||
**********************/
|
||||
|
||||
static void
|
||||
process_output(const MD_CHAR* text, MD_SIZE size, void* userdata)
|
||||
{
|
||||
membuf_append((struct membuffer*) userdata, text, size);
|
||||
}
|
||||
|
||||
static int
|
||||
process_file(FILE* in, FILE* out)
|
||||
{
|
||||
MD_SIZE n;
|
||||
struct membuffer buf_in = {0};
|
||||
struct membuffer buf_out = {0};
|
||||
int ret = -1;
|
||||
clock_t t0, t1;
|
||||
|
||||
membuf_init(&buf_in, 32 * 1024);
|
||||
|
||||
/* Read the input file into a buffer. */
|
||||
while(1) {
|
||||
if(buf_in.size >= buf_in.asize)
|
||||
membuf_grow(&buf_in, buf_in.asize + buf_in.asize / 2);
|
||||
|
||||
n = fread(buf_in.data + buf_in.size, 1, buf_in.asize - buf_in.size, in);
|
||||
if(n == 0)
|
||||
break;
|
||||
buf_in.size += n;
|
||||
}
|
||||
|
||||
/* Input size is good estimation of output size. Add some more reserve to
|
||||
* deal with the HTML header/footer and tags. */
|
||||
membuf_init(&buf_out, buf_in.size + buf_in.size/8 + 64);
|
||||
|
||||
/* Parse the document. This shall call our callbacks provided via the
|
||||
* md_renderer_t structure. */
|
||||
t0 = clock();
|
||||
|
||||
ret = md_render_html(buf_in.data, buf_in.size, process_output,
|
||||
(void*) &buf_out, parser_flags, renderer_flags);
|
||||
|
||||
t1 = clock();
|
||||
if(ret != 0) {
|
||||
fprintf(stderr, "Parsing failed.\n");
|
||||
goto out;
|
||||
}
|
||||
|
||||
/* Write down the document in the HTML format. */
|
||||
if(want_fullhtml) {
|
||||
fprintf(out, "<html>\n");
|
||||
fprintf(out, "<head>\n");
|
||||
fprintf(out, "<title></title>\n");
|
||||
fprintf(out, "<meta name=\"generator\" content=\"md2html\">\n");
|
||||
fprintf(out, "</head>\n");
|
||||
fprintf(out, "<body>\n");
|
||||
}
|
||||
|
||||
fwrite(buf_out.data, 1, buf_out.size, out);
|
||||
|
||||
if(want_fullhtml) {
|
||||
fprintf(out, "</body>\n");
|
||||
fprintf(out, "</html>\n");
|
||||
}
|
||||
|
||||
if(want_stat) {
|
||||
if(t0 != (clock_t)-1 && t1 != (clock_t)-1) {
|
||||
double elapsed = (double)(t1 - t0) / CLOCKS_PER_SEC;
|
||||
if (elapsed < 1)
|
||||
fprintf(stderr, "Time spent on parsing: %7.2f ms.\n", elapsed*1e3);
|
||||
else
|
||||
fprintf(stderr, "Time spent on parsing: %6.3f s.\n", elapsed);
|
||||
}
|
||||
}
|
||||
|
||||
/* Success if we have reached here. */
|
||||
ret = 0;
|
||||
|
||||
out:
|
||||
membuf_fini(&buf_in);
|
||||
membuf_fini(&buf_out);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
||||
#define OPTION_ARG_NONE 0
|
||||
#define OPTION_ARG_REQUIRED 1
|
||||
#define OPTION_ARG_OPTIONAL 2
|
||||
|
||||
static const option cmdline_options[] = {
|
||||
{ "output", 'o', 'o', OPTION_ARG_REQUIRED },
|
||||
{ "full-html", 'f', 'f', OPTION_ARG_NONE },
|
||||
{ "stat", 's', 's', OPTION_ARG_NONE },
|
||||
{ "help", 'h', 'h', OPTION_ARG_NONE },
|
||||
{ "version", 'v', 'v', OPTION_ARG_NONE },
|
||||
|
||||
{ "commonmark", 0, 'c', OPTION_ARG_NONE },
|
||||
{ "github", 0, 'g', OPTION_ARG_NONE },
|
||||
|
||||
{ "fcollapse-whitespace", 0, 'W', OPTION_ARG_NONE },
|
||||
{ "flatex-math", 0, 'L', OPTION_ARG_NONE },
|
||||
{ "fpermissive-atx-headers", 0, 'A', OPTION_ARG_NONE },
|
||||
{ "fpermissive-autolinks", 0, 'V', OPTION_ARG_NONE },
|
||||
{ "fpermissive-email-autolinks", 0, '@', OPTION_ARG_NONE },
|
||||
{ "fpermissive-url-autolinks", 0, 'U', OPTION_ARG_NONE },
|
||||
{ "fpermissive-www-autolinks", 0, '.', OPTION_ARG_NONE },
|
||||
{ "fstrikethrough", 0, 'S', OPTION_ARG_NONE },
|
||||
{ "ftables", 0, 'T', OPTION_ARG_NONE },
|
||||
{ "ftasklists", 0, 'X', OPTION_ARG_NONE },
|
||||
{ "funderline", 0, '_', OPTION_ARG_NONE },
|
||||
{ "fverbatim-entities", 0, 'E', OPTION_ARG_NONE },
|
||||
{ "fwiki-links", 0, 'K', OPTION_ARG_NONE },
|
||||
|
||||
{ "fno-html-blocks", 0, 'F', OPTION_ARG_NONE },
|
||||
{ "fno-html-spans", 0, 'G', OPTION_ARG_NONE },
|
||||
{ "fno-html", 0, 'H', OPTION_ARG_NONE },
|
||||
{ "fno-indented-code", 0, 'I', OPTION_ARG_NONE },
|
||||
|
||||
{ 0 }
|
||||
};
|
||||
|
||||
static void
|
||||
usage(void)
|
||||
{
|
||||
printf(
|
||||
"Usage: md2html [OPTION]... [FILE]\n"
|
||||
"Convert input FILE (or standard input) in Markdown format to HTML.\n"
|
||||
"\n"
|
||||
"General options:\n"
|
||||
" -o --output=FILE Output file (default is standard output)\n"
|
||||
" -f, --full-html Generate full HTML document, including header\n"
|
||||
" -s, --stat Measure time of input parsing\n"
|
||||
" -h, --help Display this help and exit\n"
|
||||
" -v, --version Display version and exit\n"
|
||||
"\n"
|
||||
"Markdown dialect options:\n"
|
||||
"(note these are equivalent to some combinations of the flags below)\n"
|
||||
" --commonmark CommonMark (this is default)\n"
|
||||
" --github Github Flavored Markdown\n"
|
||||
"\n"
|
||||
"Markdown extension options:\n"
|
||||
" --fcollapse-whitespace\n"
|
||||
" Collapse non-trivial whitespace\n"
|
||||
" --flatex-math Enable LaTeX style mathematics spans\n"
|
||||
" --fpermissive-atx-headers\n"
|
||||
" Allow ATX headers without delimiting space\n"
|
||||
" --fpermissive-url-autolinks\n"
|
||||
" Allow URL autolinks without '<', '>'\n"
|
||||
" --fpermissive-www-autolinks\n"
|
||||
" Allow WWW autolinks without any scheme (e.g. 'www.example.com')\n"
|
||||
" --fpermissive-email-autolinks \n"
|
||||
" Allow e-mail autolinks without '<', '>' and 'mailto:'\n"
|
||||
" --fpermissive-autolinks\n"
|
||||
" Same as --fpermissive-url-autolinks --fpermissive-www-autolinks\n"
|
||||
" --fpermissive-email-autolinks\n"
|
||||
" --fstrikethrough Enable strike-through spans\n"
|
||||
" --ftables Enable tables\n"
|
||||
" --ftasklists Enable task lists\n"
|
||||
" --funderline Enable underline spans\n"
|
||||
" --fwiki-links Enable wiki links\n"
|
||||
"\n"
|
||||
"Markdown suppression options:\n"
|
||||
" --fno-html-blocks\n"
|
||||
" Disable raw HTML blocks\n"
|
||||
" --fno-html-spans\n"
|
||||
" Disable raw HTML spans\n"
|
||||
" --fno-html Same as --fno-html-blocks --fno-html-spans\n"
|
||||
" --fno-indented-code\n"
|
||||
" Disable indented code blocks\n"
|
||||
"\n"
|
||||
"HTML generator options:\n"
|
||||
" --fverbatim-entities\n"
|
||||
" Do not translate entities\n"
|
||||
"\n"
|
||||
);
|
||||
}
|
||||
|
||||
static void
|
||||
version(void)
|
||||
{
|
||||
printf("%d.%d.%d\n", MD_VERSION_MAJOR, MD_VERSION_MINOR, MD_VERSION_RELEASE);
|
||||
}
|
||||
|
||||
static const char* input_path = NULL;
|
||||
static const char* output_path = NULL;
|
||||
|
||||
static int
|
||||
cmdline_callback(int opt, char const* value, void* data)
|
||||
{
|
||||
switch(opt) {
|
||||
case 0:
|
||||
if(input_path) {
|
||||
fprintf(stderr, "Too many arguments. Only one input file can be specified.\n");
|
||||
fprintf(stderr, "Use --help for more info.\n");
|
||||
exit(1);
|
||||
}
|
||||
input_path = value;
|
||||
break;
|
||||
|
||||
case 'o': output_path = value; break;
|
||||
case 'f': want_fullhtml = 1; break;
|
||||
case 's': want_stat = 1; break;
|
||||
case 'h': usage(); exit(0); break;
|
||||
case 'v': version(); exit(0); break;
|
||||
|
||||
case 'c': parser_flags = MD_DIALECT_COMMONMARK; break;
|
||||
case 'g': parser_flags = MD_DIALECT_GITHUB; break;
|
||||
|
||||
case 'E': renderer_flags |= MD_RENDER_FLAG_VERBATIM_ENTITIES; break;
|
||||
case 'A': parser_flags |= MD_FLAG_PERMISSIVEATXHEADERS; break;
|
||||
case 'I': parser_flags |= MD_FLAG_NOINDENTEDCODEBLOCKS; break;
|
||||
case 'F': parser_flags |= MD_FLAG_NOHTMLBLOCKS; break;
|
||||
case 'G': parser_flags |= MD_FLAG_NOHTMLSPANS; break;
|
||||
case 'H': parser_flags |= MD_FLAG_NOHTML; break;
|
||||
case 'W': parser_flags |= MD_FLAG_COLLAPSEWHITESPACE; break;
|
||||
case 'U': parser_flags |= MD_FLAG_PERMISSIVEURLAUTOLINKS; break;
|
||||
case '.': parser_flags |= MD_FLAG_PERMISSIVEWWWAUTOLINKS; break;
|
||||
case '@': parser_flags |= MD_FLAG_PERMISSIVEEMAILAUTOLINKS; break;
|
||||
case 'V': parser_flags |= MD_FLAG_PERMISSIVEAUTOLINKS; break;
|
||||
case 'T': parser_flags |= MD_FLAG_TABLES; break;
|
||||
case 'S': parser_flags |= MD_FLAG_STRIKETHROUGH; break;
|
||||
case 'L': parser_flags |= MD_FLAG_LATEXMATHSPANS; break;
|
||||
case 'K': parser_flags |= MD_FLAG_WIKILINKS; break;
|
||||
case 'X': parser_flags |= MD_FLAG_TASKLISTS; break;
|
||||
case '_': parser_flags |= MD_FLAG_UNDERLINE; break;
|
||||
|
||||
default:
|
||||
fprintf(stderr, "Illegal option: %s\n", value);
|
||||
fprintf(stderr, "Use --help for more info.\n");
|
||||
exit(1);
|
||||
break;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int
|
||||
main(int argc, char** argv)
|
||||
{
|
||||
FILE* in = stdin;
|
||||
FILE* out = stdout;
|
||||
int ret = 0;
|
||||
|
||||
if(readoptions(cmdline_options, argc, argv, cmdline_callback, NULL) < 0) {
|
||||
usage();
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if(input_path != NULL && strcmp(input_path, "-") != 0) {
|
||||
in = fopen(input_path, "rb");
|
||||
if(in == NULL) {
|
||||
fprintf(stderr, "Cannot open %s.\n", input_path);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
if(output_path != NULL && strcmp(output_path, "-") != 0) {
|
||||
out = fopen(output_path, "wt");
|
||||
if(out == NULL) {
|
||||
fprintf(stderr, "Cannot open %s.\n", output_path);
|
||||
exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
ret = process_file(in, out);
|
||||
if(in != stdin)
|
||||
fclose(in);
|
||||
if(out != stdout)
|
||||
fclose(out);
|
||||
|
||||
return ret;
|
||||
}
|
|
@ -0,0 +1,561 @@
|
|||
/*
|
||||
* MD4C: Markdown parser for C
|
||||
* (http://github.com/mity/md4c)
|
||||
*
|
||||
* Copyright (c) 2016-2019 Martin Mitas
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
|
||||
#include "render_html.h"
|
||||
#include "entity.h"
|
||||
|
||||
|
||||
#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199409L
|
||||
/* C89/90 or old compilers in general may not understand "inline". */
|
||||
#if defined __GNUC__
|
||||
#define inline __inline__
|
||||
#elif defined _MSC_VER
|
||||
#define inline __inline
|
||||
#else
|
||||
#define inline
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#ifdef _WIN32
|
||||
#define snprintf _snprintf
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
typedef struct MD_RENDER_HTML_tag MD_RENDER_HTML;
|
||||
struct MD_RENDER_HTML_tag {
|
||||
void (*process_output)(const MD_CHAR*, MD_SIZE, void*);
|
||||
void* userdata;
|
||||
unsigned flags;
|
||||
int image_nesting_level;
|
||||
char escape_map[256];
|
||||
};
|
||||
|
||||
#define NEED_HTML_ESC_FLAG 0x1
|
||||
#define NEED_URL_ESC_FLAG 0x2
|
||||
|
||||
|
||||
/*****************************************
|
||||
*** HTML rendering helper functions ***
|
||||
*****************************************/
|
||||
|
||||
#define ISDIGIT(ch) ('0' <= (ch) && (ch) <= '9')
|
||||
#define ISLOWER(ch) ('a' <= (ch) && (ch) <= 'z')
|
||||
#define ISUPPER(ch) ('A' <= (ch) && (ch) <= 'Z')
|
||||
#define ISALNUM(ch) (ISLOWER(ch) || ISUPPER(ch) || ISDIGIT(ch))
|
||||
|
||||
|
||||
static inline void
|
||||
render_verbatim(MD_RENDER_HTML* r, const MD_CHAR* text, MD_SIZE size)
|
||||
{
|
||||
r->process_output(text, size, r->userdata);
|
||||
}
|
||||
|
||||
/* Keep this as a macro. Most compiler should then be smart enough to replace
|
||||
* the strlen() call with a compile-time constant if the string is a C literal. */
|
||||
#define RENDER_VERBATIM(r, verbatim) \
|
||||
render_verbatim((r), (verbatim), (MD_SIZE) (strlen(verbatim)))
|
||||
|
||||
|
||||
static void
|
||||
render_html_escaped(MD_RENDER_HTML* r, const MD_CHAR* data, MD_SIZE size)
|
||||
{
|
||||
MD_OFFSET beg = 0;
|
||||
MD_OFFSET off = 0;
|
||||
|
||||
/* Some characters need to be escaped in normal HTML text. */
|
||||
#define NEED_HTML_ESC(ch) (r->escape_map[(unsigned char)(ch)] & NEED_HTML_ESC_FLAG)
|
||||
|
||||
while(1) {
|
||||
/* Optimization: Use some loop unrolling. */
|
||||
while(off + 3 < size && !NEED_HTML_ESC(data[off+0]) && !NEED_HTML_ESC(data[off+1])
|
||||
&& !NEED_HTML_ESC(data[off+2]) && !NEED_HTML_ESC(data[off+3]))
|
||||
off += 4;
|
||||
while(off < size && !NEED_HTML_ESC(data[off]))
|
||||
off++;
|
||||
|
||||
if(off > beg)
|
||||
render_verbatim(r, data + beg, off - beg);
|
||||
|
||||
if(off < size) {
|
||||
switch(data[off]) {
|
||||
case '&': RENDER_VERBATIM(r, "&"); break;
|
||||
case '<': RENDER_VERBATIM(r, "<"); break;
|
||||
case '>': RENDER_VERBATIM(r, ">"); break;
|
||||
case '"': RENDER_VERBATIM(r, """); break;
|
||||
}
|
||||
off++;
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
beg = off;
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
render_url_escaped(MD_RENDER_HTML* r, const MD_CHAR* data, MD_SIZE size)
|
||||
{
|
||||
static const MD_CHAR hex_chars[] = "0123456789ABCDEF";
|
||||
MD_OFFSET beg = 0;
|
||||
MD_OFFSET off = 0;
|
||||
|
||||
/* Some characters need to be escaped in URL attributes. */
|
||||
#define NEED_URL_ESC(ch) (r->escape_map[(unsigned char)(ch)] & NEED_URL_ESC_FLAG)
|
||||
|
||||
while(1) {
|
||||
while(off < size && !NEED_URL_ESC(data[off]))
|
||||
off++;
|
||||
if(off > beg)
|
||||
render_verbatim(r, data + beg, off - beg);
|
||||
|
||||
if(off < size) {
|
||||
char hex[3];
|
||||
|
||||
switch(data[off]) {
|
||||
case '&': RENDER_VERBATIM(r, "&"); break;
|
||||
default:
|
||||
hex[0] = '%';
|
||||
hex[1] = hex_chars[((unsigned)data[off] >> 4) & 0xf];
|
||||
hex[2] = hex_chars[((unsigned)data[off] >> 0) & 0xf];
|
||||
render_verbatim(r, hex, 3);
|
||||
break;
|
||||
}
|
||||
off++;
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
|
||||
beg = off;
|
||||
}
|
||||
}
|
||||
|
||||
static unsigned
|
||||
hex_val(char ch)
|
||||
{
|
||||
if('0' <= ch && ch <= '9')
|
||||
return ch - '0';
|
||||
if('A' <= ch && ch <= 'Z')
|
||||
return ch - 'A' + 10;
|
||||
else
|
||||
return ch - 'a' + 10;
|
||||
}
|
||||
|
||||
static void
|
||||
render_utf8_codepoint(MD_RENDER_HTML* r, unsigned codepoint,
|
||||
void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
|
||||
{
|
||||
static const MD_CHAR utf8_replacement_char[] = { 0xef, 0xbf, 0xbd };
|
||||
|
||||
unsigned char utf8[4];
|
||||
size_t n;
|
||||
|
||||
if(codepoint <= 0x7f) {
|
||||
n = 1;
|
||||
utf8[0] = codepoint;
|
||||
} else if(codepoint <= 0x7ff) {
|
||||
n = 2;
|
||||
utf8[0] = 0xc0 | ((codepoint >> 6) & 0x1f);
|
||||
utf8[1] = 0x80 + ((codepoint >> 0) & 0x3f);
|
||||
} else if(codepoint <= 0xffff) {
|
||||
n = 3;
|
||||
utf8[0] = 0xe0 | ((codepoint >> 12) & 0xf);
|
||||
utf8[1] = 0x80 + ((codepoint >> 6) & 0x3f);
|
||||
utf8[2] = 0x80 + ((codepoint >> 0) & 0x3f);
|
||||
} else {
|
||||
n = 4;
|
||||
utf8[0] = 0xf0 | ((codepoint >> 18) & 0x7);
|
||||
utf8[1] = 0x80 + ((codepoint >> 12) & 0x3f);
|
||||
utf8[2] = 0x80 + ((codepoint >> 6) & 0x3f);
|
||||
utf8[3] = 0x80 + ((codepoint >> 0) & 0x3f);
|
||||
}
|
||||
|
||||
if(0 < codepoint && codepoint <= 0x10ffff)
|
||||
fn_append(r, (char*)utf8, n);
|
||||
else
|
||||
fn_append(r, utf8_replacement_char, 3);
|
||||
}
|
||||
|
||||
/* Translate entity to its UTF-8 equivalent, or output the verbatim one
|
||||
* if such entity is unknown (or if the translation is disabled). */
|
||||
static void
|
||||
render_entity(MD_RENDER_HTML* r, const MD_CHAR* text, MD_SIZE size,
|
||||
void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
|
||||
{
|
||||
if(r->flags & MD_RENDER_FLAG_VERBATIM_ENTITIES) {
|
||||
fn_append(r, text, size);
|
||||
return;
|
||||
}
|
||||
|
||||
/* We assume UTF-8 output is what is desired. */
|
||||
if(size > 3 && text[1] == '#') {
|
||||
unsigned codepoint = 0;
|
||||
|
||||
if(text[2] == 'x' || text[2] == 'X') {
|
||||
/* Hexadecimal entity (e.g. "�")). */
|
||||
MD_SIZE i;
|
||||
for(i = 3; i < size-1; i++)
|
||||
codepoint = 16 * codepoint + hex_val(text[i]);
|
||||
} else {
|
||||
/* Decimal entity (e.g. "&1234;") */
|
||||
MD_SIZE i;
|
||||
for(i = 2; i < size-1; i++)
|
||||
codepoint = 10 * codepoint + (text[i] - '0');
|
||||
}
|
||||
|
||||
render_utf8_codepoint(r, codepoint, fn_append);
|
||||
return;
|
||||
} else {
|
||||
/* Named entity (e.g. " "). */
|
||||
const struct entity* ent;
|
||||
|
||||
ent = entity_lookup(text, size);
|
||||
if(ent != NULL) {
|
||||
render_utf8_codepoint(r, ent->codepoints[0], fn_append);
|
||||
if(ent->codepoints[1])
|
||||
render_utf8_codepoint(r, ent->codepoints[1], fn_append);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
fn_append(r, text, size);
|
||||
}
|
||||
|
||||
static void
|
||||
render_attribute(MD_RENDER_HTML* r, const MD_ATTRIBUTE* attr,
|
||||
void (*fn_append)(MD_RENDER_HTML*, const MD_CHAR*, MD_SIZE))
|
||||
{
|
||||
int i;
|
||||
|
||||
for(i = 0; attr->substr_offsets[i] < attr->size; i++) {
|
||||
MD_TEXTTYPE type = attr->substr_types[i];
|
||||
MD_OFFSET off = attr->substr_offsets[i];
|
||||
MD_SIZE size = attr->substr_offsets[i+1] - off;
|
||||
const MD_CHAR* text = attr->text + off;
|
||||
|
||||
switch(type) {
|
||||
case MD_TEXT_NULLCHAR: render_utf8_codepoint(r, 0x0000, render_verbatim); break;
|
||||
case MD_TEXT_ENTITY: render_entity(r, text, size, fn_append); break;
|
||||
default: fn_append(r, text, size); break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
render_open_ol_block(MD_RENDER_HTML* r, const MD_BLOCK_OL_DETAIL* det)
|
||||
{
|
||||
char buf[64];
|
||||
|
||||
if(det->start == 1) {
|
||||
RENDER_VERBATIM(r, "<ol>\n");
|
||||
return;
|
||||
}
|
||||
|
||||
snprintf(buf, sizeof(buf), "<ol start=\"%u\">\n", det->start);
|
||||
RENDER_VERBATIM(r, buf);
|
||||
}
|
||||
|
||||
static void
|
||||
render_open_li_block(MD_RENDER_HTML* r, const MD_BLOCK_LI_DETAIL* det)
|
||||
{
|
||||
if(det->is_task) {
|
||||
RENDER_VERBATIM(r, "<li class=\"task-list-item\">"
|
||||
"<input type=\"checkbox\" class=\"task-list-item-checkbox\" disabled");
|
||||
if(det->task_mark == 'x' || det->task_mark == 'X')
|
||||
RENDER_VERBATIM(r, " checked");
|
||||
RENDER_VERBATIM(r, ">");
|
||||
} else {
|
||||
RENDER_VERBATIM(r, "<li>");
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
render_open_code_block(MD_RENDER_HTML* r, const MD_BLOCK_CODE_DETAIL* det)
|
||||
{
|
||||
RENDER_VERBATIM(r, "<pre><code");
|
||||
|
||||
/* If known, output the HTML 5 attribute class="language-LANGNAME". */
|
||||
if(det->lang.text != NULL) {
|
||||
RENDER_VERBATIM(r, " class=\"language-");
|
||||
render_attribute(r, &det->lang, render_html_escaped);
|
||||
RENDER_VERBATIM(r, "\"");
|
||||
}
|
||||
|
||||
RENDER_VERBATIM(r, ">");
|
||||
}
|
||||
|
||||
static void
|
||||
render_open_td_block(MD_RENDER_HTML* r, const MD_CHAR* cell_type, const MD_BLOCK_TD_DETAIL* det)
|
||||
{
|
||||
RENDER_VERBATIM(r, "<");
|
||||
RENDER_VERBATIM(r, cell_type);
|
||||
|
||||
switch(det->align) {
|
||||
case MD_ALIGN_LEFT: RENDER_VERBATIM(r, " align=\"left\">"); break;
|
||||
case MD_ALIGN_CENTER: RENDER_VERBATIM(r, " align=\"center\">"); break;
|
||||
case MD_ALIGN_RIGHT: RENDER_VERBATIM(r, " align=\"right\">"); break;
|
||||
default: RENDER_VERBATIM(r, ">"); break;
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
render_open_a_span(MD_RENDER_HTML* r, const MD_SPAN_A_DETAIL* det)
|
||||
{
|
||||
RENDER_VERBATIM(r, "<a href=\"");
|
||||
render_attribute(r, &det->href, render_url_escaped);
|
||||
|
||||
if(det->title.text != NULL) {
|
||||
RENDER_VERBATIM(r, "\" title=\"");
|
||||
render_attribute(r, &det->title, render_html_escaped);
|
||||
}
|
||||
|
||||
RENDER_VERBATIM(r, "\">");
|
||||
}
|
||||
|
||||
static void
|
||||
render_open_img_span(MD_RENDER_HTML* r, const MD_SPAN_IMG_DETAIL* det)
|
||||
{
|
||||
RENDER_VERBATIM(r, "<img src=\"");
|
||||
render_attribute(r, &det->src, render_url_escaped);
|
||||
|
||||
RENDER_VERBATIM(r, "\" alt=\"");
|
||||
|
||||
r->image_nesting_level++;
|
||||
}
|
||||
|
||||
static void
|
||||
render_close_img_span(MD_RENDER_HTML* r, const MD_SPAN_IMG_DETAIL* det)
|
||||
{
|
||||
if(det->title.text != NULL) {
|
||||
RENDER_VERBATIM(r, "\" title=\"");
|
||||
render_attribute(r, &det->title, render_html_escaped);
|
||||
}
|
||||
|
||||
RENDER_VERBATIM(r, "\">");
|
||||
|
||||
r->image_nesting_level--;
|
||||
}
|
||||
|
||||
static void
|
||||
render_open_wikilink_span(MD_RENDER_HTML* r, const MD_SPAN_WIKILINK_DETAIL* det)
|
||||
{
|
||||
RENDER_VERBATIM(r, "<x-wikilink data-target=\"");
|
||||
render_attribute(r, &det->target, render_html_escaped);
|
||||
|
||||
RENDER_VERBATIM(r, "\">");
|
||||
}
|
||||
|
||||
|
||||
/**************************************
|
||||
*** HTML renderer implementation ***
|
||||
**************************************/
|
||||
|
||||
static int
|
||||
enter_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
|
||||
{
|
||||
static const MD_CHAR* head[6] = { "<h1>", "<h2>", "<h3>", "<h4>", "<h5>", "<h6>" };
|
||||
MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
|
||||
|
||||
switch(type) {
|
||||
case MD_BLOCK_DOC: /* noop */ break;
|
||||
case MD_BLOCK_QUOTE: RENDER_VERBATIM(r, "<blockquote>\n"); break;
|
||||
case MD_BLOCK_UL: RENDER_VERBATIM(r, "<ul>\n"); break;
|
||||
case MD_BLOCK_OL: render_open_ol_block(r, (const MD_BLOCK_OL_DETAIL*)detail); break;
|
||||
case MD_BLOCK_LI: render_open_li_block(r, (const MD_BLOCK_LI_DETAIL*)detail); break;
|
||||
case MD_BLOCK_HR: RENDER_VERBATIM(r, "<hr>\n"); break;
|
||||
case MD_BLOCK_H: RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break;
|
||||
case MD_BLOCK_CODE: render_open_code_block(r, (const MD_BLOCK_CODE_DETAIL*) detail); break;
|
||||
case MD_BLOCK_HTML: /* noop */ break;
|
||||
case MD_BLOCK_P: RENDER_VERBATIM(r, "<p>"); break;
|
||||
case MD_BLOCK_TABLE: RENDER_VERBATIM(r, "<table>\n"); break;
|
||||
case MD_BLOCK_THEAD: RENDER_VERBATIM(r, "<thead>\n"); break;
|
||||
case MD_BLOCK_TBODY: RENDER_VERBATIM(r, "<tbody>\n"); break;
|
||||
case MD_BLOCK_TR: RENDER_VERBATIM(r, "<tr>\n"); break;
|
||||
case MD_BLOCK_TH: render_open_td_block(r, "th", (MD_BLOCK_TD_DETAIL*)detail); break;
|
||||
case MD_BLOCK_TD: render_open_td_block(r, "td", (MD_BLOCK_TD_DETAIL*)detail); break;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int
|
||||
leave_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
|
||||
{
|
||||
static const MD_CHAR* head[6] = { "</h1>\n", "</h2>\n", "</h3>\n", "</h4>\n", "</h5>\n", "</h6>\n" };
|
||||
MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
|
||||
|
||||
switch(type) {
|
||||
case MD_BLOCK_DOC: /*noop*/ break;
|
||||
case MD_BLOCK_QUOTE: RENDER_VERBATIM(r, "</blockquote>\n"); break;
|
||||
case MD_BLOCK_UL: RENDER_VERBATIM(r, "</ul>\n"); break;
|
||||
case MD_BLOCK_OL: RENDER_VERBATIM(r, "</ol>\n"); break;
|
||||
case MD_BLOCK_LI: RENDER_VERBATIM(r, "</li>\n"); break;
|
||||
case MD_BLOCK_HR: /*noop*/ break;
|
||||
case MD_BLOCK_H: RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break;
|
||||
case MD_BLOCK_CODE: RENDER_VERBATIM(r, "</code></pre>\n"); break;
|
||||
case MD_BLOCK_HTML: /* noop */ break;
|
||||
case MD_BLOCK_P: RENDER_VERBATIM(r, "</p>\n"); break;
|
||||
case MD_BLOCK_TABLE: RENDER_VERBATIM(r, "</table>\n"); break;
|
||||
case MD_BLOCK_THEAD: RENDER_VERBATIM(r, "</thead>\n"); break;
|
||||
case MD_BLOCK_TBODY: RENDER_VERBATIM(r, "</tbody>\n"); break;
|
||||
case MD_BLOCK_TR: RENDER_VERBATIM(r, "</tr>\n"); break;
|
||||
case MD_BLOCK_TH: RENDER_VERBATIM(r, "</th>\n"); break;
|
||||
case MD_BLOCK_TD: RENDER_VERBATIM(r, "</td>\n"); break;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int
|
||||
enter_span_callback(MD_SPANTYPE type, void* detail, void* userdata)
|
||||
{
|
||||
MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
|
||||
|
||||
if(r->image_nesting_level > 0) {
|
||||
/* We are inside a Markdown image label. Markdown allows to use any
|
||||
* emphasis and other rich contents in that context similarly as in
|
||||
* any link label.
|
||||
*
|
||||
* However, unlike in the case of links (where that contents becomes
|
||||
* contents of the <a>...</a> tag), in the case of images the contents
|
||||
* is supposed to fall into the attribute alt: <img alt="...">.
|
||||
*
|
||||
* In that context we naturally cannot output nested HTML tags. So lets
|
||||
* suppress them and only output the plain text (i.e. what falls into
|
||||
* text() callback).
|
||||
*
|
||||
* This make-it-a-plain-text approach is the recommended practice by
|
||||
* CommonMark specification (for HTML output).
|
||||
*/
|
||||
return 0;
|
||||
}
|
||||
|
||||
switch(type) {
|
||||
case MD_SPAN_EM: RENDER_VERBATIM(r, "<em>"); break;
|
||||
case MD_SPAN_STRONG: RENDER_VERBATIM(r, "<strong>"); break;
|
||||
case MD_SPAN_U: RENDER_VERBATIM(r, "<u>"); break;
|
||||
case MD_SPAN_A: render_open_a_span(r, (MD_SPAN_A_DETAIL*) detail); break;
|
||||
case MD_SPAN_IMG: render_open_img_span(r, (MD_SPAN_IMG_DETAIL*) detail); break;
|
||||
case MD_SPAN_CODE: RENDER_VERBATIM(r, "<code>"); break;
|
||||
case MD_SPAN_DEL: RENDER_VERBATIM(r, "<del>"); break;
|
||||
case MD_SPAN_LATEXMATH: RENDER_VERBATIM(r, "<x-equation>"); break;
|
||||
case MD_SPAN_LATEXMATH_DISPLAY: RENDER_VERBATIM(r, "<x-equation type=\"display\">"); break;
|
||||
case MD_SPAN_WIKILINK: render_open_wikilink_span(r, (MD_SPAN_WIKILINK_DETAIL*) detail); break;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int
|
||||
leave_span_callback(MD_SPANTYPE type, void* detail, void* userdata)
|
||||
{
|
||||
MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
|
||||
|
||||
if(r->image_nesting_level > 0) {
|
||||
/* Ditto as in enter_span_callback(), except we have to allow the
|
||||
* end of the <img> tag. */
|
||||
if(r->image_nesting_level == 1 && type == MD_SPAN_IMG)
|
||||
render_close_img_span(r, (MD_SPAN_IMG_DETAIL*) detail);
|
||||
return 0;
|
||||
}
|
||||
|
||||
switch(type) {
|
||||
case MD_SPAN_EM: RENDER_VERBATIM(r, "</em>"); break;
|
||||
case MD_SPAN_STRONG: RENDER_VERBATIM(r, "</strong>"); break;
|
||||
case MD_SPAN_U: RENDER_VERBATIM(r, "</u>"); break;
|
||||
case MD_SPAN_A: RENDER_VERBATIM(r, "</a>"); break;
|
||||
case MD_SPAN_IMG: /*noop, handled above*/ break;
|
||||
case MD_SPAN_CODE: RENDER_VERBATIM(r, "</code>"); break;
|
||||
case MD_SPAN_DEL: RENDER_VERBATIM(r, "</del>"); break;
|
||||
case MD_SPAN_LATEXMATH: /*fall through*/
|
||||
case MD_SPAN_LATEXMATH_DISPLAY: RENDER_VERBATIM(r, "</x-equation>"); break;
|
||||
case MD_SPAN_WIKILINK: RENDER_VERBATIM(r, "</x-wikilink>"); break;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int
|
||||
text_callback(MD_TEXTTYPE type, const MD_CHAR* text, MD_SIZE size, void* userdata)
|
||||
{
|
||||
MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
|
||||
|
||||
switch(type) {
|
||||
case MD_TEXT_NULLCHAR: render_utf8_codepoint(r, 0x0000, render_verbatim); break;
|
||||
case MD_TEXT_BR: RENDER_VERBATIM(r, (r->image_nesting_level == 0 ? "<br>\n" : " ")); break;
|
||||
case MD_TEXT_SOFTBR: RENDER_VERBATIM(r, (r->image_nesting_level == 0 ? "\n" : " ")); break;
|
||||
case MD_TEXT_HTML: render_verbatim(r, text, size); break;
|
||||
case MD_TEXT_ENTITY: render_entity(r, text, size, render_html_escaped); break;
|
||||
default: render_html_escaped(r, text, size); break;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void
|
||||
debug_log_callback(const char* msg, void* userdata)
|
||||
{
|
||||
MD_RENDER_HTML* r = (MD_RENDER_HTML*) userdata;
|
||||
if(r->flags & MD_RENDER_FLAG_DEBUG)
|
||||
fprintf(stderr, "MD4C: %s\n", msg);
|
||||
}
|
||||
|
||||
int
|
||||
md_render_html(const MD_CHAR* input, MD_SIZE input_size,
|
||||
void (*process_output)(const MD_CHAR*, MD_SIZE, void*),
|
||||
void* userdata, unsigned parser_flags, unsigned renderer_flags)
|
||||
{
|
||||
MD_RENDER_HTML render = { process_output, userdata, renderer_flags, 0, { 0 } };
|
||||
int i;
|
||||
|
||||
MD_PARSER parser = {
|
||||
0,
|
||||
parser_flags,
|
||||
enter_block_callback,
|
||||
leave_block_callback,
|
||||
enter_span_callback,
|
||||
leave_span_callback,
|
||||
text_callback,
|
||||
debug_log_callback,
|
||||
NULL
|
||||
};
|
||||
|
||||
/* Build map of characters which need escaping. */
|
||||
for(i = 0; i < 256; i++) {
|
||||
unsigned char ch = (unsigned char) i;
|
||||
|
||||
if(strchr("\"&<>", ch) != NULL)
|
||||
render.escape_map[i] |= NEED_HTML_ESC_FLAG;
|
||||
|
||||
if(!ISALNUM(ch) && strchr("-_.+!*(),%#@?=;:/,+$", ch) == NULL)
|
||||
render.escape_map[i] |= NEED_URL_ESC_FLAG;
|
||||
}
|
||||
|
||||
return md_parse(input, input_size, &parser, (void*) &render);
|
||||
}
|
||||
|
|
@ -0,0 +1,66 @@
|
|||
/*
|
||||
* MD4C: Markdown parser for C
|
||||
* (http://github.com/mity/md4c)
|
||||
*
|
||||
* Copyright (c) 2016-2017 Martin Mitas
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#ifndef MD4C_RENDER_HTML_H
|
||||
#define MD4C_RENDER_HTML_H
|
||||
|
||||
#include "md4c.h"
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
|
||||
/* If set, debug output from md_parse() is sent to stderr. */
|
||||
#define MD_RENDER_FLAG_DEBUG 0x0001
|
||||
#define MD_RENDER_FLAG_VERBATIM_ENTITIES 0x0002
|
||||
|
||||
|
||||
/* Render Markdown into HTML.
|
||||
*
|
||||
* Note only contents of <body> tag is generated. Caller must generate
|
||||
* HTML header/footer manually before/after calling md_render_html().
|
||||
*
|
||||
* Params input and input_size specify the Markdown input.
|
||||
* Callback process_output() gets called with chunks of HTML output.
|
||||
* (Typical implementation may just output the bytes to file or append to
|
||||
* some buffer).
|
||||
* Param userdata is just propgated back to process_output() callback.
|
||||
* Param parser_flags are flags from md4c.h propagated to md_parse().
|
||||
* Param render_flags is bitmask of MD_RENDER_FLAG_xxxx.
|
||||
*
|
||||
* Returns -1 on error (if md_parse() fails.)
|
||||
* Returns 0 on success.
|
||||
*/
|
||||
int md_render_html(const MD_CHAR* input, MD_SIZE input_size,
|
||||
void (*process_output)(const MD_CHAR*, MD_SIZE, void*),
|
||||
void* userdata, unsigned parser_flags, unsigned renderer_flags);
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
} /* extern "C" { */
|
||||
#endif
|
||||
|
||||
#endif /* MD4C_RENDER_HTML_H */
|
|
@ -0,0 +1,32 @@
|
|||
# Be sure to export all symbols in Windows.
|
||||
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS 1)
|
||||
|
||||
set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -DDEBUG")
|
||||
|
||||
set(md4c_src
|
||||
md4c.c
|
||||
)
|
||||
|
||||
add_library(md4c ${md4c_src})
|
||||
|
||||
set_target_properties(md4c PROPERTIES
|
||||
VERSION ${MD_VERSION}
|
||||
SOVERSION ${MD_VERSION_MAJOR}
|
||||
PUBLIC_HEADER md4c.h
|
||||
)
|
||||
|
||||
install(
|
||||
TARGETS md4c
|
||||
EXPORT md4cConfig
|
||||
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
|
||||
)
|
||||
|
||||
# Create a pkg-config file
|
||||
configure_file(md4c.pc.in md4c.pc @ONLY)
|
||||
install(FILES ${CMAKE_BINARY_DIR}/md4c/md4c.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig)
|
||||
|
||||
# And a CMake file
|
||||
install(EXPORT md4cConfig DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/md4c/)
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,388 @@
|
|||
/*
|
||||
* MD4C: Markdown parser for C
|
||||
* (http://github.com/mity/md4c)
|
||||
*
|
||||
* Copyright (c) 2016-2020 Martin Mitas
|
||||
*
|
||||
* Permission is hereby granted, free of charge, to any person obtaining a
|
||||
* copy of this software and associated documentation files (the "Software"),
|
||||
* to deal in the Software without restriction, including without limitation
|
||||
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
* and/or sell copies of the Software, and to permit persons to whom the
|
||||
* Software is furnished to do so, subject to the following conditions:
|
||||
*
|
||||
* The above copyright notice and this permission notice shall be included in
|
||||
* all copies or substantial portions of the Software.
|
||||
*
|
||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
||||
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
* IN THE SOFTWARE.
|
||||
*/
|
||||
|
||||
#ifndef MD4C_MARKDOWN_H
|
||||
#define MD4C_MARKDOWN_H
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#if defined MD4C_USE_UTF16
|
||||
/* Magic to support UTF-16. Not that in order to use it, you have to define
|
||||
* the macro MD4C_USE_UTF16 both when building MD4C as well as when
|
||||
* including this header in your code. */
|
||||
#ifdef _WIN32
|
||||
#include <windows.h>
|
||||
typedef WCHAR MD_CHAR;
|
||||
#else
|
||||
#error MD4C_USE_UTF16 is only supported on Windows.
|
||||
#endif
|
||||
#else
|
||||
typedef char MD_CHAR;
|
||||
#endif
|
||||
|
||||
typedef unsigned MD_SIZE;
|
||||
typedef unsigned MD_OFFSET;
|
||||
|
||||
|
||||
/* Block represents a part of document hierarchy structure like a paragraph
|
||||
* or list item.
|
||||
*/
|
||||
typedef enum MD_BLOCKTYPE {
|
||||
/* <body>...</body> */
|
||||
MD_BLOCK_DOC = 0,
|
||||
|
||||
/* <blockquote>...</blockquote> */
|
||||
MD_BLOCK_QUOTE,
|
||||
|
||||
/* <ul>...</ul>
|
||||
* Detail: Structure MD_BLOCK_UL_DETAIL. */
|
||||
MD_BLOCK_UL,
|
||||
|
||||
/* <ol>...</ol>
|
||||
* Detail: Structure MD_BLOCK_OL_DETAIL. */
|
||||
MD_BLOCK_OL,
|
||||
|
||||
/* <li>...</li>
|
||||
* Detail: Structure MD_BLOCK_LI_DETAIL. */
|
||||
MD_BLOCK_LI,
|
||||
|
||||
/* <hr> */
|
||||
MD_BLOCK_HR,
|
||||
|
||||
/* <h1>...</h1> (for levels up to 6)
|
||||
* Detail: Structure MD_BLOCK_H_DETAIL. */
|
||||
MD_BLOCK_H,
|
||||
|
||||
/* <pre><code>...</code></pre>
|
||||
* Note the text lines within code blocks are terminated with '\n'
|
||||
* instead of explicit MD_TEXT_BR. */
|
||||
MD_BLOCK_CODE,
|
||||
|
||||
/* Raw HTML block. This itself does not correspond to any particular HTML
|
||||
* tag. The contents of it _is_ raw HTML source intended to be put
|
||||
* in verbatim form to the HTML output. */
|
||||
MD_BLOCK_HTML,
|
||||
|
||||
/* <p>...</p> */
|
||||
MD_BLOCK_P,
|
||||
|
||||
/* <table>...</table> and its contents.
|
||||
* Detail: Structure MD_BLOCK_TD_DETAIL (used with MD_BLOCK_TH and MD_BLOCK_TD)
|
||||
* Note all of these are used only if extension MD_FLAG_TABLES is enabled. */
|
||||
MD_BLOCK_TABLE,
|
||||
MD_BLOCK_THEAD,
|
||||
MD_BLOCK_TBODY,
|
||||
MD_BLOCK_TR,
|
||||
MD_BLOCK_TH,
|
||||
MD_BLOCK_TD
|
||||
} MD_BLOCKTYPE;
|
||||
|
||||
/* Span represents an in-line piece of a document which should be rendered with
|
||||
* the same font, color and other attributes. A sequence of spans forms a block
|
||||
* like paragraph or list item. */
|
||||
typedef enum MD_SPANTYPE {
|
||||
/* <em>...</em> */
|
||||
MD_SPAN_EM,
|
||||
|
||||
/* <strong>...</strong> */
|
||||
MD_SPAN_STRONG,
|
||||
|
||||
/* <a href="xxx">...</a>
|
||||
* Detail: Structure MD_SPAN_A_DETAIL. */
|
||||
MD_SPAN_A,
|
||||
|
||||
/* <img src="xxx">...</a>
|
||||
* Detail: Structure MD_SPAN_IMG_DETAIL.
|
||||
* Note: Image text can contain nested spans and even nested images.
|
||||
* If rendered into ALT attribute of HTML <IMG> tag, it's responsibility
|
||||
* of the renderer to deal with it.
|
||||
*/
|
||||
MD_SPAN_IMG,
|
||||
|
||||
/* <code>...</code> */
|
||||
MD_SPAN_CODE,
|
||||
|
||||
/* <del>...</del>
|
||||
* Note: Recognized only when MD_FLAG_STRIKETHROUGH is enabled.
|
||||
*/
|
||||
MD_SPAN_DEL,
|
||||
|
||||
/* For recognizing inline ($) and display ($$) equations
|
||||
* Note: Recognized only when MD_FLAG_LATEXMATHSPANS is enabled.
|
||||
*/
|
||||
MD_SPAN_LATEXMATH,
|
||||
MD_SPAN_LATEXMATH_DISPLAY,
|
||||
|
||||
/* Wiki links
|
||||
* Note: Recognized only when MD_FLAG_WIKILINKS is enabled.
|
||||
*/
|
||||
MD_SPAN_WIKILINK,
|
||||
|
||||
/* <u>...</u>
|
||||
* Note: Recognized only when MD_FLAG_UNDERLINE is enabled. */
|
||||
MD_SPAN_U
|
||||
} MD_SPANTYPE;
|
||||
|
||||
/* Text is the actual textual contents of span. */
|
||||
typedef enum MD_TEXTTYPE {
|
||||
/* Normal text. */
|
||||
MD_TEXT_NORMAL = 0,
|
||||
|
||||
/* NULL character. CommonMark requires replacing NULL character with
|
||||
* the replacement char U+FFFD, so this allows caller to do that easily. */
|
||||
MD_TEXT_NULLCHAR,
|
||||
|
||||
/* Line breaks.
|
||||
* Note these are not sent from blocks with verbatim output (MD_BLOCK_CODE
|
||||
* or MD_BLOCK_HTML). In such cases, '\n' is part of the text itself. */
|
||||
MD_TEXT_BR, /* <br> (hard break) */
|
||||
MD_TEXT_SOFTBR, /* '\n' in source text where it is not semantically meaningful (soft break) */
|
||||
|
||||
/* Entity.
|
||||
* (a) Named entity, e.g.
|
||||
* (Note MD4C does not have a list of known entities.
|
||||
* Anything matching the regexp /&[A-Za-z][A-Za-z0-9]{1,47};/ is
|
||||
* treated as a named entity.)
|
||||
* (b) Numerical entity, e.g. Ӓ
|
||||
* (c) Hexadecimal entity, e.g. ካ
|
||||
*
|
||||
* As MD4C is mostly encoding agnostic, application gets the verbatim
|
||||
* entity text into the MD_RENDERER::text_callback(). */
|
||||
MD_TEXT_ENTITY,
|
||||
|
||||
/* Text in a code block (inside MD_BLOCK_CODE) or inlined code (`code`).
|
||||
* If it is inside MD_BLOCK_CODE, it includes spaces for indentation and
|
||||
* '\n' for new lines. MD_TEXT_BR and MD_TEXT_SOFTBR are not sent for this
|
||||
* kind of text. */
|
||||
MD_TEXT_CODE,
|
||||
|
||||
/* Text is a raw HTML. If it is contents of a raw HTML block (i.e. not
|
||||
* an inline raw HTML), then MD_TEXT_BR and MD_TEXT_SOFTBR are not used.
|
||||
* The text contains verbatim '\n' for the new lines. */
|
||||
MD_TEXT_HTML,
|
||||
|
||||
/* Text is inside an equation. This is processed the same way as inlined code
|
||||
* spans (`code`). */
|
||||
MD_TEXT_LATEXMATH
|
||||
} MD_TEXTTYPE;
|
||||
|
||||
|
||||
/* Alignment enumeration. */
|
||||
typedef enum MD_ALIGN {
|
||||
MD_ALIGN_DEFAULT = 0, /* When unspecified. */
|
||||
MD_ALIGN_LEFT,
|
||||
MD_ALIGN_CENTER,
|
||||
MD_ALIGN_RIGHT
|
||||
} MD_ALIGN;
|
||||
|
||||
|
||||
/* String attribute.
|
||||
*
|
||||
* This wraps strings which are outside of a normal text flow and which are
|
||||
* propagated within various detailed structures, but which still may contain
|
||||
* string portions of different types like e.g. entities.
|
||||
*
|
||||
* So, for example, lets consider an image has a title attribute string
|
||||
* set to "foo " bar". (Note the string size is 14.)
|
||||
*
|
||||
* Then the attribute MD_SPAN_IMG_DETAIL::title shall provide the following:
|
||||
* -- [0]: "foo " (substr_types[0] == MD_TEXT_NORMAL; substr_offsets[0] == 0)
|
||||
* -- [1]: """ (substr_types[1] == MD_TEXT_ENTITY; substr_offsets[1] == 4)
|
||||
* -- [2]: " bar" (substr_types[2] == MD_TEXT_NORMAL; substr_offsets[2] == 10)
|
||||
* -- [3]: (n/a) (n/a ; substr_offsets[3] == 14)
|
||||
*
|
||||
* Note that these conditions are guaranteed:
|
||||
* -- substr_offsets[0] == 0
|
||||
* -- substr_offsets[LAST+1] == size
|
||||
* -- Only MD_TEXT_NORMAL, MD_TEXT_ENTITY, MD_TEXT_NULLCHAR substrings can appear.
|
||||
*/
|
||||
typedef struct MD_ATTRIBUTE {
|
||||
const MD_CHAR* text;
|
||||
MD_SIZE size;
|
||||
const MD_TEXTTYPE* substr_types;
|
||||
const MD_OFFSET* substr_offsets;
|
||||
} MD_ATTRIBUTE;
|
||||
|
||||
|
||||
/* Detailed info for MD_BLOCK_UL. */
|
||||
typedef struct MD_BLOCK_UL_DETAIL {
|
||||
int is_tight; /* Non-zero if tight list, zero if loose. */
|
||||
MD_CHAR mark; /* Item bullet character in MarkDown source of the list, e.g. '-', '+', '*'. */
|
||||
} MD_BLOCK_UL_DETAIL;
|
||||
|
||||
/* Detailed info for MD_BLOCK_OL. */
|
||||
typedef struct MD_BLOCK_OL_DETAIL {
|
||||
unsigned start; /* Start index of the ordered list. */
|
||||
int is_tight; /* Non-zero if tight list, zero if loose. */
|
||||
MD_CHAR mark_delimiter; /* Character delimiting the item marks in MarkDown source, e.g. '.' or ')' */
|
||||
} MD_BLOCK_OL_DETAIL;
|
||||
|
||||
/* Detailed info for MD_BLOCK_LI. */
|
||||
typedef struct MD_BLOCK_LI_DETAIL {
|
||||
int is_task; /* Can be non-zero only with MD_FLAG_TASKLISTS */
|
||||
MD_CHAR task_mark; /* If is_task, then one of 'x', 'X' or ' '. Undefined otherwise. */
|
||||
MD_OFFSET task_mark_offset; /* If is_task, then offset in the input of the char between '[' and ']'. */
|
||||
} MD_BLOCK_LI_DETAIL;
|
||||
|
||||
/* Detailed info for MD_BLOCK_H. */
|
||||
typedef struct MD_BLOCK_H_DETAIL {
|
||||
unsigned level; /* Header level (1 - 6) */
|
||||
} MD_BLOCK_H_DETAIL;
|
||||
|
||||
/* Detailed info for MD_BLOCK_CODE. */
|
||||
typedef struct MD_BLOCK_CODE_DETAIL {
|
||||
MD_ATTRIBUTE info;
|
||||
MD_ATTRIBUTE lang;
|
||||
MD_CHAR fence_char; /* The character used for fenced code block; or zero for indented code block. */
|
||||
} MD_BLOCK_CODE_DETAIL;
|
||||
|
||||
/* Detailed info for MD_BLOCK_TH and MD_BLOCK_TD. */
|
||||
typedef struct MD_BLOCK_TD_DETAIL {
|
||||
MD_ALIGN align;
|
||||
} MD_BLOCK_TD_DETAIL;
|
||||
|
||||
/* Detailed info for MD_SPAN_A. */
|
||||
typedef struct MD_SPAN_A_DETAIL {
|
||||
MD_ATTRIBUTE href;
|
||||
MD_ATTRIBUTE title;
|
||||
} MD_SPAN_A_DETAIL;
|
||||
|
||||
/* Detailed info for MD_SPAN_IMG. */
|
||||
typedef struct MD_SPAN_IMG_DETAIL {
|
||||
MD_ATTRIBUTE src;
|
||||
MD_ATTRIBUTE title;
|
||||
} MD_SPAN_IMG_DETAIL;
|
||||
|
||||
/* Detailed info for MD_SPAN_WIKILINK. */
|
||||
typedef struct MD_SPAN_WIKILINK {
|
||||
MD_ATTRIBUTE target;
|
||||
} MD_SPAN_WIKILINK_DETAIL;
|
||||
|
||||
/* Flags specifying extensions/deviations from CommonMark specification.
|
||||
*
|
||||
* By default (when MD_RENDERER::flags == 0), we follow CommonMark specification.
|
||||
* The following flags may allow some extensions or deviations from it.
|
||||
*/
|
||||
#define MD_FLAG_COLLAPSEWHITESPACE 0x0001 /* In MD_TEXT_NORMAL, collapse non-trivial whitespace into single ' ' */
|
||||
#define MD_FLAG_PERMISSIVEATXHEADERS 0x0002 /* Do not require space in ATX headers ( ###header ) */
|
||||
#define MD_FLAG_PERMISSIVEURLAUTOLINKS 0x0004 /* Recognize URLs as autolinks even without '<', '>' */
|
||||
#define MD_FLAG_PERMISSIVEEMAILAUTOLINKS 0x0008 /* Recognize e-mails as autolinks even without '<', '>' and 'mailto:' */
|
||||
#define MD_FLAG_NOINDENTEDCODEBLOCKS 0x0010 /* Disable indented code blocks. (Only fenced code works.) */
|
||||
#define MD_FLAG_NOHTMLBLOCKS 0x0020 /* Disable raw HTML blocks. */
|
||||
#define MD_FLAG_NOHTMLSPANS 0x0040 /* Disable raw HTML (inline). */
|
||||
#define MD_FLAG_TABLES 0x0100 /* Enable tables extension. */
|
||||
#define MD_FLAG_STRIKETHROUGH 0x0200 /* Enable strikethrough extension. */
|
||||
#define MD_FLAG_PERMISSIVEWWWAUTOLINKS 0x0400 /* Enable WWW autolinks (even without any scheme prefix, if they begin with 'www.') */
|
||||
#define MD_FLAG_TASKLISTS 0x0800 /* Enable task list extension. */
|
||||
#define MD_FLAG_LATEXMATHSPANS 0x1000 /* Enable $ and $$ containing LaTeX equations. */
|
||||
#define MD_FLAG_WIKILINKS 0x2000 /* Enable wiki links extension. */
|
||||
#define MD_FLAG_UNDERLINE 0x4000 /* Enable underline extension (and disables '_' for normal emphasis). */
|
||||
|
||||
#define MD_FLAG_PERMISSIVEAUTOLINKS (MD_FLAG_PERMISSIVEEMAILAUTOLINKS | MD_FLAG_PERMISSIVEURLAUTOLINKS | MD_FLAG_PERMISSIVEWWWAUTOLINKS)
|
||||
#define MD_FLAG_NOHTML (MD_FLAG_NOHTMLBLOCKS | MD_FLAG_NOHTMLSPANS)
|
||||
|
||||
/* Convenient sets of flags corresponding to well-known Markdown dialects.
|
||||
*
|
||||
* Note we may only support subset of features of the referred dialect.
|
||||
* The constant just enables those extensions which bring us as close as
|
||||
* possible given what features we implement.
|
||||
*
|
||||
* ABI compatibility note: Meaning of these can change in time as new
|
||||
* extensions, bringing the dialect closer to the original, are implemented.
|
||||
*/
|
||||
#define MD_DIALECT_COMMONMARK 0
|
||||
#define MD_DIALECT_GITHUB (MD_FLAG_PERMISSIVEAUTOLINKS | MD_FLAG_TABLES | MD_FLAG_STRIKETHROUGH | MD_FLAG_TASKLISTS)
|
||||
|
||||
/* Renderer structure.
|
||||
*/
|
||||
typedef struct MD_PARSER {
|
||||
/* Reserved. Set to zero.
|
||||
*/
|
||||
unsigned abi_version;
|
||||
|
||||
/* Dialect options. Bitmask of MD_FLAG_xxxx values.
|
||||
*/
|
||||
unsigned flags;
|
||||
|
||||
/* Caller-provided rendering callbacks.
|
||||
*
|
||||
* For some block/span types, more detailed information is provided in a
|
||||
* type-specific structure pointed by the argument 'detail'.
|
||||
*
|
||||
* The last argument of all callbacks, 'userdata', is just propagated from
|
||||
* md_parse() and is available for any use by the application.
|
||||
*
|
||||
* Note any strings provided to the callbacks as their arguments or as
|
||||
* members of any detail structure are generally not zero-terminated.
|
||||
* Application has take the respective size information into account.
|
||||
*
|
||||
* Callbacks may abort further parsing of the document by returning non-zero.
|
||||
*/
|
||||
int (*enter_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
|
||||
int (*leave_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
|
||||
|
||||
int (*enter_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
|
||||
int (*leave_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
|
||||
|
||||
int (*text)(MD_TEXTTYPE /*type*/, const MD_CHAR* /*text*/, MD_SIZE /*size*/, void* /*userdata*/);
|
||||
|
||||
/* Debug callback. Optional (may be NULL).
|
||||
*
|
||||
* If provided and something goes wrong, this function gets called.
|
||||
* This is intended for debugging and problem diagnosis for developers;
|
||||
* it is not intended to provide any errors suitable for displaying to an
|
||||
* end user.
|
||||
*/
|
||||
void (*debug_log)(const char* /*msg*/, void* /*userdata*/);
|
||||
|
||||
/* Reserved. Set to NULL.
|
||||
*/
|
||||
void (*syntax)(void);
|
||||
} MD_PARSER;
|
||||
|
||||
|
||||
/* For backward compatibility. Do not use in new code. */
|
||||
typedef MD_PARSER MD_RENDERER;
|
||||
|
||||
|
||||
/* Parse the Markdown document stored in the string 'text' of size 'size'.
|
||||
* The renderer provides callbacks to be called during the parsing so the
|
||||
* caller can render the document on the screen or convert the Markdown
|
||||
* to another format.
|
||||
*
|
||||
* Zero is returned on success. If a runtime error occurs (e.g. a memory
|
||||
* fails), -1 is returned. If the processing is aborted due any callback
|
||||
* returning non-zero, md_parse() the return value of the callback is returned.
|
||||
*/
|
||||
int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata);
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
} /* extern "C" { */
|
||||
#endif
|
||||
|
||||
#endif /* MD4C_MARKDOWN_H */
|
|
@ -0,0 +1,12 @@
|
|||
prefix=@CMAKE_INSTALL_PREFIX@
|
||||
exec_prefix=@CMAKE_INSTALL_PREFIX@
|
||||
libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@
|
||||
includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@
|
||||
|
||||
Name: @PROJECT_NAME@
|
||||
Description: @PROJECT_DESCRIPTION@
|
||||
Version: @PROJECT_VERSION@
|
||||
|
||||
Requires:
|
||||
Libs: -L${libdir} -lmd4c
|
||||
Cflags: -I${includedir}
|
|
@ -0,0 +1,118 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import sys
|
||||
import textwrap
|
||||
|
||||
|
||||
self_path = os.path.dirname(os.path.realpath(__file__));
|
||||
f = open(self_path + "/unicode/CaseFolding.txt", "r")
|
||||
|
||||
status_list = [ "C", "F" ]
|
||||
|
||||
folding_list = [ dict(), dict(), dict() ]
|
||||
|
||||
# Filter the foldings for "full" folding.
|
||||
for line in f:
|
||||
comment_off = line.find("#")
|
||||
if comment_off >= 0:
|
||||
line = line[:comment_off]
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
raw_codepoint, status, raw_mapping, ignored_tail = line.split(";", 3)
|
||||
if not status.strip() in status_list:
|
||||
continue
|
||||
codepoint = int(raw_codepoint.strip(), 16)
|
||||
mapping = [int(it, 16) for it in raw_mapping.strip().split(" ")]
|
||||
mapping_len = len(mapping)
|
||||
|
||||
if mapping_len in range(1, 4):
|
||||
folding_list[mapping_len-1][codepoint] = mapping
|
||||
else:
|
||||
assert(False)
|
||||
f.close()
|
||||
|
||||
|
||||
# If we assume that range (index0 ... index-1) makes a range, check that index
|
||||
# is compatible with it too.
|
||||
#
|
||||
# We are capable to handle ranges which:
|
||||
#
|
||||
# (1) either form consecutive sequence of codepoints and which map that range
|
||||
# to other consecutive range of codepoints;
|
||||
#
|
||||
# (2) or consecutive range of codepoints with step 2 where each codepoint
|
||||
# CP is mapped to the next codepoint CP+1
|
||||
# (e.g. 0x1234 -> 0x1235; 0x1236 -> 0x1238; ...).
|
||||
#
|
||||
# (If the mappings have multiple codepoints, only the 1st mapped codepoint is
|
||||
# considered and all the other ones have to be the same for the whole range.)
|
||||
def is_range_compatible(folding, codepoint_list, index0, index):
|
||||
N = index - index0
|
||||
codepoint0 = codepoint_list[index0]
|
||||
codepoint1 = codepoint_list[index0+1]
|
||||
codepointN = codepoint_list[index]
|
||||
mapping0 = folding[codepoint0]
|
||||
mapping1 = folding[codepoint1]
|
||||
mappingN = folding[codepointN]
|
||||
|
||||
# Check the range type (1):
|
||||
if codepoint1 - codepoint0 == 1 and codepointN - codepoint0 == N \
|
||||
and mapping1[0] - mapping0[0] == 1 and mapping1[1:] == mapping0[1:] \
|
||||
and mappingN[0] - mapping0[0] == N and mappingN[1:] == mapping0[1:]:
|
||||
return True
|
||||
|
||||
# Check the range type (2):
|
||||
if codepoint1 - codepoint0 == 2 and codepointN - codepoint0 == 2 * N \
|
||||
and mapping0[0] - codepoint0 == 1 \
|
||||
and mapping1[0] - codepoint1 == 1 and mapping1[1:] == mapping0[1:] \
|
||||
and mappingN[0] - codepointN == 1 and mappingN[1:] == mapping0[1:]:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def mapping_str(list, mapping):
|
||||
return ",".join("0x{:04x}".format(x) for x in mapping)
|
||||
|
||||
for mapping_len in range(1, 4):
|
||||
folding = folding_list[mapping_len-1]
|
||||
codepoint_list = list(folding)
|
||||
|
||||
index0 = 0
|
||||
count = len(folding)
|
||||
|
||||
records = list()
|
||||
data_records = list()
|
||||
|
||||
while index0 < count:
|
||||
index1 = index0 + 1
|
||||
while index1 < count and is_range_compatible(folding, codepoint_list, index0, index1):
|
||||
index1 += 1
|
||||
|
||||
if index1 - index0 > 2:
|
||||
# Range of codepoints
|
||||
records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
|
||||
data_records.append(mapping_str(data_records, folding[codepoint_list[index0]]))
|
||||
data_records.append(mapping_str(data_records, folding[codepoint_list[index1-1]]))
|
||||
else:
|
||||
# Single codepoint
|
||||
records.append("S(0x{:04x})".format(codepoint_list[index0]))
|
||||
data_records.append(mapping_str(data_records, folding[codepoint_list[index0]]))
|
||||
|
||||
index0 = index1
|
||||
|
||||
sys.stdout.write("static const unsigned FOLD_MAP_{}[] = {{\n".format(mapping_len))
|
||||
sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
|
||||
initial_indent = " ", subsequent_indent=" ")))
|
||||
sys.stdout.write("\n};\n")
|
||||
|
||||
sys.stdout.write("static const unsigned FOLD_MAP_{}_DATA[] = {{\n".format(mapping_len))
|
||||
sys.stdout.write("\n".join(textwrap.wrap(", ".join(data_records), 110,
|
||||
initial_indent = " ", subsequent_indent=" ")))
|
||||
sys.stdout.write("\n};\n")
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,66 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import sys
|
||||
import textwrap
|
||||
|
||||
|
||||
self_path = os.path.dirname(os.path.realpath(__file__));
|
||||
f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r")
|
||||
|
||||
codepoint_list = []
|
||||
category_list = [ "Pc", "Pd", "Pe", "Pf", "Pi", "Po", "Ps" ]
|
||||
|
||||
# Filter codepoints falling in the right category:
|
||||
for line in f:
|
||||
comment_off = line.find("#")
|
||||
if comment_off >= 0:
|
||||
line = line[:comment_off]
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
char_range, category = line.split(";")
|
||||
char_range = char_range.strip()
|
||||
category = category.strip()
|
||||
|
||||
if not category in category_list:
|
||||
continue
|
||||
|
||||
delim_off = char_range.find("..")
|
||||
if delim_off >= 0:
|
||||
codepoint0 = int(char_range[:delim_off], 16)
|
||||
codepoint1 = int(char_range[delim_off+2:], 16)
|
||||
for codepoint in range(codepoint0, codepoint1 + 1):
|
||||
codepoint_list.append(codepoint)
|
||||
else:
|
||||
codepoint = int(char_range, 16)
|
||||
codepoint_list.append(codepoint)
|
||||
f.close()
|
||||
|
||||
|
||||
codepoint_list.sort()
|
||||
|
||||
|
||||
index0 = 0
|
||||
count = len(codepoint_list)
|
||||
|
||||
records = list()
|
||||
while index0 < count:
|
||||
index1 = index0 + 1
|
||||
while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1:
|
||||
index1 += 1
|
||||
|
||||
if index1 - index0 > 1:
|
||||
# Range of codepoints
|
||||
records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
|
||||
else:
|
||||
# Single codepoint
|
||||
records.append("S(0x{:04x})".format(codepoint_list[index0]))
|
||||
|
||||
index0 = index1
|
||||
|
||||
sys.stdout.write("static const unsigned PUNCT_MAP[] = {\n")
|
||||
sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
|
||||
initial_indent = " ", subsequent_indent=" ")))
|
||||
sys.stdout.write("\n};\n\n")
|
|
@ -0,0 +1,66 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import sys
|
||||
import textwrap
|
||||
|
||||
|
||||
self_path = os.path.dirname(os.path.realpath(__file__));
|
||||
f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r")
|
||||
|
||||
codepoint_list = []
|
||||
category_list = [ "Zs" ]
|
||||
|
||||
# Filter codepoints falling in the right category:
|
||||
for line in f:
|
||||
comment_off = line.find("#")
|
||||
if comment_off >= 0:
|
||||
line = line[:comment_off]
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
|
||||
char_range, category = line.split(";")
|
||||
char_range = char_range.strip()
|
||||
category = category.strip()
|
||||
|
||||
if not category in category_list:
|
||||
continue
|
||||
|
||||
delim_off = char_range.find("..")
|
||||
if delim_off >= 0:
|
||||
codepoint0 = int(char_range[:delim_off], 16)
|
||||
codepoint1 = int(char_range[delim_off+2:], 16)
|
||||
for codepoint in range(codepoint0, codepoint1 + 1):
|
||||
codepoint_list.append(codepoint)
|
||||
else:
|
||||
codepoint = int(char_range, 16)
|
||||
codepoint_list.append(codepoint)
|
||||
f.close()
|
||||
|
||||
|
||||
codepoint_list.sort()
|
||||
|
||||
|
||||
index0 = 0
|
||||
count = len(codepoint_list)
|
||||
|
||||
records = list()
|
||||
while index0 < count:
|
||||
index1 = index0 + 1
|
||||
while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1:
|
||||
index1 += 1
|
||||
|
||||
if index1 - index0 > 1:
|
||||
# Range of codepoints
|
||||
records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
|
||||
else:
|
||||
# Single codepoint
|
||||
records.append("S(0x{:04x})".format(codepoint_list[index0]))
|
||||
|
||||
index0 = index1
|
||||
|
||||
sys.stdout.write("static const unsigned WHITESPACE_MAP[] = {\n")
|
||||
sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
|
||||
initial_indent = " ", subsequent_indent=" ")))
|
||||
sys.stdout.write("\n};\n\n")
|
|
@ -0,0 +1,70 @@
|
|||
#!/bin/sh
|
||||
#
|
||||
# This scripts attempts to build the project via cov-build utility, and prepare
|
||||
# a package for uploading to the coverity scan service.
|
||||
#
|
||||
# (See http://scan.coverity.com for more info.)
|
||||
|
||||
set -e
|
||||
|
||||
# Check presence of coverity static analyzer.
|
||||
if ! which cov-build; then
|
||||
echo "Utility cov-build not found in PATH."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Choose a build system (ninja or GNU make).
|
||||
if which ninja; then
|
||||
BUILD_TOOL=ninja
|
||||
GENERATOR=Ninja
|
||||
elif which make; then
|
||||
BUILD_TOOL=make
|
||||
GENERATOR="MSYS Makefiles"
|
||||
else
|
||||
echo "No suitable build system found."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Choose a zip tool.
|
||||
if which 7za; then
|
||||
MKZIP="7za a -r -mx9"
|
||||
elif which 7z; then
|
||||
MKZIP="7z a -r -mx9"
|
||||
elif which zip; then
|
||||
MKZIP="zip -r"
|
||||
else
|
||||
echo "No suitable zip utility found"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Change dir to project root.
|
||||
cd `dirname "$0"`/..
|
||||
|
||||
CWD=`pwd`
|
||||
ROOT_DIR="$CWD"
|
||||
BUILD_DIR="$CWD/coverity"
|
||||
OUTPUT="$CWD/cov-int.zip"
|
||||
|
||||
# Sanity checks.
|
||||
if [ ! -x "$ROOT_DIR/scripts/coverity.sh" ]; then
|
||||
echo "There is some path mismatch."
|
||||
exit 1
|
||||
fi
|
||||
if [ -e "$BUILD_DIR" ]; then
|
||||
echo "Path $BUILD_DIR already exists. Delete it and retry."
|
||||
exit 1
|
||||
fi
|
||||
if [ -e "$OUTPUT" ]; then
|
||||
echo "Path $OUTPUT already exists. Delete it and retry."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Build the project with the Coverity analyzes enabled.
|
||||
mkdir -p "$BUILD_DIR"
|
||||
cd "$BUILD_DIR"
|
||||
cmake -G "$GENERATOR" "$ROOT_DIR"
|
||||
cov-build --dir cov-int "$BUILD_TOOL"
|
||||
$MKZIP "$OUTPUT" "cov-int"
|
||||
cd "$ROOT_DIR"
|
||||
rm -rf "$BUILD_DIR"
|
||||
|
|
@ -0,0 +1,75 @@
|
|||
#!/bin/sh
|
||||
#
|
||||
# Run this script from build directory.
|
||||
|
||||
#set -e
|
||||
|
||||
SELF_DIR=`dirname $0`
|
||||
PROJECT_DIR="$SELF_DIR/.."
|
||||
TEST_DIR="$PROJECT_DIR/test"
|
||||
|
||||
|
||||
PROGRAM="md2html/md2html"
|
||||
if [ ! -x "$PROGRAM" ]; then
|
||||
echo "Cannot find the $PROGRAM." >&2
|
||||
echo "You have to run this script from the build directory." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if which py >>/dev/null 2>&1; then
|
||||
PYTHON=py
|
||||
elif which python3 >>/dev/null 2>&1; then
|
||||
PYTHON=python3
|
||||
elif which python >>/dev/null 2>&1; then
|
||||
if [ `python --version | awk '{print $2}' | cut -d. -f1` -ge 3 ]; then
|
||||
PYTHON=python
|
||||
fi
|
||||
fi
|
||||
|
||||
echo
|
||||
echo "CommonMark specification:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/spec.txt" -p "$PROGRAM"
|
||||
|
||||
echo
|
||||
echo "Code coverage & regressions:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/coverage.txt" -p "$PROGRAM"
|
||||
|
||||
echo
|
||||
echo "Permissive e-mail autolinks extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-email-autolinks.txt" -p "$PROGRAM --fpermissive-email-autolinks"
|
||||
|
||||
echo
|
||||
echo "Permissive URL autolinks extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-url-autolinks.txt" -p "$PROGRAM --fpermissive-url-autolinks"
|
||||
|
||||
echo
|
||||
echo "WWW autolinks extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-www-autolinks.txt" -p "$PROGRAM --fpermissive-www-autolinks"
|
||||
|
||||
echo
|
||||
echo "Tables extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/tables.txt" -p "$PROGRAM --ftables"
|
||||
|
||||
echo
|
||||
echo "Strikethrough extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/strikethrough.txt" -p "$PROGRAM --fstrikethrough"
|
||||
|
||||
echo
|
||||
echo "Task lists extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/tasklists.txt" -p "$PROGRAM --ftasklists"
|
||||
|
||||
echo
|
||||
echo "LaTeX extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/latex-math.txt" -p "$PROGRAM --flatex-math"
|
||||
|
||||
echo
|
||||
echo "Wiki links extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/wiki-links.txt" -p "$PROGRAM --fwiki-links --ftables"
|
||||
|
||||
echo
|
||||
echo "Underline extension:"
|
||||
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/underline.txt" -p "$PROGRAM --funderline"
|
||||
|
||||
echo
|
||||
echo "Pathological input:"
|
||||
$PYTHON "$TEST_DIR/pathological_tests.py" -p "$PROGRAM"
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,64 @@
|
|||
The CommonMark spec (spec.txt) and DTD (CommonMark.dtd) are
|
||||
|
||||
Copyright (C) 2014-16 John MacFarlane
|
||||
|
||||
Released under the Creative Commons CC-BY-SA 4.0 license:
|
||||
<http://creativecommons.org/licenses/by-sa/4.0/>.
|
||||
|
||||
---
|
||||
|
||||
The test software in test/ and the programs in tools/ are
|
||||
|
||||
Copyright (c) 2014, John MacFarlane
|
||||
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above
|
||||
copyright notice, this list of conditions and the following
|
||||
disclaimer in the documentation and/or other materials provided
|
||||
with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
---
|
||||
|
||||
The normalization code in runtests.py was derived from the
|
||||
markdowntest project, Copyright 2013 Karl Dubost:
|
||||
|
||||
The MIT License (MIT)
|
||||
|
||||
Copyright (c) 2013 Karl Dubost
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in
|
||||
all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
||||
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
||||
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
||||
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
||||
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
@ -0,0 +1,40 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
from ctypes import CDLL, c_char_p, c_long
|
||||
from subprocess import *
|
||||
import platform
|
||||
import os
|
||||
|
||||
def pipe_through_prog(prog, text):
|
||||
p1 = Popen(prog.split(), stdout=PIPE, stdin=PIPE, stderr=PIPE)
|
||||
[result, err] = p1.communicate(input=text.encode('utf-8'))
|
||||
return [p1.returncode, result.decode('utf-8'), err]
|
||||
|
||||
def use_library(lib, text):
|
||||
textbytes = text.encode('utf-8')
|
||||
textlen = len(textbytes)
|
||||
return [0, lib(textbytes, textlen, 0).decode('utf-8'), '']
|
||||
|
||||
class CMark:
|
||||
def __init__(self, prog=None, library_dir=None):
|
||||
self.prog = prog
|
||||
if prog:
|
||||
self.to_html = lambda x: pipe_through_prog(prog, x)
|
||||
else:
|
||||
sysname = platform.system()
|
||||
if sysname == 'Darwin':
|
||||
libname = "libcmark.dylib"
|
||||
elif sysname == 'Windows':
|
||||
libname = "cmark.dll"
|
||||
else:
|
||||
libname = "libcmark.so"
|
||||
if library_dir:
|
||||
libpath = os.path.join(library_dir, libname)
|
||||
else:
|
||||
libpath = os.path.join("build", "src", libname)
|
||||
cmark = CDLL(libpath)
|
||||
markdown = cmark.cmark_markdown_to_html
|
||||
markdown.restype = c_char_p
|
||||
markdown.argtypes = [c_char_p, c_long]
|
||||
self.to_html = lambda x: use_library(markdown, x)
|
|
@ -0,0 +1,464 @@
|
|||
|
||||
# Coverage
|
||||
|
||||
This file is just a collection of unit tests not covered elsewhere.
|
||||
|
||||
Most notably regression tests, tests improving code coverage and other useful
|
||||
things may drop here.
|
||||
|
||||
(However any tests requiring any additional command line option, like enabling
|
||||
an extension, must be included in their respective files.)
|
||||
|
||||
|
||||
## GitHub Issues
|
||||
|
||||
### [Issue 2](https://github.com/mity/md4c/issues/2)
|
||||
|
||||
Raw HTML block:
|
||||
|
||||
```````````````````````````````` example
|
||||
<gi att1=tok1 att2=tok2>
|
||||
.
|
||||
<gi att1=tok1 att2=tok2>
|
||||
````````````````````````````````
|
||||
|
||||
Inline:
|
||||
|
||||
```````````````````````````````` example
|
||||
foo <gi att1=tok1 att2=tok2> bar
|
||||
.
|
||||
<p>foo <gi att1=tok1 att2=tok2> bar</p>
|
||||
````````````````````````````````
|
||||
|
||||
Inline with a line break:
|
||||
|
||||
```````````````````````````````` example
|
||||
foo <gi att1=tok1
|
||||
att2=tok2> bar
|
||||
.
|
||||
<p>foo <gi att1=tok1
|
||||
att2=tok2> bar</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 4](https://github.com/mity/md4c/issues/4)
|
||||
|
||||
```````````````````````````````` example
|
||||
![alt text with *entity* ©](img.png 'title')
|
||||
.
|
||||
<p><img src="img.png" alt="alt text with entity ©" title="title"></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 9](https://github.com/mity/md4c/issues/9)
|
||||
|
||||
```````````````````````````````` example
|
||||
> [foo
|
||||
> bar]: /url
|
||||
>
|
||||
> [foo bar]
|
||||
.
|
||||
<blockquote>
|
||||
<p><a href="/url">foo
|
||||
bar</a></p>
|
||||
</blockquote>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 10](https://github.com/mity/md4c/issues/10)
|
||||
|
||||
```````````````````````````````` example
|
||||
[x]:
|
||||
x
|
||||
- <?
|
||||
|
||||
x
|
||||
.
|
||||
<ul>
|
||||
<li><?
|
||||
|
||||
x
|
||||
</li>
|
||||
</ul>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 11](https://github.com/mity/md4c/issues/11)
|
||||
|
||||
```````````````````````````````` example
|
||||
x [link](/url "foo – bar") x
|
||||
.
|
||||
<p>x <a href="/url" title="foo – bar">link</a> x</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 14](https://github.com/mity/md4c/issues/14)
|
||||
|
||||
```````````````````````````````` example
|
||||
a***b* c*
|
||||
.
|
||||
<p>a*<em><em>b</em> c</em></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 15](https://github.com/mity/md4c/issues/15)
|
||||
|
||||
```````````````````````````````` example
|
||||
***b* c*
|
||||
.
|
||||
<p>*<em><em>b</em> c</em></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 21](https://github.com/mity/md4c/issues/21)
|
||||
|
||||
```````````````````````````````` example
|
||||
a*b**c*
|
||||
.
|
||||
<p>a<em>b**c</em></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 33](https://github.com/mity/md4c/issues/33)
|
||||
|
||||
```````````````````````````````` example
|
||||
```&&&&&&&&
|
||||
.
|
||||
<pre><code class="language-&&&&&&&&"></code></pre>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 36](https://github.com/mity/md4c/issues/36)
|
||||
|
||||
```````````````````````````````` example
|
||||
__x_ _x___
|
||||
.
|
||||
<p><em><em>x</em> <em>x</em></em>_</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 39](https://github.com/mity/md4c/issues/39)
|
||||
|
||||
```````````````````````````````` example
|
||||
[\\]: x
|
||||
.
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 40](https://github.com/mity/md4c/issues/40)
|
||||
|
||||
```````````````````````````````` example
|
||||
[x](url
|
||||
'title'
|
||||
)x
|
||||
.
|
||||
<p><a href="url" title="title">x</a>x</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 65](https://github.com/mity/md4c/issues/65)
|
||||
|
||||
```````````````````````````````` example
|
||||
`
|
||||
.
|
||||
<p>`</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 74](https://github.com/mity/md4c/issues/74)
|
||||
|
||||
```````````````````````````````` example
|
||||
[f]:
|
||||
-
|
||||
xx
|
||||
-
|
||||
.
|
||||
<pre><code>xx
|
||||
</code></pre>
|
||||
<ul>
|
||||
<li></li>
|
||||
</ul>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 78](https://github.com/mity/md4c/issues/78)
|
||||
|
||||
```````````````````````````````` example
|
||||
[SS ẞ]: /url
|
||||
[ẞ SS]
|
||||
.
|
||||
<p><a href="/url">ẞ SS</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 83](https://github.com/mity/md4c/issues/83)
|
||||
|
||||
```````````````````````````````` example
|
||||
foo
|
||||
>
|
||||
.
|
||||
<p>foo</p>
|
||||
<blockquote>
|
||||
</blockquote>
|
||||
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 95](https://github.com/mity/md4c/issues/95)
|
||||
|
||||
```````````````````````````````` example
|
||||
. foo
|
||||
.
|
||||
<p>. foo</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 96](https://github.com/mity/md4c/issues/96)
|
||||
|
||||
```````````````````````````````` example
|
||||
[ab]: /foo
|
||||
[a] [ab] [abc]
|
||||
.
|
||||
<p>[a] <a href="/foo">ab</a> [abc]</p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
[a b]: /foo
|
||||
[a b]
|
||||
.
|
||||
<p><a href="/foo">a b</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 97](https://github.com/mity/md4c/issues/97)
|
||||
|
||||
```````````````````````````````` example
|
||||
*a **b c* d**
|
||||
.
|
||||
<p><em>a <em><em>b c</em> d</em></em></p>
|
||||
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 100](https://github.com/mity/md4c/issues/100)
|
||||
|
||||
```````````````````````````````` example
|
||||
<foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123>
|
||||
.
|
||||
<p><a href="mailto:foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123">foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
<foo@123456789012345678901234567890123456789012345678901234567890123x.123456789012345678901234567890123456789012345678901234567890123>
|
||||
.
|
||||
<p><foo@123456789012345678901234567890123456789012345678901234567890123x.123456789012345678901234567890123456789012345678901234567890123></p>
|
||||
````````````````````````````````
|
||||
(Note the `x` here which turns it over the max. allowed length limit.)
|
||||
|
||||
|
||||
### [Issue 107](https://github.com/mity/md4c/issues/107)
|
||||
|
||||
```````````````````````````````` example
|
||||
***foo *bar baz***
|
||||
.
|
||||
<p>*<strong>foo <em>bar baz</em></strong></p>
|
||||
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
## Code coverage
|
||||
|
||||
### `md_is_unicode_whitespace__()`
|
||||
|
||||
Unicode whitespace (here U+2000) forms a word boundary so these cannot be
|
||||
resolved as emphasis span because there is no closer mark.
|
||||
|
||||
```````````````````````````````` example
|
||||
*foo *bar
|
||||
.
|
||||
<p>*foo *bar</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### `md_is_unicode_punct__()`
|
||||
|
||||
Ditto for Unicode punctuation (here U+00A1).
|
||||
|
||||
```````````````````````````````` example
|
||||
*foo¡*bar
|
||||
.
|
||||
<p>*foo¡*bar</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### `md_get_unicode_fold_info()`
|
||||
|
||||
```````````````````````````````` example
|
||||
[Příliš žluťoučký kůň úpěl ďábelské ódy.]
|
||||
|
||||
[PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY.]: /url
|
||||
.
|
||||
<p><a href="/url">Příliš žluťoučký kůň úpěl ďábelské ódy.</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### `md_decode_utf8__()` and `md_decode_utf8_before__()`
|
||||
|
||||
```````````````````````````````` example
|
||||
á*Á (U+00E1, i.e. two byte UTF-8 sequence)
|
||||
* (U+2000, i.e. three byte UTF-8 sequence)
|
||||
.
|
||||
<p>á*Á (U+00E1, i.e. two byte UTF-8 sequence)
|
||||
* (U+2000, i.e. three byte UTF-8 sequence)</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### `md_is_link_destination_A()`
|
||||
|
||||
```````````````````````````````` example
|
||||
[link](</url\.with\.escape>)
|
||||
.
|
||||
<p><a href="/url.with.escape">link</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### `md_link_label_eq()`
|
||||
|
||||
```````````````````````````````` example
|
||||
[foo bar]
|
||||
|
||||
[foo bar]: /url
|
||||
.
|
||||
<p><a href="/url">foo bar</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### `md_is_inline_link_spec()`
|
||||
|
||||
```````````````````````````````` example
|
||||
> [link](/url 'foo
|
||||
> bar')
|
||||
.
|
||||
<blockquote>
|
||||
<p><a href="/url" title="foo
|
||||
bar">link</a></p>
|
||||
</blockquote>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### `md_build_ref_def_hashtable()`
|
||||
|
||||
All link labels in the following example all have the same FNV1a hash (after
|
||||
normalization of the label, which means after converting to a vector of Unicode
|
||||
codepoints and lowercase folding).
|
||||
|
||||
So the example triggers quite complex code paths which are not otherwise easily
|
||||
tested.
|
||||
|
||||
```````````````````````````````` example
|
||||
[foo]: /foo
|
||||
[qnptgbh]: /qnptgbh
|
||||
[abgbrwcv]: /abgbrwcv
|
||||
[abgbrwcv]: /abgbrwcv2
|
||||
[abgbrwcv]: /abgbrwcv3
|
||||
[abgbrwcv]: /abgbrwcv4
|
||||
[alqadfgn]: /alqadfgn
|
||||
|
||||
[foo]
|
||||
[qnptgbh]
|
||||
[abgbrwcv]
|
||||
[alqadfgn]
|
||||
[axgydtdu]
|
||||
.
|
||||
<p><a href="/foo">foo</a>
|
||||
<a href="/qnptgbh">qnptgbh</a>
|
||||
<a href="/abgbrwcv">abgbrwcv</a>
|
||||
<a href="/alqadfgn">alqadfgn</a>
|
||||
[axgydtdu]</p>
|
||||
````````````````````````````````
|
||||
|
||||
For the sake of completeness, the following C program was used to find the hash
|
||||
collisions by brute force:
|
||||
|
||||
~~~
|
||||
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
|
||||
|
||||
static unsigned etalon;
|
||||
|
||||
|
||||
|
||||
#define MD_FNV1A_BASE 2166136261
|
||||
#define MD_FNV1A_PRIME 16777619
|
||||
|
||||
static inline unsigned
|
||||
fnv1a(unsigned base, const void* data, size_t n)
|
||||
{
|
||||
const unsigned char* buf = (const unsigned char*) data;
|
||||
unsigned hash = base;
|
||||
size_t i;
|
||||
|
||||
for(i = 0; i < n; i++) {
|
||||
hash ^= buf[i];
|
||||
hash *= MD_FNV1A_PRIME;
|
||||
}
|
||||
|
||||
return hash;
|
||||
}
|
||||
|
||||
|
||||
static unsigned
|
||||
unicode_hash(const char* data, size_t n)
|
||||
{
|
||||
unsigned value;
|
||||
unsigned hash = MD_FNV1A_BASE;
|
||||
int i;
|
||||
|
||||
for(i = 0; i < n; i++) {
|
||||
value = data[i];
|
||||
hash = fnv1a(hash, &value, sizeof(unsigned));
|
||||
}
|
||||
|
||||
return hash;
|
||||
}
|
||||
|
||||
|
||||
static void
|
||||
recurse(char* buffer, size_t off, size_t len)
|
||||
{
|
||||
int ch;
|
||||
|
||||
if(off < len - 1) {
|
||||
for(ch = 'a'; ch <= 'z'; ch++) {
|
||||
buffer[off] = ch;
|
||||
recurse(buffer, off+1, len);
|
||||
}
|
||||
} else {
|
||||
for(ch = 'a'; ch <= 'z'; ch++) {
|
||||
buffer[off] = ch;
|
||||
if(unicode_hash(buffer, len) == etalon) {
|
||||
printf("Dup: %.*s\n", (int)len, buffer);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
main(int argc, char** argv)
|
||||
{
|
||||
char buffer[32];
|
||||
int len;
|
||||
|
||||
if(argc < 2)
|
||||
etalon = unicode_hash("foo", 3);
|
||||
else
|
||||
etalon = unicode_hash(argv[1], strlen(argv[1]));
|
||||
|
||||
for(len = 1; len <= sizeof(buffer); len++)
|
||||
recurse(buffer, 0, len);
|
||||
|
||||
return 0;
|
||||
}
|
||||
~~~
|
|
@ -0,0 +1,41 @@
|
|||
|
||||
# h1
|
||||
## h2
|
||||
### h3
|
||||
#### h4
|
||||
##### h5
|
||||
###### h6
|
||||
|
||||
h1
|
||||
==
|
||||
|
||||
h2
|
||||
--
|
||||
|
||||
--------------------
|
||||
|
||||
indented code
|
||||
|
||||
```
|
||||
fenced code
|
||||
```
|
||||
|
||||
<tag attr='val' attr2="val2">
|
||||
|
||||
> quote
|
||||
|
||||
* list item
|
||||
1. list item
|
||||
|
||||
[ref]: /url
|
||||
|
||||
paragraph
|
||||
© Ӓ ꯍ
|
||||
`code`
|
||||
*emph* **strong** ***strong emph***
|
||||
_emph_ __strong__ ___strong emph___
|
||||
[ref] [ref][] [link](/url)
|
||||
![ref] ![ref][] ![img](/url)
|
||||
<http://example.com> <doe@example.com>
|
||||
www.example.com doe@example.com
|
||||
\\ \* \. \` \
|
|
@ -0,0 +1,8 @@
|
|||
* [ ] unchecked
|
||||
* [x] checked
|
||||
|
||||
A | B | C
|
||||
---|--:|:-:
|
||||
aaa|bbb|ccc
|
||||
|
||||
~del~ ~~del~~
|
|
@ -0,0 +1 @@
|
|||
$a^2+b^2=c^2$ $$a^2+b^2=c^2$$
|
|
@ -0,0 +1 @@
|
|||
[[wiki]] [[wiki|label]]
|
|
@ -0,0 +1,39 @@
|
|||
|
||||
# LaTeX Math
|
||||
|
||||
With the flag `MD_FLAG_LATEXMATHSPANS`, MD4C enables extension for recognition
|
||||
of LaTeX style math spans.
|
||||
|
||||
A math span is is any text wrapped in dollars or double dollars (`$...$` or
|
||||
`$$...$$`).
|
||||
|
||||
```````````````````````````````` example
|
||||
$a+b=c$ Hello, world!
|
||||
.
|
||||
<p><x-equation>a+b=c</x-equation> Hello, world!</p>
|
||||
````````````````````````````````
|
||||
|
||||
If the double dollar sign is used, the math span is a display math span.
|
||||
|
||||
```````````````````````````````` example
|
||||
This is a display equation: $$\int_a^b x dx$$.
|
||||
.
|
||||
<p>This is a display equation: <x-equation type="display">\int_a^b x dx</x-equation>.</p>
|
||||
````````````````````````````````
|
||||
|
||||
Math spans may span multiple lines as they are normal spans:
|
||||
|
||||
```````````````````````````````` example
|
||||
$$
|
||||
\int_a^b
|
||||
f(x) dx
|
||||
$$
|
||||
.
|
||||
<p><x-equation type="display">\int_a^b f(x) dx </x-equation></p>
|
||||
````````````````````````````````
|
||||
|
||||
Note though that many (simple) renderers may output the math spans just as a
|
||||
verbatim text. (This includes the HTML renderer used by the `md2html` utility.)
|
||||
|
||||
Only advanced renderers which implement LaTeX math syntax can be expected to
|
||||
provide better results.
|
|
@ -0,0 +1,194 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
from html.parser import HTMLParser
|
||||
import urllib
|
||||
|
||||
try:
|
||||
from html.parser import HTMLParseError
|
||||
except ImportError:
|
||||
# HTMLParseError was removed in Python 3.5. It could never be
|
||||
# thrown, so we define a placeholder instead.
|
||||
class HTMLParseError(Exception):
|
||||
pass
|
||||
|
||||
from html.entities import name2codepoint
|
||||
import sys
|
||||
import re
|
||||
import cgi
|
||||
|
||||
# Normalization code, adapted from
|
||||
# https://github.com/karlcow/markdown-testsuite/
|
||||
significant_attrs = ["alt", "href", "src", "title"]
|
||||
whitespace_re = re.compile('\s+')
|
||||
class MyHTMLParser(HTMLParser):
|
||||
def __init__(self):
|
||||
HTMLParser.__init__(self)
|
||||
self.convert_charrefs = False
|
||||
self.last = "starttag"
|
||||
self.in_pre = False
|
||||
self.output = ""
|
||||
self.last_tag = ""
|
||||
def handle_data(self, data):
|
||||
after_tag = self.last == "endtag" or self.last == "starttag"
|
||||
after_block_tag = after_tag and self.is_block_tag(self.last_tag)
|
||||
if after_tag and self.last_tag == "br":
|
||||
data = data.lstrip('\n')
|
||||
if not self.in_pre:
|
||||
data = whitespace_re.sub(' ', data)
|
||||
if after_block_tag and not self.in_pre:
|
||||
if self.last == "starttag":
|
||||
data = data.lstrip()
|
||||
elif self.last == "endtag":
|
||||
data = data.strip()
|
||||
self.output += data
|
||||
self.last = "data"
|
||||
def handle_endtag(self, tag):
|
||||
if tag == "pre":
|
||||
self.in_pre = False
|
||||
elif self.is_block_tag(tag):
|
||||
self.output = self.output.rstrip()
|
||||
self.output += "</" + tag + ">"
|
||||
self.last_tag = tag
|
||||
self.last = "endtag"
|
||||
def handle_starttag(self, tag, attrs):
|
||||
if tag == "pre":
|
||||
self.in_pre = True
|
||||
if self.is_block_tag(tag):
|
||||
self.output = self.output.rstrip()
|
||||
self.output += "<" + tag
|
||||
# For now we don't strip out 'extra' attributes, because of
|
||||
# raw HTML test cases.
|
||||
# attrs = filter(lambda attr: attr[0] in significant_attrs, attrs)
|
||||
if attrs:
|
||||
attrs.sort()
|
||||
for (k,v) in attrs:
|
||||
self.output += " " + k
|
||||
if v in ['href','src']:
|
||||
self.output += ("=" + '"' +
|
||||
urllib.quote(urllib.unquote(v), safe='/') + '"')
|
||||
elif v != None:
|
||||
self.output += ("=" + '"' + cgi.escape(v,quote=True) + '"')
|
||||
self.output += ">"
|
||||
self.last_tag = tag
|
||||
self.last = "starttag"
|
||||
def handle_startendtag(self, tag, attrs):
|
||||
"""Ignore closing tag for self-closing """
|
||||
self.handle_starttag(tag, attrs)
|
||||
self.last_tag = tag
|
||||
self.last = "endtag"
|
||||
def handle_comment(self, data):
|
||||
self.output += '<!--' + data + '-->'
|
||||
self.last = "comment"
|
||||
def handle_decl(self, data):
|
||||
self.output += '<!' + data + '>'
|
||||
self.last = "decl"
|
||||
def unknown_decl(self, data):
|
||||
self.output += '<!' + data + '>'
|
||||
self.last = "decl"
|
||||
def handle_pi(self,data):
|
||||
self.output += '<?' + data + '>'
|
||||
self.last = "pi"
|
||||
def handle_entityref(self, name):
|
||||
try:
|
||||
c = chr(name2codepoint[name])
|
||||
except KeyError:
|
||||
c = None
|
||||
self.output_char(c, '&' + name + ';')
|
||||
self.last = "ref"
|
||||
def handle_charref(self, name):
|
||||
try:
|
||||
if name.startswith("x"):
|
||||
c = chr(int(name[1:], 16))
|
||||
else:
|
||||
c = chr(int(name))
|
||||
except ValueError:
|
||||
c = None
|
||||
self.output_char(c, '&' + name + ';')
|
||||
self.last = "ref"
|
||||
# Helpers.
|
||||
def output_char(self, c, fallback):
|
||||
if c == '<':
|
||||
self.output += "<"
|
||||
elif c == '>':
|
||||
self.output += ">"
|
||||
elif c == '&':
|
||||
self.output += "&"
|
||||
elif c == '"':
|
||||
self.output += """
|
||||
elif c == None:
|
||||
self.output += fallback
|
||||
else:
|
||||
self.output += c
|
||||
|
||||
def is_block_tag(self,tag):
|
||||
return (tag in ['article', 'header', 'aside', 'hgroup', 'blockquote',
|
||||
'hr', 'iframe', 'body', 'li', 'map', 'button', 'object', 'canvas',
|
||||
'ol', 'caption', 'output', 'col', 'p', 'colgroup', 'pre', 'dd',
|
||||
'progress', 'div', 'section', 'dl', 'table', 'td', 'dt',
|
||||
'tbody', 'embed', 'textarea', 'fieldset', 'tfoot', 'figcaption',
|
||||
'th', 'figure', 'thead', 'footer', 'tr', 'form', 'ul',
|
||||
'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'video', 'script', 'style'])
|
||||
|
||||
def normalize_html(html):
|
||||
r"""
|
||||
Return normalized form of HTML which ignores insignificant output
|
||||
differences:
|
||||
|
||||
Multiple inner whitespaces are collapsed to a single space (except
|
||||
in pre tags):
|
||||
|
||||
>>> normalize_html("<p>a \t b</p>")
|
||||
'<p>a b</p>'
|
||||
|
||||
>>> normalize_html("<p>a \t\nb</p>")
|
||||
'<p>a b</p>'
|
||||
|
||||
* Whitespace surrounding block-level tags is removed.
|
||||
|
||||
>>> normalize_html("<p>a b</p>")
|
||||
'<p>a b</p>'
|
||||
|
||||
>>> normalize_html(" <p>a b</p>")
|
||||
'<p>a b</p>'
|
||||
|
||||
>>> normalize_html("<p>a b</p> ")
|
||||
'<p>a b</p>'
|
||||
|
||||
>>> normalize_html("\n\t<p>\n\t\ta b\t\t</p>\n\t")
|
||||
'<p>a b</p>'
|
||||
|
||||
>>> normalize_html("<i>a b</i> ")
|
||||
'<i>a b</i> '
|
||||
|
||||
* Self-closing tags are converted to open tags.
|
||||
|
||||
>>> normalize_html("<br />")
|
||||
'<br>'
|
||||
|
||||
* Attributes are sorted and lowercased.
|
||||
|
||||
>>> normalize_html('<a title="bar" HREF="foo">x</a>')
|
||||
'<a href="foo" title="bar">x</a>'
|
||||
|
||||
* References are converted to unicode, except that '<', '>', '&', and
|
||||
'"' are rendered using entities.
|
||||
|
||||
>>> normalize_html("∀&><"")
|
||||
'\u2200&><"'
|
||||
|
||||
"""
|
||||
html_chunk_re = re.compile("(\<!\[CDATA\[.*?\]\]\>|\<[^>]*\>|[^<]+)")
|
||||
try:
|
||||
parser = MyHTMLParser()
|
||||
# We work around HTMLParser's limitations parsing CDATA
|
||||
# by breaking the input into chunks and passing CDATA chunks
|
||||
# through verbatim.
|
||||
for chunk in re.finditer(html_chunk_re, html):
|
||||
if chunk.group(0)[:8] == "<![CDATA":
|
||||
parser.output += chunk.group(0)
|
||||
else:
|
||||
parser.feed(chunk.group(0))
|
||||
parser.close()
|
||||
return parser.output
|
||||
except HTMLParseError as e:
|
||||
sys.stderr.write("Normalization error: " + e.msg + "\n")
|
||||
return html # on error, return unnormalized HTML
|
|
@ -0,0 +1,122 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
import re
|
||||
import argparse
|
||||
import sys
|
||||
import platform
|
||||
from cmark import CMark
|
||||
from timeit import default_timer as timer
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description='Run cmark tests.')
|
||||
parser.add_argument('-p', '--program', dest='program', nargs='?', default=None,
|
||||
help='program to test')
|
||||
parser.add_argument('--library-dir', dest='library_dir', nargs='?',
|
||||
default=None, help='directory containing dynamic library')
|
||||
args = parser.parse_args(sys.argv[1:])
|
||||
|
||||
cmark = CMark(prog=args.program, library_dir=args.library_dir)
|
||||
|
||||
# list of pairs consisting of input and a regex that must match the output.
|
||||
pathological = {
|
||||
# note - some pythons have limit of 65535 for {num-matches} in re.
|
||||
"nested strong emph":
|
||||
(("*a **a " * 65000) + "b" + (" a** a*" * 65000),
|
||||
re.compile("(<em>a <strong>a ){65000}b( a</strong> a</em>){65000}")),
|
||||
"many emph closers with no openers":
|
||||
(("a_ " * 65000),
|
||||
re.compile("(a[_] ){64999}a_")),
|
||||
"many emph openers with no closers":
|
||||
(("_a " * 65000),
|
||||
re.compile("(_a ){64999}_a")),
|
||||
"many 3-emph openers with no closers":
|
||||
(("a***" * 65000),
|
||||
re.compile("(a<em><strong>a</strong></em>){32500}")),
|
||||
"many link closers with no openers":
|
||||
(("a]" * 65000),
|
||||
re.compile("(a\]){65000}")),
|
||||
"many link openers with no closers":
|
||||
(("[a" * 65000),
|
||||
re.compile("(\[a){65000}")),
|
||||
"mismatched openers and closers":
|
||||
(("*a_ " * 50000),
|
||||
re.compile("([*]a[_] ){49999}[*]a_")),
|
||||
"openers and closers multiple of 3":
|
||||
(("a**b" + ("c* " * 50000)),
|
||||
re.compile("a[*][*]b(c[*] ){49999}c[*]")),
|
||||
"link openers and emph closers":
|
||||
(("[ a_" * 50000),
|
||||
re.compile("(\[ a_){50000}")),
|
||||
"hard link/emph case":
|
||||
("**x [a*b**c*](d)",
|
||||
re.compile("\\*\\*x <a href=\"d\">a<em>b\\*\\*c</em></a>")),
|
||||
"nested brackets":
|
||||
(("[" * 50000) + "a" + ("]" * 50000),
|
||||
re.compile("\[{50000}a\]{50000}")),
|
||||
"nested block quotes":
|
||||
((("> " * 50000) + "a"),
|
||||
re.compile("(<blockquote>\r?\n){50000}")),
|
||||
"U+0000 in input":
|
||||
("abc\u0000de\u0000",
|
||||
re.compile("abc\ufffd?de\ufffd?")),
|
||||
"backticks":
|
||||
("".join(map(lambda x: ("e" + "`" * x), range(1,1000))),
|
||||
re.compile("^<p>[e`]*</p>\r?\n$")),
|
||||
"many links":
|
||||
("[t](/u) " * 50000,
|
||||
re.compile("(<a href=\"/u\">t</a> ?){50000}")),
|
||||
"many references":
|
||||
("".join(map(lambda x: ("[" + str(x) + "]: u\n"), range(1,20000 * 16))) + "[0] " * 20000,
|
||||
re.compile("(\[0\] ){19999}")),
|
||||
"deeply nested lists":
|
||||
("".join(map(lambda x: (" " * x + "* a\n"), range(0,1000))),
|
||||
re.compile("<ul>\r?\n(<li>a<ul>\r?\n){999}<li>a</li>\r?\n</ul>\r?\n(</li>\r?\n</ul>\r?\n){999}")),
|
||||
"many html openers and closers":
|
||||
(("<>" * 50000),
|
||||
re.compile("(<>){50000}")),
|
||||
"many html proc. inst. openers":
|
||||
(("x" + "<?" * 50000),
|
||||
re.compile("x(<\\?){50000}")),
|
||||
"many html CDATA openers":
|
||||
(("x" + "<![CDATA[" * 50000),
|
||||
re.compile("x(<!\\[CDATA\\[){50000}")),
|
||||
"many backticks and escapes":
|
||||
(("\\``" * 50000),
|
||||
re.compile("(``){50000}")),
|
||||
"many broken link titles":
|
||||
(("[ (](" * 50000),
|
||||
re.compile("(\[ \(\]\(){50000}")),
|
||||
"broken thematic break":
|
||||
(("* " * 50000 + "a"),
|
||||
re.compile("<ul>\r?\n(<li><ul>\r?\n){49999}<li>a</li>\r?\n</ul>\r?\n(</li>\r?\n</ul>\r?\n){49999}"))
|
||||
}
|
||||
|
||||
whitespace_re = re.compile('/s+/')
|
||||
passed = 0
|
||||
errored = 0
|
||||
failed = 0
|
||||
|
||||
#print("Testing pathological cases:")
|
||||
for description in pathological:
|
||||
(inp, regex) = pathological[description]
|
||||
start = timer()
|
||||
[rc, actual, err] = cmark.to_html(inp)
|
||||
end = timer()
|
||||
if rc != 0:
|
||||
errored += 1
|
||||
print('{:35} [ERRORED (return code %d)]'.format(description, rc))
|
||||
print(err)
|
||||
elif regex.search(actual):
|
||||
print('{:35} [PASSED] {:.3f} secs'.format(description, end-start))
|
||||
passed += 1
|
||||
else:
|
||||
print('{:35} [FAILED]'.format(description))
|
||||
print(repr(actual))
|
||||
failed += 1
|
||||
|
||||
print("%d passed, %d failed, %d errored" % (passed, failed, errored))
|
||||
if (failed == 0 and errored == 0):
|
||||
exit(0)
|
||||
else:
|
||||
exit(1)
|
|
@ -0,0 +1,50 @@
|
|||
|
||||
# Permissive E-mail Autolinks
|
||||
|
||||
With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, MD4C enables more permissive
|
||||
recognition of e-mail addresses and transforms them to autolinks, even if they
|
||||
do not exactly follow the syntax of autolink as specified in CommonMark
|
||||
specification.
|
||||
|
||||
This is standard CommonMark e-mail autolink:
|
||||
|
||||
```````````````````````````````` example
|
||||
E-mail: <mailto:john.doe@gmail.com>
|
||||
.
|
||||
<p>E-mail: <a href="mailto:john.doe@gmail.com">mailto:john.doe@gmail.com</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
With the permissive autolinks enabled, this is sufficient:
|
||||
|
||||
```````````````````````````````` example
|
||||
E-mail: john.doe@gmail.com
|
||||
.
|
||||
<p>E-mail: <a href="mailto:john.doe@gmail.com">john.doe@gmail.com</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
`+` can occur before the `@`, but not after.
|
||||
|
||||
```````````````````````````````` example
|
||||
hello@mail+xyz.example isn't valid, but hello+xyz@mail.example is.
|
||||
.
|
||||
<p>hello@mail+xyz.example isn't valid, but <a href="mailto:hello+xyz@mail.example">hello+xyz@mail.example</a> is.</p>
|
||||
````````````````````````````````
|
||||
|
||||
`.`, `-`, and `_` can occur on both sides of the `@`, but only `.` may occur at
|
||||
the end of the email address, in which case it will not be considered part of
|
||||
the address:
|
||||
|
||||
```````````````````````````````` example
|
||||
a.b-c_d@a.b
|
||||
|
||||
a.b-c_d@a.b.
|
||||
|
||||
a.b-c_d@a.b-
|
||||
|
||||
a.b-c_d@a.b_
|
||||
.
|
||||
<p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a></p>
|
||||
<p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a>.</p>
|
||||
<p>a.b-c_d@a.b-</p>
|
||||
<p>a.b-c_d@a.b_</p>
|
||||
````````````````````````````````
|
|
@ -0,0 +1,92 @@
|
|||
|
||||
# Permissive URL Autolinks
|
||||
|
||||
With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS`, MD4C enables more permissive recognition
|
||||
of URLs and transform them to autolinks, even if they do not exactly follow the syntax
|
||||
of autolink as specified in CommonMark specification.
|
||||
|
||||
This is standard CommonMark autolink:
|
||||
|
||||
```````````````````````````````` example
|
||||
Homepage: <https://github.com/mity/md4c>
|
||||
.
|
||||
<p>Homepage: <a href="https://github.com/mity/md4c">https://github.com/mity/md4c</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
With the permissive autolinks enabled, this is sufficient:
|
||||
|
||||
```````````````````````````````` example
|
||||
Homepage: https://github.com/mity/md4c
|
||||
.
|
||||
<p>Homepage: <a href="https://github.com/mity/md4c">https://github.com/mity/md4c</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
But this permissive autolink feature can work only for very widely used URL
|
||||
schemes, in alphabetical order `ftp:`, `http:`, `https:`.
|
||||
|
||||
That's why this is not a permissive autolink:
|
||||
|
||||
```````````````````````````````` example
|
||||
ssh://root@example.com
|
||||
.
|
||||
<p>ssh://root@example.com</p>
|
||||
````````````````````````````````
|
||||
|
||||
The same rules for path validation as for permissivve WWW autolinks apply.
|
||||
Therefore the final question mark here is not part of the autolink:
|
||||
|
||||
```````````````````````````````` example
|
||||
Have you ever visited http://www.zombo.com?
|
||||
.
|
||||
<p>Have you ever visited <a href="http://www.zombo.com">http://www.zombo.com</a>?</p>
|
||||
````````````````````````````````
|
||||
|
||||
But in contrast, in this example it is:
|
||||
|
||||
```````````````````````````````` example
|
||||
http://www.bing.com/search?q=md4c
|
||||
.
|
||||
<p><a href="http://www.bing.com/search?q=md4c">http://www.bing.com/search?q=md4c</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
And finally one complex example:
|
||||
|
||||
```````````````````````````````` example
|
||||
http://commonmark.org
|
||||
|
||||
(Visit https://encrypted.google.com/search?q=Markup+(business))
|
||||
|
||||
Anonymous FTP is available at ftp://foo.bar.baz.
|
||||
.
|
||||
<p><a href="http://commonmark.org">http://commonmark.org</a></p>
|
||||
<p>(Visit <a href="https://encrypted.google.com/search?q=Markup+(business)">https://encrypted.google.com/search?q=Markup+(business)</a>)</p>
|
||||
<p>Anonymous FTP is available at <a href="ftp://foo.bar.baz">ftp://foo.bar.baz</a>.</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
## GitHub Issues
|
||||
|
||||
### [Issue 53](https://github.com/mity/md4c/issues/53)
|
||||
|
||||
```````````````````````````````` example
|
||||
This is [link](http://github.com/).
|
||||
.
|
||||
<p>This is <a href="http://github.com/">link</a>.</p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
This is [link](http://github.com/)X
|
||||
.
|
||||
<p>This is <a href="http://github.com/">link</a>X</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
## [Issue 76](https://github.com/mity/md4c/issues/76)
|
||||
|
||||
```````````````````````````````` example
|
||||
*(http://example.com)*
|
||||
.
|
||||
<p><em>(<a href="http://example.com">http://example.com</a>)</em></p>
|
||||
````````````````````````````````
|
||||
|
||||
|
|
@ -0,0 +1,107 @@
|
|||
|
||||
# Permissive WWW Autolinks
|
||||
|
||||
With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS`, MD4C enables recognition of
|
||||
autolinks starting with `www.`, even if they do not exactly follow the syntax
|
||||
of autolink as specified in CommonMark specification.
|
||||
|
||||
These do not have to be enclosed in `<` and `>`, and they even do not need
|
||||
any preceding scheme specification.
|
||||
|
||||
The WWW autolink will be recognized when a valid domain is found.
|
||||
|
||||
A valid domain consists of the text `www.`, followed by alphanumeric characters,
|
||||
nderscores (`_`), hyphens (`-`) and periods (`.`). There must be at least one
|
||||
period, and no underscores may be present in the last two segments of the domain.
|
||||
|
||||
The scheme `http` will be inserted automatically:
|
||||
|
||||
```````````````````````````````` example
|
||||
www.commonmark.org
|
||||
.
|
||||
<p><a href="http://www.commonmark.org">www.commonmark.org</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
After a valid domain, zero or more non-space non-`<` characters may follow:
|
||||
|
||||
```````````````````````````````` example
|
||||
Visit www.commonmark.org/help for more information.
|
||||
.
|
||||
<p>Visit <a href="http://www.commonmark.org/help">www.commonmark.org/help</a> for more information.</p>
|
||||
````````````````````````````````
|
||||
|
||||
We then apply extended autolink path validation as follows:
|
||||
|
||||
Trailing punctuation (specifically, `?`, `!`, `.`, `,`, `:`, `*`, `_`, and `~`)
|
||||
will not be considered part of the autolink, though they may be included in the
|
||||
interior of the link:
|
||||
|
||||
```````````````````````````````` example
|
||||
Visit www.commonmark.org.
|
||||
|
||||
Visit www.commonmark.org/a.b.
|
||||
.
|
||||
<p>Visit <a href="http://www.commonmark.org">www.commonmark.org</a>.</p>
|
||||
<p>Visit <a href="http://www.commonmark.org/a.b">www.commonmark.org/a.b</a>.</p>
|
||||
````````````````````````````````
|
||||
|
||||
When an autolink ends in `)`, we scan the entire autolink for the total number
|
||||
of parentheses. If there is a greater number of closing parentheses than
|
||||
opening ones, we don't consider the last character part of the autolink, in
|
||||
order to facilitate including an autolink inside a parenthesis:
|
||||
|
||||
```````````````````````````````` example
|
||||
www.google.com/search?q=Markup+(business)
|
||||
|
||||
(www.google.com/search?q=Markup+(business))
|
||||
.
|
||||
<p><a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a></p>
|
||||
<p>(<a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a>)</p>
|
||||
````````````````````````````````
|
||||
|
||||
This check is only done when the link ends in a closing parentheses `)`, so if
|
||||
the only parentheses are in the interior of the autolink, no special rules are
|
||||
applied:
|
||||
|
||||
```````````````````````````````` example
|
||||
www.google.com/search?q=(business)+ok
|
||||
.
|
||||
<p><a href="http://www.google.com/search?q=(business)+ok">www.google.com/search?q=(business)+ok</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
If an autolink ends in a semicolon (`;`), we check to see if it appears to
|
||||
resemble an [entity reference][entity references]; if the preceding text is `&`
|
||||
followed by one or more alphanumeric characters. If so, it is excluded from
|
||||
the autolink:
|
||||
|
||||
```````````````````````````````` example
|
||||
www.google.com/search?q=commonmark&hl=en
|
||||
|
||||
www.google.com/search?q=commonmark&hl;
|
||||
.
|
||||
<p><a href="http://www.google.com/search?q=commonmark&hl=en">www.google.com/search?q=commonmark&hl=en</a></p>
|
||||
<p><a href="http://www.google.com/search?q=commonmark">www.google.com/search?q=commonmark</a>&hl;</p>
|
||||
````````````````````````````````
|
||||
|
||||
`<` immediately ends an autolink.
|
||||
|
||||
```````````````````````````````` example
|
||||
www.commonmark.org/he<lp
|
||||
.
|
||||
<p><a href="http://www.commonmark.org/he">www.commonmark.org/he</a><lp</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
## GitHub Issues
|
||||
|
||||
### [Issue 53](https://github.com/mity/md4c/issues/53)
|
||||
```````````````````````````````` example
|
||||
This is [link](www.github.com/).
|
||||
.
|
||||
<p>This is <a href="www.github.com/">link</a>.</p>
|
||||
````````````````````````````````
|
||||
```````````````````````````````` example
|
||||
This is [link](www.github.com/)X
|
||||
.
|
||||
<p>This is <a href="www.github.com/">link</a>X</p>
|
||||
````````````````````````````````
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,144 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
import sys
|
||||
from difflib import unified_diff
|
||||
import argparse
|
||||
import re
|
||||
import json
|
||||
from cmark import CMark
|
||||
from normalize import normalize_html
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description='Run cmark tests.')
|
||||
parser.add_argument('-p', '--program', dest='program', nargs='?', default=None,
|
||||
help='program to test')
|
||||
parser.add_argument('-s', '--spec', dest='spec', nargs='?', default='spec.txt',
|
||||
help='path to spec')
|
||||
parser.add_argument('-P', '--pattern', dest='pattern', nargs='?',
|
||||
default=None, help='limit to sections matching regex pattern')
|
||||
parser.add_argument('--library-dir', dest='library_dir', nargs='?',
|
||||
default=None, help='directory containing dynamic library')
|
||||
parser.add_argument('--no-normalize', dest='normalize',
|
||||
action='store_const', const=False, default=True,
|
||||
help='do not normalize HTML')
|
||||
parser.add_argument('-d', '--dump-tests', dest='dump_tests',
|
||||
action='store_const', const=True, default=False,
|
||||
help='dump tests in JSON format')
|
||||
parser.add_argument('--debug-normalization', dest='debug_normalization',
|
||||
action='store_const', const=True,
|
||||
default=False, help='filter stdin through normalizer for testing')
|
||||
parser.add_argument('-n', '--number', type=int, default=None,
|
||||
help='only consider the test with the given number')
|
||||
args = parser.parse_args(sys.argv[1:])
|
||||
|
||||
def out(str):
|
||||
sys.stdout.buffer.write(str.encode('utf-8'))
|
||||
|
||||
def print_test_header(headertext, example_number, start_line, end_line):
|
||||
out("Example %d (lines %d-%d) %s\n" % (example_number,start_line,end_line,headertext))
|
||||
|
||||
def do_test(test, normalize, result_counts):
|
||||
[retcode, actual_html, err] = cmark.to_html(test['markdown'])
|
||||
if retcode == 0:
|
||||
expected_html = test['html']
|
||||
unicode_error = None
|
||||
if normalize:
|
||||
try:
|
||||
passed = normalize_html(actual_html) == normalize_html(expected_html)
|
||||
except UnicodeDecodeError as e:
|
||||
unicode_error = e
|
||||
passed = False
|
||||
else:
|
||||
passed = actual_html == expected_html
|
||||
if passed:
|
||||
result_counts['pass'] += 1
|
||||
else:
|
||||
print_test_header(test['section'], test['example'], test['start_line'], test['end_line'])
|
||||
out(test['markdown'] + '\n')
|
||||
if unicode_error:
|
||||
out("Unicode error: " + str(unicode_error) + '\n')
|
||||
out("Expected: " + repr(expected_html) + '\n')
|
||||
out("Got: " + repr(actual_html) + '\n')
|
||||
else:
|
||||
expected_html_lines = expected_html.splitlines(True)
|
||||
actual_html_lines = actual_html.splitlines(True)
|
||||
for diffline in unified_diff(expected_html_lines, actual_html_lines,
|
||||
"expected HTML", "actual HTML"):
|
||||
out(diffline)
|
||||
out('\n')
|
||||
result_counts['fail'] += 1
|
||||
else:
|
||||
print_test_header(test['section'], test['example'], test['start_line'], test['end_line'])
|
||||
out("program returned error code %d\n" % retcode)
|
||||
sys.stdout.buffer.write(err)
|
||||
result_counts['error'] += 1
|
||||
|
||||
def get_tests(specfile):
|
||||
line_number = 0
|
||||
start_line = 0
|
||||
end_line = 0
|
||||
example_number = 0
|
||||
markdown_lines = []
|
||||
html_lines = []
|
||||
state = 0 # 0 regular text, 1 markdown example, 2 html output
|
||||
headertext = ''
|
||||
tests = []
|
||||
|
||||
header_re = re.compile('#+ ')
|
||||
|
||||
with open(specfile, 'r', encoding='utf-8', newline='\n') as specf:
|
||||
for line in specf:
|
||||
line_number = line_number + 1
|
||||
l = line.strip()
|
||||
#if l == "`" * 32 + " example":
|
||||
if re.match("`{32} example( [a-z]{1,})?", l):
|
||||
state = 1
|
||||
elif state == 2 and l == "`" * 32:
|
||||
state = 0
|
||||
example_number = example_number + 1
|
||||
end_line = line_number
|
||||
tests.append({
|
||||
"markdown":''.join(markdown_lines).replace('→',"\t"),
|
||||
"html":''.join(html_lines).replace('→',"\t"),
|
||||
"example": example_number,
|
||||
"start_line": start_line,
|
||||
"end_line": end_line,
|
||||
"section": headertext})
|
||||
start_line = 0
|
||||
markdown_lines = []
|
||||
html_lines = []
|
||||
elif l == ".":
|
||||
state = 2
|
||||
elif state == 1:
|
||||
if start_line == 0:
|
||||
start_line = line_number - 1
|
||||
markdown_lines.append(line)
|
||||
elif state == 2:
|
||||
html_lines.append(line)
|
||||
elif state == 0 and re.match(header_re, line):
|
||||
headertext = header_re.sub('', line).strip()
|
||||
return tests
|
||||
|
||||
if __name__ == "__main__":
|
||||
if args.debug_normalization:
|
||||
out(normalize_html(sys.stdin.read()))
|
||||
exit(0)
|
||||
|
||||
all_tests = get_tests(args.spec)
|
||||
if args.pattern:
|
||||
pattern_re = re.compile(args.pattern, re.IGNORECASE)
|
||||
else:
|
||||
pattern_re = re.compile('.')
|
||||
tests = [ test for test in all_tests if re.search(pattern_re, test['section']) and (not args.number or test['example'] == args.number) ]
|
||||
if args.dump_tests:
|
||||
out(json.dumps(tests, ensure_ascii=False, indent=2))
|
||||
exit(0)
|
||||
else:
|
||||
skipped = len(all_tests) - len(tests)
|
||||
cmark = CMark(prog=args.program, library_dir=args.library_dir)
|
||||
result_counts = {'pass': 0, 'fail': 0, 'error': 0, 'skip': skipped}
|
||||
for test in tests:
|
||||
do_test(test, args.normalize, result_counts)
|
||||
out("{pass} passed, {fail} failed, {error} errored, {skip} skipped\n".format(**result_counts))
|
||||
exit(result_counts['fail'] + result_counts['error'])
|
|
@ -0,0 +1,75 @@
|
|||
|
||||
# Strike-Through
|
||||
|
||||
With the flag `MD_FLAG_STRIKETHROUGH`, MD4C enables extension for recognition
|
||||
of strike-through spans.
|
||||
|
||||
Strike-through text is any text wrapped in one or two tildes (`~`).
|
||||
|
||||
```````````````````````````````` example
|
||||
~Hi~ Hello, world!
|
||||
.
|
||||
<p><del>Hi</del> Hello, world!</p>
|
||||
````````````````````````````````
|
||||
|
||||
If the length of the opener and closer doesn't match, the strike-through is
|
||||
not recognized.
|
||||
|
||||
```````````````````````````````` example
|
||||
This ~text~~ is curious.
|
||||
.
|
||||
<p>This ~text~~ is curious.</p>
|
||||
````````````````````````````````
|
||||
|
||||
Too long tilde sequence won't be recognized:
|
||||
|
||||
```````````````````````````````` example
|
||||
foo ~~~bar~~~
|
||||
.
|
||||
<p>foo ~~~bar~~~</p>
|
||||
````````````````````````````````
|
||||
|
||||
Also note the markers cannot open a strike-through span if they are followed
|
||||
with a whitespace; and similarly, then cannot close the span if they are
|
||||
preceded with a whitespace:
|
||||
|
||||
```````````````````````````````` example
|
||||
~foo ~bar
|
||||
.
|
||||
<p>~foo ~bar</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
As with regular emphasis delimiters, a new paragraph will cause the cessation
|
||||
of parsing a strike-through:
|
||||
|
||||
```````````````````````````````` example
|
||||
This ~~has a
|
||||
|
||||
new paragraph~~.
|
||||
.
|
||||
<p>This ~~has a</p>
|
||||
<p>new paragraph~~.</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
## GitHub Issues
|
||||
|
||||
### [Issue 69](https://github.com/mity/md4c/issues/69)
|
||||
```````````````````````````````` example
|
||||
~`foo`~
|
||||
.
|
||||
<p><del><code>foo</code></del></p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
~*foo*~
|
||||
.
|
||||
<p><del><em>foo</em></del></p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
*~foo~*
|
||||
.
|
||||
<p><em><del>foo</del></em></p>
|
||||
````````````````````````````````
|
|
@ -0,0 +1,363 @@
|
|||
|
||||
# Tables
|
||||
|
||||
With the flag `MD_FLAG_TABLES`, MD4C enables extension for recognition of
|
||||
tables.
|
||||
|
||||
Basic table example of a table with two columns and three lines (when not
|
||||
counting the header) is as follows:
|
||||
|
||||
```````````````````````````````` example
|
||||
| Column 1 | Column 2 |
|
||||
|----------|----------|
|
||||
| foo | bar |
|
||||
| baz | qux |
|
||||
| quux | quuz |
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td>foo</td><td>bar</td></tr>
|
||||
<tr><td>baz</td><td>qux</td></tr>
|
||||
<tr><td>quux</td><td>quuz</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
The leading and succeeding pipe characters (`|`) on each line are optional:
|
||||
|
||||
```````````````````````````````` example
|
||||
Column 1 | Column 2 |
|
||||
---------|--------- |
|
||||
foo | bar |
|
||||
baz | qux |
|
||||
quux | quuz |
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td>foo</td><td>bar</td></tr>
|
||||
<tr><td>baz</td><td>qux</td></tr>
|
||||
<tr><td>quux</td><td>quuz</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
| Column 1 | Column 2
|
||||
|----------|---------
|
||||
| foo | bar
|
||||
| baz | qux
|
||||
| quux | quuz
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td>foo</td><td>bar</td></tr>
|
||||
<tr><td>baz</td><td>qux</td></tr>
|
||||
<tr><td>quux</td><td>quuz</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
Column 1 | Column 2
|
||||
---------|---------
|
||||
foo | bar
|
||||
baz | qux
|
||||
quux | quuz
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td>foo</td><td>bar</td></tr>
|
||||
<tr><td>baz</td><td>qux</td></tr>
|
||||
<tr><td>quux</td><td>quuz</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
However for one-column table, at least one pipe has to be used in the table
|
||||
header underline, otherwise it would be parsed as a Setext title followed by
|
||||
a paragraph.
|
||||
|
||||
```````````````````````````````` example
|
||||
Column 1
|
||||
--------
|
||||
foo
|
||||
baz
|
||||
quux
|
||||
.
|
||||
<h2>Column 1</h2>
|
||||
<p>foo
|
||||
baz
|
||||
quux</p>
|
||||
````````````````````````````````
|
||||
|
||||
Leading and trailing whitespace in a table cell is ignored and the columns do
|
||||
not need to be aligned.
|
||||
|
||||
```````````````````````````````` example
|
||||
Column 1 |Column 2
|
||||
---|---
|
||||
foo | bar
|
||||
baz| qux
|
||||
quux|quuz
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td>foo</td><td>bar</td></tr>
|
||||
<tr><td>baz</td><td>qux</td></tr>
|
||||
<tr><td>quux</td><td>quuz</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
The table cannot interrupt a paragraph.
|
||||
|
||||
```````````````````````````````` example
|
||||
Lorem ipsum dolor sit amet.
|
||||
| Column 1 | Column 2
|
||||
| ---------|---------
|
||||
| foo | bar
|
||||
| baz | qux
|
||||
| quux | quuz
|
||||
.
|
||||
<p>Lorem ipsum dolor sit amet.
|
||||
| Column 1 | Column 2
|
||||
| ---------|---------
|
||||
| foo | bar
|
||||
| baz | qux
|
||||
| quux | quuz</p>
|
||||
````````````````````````````````
|
||||
|
||||
Similarly, paragraph cannot interrupt a table:
|
||||
|
||||
```````````````````````````````` example
|
||||
Column 1 | Column 2
|
||||
---------|---------
|
||||
foo | bar
|
||||
baz | qux
|
||||
quux | quuz
|
||||
Lorem ipsum dolor sit amet.
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td>foo</td><td>bar</td></tr>
|
||||
<tr><td>baz</td><td>qux</td></tr>
|
||||
<tr><td>quux</td><td>quuz</td></tr>
|
||||
<tr><td>Lorem ipsum dolor sit amet.</td><td></td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
The underline of the table is crucial for recognition of the table, count of
|
||||
its columns and their alignment: The line has to contain at least one pipe,
|
||||
and it has provide at least three dash (`-`) characters for every column in
|
||||
the table.
|
||||
|
||||
Thus this is not a table because there are too few dashes for Column 2.
|
||||
|
||||
```````````````````````````````` example
|
||||
| Column 1 | Column 2
|
||||
| ---------|--
|
||||
| foo | bar
|
||||
| baz | qux
|
||||
| quux | quuz
|
||||
.
|
||||
<p>| Column 1 | Column 2
|
||||
| ---------|--
|
||||
| foo | bar
|
||||
| baz | qux
|
||||
| quux | quuz</p>
|
||||
````````````````````````````````
|
||||
|
||||
The first, the last or both the first and the last dash in each column
|
||||
underline can be replaced with a colon (`:`) to request left, right or middle
|
||||
alignment of the respective column:
|
||||
|
||||
```````````````````````````````` example
|
||||
| Column 1 | Column 2 | Column 3 | Column 4 |
|
||||
|----------|:---------|:--------:|---------:|
|
||||
| default | left | center | right |
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th align="left">Column 2</th><th align="center">Column 3</th><th align="right">Column 4</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td>default</td><td align="left">left</td><td align="center">center</td><td align="right">right</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
To include a literal pipe character in any cell, it has to be escaped.
|
||||
|
||||
```````````````````````````````` example
|
||||
Column 1 | Column 2
|
||||
---------|---------
|
||||
foo | bar
|
||||
baz | qux \| xyzzy
|
||||
quux | quuz
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td>foo</td><td>bar</td></tr>
|
||||
<tr><td>baz</td><td>qux | xyzzy</td></tr>
|
||||
<tr><td>quux</td><td>quuz</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
Contents of each cell is parsed as an inline text which may contents any
|
||||
inline Markdown spans like emphasis, strong emphasis, links etc.
|
||||
|
||||
```````````````````````````````` example
|
||||
Column 1 | Column 2
|
||||
---------|---------
|
||||
*foo* | bar
|
||||
**baz** | [qux]
|
||||
quux | [quuz](/url2)
|
||||
|
||||
[qux]: /url
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td><em>foo</em></td><td>bar</td></tr>
|
||||
<tr><td><strong>baz</strong></td><td><a href="/url">qux</a></td></tr>
|
||||
<tr><td>quux</td><td><a href="/url2">quuz</a></td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
However pipes which are inside a code span are not recognized as cell
|
||||
boundaries.
|
||||
|
||||
```````````````````````````````` example
|
||||
Column 1 | Column 2
|
||||
---------|---------
|
||||
`foo | bar`
|
||||
baz | qux
|
||||
quux | quuz
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr><th>Column 1</th><th>Column 2</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td><code>foo | bar</code></td><td></td></tr>
|
||||
<tr><td>baz</td><td>qux</td></tr>
|
||||
<tr><td>quux</td><td>quuz</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
## GitHub Issues
|
||||
|
||||
### [Issue 41](https://github.com/mity/md4c/issues/41)
|
||||
```````````````````````````````` example
|
||||
* x|x
|
||||
---|---
|
||||
.
|
||||
<ul>
|
||||
<li>x|x
|
||||
---|---</li>
|
||||
</ul>
|
||||
````````````````````````````````
|
||||
(Not a table, because the underline has wrong indentation and is not part of the
|
||||
list item.)
|
||||
|
||||
```````````````````````````````` example
|
||||
* x|x
|
||||
---|---
|
||||
x|x
|
||||
.
|
||||
<ul>
|
||||
<li><table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>x</th>
|
||||
<th>x</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
</tbody>
|
||||
</table>
|
||||
</li>
|
||||
</ul>
|
||||
<p>x|x</p>
|
||||
````````````````````````````````
|
||||
(Here the underline has the right indentation so the table is detected.
|
||||
But the last line is not part of it due its indentation.)
|
||||
|
||||
|
||||
### [Issue 42](https://github.com/mity/md4c/issues/42)
|
||||
|
||||
```````````````````````````````` example
|
||||
] http://x.x *x*
|
||||
|
||||
|x|x|
|
||||
|---|---|
|
||||
|x|
|
||||
.
|
||||
<p>] http://x.x <em>x</em></p>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>x</th>
|
||||
<th>x</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>x</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
### [Issue 104](https://github.com/mity/md4c/issues/104)
|
||||
|
||||
```````````````````````````````` example
|
||||
A | B
|
||||
--- | ---
|
||||
[x](url)
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>A</th>
|
||||
<th>B</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><a href="url">x</a></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
|
@ -0,0 +1,117 @@
|
|||
|
||||
# Tasklists
|
||||
|
||||
With the flag `MD_FLAG_TASKLISTS`, MD4C enables extension for recognition of
|
||||
task lists.
|
||||
|
||||
Basic task list may look as follows:
|
||||
|
||||
```````````````````````````````` example
|
||||
* [x] foo
|
||||
* [X] bar
|
||||
* [ ] baz
|
||||
.
|
||||
<ul>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
|
||||
</ul>
|
||||
````````````````````````````````
|
||||
|
||||
Task lists can also be in ordered lists:
|
||||
|
||||
```````````````````````````````` example
|
||||
1. [x] foo
|
||||
2. [X] bar
|
||||
3. [ ] baz
|
||||
.
|
||||
<ol>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
|
||||
</ol>
|
||||
````````````````````````````````
|
||||
|
||||
Task lists can also be nested in ordinary lists:
|
||||
|
||||
```````````````````````````````` example
|
||||
* xxx:
|
||||
* [x] foo
|
||||
* [x] bar
|
||||
* [ ] baz
|
||||
* yyy:
|
||||
* [ ] qux
|
||||
* [x] quux
|
||||
* [ ] quuz
|
||||
.
|
||||
<ul>
|
||||
<li>xxx:
|
||||
<ul>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
|
||||
</ul></li>
|
||||
<li>yyy:
|
||||
<ul>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>qux</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>quux</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>quuz</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
````````````````````````````````
|
||||
|
||||
Or in a parent task list:
|
||||
|
||||
```````````````````````````````` example
|
||||
1. [x] xxx:
|
||||
* [x] foo
|
||||
* [x] bar
|
||||
* [ ] baz
|
||||
2. [ ] yyy:
|
||||
* [ ] qux
|
||||
* [x] quux
|
||||
* [ ] quuz
|
||||
.
|
||||
<ol>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>xxx:
|
||||
<ul>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li>
|
||||
</ul></li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>yyy:
|
||||
<ul>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>qux</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>quux</li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>quuz</li>
|
||||
</ul></li>
|
||||
</ol>
|
||||
````````````````````````````````
|
||||
|
||||
Also, ordinary lists can be nested in the task lists.
|
||||
|
||||
```````````````````````````````` example
|
||||
* [x] xxx:
|
||||
* foo
|
||||
* bar
|
||||
* baz
|
||||
* [ ] yyy:
|
||||
* qux
|
||||
* quux
|
||||
* quuz
|
||||
.
|
||||
<ul>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>xxx:
|
||||
<ul>
|
||||
<li>foo</li>
|
||||
<li>bar</li>
|
||||
<li>baz</li>
|
||||
</ul></li>
|
||||
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>yyy:
|
||||
<ul>
|
||||
<li>qux</li>
|
||||
<li>quux</li>
|
||||
<li>quuz</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
````````````````````````````````
|
|
@ -0,0 +1,39 @@
|
|||
|
||||
# Underline
|
||||
|
||||
With the flag `MD_FLAG_UNDERLINE`, MD4C sees underscore `_` rather as a mark
|
||||
denoting an underlined span rather then an ordinary emphasis (or a strong
|
||||
emphasis).
|
||||
|
||||
```````````````````````````````` example
|
||||
_foo_
|
||||
.
|
||||
<p><u>foo</u></p>
|
||||
````````````````````````````````
|
||||
|
||||
In sequences of multiple underscores, each single one translates into an
|
||||
underline span mark.
|
||||
|
||||
```````````````````````````````` example
|
||||
___foo___
|
||||
.
|
||||
<p><u><u><u>foo</u></u></u></p>
|
||||
````````````````````````````````
|
||||
|
||||
Intra-word underscores are not recognized as underline marks:
|
||||
|
||||
```````````````````````````````` example
|
||||
foo_bar_baz
|
||||
.
|
||||
<p>foo_bar_baz</p>
|
||||
````````````````````````````````
|
||||
|
||||
Also the parser follows the standard understanding when the underscore can
|
||||
or cannot open or close a span. Therefore there is no underline in the following
|
||||
example because no underline can be seen as a closing mark.
|
||||
|
||||
```````````````````````````````` example
|
||||
_foo _bar
|
||||
.
|
||||
<p>_foo _bar</p>
|
||||
````````````````````````````````
|
|
@ -0,0 +1,232 @@
|
|||
|
||||
# Wiki Links
|
||||
|
||||
With the flag `MD_FLAG_WIKILINKS`, MD4C recognizes wiki links.
|
||||
|
||||
The simple wiki-link is a wiki-link destination enclosed in `[[` followed with
|
||||
`]]`.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo">foo</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
However wiki-link may contain an explicit label, delimited from the destination
|
||||
with `|`.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo|bar]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo">bar</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
A wiki-link destination cannot be empty.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[]]
|
||||
.
|
||||
<p>[[]]</p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
[[|foo]]
|
||||
.
|
||||
<p>[[|foo]]</p>
|
||||
````````````````````````````````
|
||||
|
||||
|
||||
The wiki-link destination cannot contain a new line.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo
|
||||
bar]]
|
||||
.
|
||||
<p>[[foo
|
||||
bar]]</p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo
|
||||
bar|baz]]
|
||||
.
|
||||
<p>[[foo
|
||||
bar|baz]]</p>
|
||||
````````````````````````````````
|
||||
|
||||
The wiki-link destination is rendered verbatim; inline markup in it is not
|
||||
recognized.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[*foo*]]
|
||||
.
|
||||
<p><x-wikilink data-target="*foo*">*foo*</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo|![bar](bar.jpg)]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo"><img src="bar.jpg" alt="bar"></x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
With multiple `|` delimiters, only the first one is recognized and the other
|
||||
ones are part of the label.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo|bar|baz]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo">bar|baz</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
However the delimiter `|` can be escaped with `/`.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo\|bar|baz]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo|bar">baz</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
The label can contain inline elements.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo|*bar*]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo"><em>bar</em></x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
Empty explicit label is the same as using the implicit label; i.e. the verbatim
|
||||
destination string is used as the label.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo|]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo">foo</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
The label can span multiple lines.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo|foo
|
||||
bar
|
||||
baz]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo">foo
|
||||
bar
|
||||
baz</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
Wiki-links have higher priority then links.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo]](foo.jpg)
|
||||
.
|
||||
<p><x-wikilink data-target="foo">foo</x-wikilink>(foo.jpg)</p>
|
||||
````````````````````````````````
|
||||
|
||||
```````````````````````````````` example
|
||||
[foo]: /url
|
||||
|
||||
[[foo]]
|
||||
.
|
||||
<p><x-wikilink data-target="foo">foo</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
Wiki links can be inlined in tables.
|
||||
|
||||
```````````````````````````````` example
|
||||
| A | B |
|
||||
|------------------|-----|
|
||||
| [[foo|*bar*]] | baz |
|
||||
.
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>A</th>
|
||||
<th>B</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><x-wikilink data-target="foo"><em>bar</em></x-wikilink></td>
|
||||
<td>baz</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
````````````````````````````````
|
||||
|
||||
Wiki-links are not prioritized over images.
|
||||
|
||||
```````````````````````````````` example
|
||||
![[foo]](foo.jpg)
|
||||
.
|
||||
<p><img src="foo.jpg" alt="[foo]"></p>
|
||||
````````````````````````````````
|
||||
|
||||
Something that may look like a wiki-link at first, but turns out not to be,
|
||||
is recognized as a normal link.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo]
|
||||
|
||||
[foo]: /url
|
||||
.
|
||||
<p>[<a href="/url">foo</a></p>
|
||||
````````````````````````````````
|
||||
|
||||
Escaping the opening `[` escapes only that one character, not the whole `[[`
|
||||
opener:
|
||||
|
||||
```````````````````````````````` example
|
||||
\[[foo]]
|
||||
|
||||
[foo]: /url
|
||||
.
|
||||
<p>[<a href="/url">foo</a>]</p>
|
||||
````````````````````````````````
|
||||
|
||||
Like with other inline links, the innermost wiki-link is preferred.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[foo[[bar]]]]
|
||||
.
|
||||
<p>[[foo<x-wikilink data-target="bar">bar</x-wikilink>]]</p>
|
||||
````````````````````````````````
|
||||
|
||||
There is limit of 100 characters for the wiki-link destination.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901]]
|
||||
[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901|foo]]
|
||||
.
|
||||
<p>[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901]]
|
||||
[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901|foo]]</p>
|
||||
````````````````````````````````
|
||||
|
||||
100 characters inside a wiki link target works.
|
||||
|
||||
```````````````````````````````` example
|
||||
[[1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890]]
|
||||
[[1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890|foo]]
|
||||
.
|
||||
<p><x-wikilink data-target="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890">1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890</x-wikilink>
|
||||
<x-wikilink data-target="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890">foo</x-wikilink></p>
|
||||
````````````````````````````````
|
||||
|
||||
The limit on link content does not include any characters belonging to a block
|
||||
quote, if the label spans multiple lines contained in a block quote.
|
||||
|
||||
```````````````````````````````` example
|
||||
> [[12345678901234567890123456789012345678901234567890|1234567890
|
||||
> 1234567890
|
||||
> 1234567890
|
||||
> 1234567890
|
||||
> 123456789]]
|
||||
.
|
||||
<blockquote>
|
||||
<p><x-wikilink data-target="12345678901234567890123456789012345678901234567890">1234567890
|
||||
1234567890
|
||||
1234567890
|
||||
1234567890
|
||||
123456789</x-wikilink></p>
|
||||
</blockquote>
|
||||
````````````````````````````````
|
Loading…
Reference in New Issue