[svn r37984] Moved all the apigen information into one document and adjusted to match the

current situation.

--HG--
branch : trunk
This commit is contained in:
guido 2007-02-05 23:37:23 +01:00
parent 1c287a2546
commit 102409a846
5 changed files with 144 additions and 317 deletions

View File

@ -1,36 +1,144 @@
===========================================
apigen - API documentation generation tool apigen - API documentation generation tool
=========================================== ===========================================
What is it? What is it?
------------ ===========
Apigen is a tool for automatically generating API reference documentation for Apigen is a tool for automatically generating API reference documentation for
Python projects. It works by examining code at runtime rather than at Python projects. It works by examining code at runtime rather than at compile
compile time. This way it is capable of displaying information time. This way it is capable of displaying information about the code base
about the code base after initialization. A drawback is that after initialization. A drawback is that you cannot easily document source code
you cannot easily document source code that automatically that automatically starts server processes or has some other irreversible
starts server processes or has some other irreversible effects upon getting imported. effects upon getting imported.
The apigen functionality can either be used from code, or from The apigen functionality is normally triggered from :api:`py.test`, and while
py.test, in the latter case it will gather information about running the tests it gathers information such as code paths, arguments and
modules, classes to export explicitely by using provided script. return values of callables, and exceptions that can be raised while the code
runs (XXX not yet!) to include in the documentation. It's also possible to
run the tracer (which collects the data) in other code if your project
does not use :api:`py.test` but still wants to collect the runtime information
and build the docs.
Please note that apigen is currently geared towards documenting the Apigen is written for the :api:`py` lib, but can be used to build documentation
py library itself, making it nicely work for other projects for any project: there are hooks in py.test to, by providing a simple script,
may still require a bit of adaption and refinement work. build api documentation for the tested project when running py.test. Of course
this does imply :api:`py.test` is actually used: if little or no tests are
actually ran, the additional information (code paths, arguments and return
values and exceptions) can not be gathered and thus there will be less of an
advantage of apigen compared to other solutions.
Using from code Features
---------------- ========
The library provides a simple API to generate a py.rest.rst tree (which Some features were mentioned above already, but here's a complete list of all
represents a ReStructuredText document), along with some helper classes to the niceties apigen has to offer:
control the output. The most important objects are the Tracer, which traces
code execution by registering itself with sys.settrace, the DocStorage class, * source documents
that stores Tracer information, and the RestGen class which creates a ReST
tree (see py.rest.rst). Apigen not only builds the API documentation, but also a tree of
syntax-colored source files, with links from the API docs to the source
files.
* abundance of information
compared to other documentation generation tools, apigen produces an
abundant amount of information: it provides syntax-colored code snippets,
code path traces, etc.
* linking
besides links to the source files, apigen provides links all across the
documentation: callable arguments and return values link to their
definition (if part of the documented code), class definition to their
base classes (again, if they're part of the documented code), and
everywhere are links to the source files (including in traces)
* (hopefully) improves testing
because the documentation is built partially from test results, developers
may (especially if they're using the documentation themselves) be more
aware of untested parts of the code, or parts can use more tests or need
attention
Using apigen
============
To trigger apigen, all you need to do is run the :source:`py/bin/py.test` tool
with an --apigen argument, as such::
$ py.test --apigen=<path>
where <path> is a path to a script containing some special hooks to build
the documents (see below). The script to build the documents for the :api:`py`
lib can be found in :source:`py/apigen/apigen.py`, so building those documents
can be done by cd'ing to the 'py' directory, and executing::
$ py.test --apigen=apigen/apigen.py
The documents will by default be built in the *parent directory* of the
*package dir* (in this case the 'py' directory). Be careful that you don't
overwrite anything!
Other projects
==============
To use apigen from another project, there are three things that you need to do:
Use :api:`py.test` for unit tests
---------------------------------
This is a good idea anyway... ;) The more tests, the more tracing information
and such can be built, so it makes sense to have good test coverage when using
this tool.
Provide :api:`py.test` hooks
----------------------------
To hook into the unit testing framework, you will need to write a script with
two functions. The first should be called 'get_documentable_items', gets a
package dir (the root of the project) as argument, and should return a tuple
with the package name as first element, and a dict as second. The dict should
contain, for all the to-be-documented items, a dotted name as key and a
reference to the item as value.
The second function should be called 'build', and gets also the package dir as
argument, but also a reference to a DocStorageAcessor, which contains
information gathered by the tracer, and a reference to a
:api:`py.io.StdCaptureFD` instance that is used to capture stdout and stderr,
and allows writing to them, when the docs are built.
This 'build' function is responsible for actually building the documentation,
and, depending on your needs, can be used to control each aspect of it. In most
situations you will just copy the code from :source:`py/apigen/apigen.py`'s
build() function, but if you want you can choose to build entirely different
output formats by directly accessing the DocStorageAccessor class.
Provide layout
--------------
For the :api:`py` lib tests, the 'LayoutPage' class found in
:source:`py/apigen/layout.py` is used, which produces HTML specific for that
particular library (with a menubar, etc.). To customize this, you will need to
provide a similar class, most probably using the Page base class from
:source:`py/doc/confrest.py`. Note that this step depends on how heavy the
customization in the previous step is done: if you decide to directly use the
DocStorageAccessor rather than let the code in :source:`py/apigen/htmlgen.py`
build HTML for you, this can be skipped.
Using apigen from code
======================
If you want to avoid using :api:`py.test`, or have an other idea of how to best
collect information while running code, the apigen functionality can be
directly accessed. The most important classes are the Tracer class found in
:source:`py/apigen/tracer/tracer.py`, which holds the information gathered
during the tests, and the DocStorage and DocStorageAccessor classes from
:source:`py/apigen/tracer/docstorage.py`, which (respectively) store the data,
and make it accessible.
Gathering information Gathering information
++++++++++++++++++++++ ---------------------
To gather information about documentation, you will first need to tell the tool To gather information about documentation, you will first need to tell the tool
what objects it should investigate. Only information for registered objects what objects it should investigate. Only information for registered objects
@ -50,71 +158,24 @@ will be stored. An example::
>>> t.end_tracing() >>> t.end_tracing()
Now the 'ds' variable should contain all kinds of information about both the Now the 'ds' variable should contain all kinds of information about both the
py.path.local and the py.path.svnwc class (it will walk through 'toregister' to :api:`py.path.local` and the :api:`py.path.svnwc` classes, and things like call
find information about all it contains), and things like call stacks, and stacks, possible argument types, etc. as additional information about
possible argument types, etc. as additional information about :api:`py.path.local.check()` (since it was called from the traced code).
py.path.local.check() (since it was called from the traced code).
Viewing information Using the information
++++++++++++++++++++ ---------------------
Viewing the information stored in the DocStorage instance isn't very hard To use the information, we need to get a DocStorageAccessor instance to
either. As explained there is a RestGen class that creates a py.rest.rst tree, provide access to the data stored in the DocStorage object::
which can directly be serialized to ReStructuredText, which can in turn be
converted to other formats. Also the py.rest.rst tree can be manipulated
directly, or converted to formats other than ReST (currently only HTML) using
special transformers.
There are several helper classes available that wrap the output format >>> dsa = DocStorageAccessor(ds)
generation. There are two types of helper classes, 'LinkWriters' and 'Writers'.
The first are responsible for generating external links (for viewing source),
the second for generating the actual output from the py.rest.rst tree, and
for generating internal links (which is directly related to generating output).
Instances of these classes are passed to the RestGen class as arguments on
instantiation.
An example of creating a directory with seperate ReST files (using DirWriter) Currently there is no API reference available for this object, so you'll have
from the 'ds' DocumentStorage instance we created below, without any external to read the source (:source:`py/apigen/tracer/docstorage.py`) to see what
links (using DirectPaste). functionality it offers.
::
>>> from py.__.apigen.rest.genrest import RestGen, DirectPaste, DirWriter
>>> # create a temp dir in /tmp/pytest-<userid>
>>> tempdir = py.test.ensuretemp('apigen_example')
>>> rg = RestGen(DocStorageAccessor(ds), DirectPaste(), DirWriter(tempdir))
>>> rg.write()
An example of a somewhat more 'real-life' use case, writing to a directory of
HTML files (this uses py.rest.transform), generating links to ViewVC source
views::
>>> from py.__.apigen.rest.genrest import ViewVC, HTMLDirWriter
>>> from py.__.apigen.rest.htmlhandlers import HTMLHandler
>>> tempdir = py.test.ensuretemp('apigen_example_2')
>>> rg = RestGen(DocStorageAccessor(ds), ViewVC('http://some.host.com/viewvc/myproj/trunk/'),
... HTMLDirWriter(HTMLHandler, HTMLHandler, tempdir))
>>> rg.write()
Using from py.test
-------------------
Running unit tests forms an ideal opportunity for apigen to find out about what
happens when code is executed (assuming you have proper test coverage ;). There
are hooks built into py.test that allow you to do that:
* Write down a python script which contains at least two functions
- `get_documentable_items() -> {}` - function which will return dictionary
of name to object of exported items
- `build(pkgpath, docstorageaccessor)` - function which will be invoked afterwards
with DocStorageAccessor instance as an argument (you should read DocStorageAccessor
interface to know how you can access it)
XXX: Write down some example usage after guido implement the script
Comparison with other documentation generation tools Comparison with other documentation generation tools
---------------------------------------------------- ====================================================
Apigen is of course not the only documentation generation tool available for Apigen is of course not the only documentation generation tool available for
Python. Although we knew in advance that our tool had certain features the Python. Although we knew in advance that our tool had certain features the
@ -122,7 +183,7 @@ others do not offer, we decided to investigate a bit so that we could do a
proper comparison. proper comparison.
Tools examined Tools examined
++++++++++++++ --------------
After some 'googling around', it turned out that the amount of documentation After some 'googling around', it turned out that the amount of documentation
generation tools available was surprisingly low. There were only 5 packages generation tools available was surprisingly low. There were only 5 packages
@ -179,7 +240,7 @@ Quick overview:
* written for Twisted, but quite nice output with other applications * written for Twisted, but quite nice output with other applications
Quick overview lists of the other tools Quick overview lists of the other tools
+++++++++++++++++++++++++++++++++++++++ ---------------------------------------
HappyDoc HappyDoc
~~~~~~~~ ~~~~~~~~
@ -217,7 +278,7 @@ https://svn.enthought.com/enthought/wiki/EndoHowTo
widely used it can not be ignored... widely used it can not be ignored...
Questions, remarks, etc. Questions, remarks, etc.
------------------------- ========================
For more information, questions, remarks, etc. see http://codespeak.net/py. For more information, questions, remarks, etc. see http://codespeak.net/py.
This website also contains links to mailing list and IRC channel. This website also contains links to mailing list and IRC channel.

View File

@ -1,116 +0,0 @@
Automatic API documentation generation
======================================
Motivation
----------
* We want better automatic hyperlinking within the documentation.
* Test driven documentation generation. Get more semantic info from the tests (return values, exceptions raised, etc). Also, expose tests in the documentation itself as usage examples.
* Should work with generated code, even without source files.
* We want to expose some workflow/context information around the API. What are the inputs, and where do they come from.
* Ease of use, setup, perhaps an option to py.test to generate the docs.
* The generator itself should be nicely unit tested small modules.
Related projects
----------------
* pydoc
* epydoc
* eric3-doc/eric3-api
* doxygen
* pydoctor
* PyUMLGraph (source page down? http://www.python.org/pypi/PyUMLGraph/0.1.4)
* ...
Proposed features
-----------------
* Minimal dependencies
* Easy to setup, easy to use.
* Frontend independent, with multiple frontends. HTML, HTML + javascript, pygame.
* Whenever possible the documentation is hyperlinked to relevant info.
* The documents contain links to the source code.
* Can look at running code and generate docs from it.
* Can generate some sort of static document output which could be easily distributed in a zip/tarball.
* Works with generated code, even code without source files.
* Collected data should be usable for other purposes, for example websites, IDE code completion, etc.
* By default, only document the exposed API, don't dig too deep inside the objects.
* Try to learn additional semantic information from the tests. What types are input and returned from functions? What order are functions usually called in? Tests can become usage examples in the documentation. There is a nice social aspect to getting data from the tests, because it encourages the developer to create more complete tests.
* Report code coverage in the tests? (does this really belong in py.test?)
* Possibly create some kind of diagrams/graphs.
* Possibly use the same introspection techniques to produce very rich state information after a crash.
Which things to implement first
-------------------------------
* Create simple ReST output, just documenting the API.
Implementation ideas
--------------------
* Somehow stop a running application and introspect it.
* Use a tracehook while running tests to collect information.
* Use pypy to solve all problems. ;)
Automatic API documentation generation - architecture overview
==============================================================
Abstraction layers:
-------------------
* ``extractor`` - library API extraction
* ``presenter`` - presentation backend (pygame, html, http + ajax)
* ``test runner`` - just `py.test`_ (modified or not)
* ``tracer`` - middleware responsible for generating necessary
annotations
* ``reporter`` - object API for presenter which gets info from tracer and runner
Extractor:
----------
Right now it's quite easy - we'll just take pylib ``__package__`` semantics,
which allows us not to try to guess what's API and what is internall. (I
believe that having explicit split is better than trying to guess it)
Presenter:
----------
XXX: Whatever fits here (use weird imagination)
Test runner:
------------
For test runner we'll use `py.test`_ with some hooks from ``tracer`` to
perform stuff we wan't him to do. Basically we'll run all of the tests
which covers some amount of API (specified explicitly) and gather as much
information as possible. Probably this will need some support for special
calls like ``py.test.fail`` or ``py.test.raises`` to support specific
situations explicitly.
Tracer:
-------
This is the most involved (and probably requiring most work) part of
application it should support as-much-operations-as-possible to make
information meaningfull. Mostly:
* Tracing the information of argument and result types from things
that are performed by API-defined functions (by debugger hook)
* Tracing variables that comes along to make crosslinking between API
possible.
* Various other (possibly endless) stuff that might be performed by
tracing flow graphs.
Reporter:
---------
Reporter is the crucial glue between components. It gathers all the
information actually coming from ``extractor`` and ``tracer`` about
gathered information and tries to present them in an object-oriented
way for any possible backend which might want to actually read it.
_`py.test` - http://codespeak.net/py/test/

View File

@ -1,40 +0,0 @@
Automatic API generation assumptions
====================================
This document tries to document what assumptions/needs are essential
for automatical generation of API based on unittests.
XXX This document overlaps a little bit with api-docs.txt, but still
a purpose is slightly different.
* We use debugger hook to trace certain function calls/returns. Basically
we install global tracing function at point where we call test and we
catch all the calls to functions which are needed by us.
* API is explicitely fixed (the formats of input should differ, ie. pylib
uses py.__pkg__, while pypy would have some explicit text files/source files
which define API of certain functionallity)
* Unittests used for doc generation are fixed per API (for pylib it's very
easy - we just specify directory, or collect all the data which py.test
gathers).
* We have to agree on some format for __doc__ specification. ReST seems
obvious for ReST backend, while different backends might reinterpret it
or even leave it as is.
* The events about seen stuff (like FunctionCall(args)) goes to flexible
enough object to support possibly any backend.
* We can use pypy annotation type system to track most-general type
of call, or better steal it as much as possible and define some
on our own. (Like usually it's better to track interface for some
object which was really used, than least basic superclass of arguments
that went inside).
Milestone 1:
------------
Date is set till sprint in Duesseldorf.

View File

@ -1,34 +0,0 @@
Automatic API documentation generation - first outcome
======================================================
After the first implementation of API generation tool,
I've realised that there are several shortcoming in current
attempt.
First of all, we need to define our type system. The one presented
in PyPy is just to weak to our purpose. This will not be very easy
issue anyway. Basic ideas are very much like the PyPy one, but
from the beggining we want to support all rich python typesystem, not
only the subset of it. So we need to provide informations which are
valuable for the end user (quite rich type system) and can always
work. I don't think that actuall tracking of all possible values of
objects makes sense. User might see them in call sites if he really wants
to.
Second thing is that we need some kind of structure (we do lack such
attempt now), which can group several classes/functions/objects into
a module (split by '.' or whatever).
Another thing is that we need to support any possible object which
are actually exported (well, we might assume that objects which
are exported are to some extend constants).
We need to track somehow several objects, which are not entirely
Python user-build objects. This means probably: builtin functions,
classes with builtin __init__, etc. etc.
And as well implement stuff like c_call, c_return and such.
I guess that keeping track of side effects might happen at some point
in the future, but it's not *now*.

View File

@ -1,44 +0,0 @@
=======================
Source viewer for pylib
=======================
Purpose:
--------
As usual, main driving force to develop sth new is lack of several
possibilities between existing solutions. Major lack of features are:
* Impossible to link to certain function like http://filename#function_name
* Impossible to properly link - most informations coming from AST
* We want this to nicely integrate with apigen - so crosslinking from
one to another makes sense (also backwards - like info for a function)
Idea:
-----
Basic idea is to take module as a py.path object, compile it (using compiler
module), than try to get some information and eventually import it and
get even some more information. Importing is optional and can be not performed
at all, but::
if 1:
def f(x):
pass
if 0:
def g(x):
pass
could be only parsed well in case of importing stuff. There are also plans for
integrating more features ie. caching it by code and attaching a name to a code
generated by some magic functions.
Status:
-------
Right now there is ready `server`_ and along with an `API viewer`_. Next step
is to improve a look & feel of API viewer and to link one to another.
.. _`server`: http://johnnydebris.net:8000/
.. _`API viewer`: http://johnnydebris.net/pyapi