[svn r38516] majorly refactor future chapter, mentioning
APIgen and other more current ideas --HG-- branch : trunk
This commit is contained in:
parent
790c9bbb88
commit
97aab00607
|
@ -9,321 +9,62 @@ This document tries to describe directions and guiding ideas
|
|||
for the near-future development of the py lib. *Note that all
|
||||
statements within this document - even if they sound factual -
|
||||
mostly just express thoughts and ideas. They not always refer to
|
||||
real code so read with some caution. This is not a reference guide
|
||||
(tm). Moreover, the order in which appear here in the file does
|
||||
not reflect the order in which they may be implemented.*
|
||||
real code so read with some caution.*
|
||||
|
||||
.. _`general-path`:
|
||||
.. _`a more general view on path objects`:
|
||||
|
||||
A more general view on ``py.path`` objects
|
||||
==========================================
|
||||
|
||||
Seen from a more general persective, the current ``py.path.extpy`` path
|
||||
offers a way to go from a file to the structured content of
|
||||
a file, namely a python object. The ``extpy`` path retains some
|
||||
common ``path`` operations and semantics but offers additional
|
||||
methods, e.g. ``resolve()`` gets you a true python object.
|
||||
|
||||
But apart from python files there are many other examples
|
||||
of structured content like xml documents or INI-style
|
||||
config files. While some tasks will only be convenient
|
||||
to perform in a domain specific manner (e.g. applying xslt
|
||||
etc.pp) ``py.path`` offers a common behaviour for
|
||||
structured content paths. So far only ``py.path.extpy``
|
||||
is implemented and used by py.test to address tests
|
||||
and traverse into test files.
|
||||
|
||||
*You are in a maze of twisty passages, all alike*
|
||||
-------------------------------------------------
|
||||
|
||||
Now, for the sake of finding out a good direction,
|
||||
let's consider some code that wants to find all
|
||||
*sections* which have a certain *option* value
|
||||
within some given ``startpath``::
|
||||
|
||||
def find_option(startpath, optionname):
|
||||
for section in startpath.listdir(dir=1):
|
||||
opt = section.join(optionname)
|
||||
if opt.check(): # does the option exist here?
|
||||
print section.basename, "found:", opt.read()
|
||||
|
||||
Now the point is that ``find_option()`` would obviously work
|
||||
when ``startpath`` is a filesystem-like path like a local
|
||||
filesystem path or a subversion URL path. It would then see
|
||||
directories as sections and files as option-names and the
|
||||
content of the file as values.
|
||||
|
||||
But it also works (today) for ``extpy`` paths if you put the following
|
||||
python code in a file::
|
||||
|
||||
class Section1:
|
||||
someoption = "i am an option value"
|
||||
|
||||
class Section2:
|
||||
someoption = "i am another option value"
|
||||
|
||||
An ``extpy()`` path maps classes and modules to directories and
|
||||
name-value bindings to file/read() operations.
|
||||
|
||||
And it could also work for 'xml' paths if you put
|
||||
the following xml string in a file::
|
||||
|
||||
<xml ...>
|
||||
<root>
|
||||
<section1>
|
||||
<someoption>value</name></section1>
|
||||
<section2>
|
||||
<someoption>value</name></section2></root>
|
||||
|
||||
where tags containing non-text tags map to directories
|
||||
and tags with just text-children map to files (which
|
||||
upon read() return the joined content of the text
|
||||
tags possibly as unicode.
|
||||
|
||||
Now, to complete the picture, we could make Config-Parser
|
||||
*ini-style* config files also available::
|
||||
|
||||
[section1]
|
||||
name = value
|
||||
|
||||
[section2]
|
||||
othername = value
|
||||
|
||||
where sections map to directories and name=value mappings
|
||||
to file/contents.
|
||||
|
||||
So it seems that our above ``find_option()`` function would
|
||||
work nicely on all these *mappings*.
|
||||
|
||||
Of course, the somewhat open question is how to make the
|
||||
transition from a filesystem path to structured content
|
||||
useful and unified, as much as possible without overdoing it.
|
||||
|
||||
Again, there are tasks that will need fully domain specific
|
||||
solutions (DOM/XSLT/...) but i think the above view warrants
|
||||
some experiments and refactoring. The degree of uniformity
|
||||
still needs to be determined and thought about.
|
||||
|
||||
path objects should be stackable
|
||||
--------------------------------
|
||||
|
||||
Oh, and btw, a ``py.path.extpy`` file could live on top of a
|
||||
'py.path.xml' path as well, i.e. take::
|
||||
|
||||
<xml ...>
|
||||
<code>
|
||||
<py>
|
||||
<magic>
|
||||
<assertion>
|
||||
import py
|
||||
... </assertion>
|
||||
<exprinfo>
|
||||
def getmsg(x): pass </exprino></magic></py></code>
|
||||
|
||||
and use it to have a ``extpy`` path living on it::
|
||||
|
||||
p = py.path.local(xmlfilename)
|
||||
xmlp = py.path.extxml(p, 'py/magic/exprinfo')
|
||||
p = py.path.extpy(xmlp, 'getmsg')
|
||||
|
||||
assert p.check(func=1, basename='getmsg')
|
||||
getmsg = p.resolve()
|
||||
# we now have a *live* getmsg() function taken and compiled from
|
||||
# the above xml fragment
|
||||
|
||||
There could be generic converters which convert between
|
||||
different content formats ... allowing configuration files to e.g.
|
||||
be in XML/Ini/python or filesystem-format with some common way
|
||||
to find and iterate values.
|
||||
|
||||
*After all the unix filesystem and the python namespaces are
|
||||
two honking great ideas, why not do more of them? :-)*
|
||||
|
||||
|
||||
.. _importexport:
|
||||
|
||||
Revising and improving the import/export system
|
||||
===============================================
|
||||
|
||||
or let's wrap the world all around
|
||||
|
||||
the export/import interface
|
||||
---------------------------
|
||||
|
||||
The py lib already incorporates a mechanism to select which
|
||||
namespaces and names get exposed to a user of the library.
|
||||
Apart from reducing the outside visible namespaces complexity
|
||||
this allows to quickly rename and refactor stuff in the
|
||||
implementation without affecting the caller side. This export
|
||||
control can be used by other python packages as well.
|
||||
|
||||
However, all is not fine as the import/export has a
|
||||
few major deficiencies and shortcomings:
|
||||
|
||||
- it doesn't allow to specify doc-strings
|
||||
- it is a bit hackish (see py/initpkg.py)
|
||||
- it doesn't present a complete and consistent view of the API.
|
||||
- ``help(constructed_namespace)`` doesn't work for the root
|
||||
package namespace
|
||||
- when the py lib implementation accesses parts of itself
|
||||
it uses the native python import mechanism which is
|
||||
limiting in some respects. Especially for distributed
|
||||
programs as encouraged by `py.execnet`_ it is not clear
|
||||
how the mechanism can nicely integrate to support remote
|
||||
lazy importing.
|
||||
|
||||
Discussions have been going on for a while but it is
|
||||
still not clear how to best tackle the problem. Personally,
|
||||
i believe the main missing thing for the first major release
|
||||
is the docstring one. The current specification
|
||||
of exported names is dictionary based. It would be
|
||||
better to declare it in terms of Objects.
|
||||
|
||||
|
||||
Example sketch for a new export specification
|
||||
---------------------------------------------
|
||||
|
||||
Here is a sketch of how the py libs ``__init__.py`` file
|
||||
might or should look like::
|
||||
|
||||
"""
|
||||
the py lib version 1.0
|
||||
http://codespeak.net/py/1.0
|
||||
"""
|
||||
|
||||
from py import pkg
|
||||
pkg.export(__name__,
|
||||
pkg.Module('path',
|
||||
'''provides path objects for local filesystem,
|
||||
subversion url and working copy, and extension paths.
|
||||
''',
|
||||
pkg.Class('local', '''
|
||||
the local filesystem path offering a single
|
||||
point of interaction for many purposes.
|
||||
''', extpy='./path/local.LocalPath'),
|
||||
|
||||
pkg.Class('svnurl', '''
|
||||
the subversion url path.
|
||||
''', extpy='./path/local/svn/urlcommand.SvnUrlPath'),
|
||||
),
|
||||
# it goes on ...
|
||||
)
|
||||
|
||||
The current ``initpkg.py`` code can be cleaned up to support
|
||||
this new more explicit style of stating things. Note that
|
||||
in principle there is nothing that stops us from retrieving
|
||||
implementations over the network, e.g. a subversion repository.
|
||||
|
||||
|
||||
Let there be alternatives
|
||||
-------------------------
|
||||
|
||||
We could also specify alternative implementations easily::
|
||||
|
||||
pkg.Class('svnwc', '''
|
||||
the subversion working copy.
|
||||
''', extpy=('./path/local/svn/urlbinding.SvnUrlPath',
|
||||
'./path/local/svn/urlcommand.SvnUrlPath',)
|
||||
)
|
||||
|
||||
This would prefer the python binding based implementation over
|
||||
the one working through he 'svn' command line utility. And
|
||||
of course, it could uniformly signal if no implementation is
|
||||
available at all.
|
||||
|
||||
|
||||
Problems problems
|
||||
-----------------
|
||||
|
||||
Now there are reasons there isn't a clear conclusion so far.
|
||||
For example, the above approach has some implications, the
|
||||
main one being that implementation classes like
|
||||
``py/path/local.LocalPath`` are visible to the caller side but
|
||||
this presents an inconsistency because the user started out with
|
||||
``py.path.local`` and expects that the two classes are really much
|
||||
the same. We have the same problem today, of course.
|
||||
|
||||
The naive solution strategy of wrapping the "implementation
|
||||
level" objects into their exported representations may remind
|
||||
of the `wrapping techniques PyPy uses`_. But it
|
||||
*may* result in a slightly heavyweight mechanism that affects
|
||||
runtime speed. However, I guess that this standard strategy
|
||||
is probably the cleanest.
|
||||
|
||||
|
||||
Every problem can be solved with another level ...
|
||||
--------------------------------------------------
|
||||
|
||||
The wrapping of implementation level classes in their export
|
||||
representations objects adds another level of indirection.
|
||||
But this indirection would have interesting advantages:
|
||||
|
||||
- we could easily present a consistent view of the library
|
||||
- it could take care of exceptions as well
|
||||
- it provides natural interception points for logging
|
||||
- it enables remote lazy loading of implementations
|
||||
or certain versions of interfaces
|
||||
|
||||
And quite likely the extra indirection wouldn't hurt so much
|
||||
as it is not much more than a function call and we cared
|
||||
we could even generate some c-code (with PyPy :-) to speed
|
||||
it up.
|
||||
|
||||
But it can lead to new problems ...
|
||||
-----------------------------------
|
||||
|
||||
However, it is critical to avoid to burden the implementation
|
||||
code of being aware of its wrapping. This is what we have
|
||||
to do in PyPy but the import/export mechanism works at
|
||||
a higher level of the language, i think.
|
||||
|
||||
Oh, and we didn't talk about bootstrapping :-)
|
||||
|
||||
.. _`py.execnet`: ../execnet.html
|
||||
.. _`wrapping techniques PyPy uses`: http://codespeak.net/pypy/index.cgi?doc/wrapping.html
|
||||
.. _`lightweight xml generation`:
|
||||
|
||||
Extension of py.path.local.sysexec()
|
||||
====================================
|
||||
|
||||
The `sysexec mechanism`_ allows to directly execute
|
||||
binaries on your system. Especially after we'll have this
|
||||
nicely integrated into Win32 we may also want to run python
|
||||
scripts both locally and from the net::
|
||||
|
||||
vadm = py.path.svnurl('http://codespeak.net/svn/vadm/dist/vadm/cmdline.py')
|
||||
stdoutput = vadm.execute('diff')
|
||||
|
||||
To be able to execute this code fragement, we need either or all of
|
||||
|
||||
- an improved import system that allows remote imports
|
||||
|
||||
- a way to specify what the "neccessary" python import
|
||||
directories are. for example, the above scriptlet will
|
||||
require a certain root included in the python search for module
|
||||
in order to execute something like "import vadm".
|
||||
|
||||
- a way to specify dependencies ... which opens up another
|
||||
interesting can of worms, suitable for another chapter
|
||||
in the neverending `future book`_.
|
||||
|
||||
.. _`sysexec mechanism`: ../misc.html#sysexec
|
||||
.. _`compile-on-the-fly`:
|
||||
|
||||
we need a persistent storage for the py lib
|
||||
-------------------------------------------
|
||||
|
||||
A somewhat open question is where to store the underlying
|
||||
generated pyc-files and other files generated on the fly
|
||||
with `CPython's distutils`_. We want to have a
|
||||
*persistent location* in order to avoid runtime-penalties
|
||||
when switching python versions and platforms (think NFS).
|
||||
|
||||
A *persistent location* for the py lib would be a good idea
|
||||
maybe also for other reasons. We could cache some of the
|
||||
expensive test setups, like the multi-revision subversion
|
||||
repository that is created for each run of the tests.
|
||||
Distribute tests ad-hoc across multiple platforms
|
||||
======================================================
|
||||
|
||||
After some more refactoring and unification of
|
||||
the current testing and distribution support code
|
||||
we'd like to be able to run tests on multiple
|
||||
platforms simultanously and allow for interaction
|
||||
and introspection into the (remote) failures.
|
||||
|
||||
|
||||
Make APIGEN useful for more projects
|
||||
================================================
|
||||
|
||||
The new APIGEN tool offers rich information
|
||||
derived from running tests against an application:
|
||||
argument types and callsites, i.e. it shows
|
||||
the places where a particular API is used.
|
||||
In its first incarnation, there are still
|
||||
some specialties that likely prevent it
|
||||
from documenting APIs for other projects.
|
||||
We'd like to evolve to a `py.apigen` tool
|
||||
that can make use of information provided
|
||||
by a py.test run.
|
||||
|
||||
Distribute channels/programs across networks
|
||||
================================================
|
||||
|
||||
Apart from stabilizing setup/teardown procedures
|
||||
for `py.execnet`_, we'd like to generalize its
|
||||
implementation to allow connecting two programs
|
||||
across multiple hosts, i.e. we'd like to arbitrarily
|
||||
send "channels" across the network. Likely this
|
||||
will be done by using the "pipe" model, i.e.
|
||||
that each channel is actually a pair of endpoints,
|
||||
both of which can be independently transported
|
||||
across the network. The programs who "own"
|
||||
these endpoints remain connected.
|
||||
|
||||
.. _`py.execnet`: ../execnet.html
|
||||
|
||||
Benchmarking and persistent storage
|
||||
=========================================
|
||||
|
||||
For storing test results, but also benchmarking
|
||||
and other information, we need a solid way
|
||||
to store all kinds of information from test runs.
|
||||
We'd like to generate statistics or html-overview
|
||||
out of it, but also use such information to determine when
|
||||
a certain test broke, or when its performance
|
||||
decreased considerably.
|
||||
|
||||
.. _`CPython's distutils`: http://www.python.org/dev/doc/devel/lib/module-distutils.html
|
||||
|
||||
|
@ -364,59 +105,12 @@ is a can of subsequent worms).
|
|||
.. _`reiserfs v4 features`: http://www.namesys.com/v4/v4.html
|
||||
|
||||
|
||||
Improve and unify Path API
|
||||
==========================
|
||||
|
||||
visit() grows depth control
|
||||
---------------------------
|
||||
Consider more features
|
||||
==================================
|
||||
|
||||
Add a ``maxdepth`` argument to the path.visit() method,
|
||||
which will limit traversal to subdirectories. Example::
|
||||
|
||||
x = py.path.local.get_tmproot()
|
||||
for x in p.visit('bin', stop=N):
|
||||
...
|
||||
|
||||
This will yield all file or directory paths whose basename
|
||||
is 'bin', depending on the values of ``stop``::
|
||||
|
||||
p # stop == 0 or higher (and p.basename == 'bin')
|
||||
p / bin # stop == 1 or higher
|
||||
p / ... / bin # stop == 2 or higher
|
||||
p / ... / ... / bin # stop == 3 or higher
|
||||
|
||||
The default for stop would be `255`.
|
||||
|
||||
But what if `stop < 0`? We could let that mean to go upwards::
|
||||
|
||||
for x in x.visit('py/bin', stop=-255):
|
||||
# will yield all parent direcotires which have a
|
||||
# py/bin subpath
|
||||
|
||||
visit() returning a lazy list?
|
||||
------------------------------
|
||||
|
||||
There is a very nice "no-API" `lazy list`_ implementation from
|
||||
Armin Rigo which presents a complete list interface, given some
|
||||
iterable. The iterable is consumed only on demand and retains
|
||||
memory efficiency as much as possible. The lazy list
|
||||
provides a number of advantages in addition to the fact that
|
||||
a list interface is nicer to deal with than an iterator.
|
||||
For example it lets you do::
|
||||
|
||||
for x in p1.visit('*.cfg') + p2.visit('*.cfg'):
|
||||
# will iterate through all results
|
||||
|
||||
Here the for-iter expression will retain all lazyness (with
|
||||
the result of adding lazy lists being another another lazy
|
||||
list) by internally concatenating the underlying
|
||||
lazylists/iterators. Moreover, the lazylist implementation
|
||||
will know that there are no references left to the lazy list
|
||||
and throw away iterated elements. This makes the iteration
|
||||
over the sum of the two visit()s as efficient as if we had
|
||||
used iterables to begin with!
|
||||
|
||||
For this, we would like to move the lazy list into the
|
||||
py lib's namespace, most probably at `py.builtin.lazylist`.
|
||||
There are many more features and useful classes
|
||||
that might be nice to integrate. For example, we might put
|
||||
Armin's `lazy list`_ implementation into the py lib.
|
||||
|
||||
.. _`lazy list`: http://codespeak.net/svn/user/arigo/hack/misc/collect.py
|
||||
|
|
Loading…
Reference in New Issue