424 lines
16 KiB
Plaintext
424 lines
16 KiB
Plaintext
|
=======================================================
|
||
|
Visions and ideas for further development of the py lib
|
||
|
=======================================================
|
||
|
|
||
|
.. contents::
|
||
|
.. sectnum::
|
||
|
|
||
|
This document tries to describe directions and guiding ideas
|
||
|
for the near-future development of the py lib. *Note that all
|
||
|
statements within this document - even if they sound factual -
|
||
|
mostly just express thoughts and ideas. They not always refer to
|
||
|
real code so read with some caution. This is not a reference guide
|
||
|
(tm). Moreover, the order in which appear here in the file does
|
||
|
not reflect the order in which they may be implemented.*
|
||
|
|
||
|
.. _`general-path`:
|
||
|
.. _`a more general view on path objects`:
|
||
|
|
||
|
A more general view on ``py.path`` objects
|
||
|
==========================================
|
||
|
|
||
|
Seen from a more general persective, the current ``py.path.extpy`` path
|
||
|
offers a way to go from a file to the structured content of
|
||
|
a file, namely a python object. The ``extpy`` path retains some
|
||
|
common ``path`` operations and semantics but offers additional
|
||
|
methods, e.g. ``resolve()`` gets you a true python object.
|
||
|
|
||
|
But apart from python files there are many other examples
|
||
|
of structured content like xml documents or INI-style
|
||
|
config files. While some tasks will only be convenient
|
||
|
to perform in a domain specific manner (e.g. applying xslt
|
||
|
etc.pp) ``py.path`` offers a common behaviour for
|
||
|
structured content paths. So far only ``py.path.extpy``
|
||
|
is implemented and used by py.test to address tests
|
||
|
and traverse into test files.
|
||
|
|
||
|
*You are in a maze of twisty passages, all alike*
|
||
|
-------------------------------------------------
|
||
|
|
||
|
Now, for the sake of finding out a good direction,
|
||
|
let's consider some code that wants to find all
|
||
|
*sections* which have a certain *option* value
|
||
|
within some given ``startpath``::
|
||
|
|
||
|
def find_option(startpath, optionname):
|
||
|
for section in startpath.listdir(dir=1):
|
||
|
opt = section.join(optionname)
|
||
|
if opt.check(): # does the option exist here?
|
||
|
print section.basename, "found:", opt.read()
|
||
|
|
||
|
Now the point is that ``find_option()`` would obviously work
|
||
|
when ``startpath`` is a filesystem-like path like a local
|
||
|
filesystem path or a subversion URL path. It would then see
|
||
|
directories as sections and files as option-names and the
|
||
|
content of the file as values.
|
||
|
|
||
|
But it also works (today) for ``extpy`` paths if you put the following
|
||
|
python code in a file::
|
||
|
|
||
|
class Section1:
|
||
|
someoption = "i am an option value"
|
||
|
|
||
|
class Section2:
|
||
|
someoption = "i am another option value"
|
||
|
|
||
|
An ``extpy()`` path maps classes and modules to directories and
|
||
|
name-value bindings to file/read() operations.
|
||
|
|
||
|
And it could also work for 'xml' paths if you put
|
||
|
the following xml string in a file::
|
||
|
|
||
|
<xml ...>
|
||
|
<root>
|
||
|
<section1>
|
||
|
<someoption>value</name></section1>
|
||
|
<section2>
|
||
|
<someoption>value</name></section2></root>
|
||
|
|
||
|
where tags containing non-text tags map to directories
|
||
|
and tags with just text-children map to files (which
|
||
|
upon read() return the joined content of the text
|
||
|
tags possibly as unicode.
|
||
|
|
||
|
Now, to complete the picture, we could make Config-Parser
|
||
|
*ini-style* config files also available::
|
||
|
|
||
|
[section1]
|
||
|
name = value
|
||
|
|
||
|
[section2]
|
||
|
othername = value
|
||
|
|
||
|
where sections map to directories and name=value mappings
|
||
|
to file/contents.
|
||
|
|
||
|
So it seems that our above ``find_option()`` function would
|
||
|
work nicely on all these *mappings*.
|
||
|
|
||
|
Of course, the somewhat open question is how to make the
|
||
|
transition from a filesystem path to structured content
|
||
|
useful and unified, as much as possible without overdoing it.
|
||
|
|
||
|
Again, there are tasks that will need fully domain specific
|
||
|
solutions (DOM/XSLT/...) but i think the above view warrants
|
||
|
some experiments and refactoring. The degree of uniformity
|
||
|
still needs to be determined and thought about.
|
||
|
|
||
|
path objects should be stackable
|
||
|
--------------------------------
|
||
|
|
||
|
Oh, and btw, a ``py.path.extpy`` file could live on top of a
|
||
|
'py.path.xml' path as well, i.e. take::
|
||
|
|
||
|
<xml ...>
|
||
|
<code>
|
||
|
<py>
|
||
|
<magic>
|
||
|
<assertion>
|
||
|
import py
|
||
|
... </assertion>
|
||
|
<exprinfo>
|
||
|
def getmsg(x): pass </exprino></magic></py></code>
|
||
|
|
||
|
and use it to have a ``extpy`` path living on it::
|
||
|
|
||
|
p = py.path.local(xmlfilename)
|
||
|
xmlp = py.path.extxml(p, 'py/magic/exprinfo')
|
||
|
p = py.path.extpy(xmlp, 'getmsg')
|
||
|
|
||
|
assert p.check(func=1, basename='getmsg')
|
||
|
getmsg = p.resolve()
|
||
|
# we now have a *live* getmsg() function taken and compiled from
|
||
|
# the above xml fragment
|
||
|
|
||
|
There could be generic converters which convert between
|
||
|
different content formats ... allowing configuration files to e.g.
|
||
|
be in XML/Ini/python or filesystem-format with some common way
|
||
|
to find and iterate values.
|
||
|
|
||
|
*After all the unix filesystem and the python namespaces are
|
||
|
two honking great ideas, why not do more of them? :-)*
|
||
|
|
||
|
|
||
|
.. _importexport:
|
||
|
|
||
|
Revising and improving the import/export system
|
||
|
===============================================
|
||
|
|
||
|
or let's wrap the world all around
|
||
|
|
||
|
the export/import interface
|
||
|
---------------------------
|
||
|
|
||
|
The py lib already incorporates a mechanism to select which
|
||
|
namespaces and names get exposed to a user of the library.
|
||
|
Apart from reducing the outside visible namespaces complexity
|
||
|
this allows to quickly rename and refactor stuff in the
|
||
|
implementation without affecting the caller side. This export
|
||
|
control can be used by other python packages as well.
|
||
|
|
||
|
However, all is not fine as the import/export has a
|
||
|
few major deficiencies and shortcomings:
|
||
|
|
||
|
- it doesn't allow to specify doc-strings
|
||
|
- it is a bit hackish (see py/initpkg.py)
|
||
|
- it doesn't present a complete and consistent view of the API.
|
||
|
- ``help(constructed_namespace)`` doesn't work for the root
|
||
|
package namespace
|
||
|
- when the py lib implementation accesses parts of itself
|
||
|
it uses the native python import mechanism which is
|
||
|
limiting in some respects. Especially for distributed
|
||
|
programs as encouraged by `py.execnet`_ it is not clear
|
||
|
how the mechanism can nicely integrate to support remote
|
||
|
lazy importing.
|
||
|
|
||
|
Discussions have been going on for a while but it is
|
||
|
still not clear how to best tackle the problem. Personally,
|
||
|
i believe the main missing thing for the first major release
|
||
|
is the docstring one. The current specification
|
||
|
of exported names is dictionary based. It would be
|
||
|
better to declare it in terms of Objects.
|
||
|
|
||
|
|
||
|
Example sketch for a new export specification
|
||
|
---------------------------------------------
|
||
|
|
||
|
Here is a sketch of how the py libs ``__init__.py`` file
|
||
|
might or should look like::
|
||
|
|
||
|
"""
|
||
|
the py lib version 0.8
|
||
|
http://codespeak.net/py/0.8
|
||
|
"""
|
||
|
|
||
|
from py import pkg
|
||
|
pkg.export(__name__,
|
||
|
pkg.Module('path',
|
||
|
'''provides path objects for local filesystem,
|
||
|
subversion url and working copy, and extension paths.
|
||
|
''',
|
||
|
pkg.Class('local', '''
|
||
|
the local filesystem path offering a single
|
||
|
point of interaction for many purposes.
|
||
|
''', extpy='./path/local.LocalPath'),
|
||
|
|
||
|
pkg.Class('svnurl', '''
|
||
|
the subversion url path.
|
||
|
''', extpy='./path/local/svn/urlcommand.SvnUrlPath'),
|
||
|
),
|
||
|
# it goes on ...
|
||
|
)
|
||
|
|
||
|
The current ``initpkg.py`` code can be cleaned up to support
|
||
|
this new more explicit style of stating things. Note that
|
||
|
in principle there is nothing that stops us from retrieving
|
||
|
implementations over the network, e.g. a subversion repository.
|
||
|
|
||
|
|
||
|
Let there be alternatives
|
||
|
-------------------------
|
||
|
|
||
|
We could also specify alternative implementations easily::
|
||
|
|
||
|
pkg.Class('svnwc', '''
|
||
|
the subversion working copy.
|
||
|
''', extpy=('./path/local/svn/urlbinding.SvnUrlPath',
|
||
|
'./path/local/svn/urlcommand.SvnUrlPath',)
|
||
|
)
|
||
|
|
||
|
This would prefer the python binding based implementation over
|
||
|
the one working through he 'svn' command line utility. And
|
||
|
of course, it could uniformly signal if no implementation is
|
||
|
available at all.
|
||
|
|
||
|
|
||
|
Problems problems
|
||
|
-----------------
|
||
|
|
||
|
Now there are reasons there isn't a clear conclusion so far.
|
||
|
For example, the above approach has some implications, the
|
||
|
main one being that implementation classes like
|
||
|
``py/path/local.LocalPath`` are visible to the caller side but
|
||
|
this presents an inconsistency because the user started out with
|
||
|
``py.path.local`` and expects that the two classes are really much
|
||
|
the same. We have the same problem today, of course.
|
||
|
|
||
|
The naive solution strategy of wrapping the "implementation
|
||
|
level" objects into their exported representations may remind
|
||
|
of the `wrapping techniques PyPy uses`_. But it
|
||
|
*may* result in a slightly heavyweight mechanism that affects
|
||
|
runtime speed. However, I guess that this standard strategy
|
||
|
is probably the cleanest.
|
||
|
|
||
|
|
||
|
Every problem can be solved with another level ...
|
||
|
--------------------------------------------------
|
||
|
|
||
|
The wrapping of implementation level classes in their export
|
||
|
representations objects adds another level of indirection.
|
||
|
But this indirection would have interesting advantages:
|
||
|
|
||
|
- we could easily present a consistent view of the library
|
||
|
- it could take care of exceptions as well
|
||
|
- it provides natural interception points for logging
|
||
|
- it enables remote lazy loading of implementations
|
||
|
or certain versions of interfaces
|
||
|
|
||
|
And quite likely the extra indirection wouldn't hurt so much
|
||
|
as it is not much more than a function call and we cared
|
||
|
we could even generate some c-code (with PyPy :-) to speed
|
||
|
it up.
|
||
|
|
||
|
But it can lead to new problems ...
|
||
|
-----------------------------------
|
||
|
|
||
|
However, it is critical to avoid to burden the implementation
|
||
|
code of being aware of its wrapping. This is what we have
|
||
|
to do in PyPy but the import/export mechanism works at
|
||
|
a higher level of the language, i think.
|
||
|
|
||
|
Oh, and we didn't talk about bootstrapping :-)
|
||
|
|
||
|
.. _`py.execnet`: ../execnet.html
|
||
|
.. _`wrapping techniques PyPy uses`: http://codespeak.net/pypy/index.cgi?doc/wrapping.html
|
||
|
.. _`lightweight xml generation`:
|
||
|
|
||
|
Extension of py.path.local.sysexec()
|
||
|
====================================
|
||
|
|
||
|
The `sysexec mechanism`_ allows to directly execute
|
||
|
binaries on your system. Especially after we'll have this
|
||
|
nicely integrated into Win32 we may also want to run python
|
||
|
scripts both locally and from the net::
|
||
|
|
||
|
vadm = py.path.svnurl('http://codespeak.net/svn/vadm/dist/vadm/cmdline.py')
|
||
|
stdoutput = vadm.execute('diff')
|
||
|
|
||
|
To be able to execute this code fragement, we need either or all of
|
||
|
|
||
|
- an improved import system that allows remote imports
|
||
|
|
||
|
- a way to specify what the "neccessary" python import
|
||
|
directories are. for example, the above scriptlet will
|
||
|
require a certain root included in the python search for module
|
||
|
in order to execute something like "import vadm".
|
||
|
|
||
|
- a way to specify dependencies ... which opens up another
|
||
|
interesting can of worms, suitable for another chapter
|
||
|
in the neverending `future book`_.
|
||
|
|
||
|
.. _`sysexec mechanism`: ../misc.html#sysexec
|
||
|
.. _`compile-on-the-fly`:
|
||
|
|
||
|
we need a persistent storage for the py lib
|
||
|
-------------------------------------------
|
||
|
|
||
|
A somewhat open question is where to store the underlying
|
||
|
generated pyc-files and other files generated on the fly
|
||
|
with `CPython's distutils`_. We want to have a
|
||
|
*persistent location* in order to avoid runtime-penalties
|
||
|
when switching python versions and platforms (think NFS).
|
||
|
|
||
|
A *persistent location* for the py lib would be a good idea
|
||
|
maybe also for other reasons. We could cache some of the
|
||
|
expensive test setups, like the multi-revision subversion
|
||
|
repository that is created for each run of the tests.
|
||
|
|
||
|
.. _`CPython's distutils`: http://www.python.org/dev/doc/devel/lib/module-distutils.html
|
||
|
|
||
|
.. _`getting started`: ../getting-started.html
|
||
|
.. _`restructured text`: http://docutils.sourceforge.net/docs/user/rst/quickref.html
|
||
|
.. _`python standard library`: http://www.python.org/doc/2.3.4/lib/lib.html
|
||
|
.. _`xpython EuroPython 2004 talk`: http://codespeak.net/svn/user/hpk/talks/xpython-talk.txt
|
||
|
.. _`under the xpy tree`: http://codespeak.net/svn/user/hpk/xpy/xml.py
|
||
|
.. _`future book`: future.html
|
||
|
.. _`PEP-324 subprocess module`: http://www.python.org/peps/pep-0324.html
|
||
|
.. _`subprocess implementation`: http://www.lysator.liu.se/~astrand/popen5/
|
||
|
.. _`py.test`: ../test.html
|
||
|
|
||
|
Refactor path implementations to use a Filesystem Abstraction
|
||
|
=============================================================
|
||
|
|
||
|
It seems like a good idea to refactor all python implementations to
|
||
|
use an internal Filesystem abstraction. The current code base
|
||
|
would be transformed to have Filesystem implementations for e.g.
|
||
|
local, subversion and subversion "working copy" filesystems. Today
|
||
|
the according code is scattered through path-handling code.
|
||
|
|
||
|
On a related note, Armin Rigo has hacked `pylufs`_ which allows to
|
||
|
implement kernel-level linux filesystems with pure python. Now
|
||
|
the idea is that the mentioned filesystem implementations would
|
||
|
be directly usable for such linux-filesystem glue code.
|
||
|
|
||
|
In other words, implementing a `memoryfs`_ or a `dictfs`_ would
|
||
|
give you two things for free: a filesystem mountable at kernel level
|
||
|
as well as a uniform "path" object allowing you to access your
|
||
|
filesystem in convenient ways. (At some point it might
|
||
|
even become interesting to think about interfacing to
|
||
|
`reiserfs v4 features`_ at the Filesystem level but that
|
||
|
is a can of subsequent worms).
|
||
|
|
||
|
.. _`memoryfs`: http://codespeak.net/svn/user/arigo/hack/pyfuse/memoryfs.py
|
||
|
.. _`dictfs`: http://codespeak.net/pipermail/py-dev/2005-January/000191.html
|
||
|
.. _`pylufs`: http://codespeak.net/svn/user/arigo/hack/pylufs/
|
||
|
.. _`reiserfs v4 features`: http://www.namesys.com/v4/v4.html
|
||
|
|
||
|
|
||
|
Improve and unify Path API
|
||
|
==========================
|
||
|
|
||
|
visit() grows depth control
|
||
|
---------------------------
|
||
|
|
||
|
Add a ``maxdepth`` argument to the path.visit() method,
|
||
|
which will limit traversal to subdirectories. Example::
|
||
|
|
||
|
x = py.path.local.get_tmproot()
|
||
|
for x in p.visit('bin', stop=N):
|
||
|
...
|
||
|
|
||
|
This will yield all file or directory paths whose basename
|
||
|
is 'bin', depending on the values of ``stop``::
|
||
|
|
||
|
p # stop == 0 or higher (and p.basename == 'bin')
|
||
|
p / bin # stop == 1 or higher
|
||
|
p / ... / bin # stop == 2 or higher
|
||
|
p / ... / ... / bin # stop == 3 or higher
|
||
|
|
||
|
The default for stop would be `255`.
|
||
|
|
||
|
But what if `stop < 0`? We could let that mean to go upwards::
|
||
|
|
||
|
for x in x.visit('py/bin', stop=-255):
|
||
|
# will yield all parent direcotires which have a
|
||
|
# py/bin subpath
|
||
|
|
||
|
visit() returning a lazy list?
|
||
|
------------------------------
|
||
|
|
||
|
There is a very nice "no-API" `lazy list`_ implementation from
|
||
|
Armin Rigo which presents a complete list interface, given some
|
||
|
iterable. The iterable is consumed only on demand and retains
|
||
|
memory efficiency as much as possible. The lazy list
|
||
|
provides a number of advantages in addition to the fact that
|
||
|
a list interface is nicer to deal with than an iterator.
|
||
|
For example it lets you do::
|
||
|
|
||
|
for x in p1.visit('*.cfg') + p2.visit('*.cfg'):
|
||
|
# will iterate through all results
|
||
|
|
||
|
Here the for-iter expression will retain all lazyness (with
|
||
|
the result of adding lazy lists being another another lazy
|
||
|
list) by internally concatenating the underlying
|
||
|
lazylists/iterators. Moreover, the lazylist implementation
|
||
|
will know that there are no references left to the lazy list
|
||
|
and throw away iterated elements. This makes the iteration
|
||
|
over the sum of the two visit()s as efficient as if we had
|
||
|
used iterables to begin with!
|
||
|
|
||
|
For this, we would like to move the lazy list into the
|
||
|
py lib's namespace, most probably at `py.builtin.lazylist`.
|
||
|
|
||
|
.. _`lazy list`: http://codespeak.net/svn/user/arigo/hack/misc/collect.py
|