PyTables implements several classes to represent the different
nodes in the object tree. They are named File
,
Group
, Leaf
,
Table
, Array
,
CArray
, EArray
,
VLArray
and UnImplemented
. Another
one allows the user to complement the information on these different
objects; its name is AttributeSet
. Finally, another
important class called IsDescription
allows to build
a Table
record description by declaring a subclass of
it. Many other classes are defined in PyTables, but they can be regarded
as helpers whose goal is mainly to declare the data type
properties of the different first class objects and will be
described at the end of this chapter as well.
An important function, called openFile
is
responsible to create, open or append to files. In addition, a few
utility functions are defined to guess if the user supplied file is a
PyTables or HDF5 file. These
are called isPyTablesFile()
and
isHDF5File()
, respectively. There exists also a
function called whichLibVersion()
that informs about
the versions of the underlying C libraries (for example, HDF5 or
Zlib
) and another called
print_versions()
that prints all the versions of the
software that PyTables relies on. Finally, test()
lets you run the complete test suite from a Python console
interactively.
Let's start discussing the first-level variables and functions available to the user, then the different classes defined in PyTables.
The PyTables version number.
The underlying HDF5 library version number.
True for PyTables Professional edition, false otherwise.
An easy way of copying one PyTables file to another.
This function allows you to copy an existing PyTables file
named srcfilename
to another file called
dstfilename
. The source file must exist and be
readable. The destination file can be overwritten in place if
existing by asserting the overwrite
argument.
This function is a shorthand for the
File.copyFile()
method, which acts on an
already opened file. kwargs
takes keyword
arguments used to customize the copying process. See the
documentation of File.copyFile()
(see description) for a description of those
arguments.
Determine whether a file is in the HDF5 format.
When successful, it returns a true value if the file is an
HDF5 file, false otherwise. If there were problems identifying the
file, an HDF5ExtError
is raised.
Determine whether a file is in the PyTables format.
When successful, it returns a true value if the file is a
PyTables file, false otherwise. The true value is the format
version string of the file. If there were problems identifying the
file, an HDF5ExtError
is raised.
Iterate over long ranges.
This is similar to xrange()
, but it
allows 64-bit arguments on all platforms. The results of the
iteration are sequentially yielded in the form of
numpy.int64
values, but getting random
individual items is not supported.
Because of the Python 32-bit limitation on object lengths,
the length
attribute (which is also a
numpy.int64
value) should be used instead of
the len()
syntax.
Default start
and step
arguments are supported in the same way as in
xrange()
. When the standard
[x]range()
Python objects support 64-bit
arguments, this iterator will be deprecated.
Open a PyTables (or generic HDF5) file and return a
File
object.
Arguments:
The name of the file (supports environment variable
expansion). It is suggested that file names have any of the
.h5
, .hdf
or
.hdf5
extensions, although this is not
mandatory.
The mode in whichto open the file. It can be one of the following:
Read-only; no data can be modified.
Write; a new file is created (an existing file with the same name would be deleted).
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
It is similar to 'a'
, but the
file must already exist.
If the file is to be created, a
TITLE
string attribute will be set on the
root group with the given value. Otherwise, the title will
be read from disk, and this will not have any effect.
A dictionary to map names in the object tree into different HDF5 names in file. The keys are the Python names, while the values are the HDF5 names. This is useful when you need to name HDF5 nodes with invalid or reserved words in Python and you want to continue using the natural naming facility on the nodes.
The root User Entry Point. This is a group in the HDF5
hierarchy which will be taken as the starting point to
create the object tree. It can be whatever existing group in
the file, named by its HDF5 path. If it does not exist, an
HDF5ExtError
is issued. Use this if you
do not want to build the entire object
tree, but rather only a subtree of
it.
An instance of the Filters
(see
Section 4.14.1) class that provides
information about the desired I/O filters applicable to the
leaves that hang directly from the root
group, unless other filter properties are
specified for these leaves. Besides, if you do not specify
filter properties for child groups, they will inherit these
ones, which will in turn propagate to child nodes.
The number of unreferenced nodes to be kept in memory. Least recently used nodes are unloaded from memory when this number of loaded nodes is reached. To load a node again, simply access it as usual. Nodes referenced by user variables are not taken into account nor unloaded.
Disable all flavors except those in
keep
.
Providing an empty keep
sequence implies
disabling all flavors (but the internal one). If the sequence is
not specified, only optional flavors are disabled.
![]() | Important |
---|---|
Once you disable a flavor, it can not be enabled again. |
Split a PyTables type
into a PyTables
kind and an item size.
Returns a tuple of (kind, itemsize)
. If
no item size is present in the type
(in the
form of a precision), the returned item size is
None
.
>>> split_type('int32') ('int', 4) >>> split_type('string') ('string', None) >>> split_type('int20') Traceback (most recent call last): ... ValueError: precision must be a multiple of 8: 20 >>> split_type('foo bar') Traceback (most recent call last): ... ValueError: malformed type: 'foo bar'
Run all the tests in the test suite.
If verbose
is set, the test suite will
emit messages with full verbosity (not recommended unless you are
looking into a certain problem).
If heavy
is set, the test suite will be
run in heavy mode (you should be careful with
this because it can take a lot of time and resources from your
computer).
Get version information about a C library.
If the library indicated by name
is
available, this function returns a 3-tuple containing the major
library version as an integer, its full version as a string, and
the version date as a string. If the library is not available,
None
is returned.
The currently supported library names are
hdf5
, zlib
,
lzo
and bzip2
. If another
name is given, a ValueError
is raised.
In-memory representation of a PyTables file.
An instance of this class is returned when a PyTables file is
opened with the openFile()
(see description) function. It offers methods to manipulate
(create, rename, delete...) nodes and handle their attributes, as well
as methods to traverse the object tree. The user entry
point to the object tree attached to the HDF5 file is
represented in the rootUEP
attribute. Other
attributes are available.
File
objects support an Undo/Redo
mechanism which can be enabled with the
enableUndo()
(see description) method. Once the Undo/Redo mechanism is
enabled, explicit marks (with an optional unique
name) can be set on the state of the database using the
mark()
(see description)
method. There are two implicit marks which are always available: the
initial mark (0) and the final mark (-1). Both the identifier of a
mark and its name can be used in undo and
redo operations.
Hierarchy manipulation operations (node creation, movement and
removal) and attribute handling operations (setting and deleting) made
after a mark can be undone by using the undo()
(see
description) method, which returns the database to the
state of a past mark. If undo()
is not followed by
operations that modify the hierarchy or attributes, the
redo()
(see description) method can
be used to return the database to the state of a future mark. Else,
future states of the database are forgotten.
Note that data handling operations can not be undone nor redone
by now. Also, hierarchy manipulation operations on nodes that do not
support the Undo/Redo mechanism issue an
UndoRedoWarning
before
changing the database.
The Undo/Redo mechanism is persistent between sessions and can
only be disabled by calling the disableUndo()
(see
description) method.
File objects can also act as context managers when using the
with
statement introduced in Python 2.5. When
exiting a context, the file is automatically closed.
The name of the opened file.
The PyTables version number of this file.
True if the underlying file is open, false otherwise.
The mode in which the file was opened.
The title of the root group in the file.
A dictionary that maps node names between PyTables and
HDF5 domain names. Its initial values are set from the
trMap
parameter passed to the
openFile()
(see description) function. You cannot change its
contents after a file is opened.
The UEP (user entry point) group name in the file (see
the openFile()
function in description).
Default filter properties for the root group (see Section 4.14.1).
The root of the object tree
hierarchy (a Group
instance).
Copy the contents of this file to
dstfilename
.
dstfilename
must be a path string
indicating the name of the destination file. If it already exists,
the copy will fail with an IOError
, unless the
overwrite
argument is true, in which case the
destination file will be overwritten in place. In this last case,
the destination file should be closed or ugly errors will
happen.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.
Copying a file usually has the beneficial side effect of creating a more compact and cleaner version of the original file.
Return a short string representation of the object tree.
Example of use:
>>> f = tables.openFile('data/test.h5') >>> print f data/test.h5 (File) 'Table Benchmark' Last modif.: 'Mon Sep 20 12:40:47 2004' Object Tree: / (Group) 'Table Benchmark' /tuple0 (Table(100L,)) 'This is the table title' /group0 (Group) '' /group0/tuple1 (Table(100L,)) 'This is the table title' /group0/group1 (Group) '' /group0/group1/tuple2 (Table(100L,)) 'This is the table title' /group0/group1/group2 (Group) ''
Copy the children of a group into another group.
This method copies the nodes hanging from the source group
srcgroup
into the destination group
dstgroup
. Existing destination nodes can be
replaced by asserting the overwrite
argument.
If the recursive
argument is true, all
descendant nodes of srcnode
are recursively
copied. If createparents
is true, the needed
groups for the given destination parent group path to exist will
be created.
kwargs
takes keyword arguments used to
customize the copying process. See the documentation of
Group._f_copyChildren()
(see description) for a description of those
arguments.
Copy the node specified by where
and
name
to
newparent/newname
.
These arguments work as in
File.getNode()
(see description), referencing the node to be acted
upon.
The destination group that the node will be copied
into (a path name or a Group
instance). If not specified or None
, the
current parent group is chosen as the new parent.
The name to be assigned to the new copy in its
destination (a string). If it is not specified or
None
, the current name is chosen as the
new name.
Additional keyword arguments may be passed to customize the
copying process. The supported arguments depend on the kind of
node being copied. See Group._f_copy()
(description) and Leaf.copy()
(description) for more information on their
allowed keyword arguments.
This method returns the newly created copy of the source
node (i.e. the destination node). See
Node._f_copy()
(description)
for further details on the semantics of copying nodes.
Create a new array with the given name
in
where
location. See the
Array
class (in Section 4.7) for more information on
arrays.
The array or scalar to be saved. Accepted types are
NumPy arrays and scalars, numarray
arrays
and string arrays, Numeric arrays and scalars, as well as
native Python sequences and scalars, provided that values
are regular (i.e. they are not like
[[1,2],2]
) and homogeneous (i.e. all the
elements are of the same type).
Also, objects that have some of their dimensions equal
to 0 are not supported (use an EArray
node (see Section 4.9) if you want to store an array
with one of its dimensions equal to 0).
The byteorder of the data on
disk, specified as 'little'
or
'big'
. If this is not specified, the
byteorder is that of the given
object
.
See File.createTable()
(description) for more
information on the rest of parameters.
Create a new chunked array with the given
name
in where
location. See
the CArray
class (in Section 4.8) for more
information on chunked arrays.
An Atom
(see Section 4.13.3)
instance representing the type and
shape of the atomic objects to be
saved.
The shape of the new array.
The shape of the data chunk to be read or written in a
single HDF5 I/O operation. Filters are applied to those
chunks of data. The dimensionality of
chunkshape
must be the same as that of
shape
. If None
, a
sensible value is calculated (which is recommended).
See File.createTable()
(description) for more
information on the rest of parameters.
Create a new enlargeable array with the given
name
in where
location. See
the EArray
(in Section 4.9) class for more information on
enlargeable arrays.
An Atom
(see Section 4.13.3)
instance representing the type and
shape of the atomic objects to be
saved.
The shape of the new array. One (and only one) of the
shape dimensions must be 0. The
dimension being 0 means that the resulting
EArray
object can be extended along it.
Multiple enlargeable dimensions are not supported right
now.
A user estimate about the number of row elements that
will be added to the growable dimension in the
EArray
node. If not provided, the
default value is 1000 rows. If you plan to create either a
much smaller or a much bigger array try providing a guess;
this will optimize the HDF5 B-Tree creation and management
process time and the amount of memory used. If you want to
specify your own chunk size for I/O purposes, see also the
chunkshape
parameter below.
The shape of the data chunk to be read or written in a
single HDF5 I/O operation. Filters are applied to those
chunks of data. The dimensionality of
chunkshape
must be the same as that of
shape
(beware: no dimension should be 0
this time!). If None
, a sensible value
is calculated (which is recommended).
The byteorder of the data on
disk, specified as 'little'
or
'big'
. If this is not specified, the
byteorder is that of the platform.
See File.createTable()
(description) for more
information on the rest of parameters.
Create a new group with the given name
in
where
location. See the
Group
class (in Section 4.4) for more information on
groups.
An instance of the Filters
class
(see Section 4.14.1) that provides information
about the desired I/O filters applicable to the leaves that
hang directly from this new group (unless other filter
properties are specified for these leaves). Besides, if you
do not specify filter properties for its child groups, they
will inherit these ones.
See File.createTable()
(description) for more
information on the rest of parameters.
Create a new table with the given name
in
where
location. See the
Table
(in Section 4.6) class for more information on
tables.
The parent group where the new table will hang from.
It can be a path string (for example
'/level1/leaf5'
), or a
Group
instance (see Section 4.4).
The name of the new table.
This is an object that describes the table, i.e. how many columns it has, their names, types, shapes, etc. It can be any of the following:
This should inherit from the
IsDescription
class (see Section 4.13.1) where table fields are specified.
For example, when you do not know beforehand which structure your table will have).
See Section 3.4 for an example of using a dictionary to describe a table.
Description
instanceYou can use the description
attribute of another table to create a new one with
the same structure.
NumPy (record)
array
instanceYou can use a NumPy array, whether nested or
not, and its field structure will be reflected in the
new Table
object. Moreover, if the
array has actual data it will be injected into the
newly created table. If you are using
numarray
instead of NumPy, you may
use one of the objects below for the same
purpose.
RecArray
instanceThis object from the numarray
package is also accepted, but it does not give you the
possibility to create a nested table. Array data is
injected into the new table.
NestedRecArray
instanceFinally, if you want to have nested columns in
your table and you are using
numarray
, you can use this
object. Array data is injected into the new
table.
See Appendix C for a description of the
NestedRecArray
class.
A description for this node (it sets the
TITLE
HDF5 attribute on disk).
An instance of the Filters
class
(see Section 4.14.1) that provides information
about the desired I/O filters to be applied during the life
of this object.
A user estimate of the number of records that will be
in the table. If not provided, the default value is
appropriate for tables up to 10 MB in size (more or
less). If you plan to create a bigger table try providing a
guess; this will optimize the HDF5 B-Tree creation and
management process time and memory used. If you want to
specify your own chunk size for I/O purposes, see also the
chunkshape
parameter below.
See Section 5.1 for a discussion on the issue of providing a number of expected rows.
The shape of the data chunk to be read or written in a
single HDF5 I/O operation. Filters are applied to those
chunks of data. The rank of the
chunkshape
for tables must be 1. If
None
, a sensible value is calculated
(which is recommended).
The byteorder of data on disk,
specified as 'little'
or
'big'
. If this is not specified, the
byteorder is that of the platform, unless you passed an
array as the description
, in which case
its byteorder will be used.
Whether to create the needed groups for the parent path to exist (not done by default).
Create a new variable-length array with the given
name
in where
location. See
the VLArray
(in Section 4.10) class
for more information on variable-length arrays.
An Atom
(see Section 4.13.3)
instance representing the type and
shape of the atomic objects to be
saved.
An user estimate about the size (in MB) in the final
VLArray
node. If not provided, the
default value is 1 MB. If you plan to create either a much
smaller or a much bigger array try providing a guess; this
will optimize the HDF5 B-Tree creation and management
process time and the amount of memory used. If you want to
specify your own chunk size for I/O purposes, see also the
chunkshape
parameter below.
The shape of the data chunk to be read or written in a
single HDF5 I/O operation. Filters are applied to those
chunks of data. The dimensionality of
chunkshape
must be 1. If
None
, a sensible value is calculated
(which is recommended).
See File.createTable()
(description) for more
information on the rest of parameters.
Move the node specified by where
and
name
to
newparent/newname
.
These arguments work as in
File.getNode()
(see description), referencing the node to be acted
upon.
The destination group the node will be moved into (a
path name or a Group
instance). If it is
not specified or None
, the current parent
group is chosen as the new parent.
The new name to be assigned to the node in its
destination (a string). If it is not specified or
None
, the current name is chosen as the
new name.
The other arguments work as in
Node._f_move()
(see description).
Remove the object node name under where location.
These arguments work as in
File.getNode()
(see description), referencing the node to be acted
upon.
If not supplied or false, the node will be removed
only if it has no children; if it does, a
NodeError
will be raised. If supplied
with a true value, the node and all its descendants will be
completely removed.
Change the name of the node specified by
where
and name
to
newname
.
These arguments work as in
File.getNode()
(see description), referencing the node to be acted
upon.
The new name to be assigned to the node (a string).
Whether to recursively remove a node with the same
newname
if it already exists (not done by
default).
Get the node under where
with the given
name
.
where
can be a Node
instance (see Section 4.3) or a path string leading to a node. If no
name
is specified, that node is
returned.
If a name
is specified, this must be a
string with the name of a node under where
. In
this case the where
argument can only lead to a
Group
(see Section 4.4) instance (else a
TypeError
is raised). The node called
name
under the group where
is returned.
In both cases, if the node to be returned does not exist, a
NoSuchNodeError
is raised. Please note that
hidden nodes are also considered.
If the classname
argument is specified,
it must be the name of a class derived from
Node
. If the node is found but it is not an
instance of that class, a NoSuchNodeError
is
also raised.
Is the node under path
visible?
If the node does not exist, a
NoSuchNodeError
is raised.
Iterate over children nodes hanging from
where
.
This argument works as in
File.getNode()
(see description), referencing the node to be acted
upon.
If the name of a class derived from
Node
(see Section 4.3) is supplied, only instances of
that class (or subclasses of it) will be returned.
The returned nodes are alphanumerically sorted by their
name. This is an iterator version of
File.listNodes()
(see description).
Return a list with children nodes
hanging from where
.
This is a list-returning version of
File.iterNodes()
(see description).
Recursively iterate over groups (not leaves) hanging from
where
.
The where
group itself is listed first
(preorder), then each of its child groups (following an
alphanumerical order) is also traversed, following the same
procedure. If where
is not supplied, the root
group is used.
The where
argument can be a path string
or a Group
instance (see Section 4.4).
Recursively iterate over nodes hanging from
where
.
If supplied, the iteration starts from (and includes)
this group. It can be a path string or a
Group
instance (see Section 4.4).
If the name of a class derived from
Node
(see Section 4.4) is supplied, only instances of
that class (or subclasses of it) will be returned.
Example of use:
# Recursively print all the nodes hanging from '/detector'. print "Nodes hanging from group '/detector':" for node in h5file.walkNodes('/detector', classname='EArray'): print node
Is there a node with that path
?
Returns True
if the file has a node with
the given path
(a string),
False
otherwise.
Recursively iterate over the nodes in the tree.
This is equivalent to calling
File.walkNodes()
(see description) with no arguments.
Example of use:
# Recursively list all the nodes in the object tree. h5file = tables.openFile('vlarray1.h5') print "All nodes in the object tree:" for node in h5file: print node
Disable the Undo/Redo mechanism.
Disabling the Undo/Redo mechanism leaves the database in the
current state and forgets past and future database states. This
makes File.mark()
(see description), File.undo()
(see description), File.redo()
(see description) and other methods fail with an
UndoRedoError
.
Calling this method when the Undo/Redo mechanism is already
disabled raises an UndoRedoError
.
Enable the Undo/Redo mechanism.
This operation prepares the database for undoing and redoing
modifications in the node hierarchy. This allows
File.mark()
(see description),
File.undo()
(see description),
File.redo()
(see description)
and other methods to be called.
The filters
argument, when specified,
must be an instance of class Filters
(see Section 4.14.1) and is
meant for setting the compression values for the action log. The
default is having compression enabled, as the gains in terms of
space can be considerable. You may want to disable compression if
you want maximum speed for Undo/Redo operations.
Calling this method when the Undo/Redo mechanism is already
enabled raises an UndoRedoError
.
Get the identifier of the current mark.
Returns the identifier of the current mark. This can be used
to know the state of a database after an application crash, or to
get the identifier of the initial implicit mark after a call to
File.enableUndo()
(see description).
This method can only be called when the Undo/Redo mechanism
has been enabled. Otherwise, an UndoRedoError
is raised.
Go to a specific mark of the database.
Returns the database to the state associated with the
specified mark
. Both the identifier of a mark
and its name can be used.
This method can only be called when the Undo/Redo mechanism
has been enabled. Otherwise, an UndoRedoError
is raised.
Is the Undo/Redo mechanism enabled?
Returns True
if the Undo/Redo mechanism
has been enabled for this file, False
otherwise. Please note that this mechanism is persistent, so a
newly opened PyTables file may already have Undo/Redo
support enabled.
Mark the state of the database.
Creates a mark for the current state of the database. A
unique (and immutable) identifier for the mark is returned. An
optional name
(a string) can be assigned to the
mark. Both the identifier of a mark and its name can be used in
File.undo()
(see description)
and File.redo()
(see description) operations. When the name
has already been
used for another mark, an UndoRedoError
is
raised.
This method can only be called when the Undo/Redo mechanism
has been enabled. Otherwise, an UndoRedoError
is raised.
Go to a future state of the database.
Returns the database to the state associated with the
specified mark
. Both the identifier of a mark
and its name can be used. If the mark
is
omitted, the next created mark is used. If there are no future
marks, or the specified mark
is not newer than
the current one, an UndoRedoError
is
raised.
This method can only be called when the Undo/Redo mechanism
has been enabled. Otherwise, an UndoRedoError
is raised.
Go to a past state of the database.
Returns the database to the state associated with the
specified mark
. Both the identifier of a mark
and its name can be used. If the mark
is
omitted, the last created mark is used. If there are no past
marks, or the specified mark
is not older than
the current one, an UndoRedoError
is
raised.
This method can only be called when the Undo/Redo mechanism
has been enabled. Otherwise, an UndoRedoError
is raised.
Copy PyTables attributes from one node to another.
These arguments work as in
File.getNode()
(see description), referencing the node to be acted
upon.
The destination node where the attributes will be
copied to. It can be a path string or a
Node
instance (see Section 4.3).
Delete a PyTables attribute from the given node.
These arguments work as in
File.getNode()
(see description), referencing the node to be acted
upon.
The name of the attribute to delete. If the named
attribute does not exist, an
AttributeError
is raised.
Get a PyTables attribute from the given node.
These arguments work as in
File.getNode()
(see description), referencing the node to be acted
upon.
The name of the attribute to retrieve. If the named
attribute does not exist, an
AttributeError
is raised.
Set a PyTables attribute for the given node.
These arguments work as in
File.getNode()
(see description), referencing the node to be acted
upon.
The name of the attribute to set.
The value of the attribute to set. Any kind of Python
object (like strings, ints, floats, lists, tuples, dicts,
small NumPy/Numeric/numarray objects...) can be stored as an
attribute. However, if necessary, cPickle
is automatically used so as to serialize objects that you
might want to save. See the AttributeSet
class (in Section 4.12) for details.
If the node already has a large number of attributes, a
PerformanceWarning
is issued.
Abstract base class for all PyTables nodes.
This is the base class for all nodes in a PyTables hierarchy. It is an abstract class, i.e. it may not be directly instantiated; however, every node in the hierarchy is an instance of this class.
A PyTables node is always hosted in a PyTables
file, under a parent group,
at a certain depth in the node hierarchy. A node
knows its own name in the parent group and its
own path name in the file. When using a
translation map (see the File
class in Section 4.2), its
HDF5 name might differ from its PyTables
name.
All the previous information is location-dependent, i.e. it may change when moving or renaming a node in the hierarchy. A node also has location-independent information, such as its HDF5 object identifier and its attribute set.
This class gathers the operations and attributes (both
location-dependent and independent) which are common to all PyTables
nodes, whatever their type is. Nonetheless, due to natural naming
restrictions, the names of all of these members start with a reserved
prefix (see the Group
class in Section 4.4).
Sub-classes with no children (i.e. leaf
nodes) may define new methods, attributes and properties to
avoid natural naming restrictions. For instance,
_v_attrs
may be shortened to
attrs
and _f_rename
to
rename
. However, the original methods and
attributes should still be available.
The depth of this node in the tree (an non-negative integer value).
The hosting File
instance (see Section 4.2).
The name of this node in the hosting HDF5 file (a string).
The name of this node in its parent group (a string).
The parent Group
instance (see Section 4.4).
The path of this node in the tree (a string).
The associated AttributeSet
instance
(see Section 4.12).
Whether this node is open or not.
A node identifier (may change from run to run).
A description of this node. A shorthand for
TITLE
attribute.
Close this node in the tree.
This releases all resources held by the node, so it should not be used again. On nodes with data, it may be flushed to disk.
The closing operation is not recursive, i.e. closing a group does not close its children.
Copy this node and return the new node.
Creates and returns a copy of the node, maybe in a different
place in the hierarchy. newparent
can be a
Group
object (see Section 4.4) or a
pathname in string form. If it is not specified or
None
, the current parent group is chosen as the
new parent. newname
must be a string with a
new name. If it is not specified or None
, the
current name is chosen as the new name. If
recursive
copy is stated, all descendants are
copied as well. If createparents
is true, the
needed groups for the given new parent group path to exist will be
created.
Copying a node across databases is supported but can not be
undone. Copying a node over itself is not allowed, nor it is
recursively copying a node into itself. These result in a
NodeError
. Copying over another existing node
is similarly not allowed, unless the optional
overwrite
argument is true, in which case that
node is recursively removed before copying.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. See the documentation for the particular node type.
Using only the first argument is equivalent to copying the node to a new location without changing its name. Using only the second argument is equivalent to making a copy of the node in the same group.
Move or rename this node.
Moves a node into a new parent group, or changes the name of
the node. newparent
can be a
Group
object (see Section 4.4) or a
pathname in string form. If it is not specified or
None
, the current parent group is chosen as the
new parent. newname
must be a string with a
new name. If it is not specified or None
, the
current name is chosen as the new name. If
createparents
is true, the needed groups for
the given new parent group path to exist will be created.
Moving a node across databases is not allowed, nor it is
moving a node into itself. These result in a
NodeError
. However, moving a node
over itself is allowed and simply does
nothing. Moving over another existing node is similarly not
allowed, unless the optional overwrite
argument
is true, in which case that node is recursively removed before
moving.
Usually, only the first argument will be used, effectively moving the node to a new location without changing its name. Using only the second argument is equivalent to renaming the node in place.
Remove this node from the hierarchy.
If the node has children, recursive removal must be stated
by giving recursive
a true value; otherwise, a
NodeError
will be raised.
Delete a PyTables attribute from this node.
If the named attribute does not exist, an
AttributeError
is raised.
Get a PyTables attribute from this node.
If the named attribute does not exist, an
AttributeError
is raised.
Basic PyTables grouping structure.
Instances of this class are grouping structures containing child instances of zero or more groups or leaves, together with supporting metadata. Each group has exactly one parent group.
Working with groups and leaves is similar in many ways to
working with directories and files, respectively, in a Unix
filesystem. As with Unix directories and files, objects in the object
tree are often described by giving their full (or absolute) path
names. This full path can be specified either as a string (like in
'/group1/group2'
) or as a complete object path
written in natural naming schema (like in
file.root.group1.group2
).
See Section 1.2 for more information on natural naming.
A collateral effect of the natural naming
schema is that the names of members in the Group
class and its instances must be carefully chosen to avoid colliding
with existing children node names. For this reason and to avoid
polluting the children namespace all members in a
Group
start with some reserved prefix, like
_f_
(for public methods), _g_
(for private ones), _v_
(for instance variables) or
_c_
(for class variables). Any attempt to create a
new child node whose name starts with one of these prefixes will raise
a ValueError
exception.
Another effect of natural naming is that children named after
Python keywords or having names not valid as Python identifiers (e.g.
class
, $a
or
44
) can not be accessed using the
node.child
syntax. You will be forced to use
node._f_getChild(child)
to access them (which is
recommended for programmatic accesses). You can also make use of the
trMap
(translation map dictionary) parameter in the
openFile()
function (see description) in order to translate HDF5 names not
suited for natural naming into more convenient ones, so that you can
go on using file.root.group1.group2
syntax or
getattr()
.
You will also need to use _f_getChild()
to
access an existing child node if you set a Python attribute in the
Group
with the same name as that node (you will get
a NaturalNameWarning
when doing this).
The following instance variables are provided in addition to
those in Node
(see Section 4.3):
The number of children hanging from this group.
Default filter properties for child nodes.
You can (and are encouraged to) use this property to
get, set and delete the FILTERS
HDF5
attribute of the group, which stores a
Filters
instance (see Section 4.14.1). When
the group has no such attribute, a default
Filters
instance is used.
Dictionary with all groups hanging from this group.
Dictionary with all hidden nodes hanging from this group.
Dictionary with all leaves hanging from this group.
Dictionary with all nodes hanging from this group.
Caveat: The following methods are
documented for completeness, and they can be used without any
problem. However, you should use the high-level counterpart methods
in the File
class (see Section 4.2, because they
are most used in documentation and examples, and are a bit more
powerful than those exposed here.
The following methods are provided in addition to those in
Node
(see Section 4.3):
Close this node in the tree.
This method has the behavior described in
Node._f_close()
(see description). It should be noted that this
operation disables access to nodes descending from this group.
Therefore, if you want to explicitly close them, you will need to
walk the nodes hanging from this group before
closing it.
Copy this node and return the new one.
This method has the behavior described in
Node._f_copy()
(see description). In addition, it recognizes the
following keyword arguments:
The new title for the destination. If omitted or
None
, the original title is used. This
only applies to the topmost node in recursive copies.
Specifying this parameter overrides the original
filter properties in the source node. If specified, it must
be an instance of the Filters
class (see
Section 4.14.1). The default is to copy the
filter properties from the source node.
You can prevent the user attributes from being copied
by setting this parameter to False
. The
default is to copy them.
This argument may be used to collect statistics on the
copy process. When used, it should be a dictionary with keys
'groups'
, 'leaves'
and
'bytes'
having a numeric value. Their
values will be incremented to reflect the number of groups,
leaves and bytes, respectively, that have been copied during
the operation.
Copy the children of this group into another group.
Children hanging directly from this group are copied into
dstgroup
, which can be a
Group
(see Section 4.4) object or its pathname in string
form. If createparents
is true, the needed
groups for the given destination group path to exist will be
created.
The operation will fail with a NodeError
if there is a child node in the destination group with the same
name as one of the copied children from this one, unless
overwrite
is true; in this case, the former
child node is recursively removed before copying the later.
By default, nodes descending from children groups of this
node are not copied. If the recursive
argument
is true, all descendant nodes of this node are recursively
copied.
Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.
Get the child called childname
of this
group.
If the child exists (be it visible or not), it is returned.
Else, a NoSuchNodeError
is raised.
Using this method is recommended over
getattr()
when doing programmatic accesses to
children if the childname
is unknown beforehand
or when its name is not a valid Python identifier.
Iterate over children nodes.
Child nodes are yielded alphanumerically sorted by node
name. If the name of a class derived from Node
(see Section 4.3)
is supplied in the classname
parameter, only
instances of that class (or subclasses of it) will be
returned.
This is an iterator version of
Group._f_listNodes()
(see description).
Return a list with children nodes.
This is a list-returning version of
Group._f_iterNodes()
(see description).
Recursively iterate over descendent groups (not leaves).
This method starts by yielding self, and then it goes on to recursively iterate over all child groups in alphanumerical order, top to bottom (preorder), following the same procedure.
Iterate over descendent nodes.
This method recursively walks self top
to bottom (preorder), iterating over child groups in
alphanumerical order, and yielding nodes. If
classname
is supplied, only instances of the
named class are yielded.
If classname is
Group
, it behaves like
Group._f_walkGroups() (see the section called “_f_walkGroups()”), yielding only groups. If you
don't want a recursive behavior, use
Group._f_iterNodes() (see description) instead.
Example of use:
# Recursively print all the arrays hanging from '/' print "Arrays in the object tree '/':" for array in h5file.root._f_walkNodes('Array', recursive=True): print array
Following are described the methods that automatically trigger
actions when a Group
instance is accessed in a
special way.
This class defines the __setattr__
,
__getattr__
and __delattr__
methods, and they set, get and delete ordinary Python
attributes as normally intended. In addition to that,
__getattr__
allows getting child
nodes by their name for the sake of easy interaction on
the command line, as long as there is no Python attribute with the
same name. Groups also allow the interactive completion (when using
readline
) of the names of child nodes. For
instance:
nchild = group._v_nchildren # get a Python attribute # Add a Table child called 'table' under 'group'. h5file.createTable(group, 'table', myDescription) table = group.table # get the table child instance group.table = 'foo' # set a Python attribute # (PyTables warns you here about using the name of a child node.) foo = group.table # get a Python attribute del group.table # delete a Python attribute table = group.table # get the table child instance again
Is there a child with that name
?
Returns a true value if the group has a child node (visible or hidden) with the given name (a string), false otherwise.
Delete a Python attribute called
name
.
This method deletes an ordinary Python
attribute from the object. It does
not remove children nodes from this group;
for that, use File.removeNode()
(see description) or
Node._f_remove()
(see description). It does neither
delete a PyTables node attribute; for that, use
File.delNodeAttr()
(see description),
Node._f_delAttr()
(see description) or Node._v_attrs
(see Section 4.3.2).
If there is an attribute and a child node with the same
name
, the child node will be made accessible
again via natural naming.
Get a Python attribute or child node called
name
.
If the object has a Python attribute called
name
, its value is returned. Else, if the node
has a child node called name
, it is returned.
Else, an AttributeError
is raised.
Iterate over the child nodes hanging directly from the group.
This iterator is not recursive. Example of use:
# Non-recursively list all the nodes hanging from '/detector' print "Nodes in '/detector' group:" for node in h5file.root.detector: print node
Return a detailed string representation of the group.
Example of use:
>>> f = tables.openFile('data/test.h5') >>> f.root.group0 /group0 (Group) 'First Group' children := ['tuple1' (Table), 'group1' (Group)]
Set a Python attribute called name
with
the given value
.
This method stores an ordinary Python
attribute in the object. It does
not store new children nodes under this
group; for that, use the File.create*()
methods
(see the File
class in Section 4.2). It does
neither store a PyTables node attribute; for
that, use File.setNodeAttr()
(see description),
Node._f_setAttr()
(see description) or Node._v_attrs
(see Section 4.3.2).
If there is already a child node with the same
name
, a NaturalNameWarning
will be issued and the child node will not be accessible via
natural naming nor getattr()
. It will still be
available via File.getNode()
(see description), Group._f_getChild()
(see description) and children
dictionaries in the group (if visible).
Abstract base class for all PyTables leaves.
A leaf is a node (see the Node
class in Section 4.3) which hangs
from a group (see the Group
class in Section 4.4) but, unlike a
group, it can not have any further children below it (i.e. it is an
end node).
This definition includes all nodes which contain actual data
(datasets handled by the Table
—see Section 4.6—,
Array
—see Section 4.7—, CArray
—see Section 4.8—,
EArray
—see Section 4.9— and VLArray
—see
Section 4.10—
classes) and unsupported nodes (the UnImplemented
class —Section 4.11) —these classes do in fact inherit from
Leaf
.
These instance variables are provided in addition to those in
Node
(see Section 4.3):
The byte ordering of the leaf data on disk.
The HDF5 chunk size for chunked leaves (a tuple).
This is read-only because you cannot change the chunk size of a leaf once it has been created.
The index of the enlargeable dimension (-1 if none).
Filter properties for this leaf —see
Filters
in Section 4.14.1.
The type of data object read from this leaf.
It can be any of 'numpy'
,
'numarray'
, 'numeric'
or
'python'
(the set of supported flavors
depends on which packages you have installed on your
system).
You can (and are encouraged to) use this property to
get, set and delete the FLAVOR
HDF5
attribute of the leaf. When the leaf has no such attribute,
the default flavor is used.
The dimension along which iterators work.
Its value is 0 (i.e. the first dimension) when the
dataset is not extendable, and self.extdim
(where available) for extendable ones.
The length of the main dimension of the leaf data.
The number of rows that fit in internal input buffers.
You can change this to fine-tune the speed or memory requirements of your application.
The shape of data in the leaf.
The following are just easier-to-write aliases to their
Node
(see Section 4.3) counterparts (indicated between
parentheses):
The associated AttributeSet
instance
—see Section 4.12— (Node._v_attrs
).
The name of this node in the hosting HDF5 file
(Node._v_hdf5name
).
The name of this node in its parent group
(Node._v_name
).
A node identifier (may change from run to run).
(Node._v_objectID
).
A description for this node
(Node._v_title
).
Close this node in the tree.
This method is completely equivalent to
Leaf._f_close()
(see description).
Copy this node and return the new one.
This method has the behavior described in
Node._f_copy()
(see description). Please note that there is no
recursive
flag since leaves do not have child
nodes. In addition, this method recognizes the following keyword
arguments:
The new title for the destination. If omitted or
None
, the original title is used.
Specifying this parameter overrides the original
filter properties in the source node. If specified, it must
be an instance of the Filters
class (see
Section 4.14.1). The default is to copy the
filter properties from the source node.
You can prevent the user attributes from being copied
by setting this parameter to False
. The
default is to copy them.
Specify the range of rows to be copied; the default is to copy all the rows.
This argument may be used to collect statistics on the
copy process. When used, it should be a dictionary with keys
'groups'
, 'leaves'
and
'bytes'
having a numeric value. Their
values will be incremented to reflect the number of groups,
leaves and bytes, respectively, that have been copied during
the operation.
Delete a PyTables attribute from this node.
This method has the behavior described in
Node._f_delAttr()
(see description).
Flush pending data to disk.
Saves whatever remaining buffered data to disk. It also
releases I/O buffers, so if you are filling many datasets in the
same PyTables session, please call flush()
extensively so as to help PyTables to keep memory requirements
low.
Get a PyTables attribute from this node.
This method has the behavior described in
Node._f_getAttr()
(see description).
Is this node visible?
This method has the behavior described in
Node._f_isVisible()
(see description).
Move or rename this node.
This method has the behavior described in
Node._f_move()
(see description).
Rename this node in place.
This method has the behavior described in
Node._f_rename()
(see description).
Remove this node from the hierarchy.
This method has the behavior described in
Node._f_remove()
(see description). Please note that there is no
recursive
flag since leaves do not have child
nodes.
Set a PyTables attribute for this node.
This method has the behavior described in
Node._f_setAttr()
(see description).
Close this node in the tree.
This method has the behavior described in
Node._f_close()
(see description). Besides that, the optional argument
flush
tells whether to flush pending data to
disk or not before closing.
This class represents heterogeneous datasets in an HDF5 file.
Tables are leaves (see the Leaf
class in
Section 4.5) whose
data consists of a unidimensional sequence of
rows, where each row contains one or more
fields. Fields have an associated unique
name and position, with the
first field having position 0. All rows have the same fields, which
are arranged in columns.
Fields can have any type supported by the Col
class (see Section 4.13.2)
and its descendants, which support multidimensional data. Moreover, a
field can be nested (to an arbitrary depth),
meaning that it includes further fields inside. A field named
x
inside a nested field a
in a
table can be accessed as the field a/x
(its
path name) from the table.
The structure of a table is declared by its description, which
is made available in the Table.description
attribute (see Section 4.6.1).
This class provides new methods to read, write and search table data efficiently. It also provides special Python methods to allow accessing the table as a normal sequence or array (with extended slicing supported).
PyTables supports in-kernel searches
working simultaneously on several columns using complex conditions.
These are faster than selections using Python expressions. See the
Tables.where()
method —description— for more information on in-kernel searches.
See also Section 5.2.1
for a detailed review of the advantages and shortcomings of in-kernel
searches.
Non-nested columns can be indexed. Searching an indexed column can be several times faster than searching a non-nested one. Search methods automatically take advantage of indexing where available.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
When iterating a table, an object from the
Row
(see Section 4.6.7) class is used. This object allows to
read and write data one row at a time, as well as to perform queries
which are not supported by in-kernel syntax (at a much lower speed, of
course).
See the tutorial sections in Chapter 3 on how to use the Row
interface.
Objects of this class support access to individual columns via
natural naming through the
Table.cols
accessor (see Section 4.6.1).
Nested columns are mapped to Cols
instances, and
non-nested ones to Column
instances. See the
Column
class in Section 4.6.9 for examples of this feature.
The following instance variables are provided in addition to
those in Leaf
(see Section 4.5). Please note that there are several
col*
dictionaries to ease retrieving information
about a column directly by its path name, avoiding the need to walk
through Table.description
or
Table.cols
.
Automatically keep column indexes up to date?
Setting this value states whether existing indexes should be automatically updated after an append operation or recomputed after an index-invalidating operation (i.e. removal and modification of rows). The default is true.
This value gets into effect whenever a column is
altered. If you don't have automatic indexing activated and
you want to do an immediate update use
Table.flushRowsToIndex()
(see Section ); for immediate reindexing of invalidated indexes, use
Table.reIndexDirty()
(see Section ).
This value is persistent.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Maps the name of a column to its Col
description (see Section 4.13.2).
Maps the name of a column to its default value.
Maps the name of a column to its NumPy data type.
Is the column which name is used as a key indexed?
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Maps the name of a column to its
Column
(see Section 4.6.9) or
Cols
(see Section 4.6.8) instance.
A list containing the names of top-level columns in the table.
A list containing the pathnames of bottom-level columns in the table.
These are the leaf columns obtained when walking the
table description left-to-right, bottom-first. Columns inside
a nested column have slashes (/
) separating
name components in their pathname.
A Cols
instance that provides
natural naming access to non-nested
(Column
, see Section 4.6.9) and
nested (Cols
, see Section 4.6.8)
columns.
Maps the name of a column to its PyTables data type.
A Description
instance (see Section 4.6.6)
reflecting the structure of the table.
The index of the enlargeable dimension (always 0 for tables).
Does this table have any indexed columns?
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
List of the pathnames of indexed columns in the table.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Filters used to compress indexes.
Setting this value to a Filters
(see
Section 4.14.1) instance determines the
compression to be used for indexes. Setting it to
None
means that no filters will be used for
indexes. The default is zlib compression level 1 with
shuffling.
This value is used when creating new indexes or
recomputing old ones. To apply it to existing indexes, use
Table.reIndex()
(see Section ).
This value is persistent.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
The current number of rows in the table.
The associated Row
instance (see
Section 4.6.7).
The size in bytes of each row in the table.
Get a column from the table.
If a column called name
exists in the
table, it is read and returned as a NumPy object, or as a
numarray
object (depending on the flavor of the
table). If it does not exist, a KeyError
is
raised.
Example of use:
narray = table.col('var2')
That statement is equivalent to:
narray = table.read(field='var2')
Here you can see how this method can be used as a shorthand
for the Table.read()
method (see description).
Iterate over the table using a Row
instance (see Section 4.6.7).
If a range is not supplied, all the
rows in the table are iterated upon —you can also use
the Table.__iter__()
special method (see description) for that purpose. If you want to
iterate over a given range of rows in the
table, you may use the start
,
stop
and step
parameters,
which have the same meaning as in Table.read()
(see description).
Example of use:
result = [ row['var2'] for row in table.iterrows(step=5) if row['var1'] <= 20 ]
![]() | Note |
---|---|
This iterator can be nested (see
|
![]() | Warning |
---|---|
When in the middle of a table row iterator, you should not
use methods that can change the number of rows in the table
(like |
Iterate over a sequence
of row
coordinates.
A true value for sort
means that the
sequence
will be sorted so that I/O
might perform better. If your sequence is
already sorted or you don't want to sort it, leave this parameter
as false. The default is not to sort the
sequence
.
![]() | Note |
---|---|
This iterator can be nested (see
|
Get data in the table as a (record) array.
The start
, stop
and
step
parameters can be used to select only a
range of rows in the table. Their meanings
are the same as in the built-in range()
Python
function, except that negative values of step
are not allowed yet. Moreover, if only start
is
specified, then stop
will be set to
start+1
. If you do not specify neither
start
nor stop
, then
all the rows in the table are
selected.
If field
is supplied only the named
column will be selected. If the column is not nested, an
array of the current flavor will be returned;
if it is, a record array will be used
instead. I no field
is specified, all the
columns will be returned in a record array of the current flavor.
More specifically, when the flavor is
'numarray'
and a record array is needed, a
NestedRecArray
(see Appendix C)
will be returned.
Columns under a nested column can be specified in the
field
parameter by using a slash character
(/
) as a separator
(e.g. 'position/x'
).
Get a set of rows given their indexes as a (record) array.
This method works much like the read()
method (see description), but it uses a sequence
(coords
) of row indexes to select the wanted
columns, instead of a column range.
The selected rows are returned in an array or record array of the current flavor.
Get a row or a range of rows from the table.
If the key
argument is an integer, the
corresponding table row is returned as a record of the current
flavor. If key
is a slice, the range of rows
determined by it is returned as a record array of the current
flavor.
Example of use:
record = table[4] recarray = table[4:1000:2]
Those statements are equivalent to:
record = table.read(start=4)[0] recarray = table.read(start=4, stop=1000, step=2)
Here you can see how indexing and slicing can be used as
shorthands for the read()
(see description) method.
Iterate over the table using a Row
instance (see Section 4.6.7).
This is equivalent to calling
Table.iterrows()
(see description) with default arguments, i.e. it
iterates over all the rows in the
table.
Example of use:
result = [ row['var2'] for row in table if row['var1'] <= 20 ]
Which is equivalent to:
result = [ row['var2'] for row in table.iterrows() if row['var1'] <= 20 ]
![]() | Note |
---|---|
This iterator can be nested (see
|
Append a sequence of rows
to the end of
the table.
The rows
argument may be any object which
can be converted to a record array compliant with the table
structure (otherwise, a ValueError
is raised).
This includes NumPy record arrays, RecArray
or
NestedRecArray
objects if
numarray
is available, lists of tuples or array
records, and a string or Python buffer.
Example of use:
from tables import * class Particle(IsDescription): name = StringCol(16, pos=1) # 16-character String lati = IntCol(pos=2) # integer longi = IntCol(pos=3) # integer pressure = Float32Col(pos=4) # float (single-precision) temperature = FloatCol(pos=5) # double (double-precision) fileh = openFile('test4.h5', mode='w') table = fileh.createTable(fileh.root, 'table', Particle, "A table") # Append several rows in only one call table.append([("Particle: 10", 10, 0, 10*10, 10**2), ("Particle: 11", 11, -1, 11*11, 11**2), ("Particle: 12", 12, -2, 12*12, 12**2)]) fileh.close()
See Appendix C if you are using
numarray
and you want to append data to nested
columns.
Modify one single column in the row slice
[start:stop:step]
.
The colname
argument specifies the name
of the column in the table to be modified with the data given in
column
. This method returns the number of rows
modified. Should the modification exceed the length of the table,
an IndexError
is raised before changing
data.
The column
argument may be any object
which can be converted to a (record) array compliant with the
structure of the column to be modified (otherwise, a
ValueError
is raised). This includes NumPy
(record) arrays, NumArray
,
RecArray
or NestedRecArray
objects if numarray
is available, Numeric
arrays if available, lists of scalars, tuples or array records,
and a string or Python buffer.
See Appendix C if you are using
numarray
and you want to modify data in a
nested column.
Modify a series of columns in the row slice
[start:stop:step]
.
The names
argument specifies the names of
the columns in the table to be modified with the data given in
columns
. This method returns the number of
rows modified. Should the modification exceed the length of the
table, an IndexError
is raised before changing
data.
The columns
argument may be any object
which can be converted to a record array compliant with the
structure of the columns to be modified (otherwise, a
ValueError
is raised). This includes NumPy
record arrays, RecArray
or
NestedRecArray
objects if
numarray
is available, lists of tuples or array
records, and a string or Python buffer.
See Appendix C if you are using
numarray
and you want to modify data in nested
columns.
Modify a series of rows in the slice
[start:stop:step]
.
The values in the selected rows will be modified with the
data given in rows
. This method returns the
number of rows modified. Should the modification exceed the
length of the table, an IndexError
is raised
before changing data.
The possible values for the rows
argument
are the same as in Table.append()
(see description).
See Appendix C if you are using
numarray
and you want to modify data in nested
columns.
Remove a range of rows in the table.
If only start
is supplied, only this row
is to be deleted. If a range is supplied, i.e. both the
start
and stop
parameters
are passed, all the rows in the range are removed. A
step
parameter is not supported, and it is not
foreseen to be implemented anytime soon.
Sets the starting row to be removed. It accepts negative values meaning that the count starts from the end. A value of 0 means the first row.
Sets the last row to be removed to
stop-1
, i.e. the end point is omitted (in
the Python range()
tradition). Negative
values are also accepted. A special value of
None
(the default) means removing just
the row supplied in start
.
Set a row or a range of rows in the table.
It takes different actions depending on the type of the
key
parameter: if it is an integer, the
corresponding table row is set to value
(a
record or sequence capable of being converted to the table
structure). If the key
is a slice, the row
slice determined by it is set to value
(a
record array or sequence capable of being converted to the table
structure).
Example of use:
# Modify just one existing row table[2] = [456,'db2',1.2] # Modify two existing rows rows = numpy.rec.array([[457,'db1',1.2],[6,'de2',1.3]], formats='i4,a3,f8') table[1:3:2] = rows
Which is equivalent to:
table.modifyRows(start=2, rows=[456,'db2',1.2]) rows = numpy.rec.array([[457,'db1',1.2],[6,'de2',1.3]], formats='i4,a3,f8') table.modifyRows(start=1, stop=3, step=2, rows=rows)
See Appendix C if you are using
numarray
and you want to modify data in nested
columns.
Get the row coordinates fulfilling the given
condition
.
The coordinates are returned as a list of the current
flavor. sort
means that you want to retrieve
the coordinates ordered. The default is to not sort them.
The meaning of the other arguments is the same as in the
Table.where()
method (see description).
Read table data fulfilling the given condition.
This method is similar to Table.read()
(see description), having their common arguments
and return values the same meanings. However, only the rows
fulfilling the condition are included in the
result.
The meaning of the other arguments is the same as in the
Table.where()
method (see description).
Iterate over values fulfilling a
condition
.
This method returns a Row
iterator (see
Section 4.6.7) which
only selects rows in the table that satisfy the given
condition
(an expression-like string).
For more information on condition syntax, see Appendix B.
The condvars
mapping may be used to
define the variable names appearing in the
condition
. condvars
should
consist of identifier-like strings pointing to
Column
(see Section 4.6.9) instances of this
table, or to other values (which will be converted to
arrays). A default set of condition variables is provided where
each top-level, non-nested column with an identifier-like name
appears. Variables in condvars
override the
default ones.
When condvars
is not provided or
None
, the current local and global namespace is
sought instead of condvars
. The previous
mechanism is mostly intended for interactive usage. To disable it,
just specify a (maybe empty) mapping as
condvars
.
If a range is supplied (by setting some of the
start
, stop
or
step
parameters), only the rows in that range
and
fulfilling the condition
are used. The meaning of the start
,
stop
and step
parameters is
the same as in the range()
Python function,
except that negative values of step
are
not
allowed. Moreover, if only
start
is specified, then
stop
will be set to
start+1
.
When possible, indexed columns participating in the condition will be used to speed up the search. It is recommended that you place the indexed columns as left and out in the condition as possible. Anyway, this method has always better performance than regular Python selections on the table. Please check the Section 5.2 for more information about the performance of the different searching modes.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
You can mix this method with regular Python selections in order to support even more complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.
Example of use:
>>> passvalues = [ row['col3'] for row in ... table.where('(col1 > 0) & (col2 <= 20)', step=5) ... if your_function(row['col2']) ] >>> print "Values that pass the cuts:", passvalues
Note that, from PyTables 1.1 on, you can nest several iterators over the same table. For example:
for p in rout.where('pressure < 16'): for q in rout.where('pressure < 9'): for n in rout.where('energy < 10'): print "pressure, energy:", p['pressure'], n['energy']
In this example, iterators returned by
Table.where()
have been used, but you may as
well use any of the other reading iterators that
Table
objects offer. See the file
examples/nested-iter.py
for the full
code.
![]() | Warning |
---|---|
When in the middle of a table row iterator, you should not
use methods that can change the number of rows in the table
(like |
Append rows fulfilling the condition
to
the dstTable
table.
dstTable
must be capable of taking the
rows resulting from the query, i.e. it must have columns with the
expected names and compatible types. The meaning of the other
arguments is the same as in the Table.where()
method (see description).
The number of rows appended to dstTable
is returned as a result.
Will a query for the condition
use
indexing?
The meaning of the condition
and
condvars arguments is the same as in the
Table.where()
method (see description). If the condition
can
use indexing, this method returns the path name of the column
whose index is usable. Otherwise, it returns
None
.
This method is mainly intended for testing. Keep in mind that changing the set of indexed columns or their dirtyness may make this method return different values for the same arguments at different times.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Add remaining rows in buffers to non-dirty indexes.
This can be useful when you have chosen non-automatic
indexing for the table (see the Table.autoIndex
property in Section 4.6.1) and you want to update the indexes
on it.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Get the enumerated type associated with the named column.
If the column named colname
(a string)
exists and is of an enumerated type, the corresponding
Enum
instance (see Section 4.14.3) is
returned. If it is not of an enumerated type, a
TypeError
is raised. If the column does not
exist, a KeyError
is raised.
Recompute all the existing indexes in the table.
This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Recompute the existing indexes in table, if they are dirty.
This can be useful when you have set
Table.autoIndex
(see Section 4.6.1) to false for the table and you want to update the indexes
after a invalidating index operation
(Table.removeRows()
, for example).
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
This class represents descriptions of the structure of tables.
An instance of this class is automatically bound to
Table
(see Section 4.6) objects when they are created. It
provides a browseable representation of the structure of the table,
made of non-nested (Col
—see Section 4.13.2) and nested
(Description
) columns. It also contains
information that will allow you to build
NestedRecArray
(see Appendix C)
objects suited for the different columns in a table (be they nested
or not).
Column definitions under a description can be accessed as
attributes of it (natural naming). For
instance, if table.description
is a
Description
instance with a column named
col1
under it, the later can be accessed as
table.description.col1
. If
col1
is nested and contains a
col2
column, this can be accessed as
table.description.col1.col2
. Because of natural
naming, the names of members start with special prefixes, like in
the Group
class (see Section 4.4).
A dictionary mapping the names of the columns hanging
directly from the associated table or nested column to their
respective descriptions (Col
—see Section 4.13.2— or
Description
—see Section 4.6.6— instances).
A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective default values.
The NumPy type which reflects the structure of this
table or nested column. You can use this as the
dtype
argument of NumPy array
factories.
A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective NumPy types.
Whether the associated table or nested column contains further nested columns or not.
The size in bytes of an item in this table or nested column.
The name of this description group. The name of the
root group is '/'
.
A list of the names of the columns hanging directly from the associated table or nested column. The order of the names matches the order of their respective columns in the containing table.
A nested list of pairs of (name,
format)
tuples for all the columns under this
table or nested column. You can use this as the
dtype
and descr
arguments of NumPy array and
NestedRecArray
(see Appendix C) factories, respectively.
A nested list of the NumPy string formats (and shapes)
of all the columns under this table or nested column. You
can use this as the formats
argument of
NumPy array and NestedRecArray
(see Appendix C) factories.
The level of the associated table or nested column in the nested datatype.
A nested list of the names of all the columns under
this table or nested column. You can use this as the
names
argument of NumPy array and
NestedRecArray
(see Appendix C) factories.
A list of the pathnames of all the columns under this
table or nested column (in preorder). If it does not
contain nested columns, this is exactly the same as the
Description._v_names
attribute.
A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective PyTables types.
Table row iterator and field accessor.
Instances of this class are used to fetch and set the values of individual table fields. It works very much like a dictionary, where keys are the pathnames or positions (extended slicing is supported) of the fields in the associated table in a specific row.
This class provides an iterator interface
so that you can use the same Row
instance to
access successive table rows one after the other. There are also
some important methods that are useful for acessing, adding and
modifying values in tables.
The current row number.
This poperty is useful for knowing which row is being dealt with in the middle of a loop or iterator.
Add a new row of data to the end of the dataset.
Once you have filled the proper fields for the current row, calling this method actually appends the new data to the output buffer (which will eventually be dumped to disk). If you have not set the value of a field, the default value of the column will be used.
Example of use:
row = table.row for i in xrange(nrows): row['col1'] = i-1 row['col2'] = 'a' row['col3'] = -1.0 row.append() table.flush()
![]() | Warning |
---|---|
After completion of the loop in which
|
Retrieve all the fields in the current row.
Contrarily to row[:]
(see Section ), this
returns row data as a NumPy void scalar. For instance:
[row.fetch_all_fields() for row in table.where('col1 < 3')]
will select all the rows that fullfill the given condition as a list of NumPy records.
Change the data of the current row in the dataset.
This method allows you to modify values in a table when
you are in the middle of a table iterator like
Table.iterrows()
(see description) or Table.where()
(see description).
Once you have filled the proper fields for the current row, calling this method actually changes data in the output buffer (which will eventually be dumped to disk). If you have not set the value of a field, its original value will be used.
Examples of use:
for row in table.iterrows(step=10): row['col1'] = row.nrow row['col2'] = 'b' row['col3'] = 0.0 row.update() table.flush()
which modifies every tenth row in table. Or:
for row in table.where('col1 > 3'): row['col1'] = row.nrow row['col2'] = 'b' row['col3'] = 0.0 row.update() table.flush()
which just updates the rows with values bigger than 3 in the first column.
![]() | Warning |
---|---|
After completion of the loop in which
|
Get the row field specified by the
key
.
The key
can be a string (the name of
the field), an integer (the position of the field) or a slice
(the range of field positions). When key
is a
slice, the returned value is a tuple
containing the values of the specified fields.
Examples of use:
res = [row['var3'] for row in table.where('var2 < 20')]
which selects the var3
field for all
the rows that fullfill the condition. Or:
res = [row[4] for row in table if row[1] < 20]
which selects the field in the 4th position for all the rows that fullfill the condition. Or:
res = [row[:] for row in table if row['var2'] < 20]
which selects the all the fields (in the form of a tuple) for all the rows that fullfill the condition. Or:
res = [row[1::2] for row in table.iterrows(2, 3000, 3)]
which selects all the fields in even positions (in the
form of a tuple) for all the rows in the
slice [2:3000:3]
.
Set the key
row field to the specified
value
.
Differently from its __getitem__()
counterpart, in this case key
can only be a
string (the name of the field). The changes done via
__setitem__()
will not take effect on the
data on disk until any of the Row.append()
(see description) or
Row.update()
(see description) methods are called.
Example of use:
for row in table.iterrows(step=10): row['col1'] = row.nrow row['col2'] = 'b' row['col3'] = 0.0 row.update() table.flush()
which modifies every tenth row in the table.
Container for columns in a table or nested column.
This class is used as an accessor to the
columns in a table or nested column. It supports the
natural naming convention, so that you can
access the different columns as attributes which lead to
Column
instances (for non-nested columns) or
other Cols
instances (for nested columns).
For instance, if table.cols
is a
Cols
instance with a column named
col1
under it, the later can be accessed as
table.cols.col1
. If col1
is
nested and contains a col2
column, this can be
accessed as table.cols.col1.col2
and so
on. Because of natural naming, the names of members start with
special prefixes, like in the Group
class (see
Section 4.4).
Like the Column
class (see Section 4.6.9),
Cols
supports item access to read and write
ranges of values in the table or nested column.
A list of the names of the columns hanging directly from the associated table or nested column. The order of the names matches the order of their respective columns in the containing table.
A list of the pathnames of all the columns under the
associated table or nested column (in preorder). If it does
not contain nested columns, this is exactly the same as the
Cols._v_colnames
attribute.
The associated Description
instance
(see Section 4.6.6).
The parent Table
instance (see
Section 4.6).
Get an accessor to the column
colname
.
This method returns a Column
instance
(see Section 4.6.9) if the requested column is not nested, and a
Cols
instance (see Section 4.6.8) if it is.
You may use full column pathnames in
colname
.
Calling cols._f_col('col1/col2')
is
equivalent to using cols.col1.col2
. However,
the first syntax is more intended for programmatic use. It is
also better if you want to access columns with names that are
not valid Python identifiers.
Get a row or a range of rows from a table or nested column.
If the key
argument is an integer, the
corresponding nested type row is returned as a record of the
current flavor. If key
is a slice, the range
of rows determined by it is returned as a record array of the
current flavor.
Example of use:
record = table.cols[4] # equivalent to table[4] recarray = table.cols.Info[4:1000:2]
Those statements are equivalent to:
nrecord = table.read(start=4)[0] nrecarray = table.read(start=4, stop=1000, step=2).field('Info')
Here you can see how a mix of natural naming, indexing and
slicing can be used as shorthands for the
Table.read()
(see description) method.
Get the number of elements in the column.
This matches the length in rows of the parent table.
Set a row or a range of rows in a table or nested column.
If the key
argument is an integer, the
corresponding row is set to value
. If
key
is a slice, the range of rows determined
by it is set to value
.
Example of use:
table.cols[4] = record table.cols.Info[4:1000:2] = recarray
Those statements are equivalent to:
table.modifyRows(4, rows=record) table.modifyColumn(4, 1000, 2, colname='Info', column=recarray)
Here you can see how a mix of natural naming, indexing and
slicing can be used as shorthands for the
Table.modifyRows()
(see description) and
Table.modifyColumn()
(see description) methods.
Accessor for a non-nested column in a table.
Each instance of this class is associated with one
non-nested column of a table. These instances
are mainly used to read and write data from the table columns using
item access (like the Cols
class —see Section 4.6.8), but there
are a few other associated methods to deal with indexes.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
The Description
(see Section 4.6.6) instance of the parent table or nested column.
The NumPy dtype
that most closely
matches this column.
The Index
instance (see Section 4.14.2)
associated with this column (None
if the
column is not indexed).
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
True if the column is indexed, false otherwise.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
The name of the associated column.
The complete pathname of the associated column (the
same as Column.name
if the column is not
inside a nested column).
The parent Table
instance (see
Section 4.6).
The PyTables type of the column (a string).
Create an index for this column.
You can select the optimization level of the index by
setting optlevel
from 0 (no optimization) to
9 (maximum optimization). Higher levels of optimization mean
better chances for reducing the entropy of the index at the
price of using more CPU and I/O resources for creating the
index.
The filters
argument can be used to set
the Filters
(see Section 4.14.1) used
to compress the index. If None
, default
index filters will be used (currently, zlib level 1 with
shuffling).
When optlevel
is greater that 0, a
temporary file is created during the index build process. You
can use the tmp_dir
argument to specify the
directory for this temporary file. The default is to create it
in the same directory as the file containing the original
table.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Recompute the index associated with this column.
This can be useful when you suspect that, for any reason, the index information is no longer valid and you want to rebuild it.
This method does nothing if the column is not indexed.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Recompute the associated index only if it is dirty.
This can be useful when you have set
Table.autoIndex
(see Section 4.6.1) to false for the table and you want to update the column's
index after an invalidating index operation
(like. Table.removeRows()
—see description).
This method does nothing if the column is not indexed.
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Remove the index associated with this column.
This method does nothing if the column is not indexed. The
removed index can be created again by calling the
Column.createIndex()
method (see description).
![]() | Note |
---|---|
Column indexing is only available in PyTables Pro. |
Get a row or a range of rows from a column.
If the key
argument is an integer, the
corresponding element in the column is returned as an object of
the current flavor. If key
is a slice, the
range of elements determined by it is returned as an array of
the current flavor.
Example of use:
print "Column handlers:" for name in table.colnames: print table.cols._f_col(name) print "Select table.cols.name[1]-->", table.cols.name[1] print "Select table.cols.name[1:2]-->", table.cols.name[1:2] print "Select table.cols.name[:]-->", table.cols.name[:] print "Select table.cols._f_col('name')[:]-->", table.cols._f_col('name')[:]
The output of this for a certain arbitrary table is:
Column handlers: /table.cols.name (Column(), string, idx=None) /table.cols.lati (Column(), int32, idx=None) /table.cols.longi (Column(), int32, idx=None) /table.cols.vector (Column(2,), int32, idx=None) /table.cols.matrix2D (Column(2, 2), float64, idx=None) Select table.cols.name[1]--> Particle: 11 Select table.cols.name[1:2]--> ['Particle: 11'] Select table.cols.name[:]--> ['Particle: 10' 'Particle: 11' 'Particle: 12' 'Particle: 13' 'Particle: 14'] Select table.cols._f_col('name')[:]--> ['Particle: 10' 'Particle: 11' 'Particle: 12' 'Particle: 13' 'Particle: 14']
See the examples/table2.py
file for a
more complete example.
Get the number of elements in the column.
This matches the length in rows of the parent table.
Set a row or a range of rows in a column.
If the key
argument is an integer, the
corresponding element is set to value
. If
key
is a slice, the range of elements
determined by it is set to value
.
Example of use:
# Modify row 1 table.cols.col1[1] = -1 # Modify rows 1 and 3 table.cols.col1[1::2] = [2,3]
Which is equivalent to:
# Modify row 1 table.modifyColumns(start=1, columns=[[-1]], names=['col1']) # Modify rows 1 and 3 columns = numpy.rec.fromarrays([[2,3]], formats='i4') table.modifyColumns(start=1, step=2, columns=columns, names=['col1'])
This class represents homogeneous datasets in an HDF5 file.
This class provides methods to write or read data to or from
array objects in the file. This class does not allow you neither to
enlarge nor compress the datasets on disk; use the
EArray
class (see Section 4.9) if you want enlargeable dataset support
or compression features, or CArray
(see Section 4.8) if you just
want compression.
An interesting property of the Array
class is
that it remembers the flavor of the object that
has been saved so that if you saved, for example, a
list
, you will get a list
during
readings afterwards; if you saved a NumPy array, you will get a NumPy
object, and so forth.
Note that this class inherits all the public attributes and
methods that Leaf
(see Section 4.5) already
provides. However, as Array
instances have no
internal I/O buffers, it is not necessary to use the
flush()
method they inherit from
Leaf
in order to save their internal state to disk.
When a writing method call returns, all the data is already on
disk.
An Atom
(see Section 4.13.3)
instance representing the type and
shape of the atomic objects to be
saved.
The size of the rows in dimensions orthogonal to maindim.
On iterators, this is the index of the current row.
Get the enumerated type associated with this array.
If this array is of an enumerated type, the corresponding
Enum
instance (see Section 4.14.3) is
returned. If it is not of an enumerated type, a
TypeError
is raised.
Iterate over the rows of the array.
This method returns an iterator yielding an object of the current flavor for each selected row in the array. The returned rows are taken from the main dimension.
If a range is not supplied, all the
rows in the array are iterated upon —you can also use
the Array.__iter__()
special method (see description) for that purpose. If you only want
to iterate over a given range of rows in the
array, you may use the start
,
stop
and step
parameters,
which have the same meaning as in Array.read()
(see description).
Example of use:
result = [row for row in arrayInstance.iterrows(step=4)]
Get the next element of the array during an iteration.
The element is returned as an object of the current flavor.
Get data in the array as an object of the current flavor.
The start
, stop
and
step
parameters can be used to select only a
range of rows in the array. Their meanings
are the same as in the built-in range()
Python
function, except that negative values of step
are not allowed yet. Moreover, if only start
is
specified, then stop
will be set to
start+1
. If you do not specify neither
start
nor stop
, then
all the rows in the array are
selected.
The following methods automatically trigger actions when an
Array
instance is accessed in a special way
(e.g. array[2:3,...,::2]
will be equivalent to a
call to array.__getitem__((slice(2, 3, None), Ellipsis,
slice(None, None, 2)))
).
Get a row, a range of rows or a slice from the array.
The set of tokens allowed for the key
is
the same as that for extended slicing in Python (including the
Ellipsis
or ...
token). The
result is an object of the current flavor; its shape depends on
the kind of slice used as key
and the shape of
the array itself.
Example of use:
array1 = array[4] # array1.shape == array.shape[1:] array2 = array[4:1000:2] # len(array2.shape) == len(array.shape) array3 = array[::2, 1:4, :] array4 = array[1, ..., ::2, 1:4, 4:] # general slice selection
Iterate over the rows of the array.
This is equivalent to calling
Array.iterrows()
(see description) with default arguments, i.e. it
iterates over all the rows in the
array.
Example of use:
result = [row[2] for row in array]
Which is equivalent to:
result = [row[2] for row in array.iterrows()]
Set a row, a range of rows or a slice in the array.
It takes different actions depending on the type of the
key
parameter: if it is an integer, the
corresponding array row is set to value
(the
value is broadcast when needed). If the key
is
a slice, the row slice determined by it is set to
value
(as usual, if the slice to be updated
exceeds the actual shape of the array, only the values in the
existing range are updated).
If the value
is a multidimensional
object, then its shape must be compatible with the shape
determined by the key
, otherwise, a
ValueError
will be raised.
Example of use:
a1[0] = 333 # assign an integer to a Integer Array row a2[0] = 'b' # assign a string to a string Array row a3[1:4] = 5 # broadcast 5 to slice 1:4 a4[1:4:2] = 'xXx' # broadcast 'xXx' to slice 1:4:2 # General slice update (a5.shape = (4,3,2,8,5,10). a5[1, ..., ::2, 1:4, 4:] = arange(1728, shape=(4,3,2,4,3,6))
This class represents homogeneous datasets in an HDF5 file.
The difference between a CArray
and a normal
Array
(see Section 4.7), from which it inherits, is that a
CArray
has a chunked layout and, as a consequence,
it supports compression. You can use datasets of this class to easily
save or load arrays to or from disk, with compression support
included.
See below a small example of the use of the
CArray
class. The code is available in
examples/carray1.py
:
import numpy import tables fileName = 'carray1.h5' shape = (200, 300) atom = tables.UInt8Atom() filters = tables.Filters(complevel=5, complib='zlib') h5f = tables.openFile(fileName, 'w') ca = h5f.createCArray(h5f.root, 'carray', atom, shape, filters=filters) # Fill a hyperslab in ``ca``. ca[10:60, 20:70] = numpy.ones((50, 50)) h5f.close() # Re-open and read another hyperslab h5f = tables.openFile(fileName) print h5f print h5f.root.carray[8:12, 18:22] h5f.close()
The output for the previous script is something like:
carray1.h5 (File) '' Last modif.: 'Thu Apr 12 10:15:38 2007' Object Tree: / (RootGroup) '' /carray (CArray(200L, 300L), shuffle, zlib(5)) '' [[0 0 0 0] [0 0 0 0] [0 0 1 1] [0 0 1 1]]
This class represents extendible, homogeneous datasets in an HDF5 file.
The main difference between an EArray
and a
CArray
(see Section 4.8), from which it inherits, is that the
former can be enlarged along one of its dimensions, the
enlargeable dimension. That means that the
Leaf.extdim
attribute (see Section 4.5.1) of any
EArray
instance will always be non-negative.
Multiple enlargeable dimensions might be supported in the
future.
New rows can be added to the end of an enlargeable array by
using the EArray.append()
method (see the section called “append(sequence)”). The array can also be shrunken along its
enlargeable dimension using the EArray.truncate()
method (see the section called “truncate(size)”).
Add a sequence
of data to the end of the
dataset.
The sequence must have the same type as the array; otherwise
a TypeError
is raised. In the same way, the
dimensions of the sequence
must conform to the
shape of the array, that is, all dimensions must match, with the
exception of the enlargeable dimension, which can be of any length
(even 0!). If the shape of the sequence
is
invalid, a ValueError
is raised.
See below a small example of the use of the
EArray
class. The code is available in
examples/earray1.py
:
import tables import numpy fileh = tables.openFile('earray1.h5', mode='w') a = tables.StringAtom(itemsize=8) # Use ``a`` as the object type for the enlargeable array. array_c = fileh.createEArray(fileh.root, 'array_c', a, (0,), "Chars") array_c.append(numpy.array(['a'*2, 'b'*4], dtype='S8')) array_c.append(numpy.array(['a'*6, 'b'*8, 'c'*10], dtype='S8')) # Read the string ``EArray`` we have created on disk. for s in array_c: print 'array_c[%s] => %r' % (array_c.nrow, s) # Close the file. fileh.close()
The output for the previous script is something like:
array_c[0] => 'aa' array_c[1] => 'bbbb' array_c[2] => 'aaaaaa' array_c[3] => 'bbbbbbbb' array_c[4] => 'cccccccc'
This class represents variable length (ragged) arrays in an HDF5 file.
Instances of this class represent array objects in the object
tree with the property that their rows can have a
variable number of homogeneous elements, called
atoms. Like Table
datasets
(see Section 4.6),
variable length arrays can have only one dimension, and the elements
(atoms) of their rows can be fully multidimensional.
VLArray
objects do also support compression.
When reading a range of rows from a VLArray
,
you will always get a Python list of objects of
the current flavor (each of them for a row), which may have different
lengths.
This class provides methods to write or read data to or from
variable length array objects in the file. Note that it also inherits
all the public attributes and methods that Leaf
(see Section 4.5)
already provides.
An Atom
(see Section 4.13.3)
instance representing the type and
shape of the atomic objects to be
saved. You may use a pseudo-atom for
storing a serialized object or variable length string per
row.
The type of data object read from this leaf.
Please note that when reading several rows of
VLArray
data, the flavor only applies to
the components of the returned Python
list, not to the list itself.
On iterators, this is the index of the current row.
Add a sequence
of data to the end of the
dataset.
This method appends the objects in the
sequence
to a single row
in this array. The type and shape of individual objects must be
compliant with the atoms in the array. In the case of serialized
objects and variable length strings, the object or string to
append is itself the sequence
.
Get the enumerated type associated with this array.
If this array is of an enumerated type, the corresponding
Enum
instance (see Section 4.14.3) is
returned. If it is not of an enumerated type, a
TypeError
is raised.
Iterate over the rows of the array.
This method returns an iterator yielding an object of the current flavor for each selected row in the array.
If a range is not supplied, all the
rows in the array are iterated upon —you can also use
the VLArray.__iter__()
(see description) special method for that purpose.
If you only want to iterate over a given range of
rows in the array, you may use the
start
, stop
and
step
parameters, which have the same meaning as
in VLArray.read()
(see description).
Example of use:
for row in vlarray.iterrows(step=4): print '%s[%d]--> %s' % (vlarray.name, vlarray.nrow, row)
Get the next element of the array during an iteration.
The element is returned as a list of objects of the current flavor.
Get data in the array as a list of objects of the current flavor.
Please note that, as the lengths of the different rows are variable, the returned value is a Python list (not an array of the current flavor), with as many entries as specified rows in the range parameters.
The start
, stop
and
step
parameters can be used to select only a
range of rows in the array. Their meanings
are the same as in the built-in range()
Python
function, except that negative values of step
are not allowed yet. Moreover, if only start
is
specified, then stop
will be set to
start+1
. If you do not specify neither
start
nor stop
, then
all the rows in the array are
selected.
The following methods automatically trigger actions when a
VLArray
instance is accessed in a special way
(e.g., vlarray[2:5]
will be equivalent to a call
to vlarray.__getitem__(slice(2, 5, None)
).
Get a row or a range of rows from the array.
If the key
argument is an integer, the
corresponding array row is returned as an object of the current
flavor. If key
is a slice, the range of rows
determined by it is returned as a list of objects of the current
flavor.
Example of use:
a_row = vlarray[4] a_list = vlarray[4:1000:2]
Iterate over the rows of the array.
This is equivalent to calling
VLArray.iterrows()
(see description) with default arguments, i.e. it
iterates over all the rows in the
array.
Example of use:
result = [row for row in vlarray]
Which is equivalent to:
result = [row for row in vlarray.iterrows()]
Set a row in the array.
It takes different actions depending on the type of the
key
parameter: if it is an integer, the
corresponding array row is set to value
. If
the key
is a tuple, the first element refers to
the row to be modified, and the second element to the range within
the row to be updated with the value
(so it can
be an integer or a slice).
The type and shape of the value
must be
compatible with the type and shape determined by the
key
, otherwise, a TypeError
or a ValueError
will be raised.
![]() | Note |
---|---|
When updating the rows of a |
Example of use:
vlarray[0] = vlarray[0] * 2 + 3 vlarray[99, 3:] = arange(96) * 2 + 3 # Negative values for start and stop (but not step) are supported. vlarray[99, -99:-89:2] = vlarray[5] * 2 + 3
See below a small example of the use of the
VLArray
class. The code is available in
examples/vlarray1.py
:
import tables from numpy import * # Create a VLArray: fileh = tables.openFile('vlarray1.h5', mode='w') vlarray = fileh.createVLArray(fileh.root, 'vlarray1', tables.Int32Atom(shape=()), "ragged array of ints", filters=tables.Filters(1)) # Append some (variable length) rows: vlarray.append(array([5, 6])) vlarray.append(array([5, 6, 7])) vlarray.append([5, 6, 9, 8]) # Now, read it through an iterator: print '-->', vlarray.title for x in vlarray: print '%s[%d]--> %s' % (vlarray.name, vlarray.nrow, x) # Now, do the same with native Python strings. vlarray2 = fileh.createVLArray(fileh.root, 'vlarray2', tables.StringAtom(itemsize=2), "ragged array of strings", filters=tables.Filters(1)) vlarray2.flavor = 'python' # Append some (variable length) rows: print '-->', vlarray2.title vlarray2.append(['5', '66']) vlarray2.append(['5', '6', '77']) vlarray2.append(['5', '6', '9', '88']) # Now, read it through an iterator: for x in vlarray2: print '%s[%d]--> %s' % (vlarray2.name, vlarray2.nrow, x) # Close the file. fileh.close()
The output for the previous script is something like:
--> ragged array of ints vlarray1[0]--> [5 6] vlarray1[1]--> [5 6 7] vlarray1[2]--> [5 6 9 8] --> ragged array of strings vlarray2[0]--> ['5', '66'] vlarray2[1]--> ['5', '6', '77'] vlarray2[2]--> ['5', '6', '9', '88']
This class represents datasets not supported by PyTables in an HDF5 file.
When reading a generic HDF5 file (i.e. one that has not been
created with PyTables, but with some other HDF5 library based tool),
chances are that the specific combination of datatypes or dataspaces
in some dataset might not be supported by PyTables yet. In such a
case, this dataset will be mapped into an
UnImplemented
instance and the user will still be
able to access the complete object tree of the generic HDF5 file. The
user will also be able to read and write the
attributes of the dataset, access some of its
metadata, and perform certain hierarchy
manipulation operations like deleting or moving (but not
copying) the node. Of course, the user will not be able to read the
actual data on it.
This is an elegant way to allow users to work with generic HDF5 files despite the fact that some of its datasets are not supported by PyTables. However, if you are really interested in having full access to an unimplemented dataset, please get in contact with the developer team.
This class does not have any public instance variables or
methods, except those inherited from the Leaf
class
(see Section 4.5).
Container for the HDF5 attributes of a Node
(see Section 4.3).
This class provides methods to create new HDF5 node attributes, and to get, rename or delete existing ones.
Like in Group
instances (see Section 4.4),
AttributeSet
instances make use of the
natural naming convention, i.e. you can access
the attributes on disk as if they were normal Python attributes of the
AttributeSet
instance.
This offers the user a very convenient way to access HDF5 node
attributes. However, for this reason and in order not to pollute the
object namespace, one can not assign normal
attributes to AttributeSet
instances, and their
members use names which start by special prefixes as happens with
Group
objects.
The values of most basic types are saved as HDF5 native data
in the HDF5 file. This includes Python bool
,
int
, float
,
complex
and str
(but not
long
nor unicode
) values, as
well as their NumPy scalar versions and
homogeneous NumPy arrays of them. When read,
these values are always loaded as NumPy scalar or array objects, as
needed.
For that reason, attributes in native HDF5 files will be
always mapped into NumPy objects. Specifically, a multidimensional
attribute will be mapped into a multidimensional
ndarray
and a scalar will be mapped into a NumPy
scalar object (for example, a scalar
H5T_NATIVE_LLONG
will be read and returned as a
numpy.int64
scalar).
However, other kinds of values are serialized using
cPickle
, so you only will be able to correctly
retrieve them using a Python-aware HDF5 library. Thus, if you want
to save Python scalar values and make sure you are able to read them
with generic HDF5 tools, you should make use of scalar or
homogeneous array NumPy objects (for example,
numpy.int64(1)
or numpy.array([1, 2, 3],
dtype='int16')
).
One more piece of advice: because of the various potential
difficulties in restoring a Python object stored in an attribute,
you may end up getting a cPickle
string where a
Python object is expected. If this is the case, you may wish to run
cPickle.loads()
on that string to get an idea of
where things went wrong, as shown in this example:
>>> import os, tempfile >>> import tables >>> >>> class MyClass(object): ... foo = 'bar' ... >>> myObject = MyClass() # save object of custom class in HDF5 attr >>> h5fname = tempfile.mktemp(suffix='.h5') >>> h5f = tables.openFile(h5fname, 'w') >>> h5f.root._v_attrs.obj = myObject # store the object >>> print h5f.root._v_attrs.obj.foo # retrieve it bar >>> h5f.close() >>> >>> del MyClass, myObject # delete class of object and reopen file >>> h5f = tables.openFile(h5fname, 'r') >>> print repr(h5f.root._v_attrs.obj) 'ccopy_reg\n_reconstructor... >>> import cPickle # let's unpickle that to see what went wrong >>> cPickle.loads(h5f.root._v_attrs.obj) Traceback (most recent call last): ... AttributeError: 'module' object has no attribute 'MyClass' >>> # So the problem was not in the stored object, ... # but in the *environment* where it was restored. ... h5f.close() >>> os.remove(h5fname)
A list with all attribute names.
A list with system attribute names.
A list with user attribute names.
The Node
instance (see Section 4.3) this
attribute set is associated with.
Note that this class overrides the
__setattr__()
, __getattr__()
and __delattr__()
special methods. This allows
you to read, assign or delete attributes on disk by just using the
next constructs:
leaf.attrs.myattr = 'str attr' # set a string (native support) leaf.attrs.myattr2 = 3 # set an integer (native support) leaf.attrs.myattr3 = [3, (1, 2)] # a generic object (Pickled) attrib = leaf.attrs.myattr # get the attribute ``myattr`` del leaf.attrs.myattr # delete the attribute ``myattr``
If an attribute is set on a target node that already has a
large number of attributes, a PerformanceWarning
will be issued.
Copy attributes to the where
node.
Copies all user and certain system attributes to the given
where
node (a Node
instance
—see Section 4.3),
replacing the existing ones.
Get a list of attribute names.
The attrset
string selects the attribute
set to be used. A 'user'
value returns only
user attributes (this is the default). A 'sys'
value returns only system attributes. Finally,
'all'
returns both system and user
attributes.
In this section a series of classes that are meant to
declare datatypes that are required for primary
PyTables datasets (like Table
or
VLArray
) are described.
Description of the structure of a table or nested column.
This class is designed to be used as an easy, yet meaningful
way to describe the structure of new Table
(see
Section 4.6)
datasets or nested columns through the definition of
derived classes. In order to define such a
class, you must declare it as descendant of
IsDescription
, with as many attributes as columns
you want in your table. The name of each attribute will become the
name of a column, and its value will hold a description of
it.
Ordinary columns can be described using instances of the
Col
class (see Section 4.13.2). Nested columns can be described by
using classes derived from IsDescription
,
instances of it, or name-description dictionaries. Derived classes
can be declared in place (in which case the column takes the name of
the class) or referenced by name.
Nested columns can have a _v_pos
special
attribute which sets the relative position of
the column among sibling columns also having explicit
positions. The pos
constructor
argument of Col
intances is used for the same
purpose. Columns with no explicit position will be placed
afterwards in alphanumeric order.
Once you have created a description object, you can pass it to
the Table
constructor, where all the information
it contains will be used to define the table structure.
See the Section 3.4
for an example on how that works.
These are the special attributes that the user can specify
when declaring an
IsDescription
subclass to complement its
metadata.
Sets the position of a possible nested column description among its sibling columns.
The following attributes are automatically
created when an IsDescription
subclass is declared. Please note that declared columns can no
longer be accessed as normal class variables after its
creation.
Maps the name of each column in the description to its own descriptive object.
Defines a non-nested column.
Col
instances are used as a means to
declare the different properties of a non-nested column in a table
or nested column. Col
classes are descendants of
their equivalent Atom
classes (see Section 4.13.3), but their
instances have an additional _v_pos
attribute
that is used to decide the position of the column inside its parent
table or nested column (see the IsDescription
class in Section 4.13.1 for more information on column positions).
In the same fashion as Atom
, you should use
a particular Col
descendant class whenever you
know the exact type you will need when writing your code. Otherwise,
you may use one of the Col.from_*()
factory
methods.
In addition to the variables that they inherit from the
Atom
class, Col
instances
have the following attributes:
The relative position of this column with regard to its column siblings.
Create a Col
definition from a PyTables
atom
.
An optional position may be specified as the
pos
argument.
Create a Col
definition from a NumPy
dtype
.
Optional default value and position may be specified as
the dflt
and pos
arguments, respectively. Information in the
dtype
not represented in a
Col
is ignored.
Create a Col
definition from a PyTables
kind
.
Optional item size, shape, default value and position may
be specified as the itemsize
,
shape
, dflt
and
pos
arguments, respectively. Bear in mind
that not all columns support a default item size.
Create a Col
definition from a NumPy
scalar type sctype
.
Optional shape, default value and position may be
specified as the shape
,
dflt
and pos
arguments,
respectively. Information in the sctype
not
represented in a Col
is ignored.
Defines the type of atomic cells stored in a dataset.
The meaning of atomic is that individual
elements of a cell can not be extracted directly by indexing (i.e.
__getitem__()
) the dataset; e.g. if a dataset has
shape (2, 2) and its atoms have shape (3,), to get the third element
of the cell at (1, 0) one should use
dataset[1,0][2]
instead of
dataset[1,0,2]
.
The Atom
class is meant to declare the
different properties of the base element (also
known as atom) of CArray
,
EArray
and VLArray
datasets,
although they are also used to describe the base elements of
Array
datasets. Atoms have the property that
their length is always the same. However, you can grow datasets
along the extensible dimension in the case of
EArray
or put a variable number of them on a
VLArray
row. Moreover, the are not
restricted to scalar values, and they can be fully
multidimensional objects.
A series of descendant classes are offered in order to make
the use of these element descriptions easier. You should use a
particular Atom
descendant class whenever you
know the exact type you will need when writing your code. Otherwise,
you may use one of the Atom.from_*()
factory
methods.
The default value of the atom.
If the user does not supply a value for an element
while filling a dataset, this default value will be written
to disk. If the user supplies a scalar value for a
multidimensional atom, this value is automatically
broadcast to all the items in the atom
cell. If dflt
is not supplied, an
appropriate zero value (or null string)
will be chosen by default. Please note that default values
are kept internally as NumPy objects.
The NumPy dtype
that most closely
matches this atom.
Size in bytes of a sigle item in the atom.
Specially useful for atoms of the
string
kind.
The PyTables kind of the atom (a string). For a relation of the data kinds supported by PyTables and more information about them, see Appendix A.
String type to be used in
numpy.rec.array()
.
The shape of the atom (a tuple, ()
for scalar atoms).
Total size in bytes of the atom.
The PyTables type of the atom (a string). For a relation of the data types supported by PyTables and more information about them, see Appendix A.
Atoms can be compared with atoms and other objects for strict (in)equality without having to compare individual attributes:
>>> atom1 = StringAtom(itemsize=10) # same as ``atom2`` >>> atom2 = Atom.from_kind('string', 10) # same as ``atom1`` >>> atom3 = IntAtom() >>> atom1 == 'foo' False >>> atom1 == atom2 True >>> atom2 != atom1 False >>> atom1 == atom3 False >>> atom3 != atom2 True
Get a copy of the atom, possibly overriding some arguments.
Constructor arguments to be overridden must be passed as keyword arguments.
>>> atom1 = StringAtom(itemsize=12) >>> atom2 = atom1.copy() >>> print atom1 StringAtom(itemsize=12, shape=(), dflt='') >>> print atom2 StringAtom(itemsize=12, shape=(), dflt='') >>> atom1 is atom2 False >>> atom3 = atom1.copy(itemsize=100, shape=(2, 2)) >>> print atom3 StringAtom(itemsize=100, shape=(2, 2), dflt='') >>> atom1.copy(foobar=42) Traceback (most recent call last): ... TypeError: __init__() got an unexpected keyword argument 'foobar'
Create an Atom
from a NumPy
dtype
.
An optional default value may be specified as the
dflt
argument. Information in the
dtype
not represented in an
Atom
is ignored.
>>> import numpy >>> Atom.from_dtype(numpy.dtype((numpy.int16, (2, 2)))) Int16Atom(shape=(2, 2), dflt=0) >>> Atom.from_dtype(numpy.dtype('S5'), dflt='hello') StringAtom(itemsize=5, shape=(), dflt='hello') >>> Atom.from_dtype(numpy.dtype('Float64')) Float64Atom(shape=(), dflt=0.0)
Create an Atom
from a PyTables
kind
.
Optional item size, shape and default value may be
specified as the itemsize
,
shape
and dflt
arguments, respectively. Bear in mind that not all atoms support
a default item size.
>>> Atom.from_kind('int', itemsize=2, shape=(2, 2)) Int16Atom(shape=(2, 2), dflt=0) >>> Atom.from_kind('int', shape=(2, 2)) Int32Atom(shape=(2, 2), dflt=0) >>> Atom.from_kind('string', itemsize=5, dflt='hello') StringAtom(itemsize=5, shape=(), dflt='hello') >>> Atom.from_kind('string', dflt='hello') Traceback (most recent call last): ... ValueError: no default item size for kind ``string`` >>> Atom.from_kind('Float') Traceback (most recent call last): ... ValueError: unknown kind: 'Float'
Moreover, some kinds with atypical constructor signatures are not supported; you need to use the proper constructor:
>>> Atom.from_kind('enum') Traceback (most recent call last): ... ValueError: the ``enum`` kind is not supported...
Create an Atom
from a NumPy scalar type
sctype
.
Optional shape and default value may be specified as the
shape
and dflt
arguments, respectively. Information in the
sctype
not represented in an
Atom
is ignored.
>>> import numpy >>> Atom.from_sctype(numpy.int16, shape=(2, 2)) Int16Atom(shape=(2, 2), dflt=0) >>> Atom.from_sctype('S5', dflt='hello') Traceback (most recent call last): ... ValueError: unknown NumPy scalar type: 'S5' >>> Atom.from_sctype('Float64') Float64Atom(shape=(), dflt=0.0)
Create an Atom
from a PyTables
type
.
Optional shape and default value may be specified as the
shape
and dflt
arguments, respectively.
>>> Atom.from_type('bool') BoolAtom(shape=(), dflt=False) >>> Atom.from_type('int16', shape=(2, 2)) Int16Atom(shape=(2, 2), dflt=0) >>> Atom.from_type('string40', dflt='hello') Traceback (most recent call last): ... ValueError: unknown type: 'string40' >>> Atom.from_type('Float64') Traceback (most recent call last): ... ValueError: unknown type: 'Float64'
There are some common arguments for most
Atom
-derived constructors:
For types with a non-fixed size, this sets the size in bytes of individual items in the atom.
Sets the shape of the atom. An integer shape like
2
is equivalent to the tuple
(2,)
.
Sets the default value for the atom.
A relation of the different constructors with their parameters follows.
Defines an atom of type string
.
The item size is the maximum length in characters of strings.
Defines an atom of kind complex
.
Allowed item sizes are 8 (single precision) and 16 (double
precision). This class must be used instead of more concrete
ones to avoid confusions with numarray
-like
precision specifications used in PyTables 1.X.
Defines an atom of time type (time
kind).
There are two distinct supported types of time: a 32 bit integer value and a 64 bit floating point value. Both of them reflect the number of seconds since the Unix epoch. This atom has the property of being stored using the HDF5 time datatypes.
Description of an atom of an enumerated type.
Instances of this class describe the atom type used to
store enumerated values. Those values belong to an enumerated
type, defined by the first argument (enum
) in
the constructor of the atom, which accepts the same kinds of
arguments as the Enum
class (see Section 4.14.3). The
enumerated type is stored in the enum
attribute of the atom.
A default value must be specified as the second argument
(dflt
) in the constructor; it must be the
name (a string) of one of the enumerated
values in the enumerated type. When the atom is created, the
corresponding concrete value is broadcast and stored in the
dflt
attribute (setting different default
values for items in a multidimensional atom is not supported
yet). If the name does not match any value in the enumerated
type, a KeyError
is raised.
Another atom must be specified as the
base
argument in order to determine the base
type used for storing the values of enumerated values in memory
and disk. This storage atom is kept in the
base
attribute of the created atom. As a
shorthand, you may specify a PyTables type instead of the
storage atom, implying that this has a scalar shape.
The storage atom should be able to represent each and
every concrete value in the enumeration. If it is not, a
TypeError
is raised. The default value of the
storage atom is ignored.
The type
attribute of enumerated atoms
is always enum
.
Enumerated atoms also support comparisons with other objects:
>>> enum = ['T0', 'T1', 'T2'] >>> atom1 = EnumAtom(enum, 'T0', 'int8') # same as ``atom2`` >>> atom2 = EnumAtom(enum, 'T0', Int8Atom()) # same as ``atom1`` >>> atom3 = EnumAtom(enum, 'T0', 'int16') >>> atom4 = Int8Atom() >>> atom1 == enum False >>> atom1 == atom2 True >>> atom2 != atom1 False >>> atom1 == atom3 False >>> atom1 == atom4 False >>> atom4 != atom1 True
The next C enum
construction:
enum myEnum { T0, T1, T2 };
would correspond to the following PyTables declaration:
>>> myEnumAtom = EnumAtom(['T0', 'T1', 'T2'], 'T0', 'int32')
Please note the dflt
argument with a
value of 'T0'
. Since the concrete value
matching T0
is unknown right now (we have
not used explicit concrete values), using the name is the only
option left for defining a default value for the atom.
The chosen representation of values for this enumerated
atom uses unsigned 32-bit integers, which surely wastes quite
a lot of memory. Another size could be selected by using the
base
argument (this time with a full-blown
storage atom):
>>> myEnumAtom = EnumAtom(['T0', 'T1', 'T2'], 'T0', UInt8Atom())
You can also define multidimensional arrays for data elements:
>>> myEnumAtom = EnumAtom( ... ['T0', 'T1', 'T2'], 'T0', base='uint32', shape=(3,2))
for 3x2 arrays of uint32
.
Now, there come three special classes,
ObjectAtom
, VLStringAtom
and
VLUnicodeAtom
, that actually do not descend
from Atom
, but which goal is so similar that
they should be described here. Pseudo-atoms can only be used with
VLArray
datasets (see Section 4.10), and
they do not support multidimensional values, nor multiple values
per row.
They can be recognised because they also have
kind
, type
and
shape
attributes, but no
size
, itemsize
or
dflt
ones. Instead, they have a
base
atom which defines the elements used for
storage.
See examples/vlarray1.py
and
examples/vlarray2.py
for further examples on
VLArray
datasets, including object
serialization and string management.
Defines an atom of type object
.
This class is meant to fit any kind
of Python object in a row of a VLArray
dataset by using cPickle
behind the
scenes. Due to the fact that you can not foresee how long will
be the output of the cPickle
serialization
(i.e. the atom already has a variable
length), you can only fit one object per
row. However, you can still group several objects in
a single tuple or list and pass it to the
VLArray.append()
method (see description).
Object atoms do not accept parameters and they cause the
reads of rows to always return Python objects. You can regard
object
atoms as an easy way to save an
arbitrary number of generic Python objects in a
VLArray
dataset.
Defines an atom of type
vlstring
.
This class describes a row of the
VLArray
class, rather than an atom. It
differs from the StringAtom
class in that you
can only add one instance of it to one specific
row, i.e. the VLArray.append()
method (see description) only accepts one
object when the base atom is of this type.
Like StringAtom
, this class does not
make assumptions on the encoding of the string, and raw bytes
are stored as is. Unicode strings are supported as long as no
character is out of the ASCII set; otherwise, you will need to
explicitly convert them to strings before
you can save them. For full Unicode support, using
VLUnicodeAtom
(see description) is recommended.
Variable-length string atoms do not accept parameters and
they cause the reads of rows to always return Python strings.
You can regard vlstring
atoms as an easy way
to save generic variable length strings.
Defines an atom of type
vlunicode
.
This class describes a row of the
VLArray
class, rather than an atom. It is
very similar to VLStringAtom
(see description), but it stores Unicode strings (using
32-bit characters a la UCS-4, so all strings of the same length
also take up the same space).
This class does not make assumptions on the encoding of plain input strings. Plain strings are supported as long as no character is out of the ASCII set; otherwise, you will need to explicitly convert them to Unicode before you can save them.
Variable-length Unicode atoms do not accept parameters and
they cause the reads of rows to always return Python Unicode
strings. You can regard vlunicode
atoms as
an easy way to save variable length Unicode strings.
This section describes some classes that do not fit in any other section and that mainly serve for ancillary purposes.
Container for filter properties.
This class is meant to serve as a container that keeps
information about the filter properties associated with the chunked
leaves, that is Table
, CArray
,
EArray
and VLArray
.
Instances of this class can be directly compared for equality.
Whether the Fletcher32 filter is active or not.
The compression level (0 disables compression).
The compression filter used (irrelevant when compression is not enabled).
Whether the Shuffle filter is active or not.
This is a small example on using the
Filters
class:
import numpy from tables import * fileh = openFile('test5.h5', mode='w') atom = Float32Atom() filters = Filters(complevel=1, complib='lzo', fletcher32=True) arr = fileh.createEArray(fileh.root, 'earray', atom, (0,2), "A growable array", filters=filters) # Append several rows in only one call arr.append(numpy.array([[1., 2.], [2., 3.], [3., 4.]], dtype=numpy.float32)) # Print information on that enlargeable array print "Result Array:" print repr(arr) fileh.close()
This enforces the use of the LZO library, a compression level of 1 and a Fletcher32 checksum filter as well. See the output of this example:
Result Array: /earray (EArray(3L, 2), fletcher32, shuffle, lzo(1)) 'A growable array' type = float32 shape = (3L, 2) itemsize = 4 nrows = 3 extdim = 0 flavor = 'numpy' byteorder = 'little'
Create a new Filters
instance.
Specifies a compression level for data. The allowed range is 0-9. A value of 0 (the default) disables compression.
Specifies the compression library to be used. Right
now, 'zlib' (the default), 'lzo' and 'bzip2' are supported.
Specifying a compression library which is not available in
the system issues a FiltersWarning
and
sets the library to the default one.
See Section 5.3 for some advice on which library is better suited to your needs.
Whether or not to use the Shuffle filter in the HDF5 library. This is normally used to improve the compression ratio. A false value disables shuffling and a true one enables it. The default value depends on whether compression is enabled or not; if compression is enabled, shuffling defaults to be enabled, else shuffling is disabled. Shuffling can only be used when compression is enabled.
Whether or not to use the Fletcher32 filter in the HDF5 library. This is used to add a checksum on each data chunk. A false value (the default) disables the checksum.
Get a copy of the filters, possibly overriding some arguments.
Constructor arguments to be overridden must be passed as keyword arguments.
Using this method is recommended over replacing the attributes of an instance, since instances of this class may become immutable in the future.
>>> filters1 = Filters() >>> filters2 = filters1.copy() >>> filters1 == filters2 True >>> filters1 is filters2 False >>> filters3 = filters1.copy(complevel=1) Traceback (most recent call last): ... ValueError: compression library ``None`` is not supported... >>> filters3 = filters1.copy(complevel=1, complib='zlib') >>> print filters1 Filters(complevel=0, shuffle=False, fletcher32=False) >>> print filters3 Filters(complevel=1, complib='zlib', shuffle=False, fletcher32=False) >>> filters1.copy(foobar=42) Traceback (most recent call last): ... TypeError: __init__() got an unexpected keyword argument 'foobar'
Represents the index of a column in a table.
This class is used to keep the indexing information for
columns in a Table
dataset (see Section 4.6). It is
actually a descendant of the Group
class (see
Section 4.4), with
some added functionality. An Index
is always
associated with one and only one column in the table.
This class is mainly intended for internal use, but some of its attributes may be interesting for the programmer.
The Column
(see Section 4.6.9)
instance for the indexed column.
Whether the index is dirty or not.
Dirty indexes are out of sync with column data, so they exist but they are not usable.
Filter properties for this index —see
Filters
in Section 4.14.1.
The number of currently indexed row for this column.
Enumerated type.
Each instance of this class represents an enumerated type. The values of the type must be declared exhaustively and named with strings, and they might be given explicit concrete values, though this is not compulsory. Once the type is defined, it can not be modified.
There are three ways of defining an enumerated type. Each one
of them corresponds to the type of the only argument in the
constructor of Enum
:
Sequence of names: each enumerated value is named using a string, and its order is determined by its position in the sequence; the concrete value is assigned automatically:
>>> boolEnum = Enum(['True', 'False'])
Mapping of names: each enumerated
value is named by a string and given an explicit concrete value.
All of the concrete values must be different, or a
ValueError
will be raised.
>>> priority = Enum({'red': 20, 'orange': 10, 'green': 0}) >>> colors = Enum({'red': 1, 'blue': 1}) Traceback (most recent call last): ... ValueError: enumerated values contain duplicate concrete values: 1
Enumerated type: in that case, a copy of the original enumerated type is created. Both enumerated types are considered equal.
>>> prio2 = Enum(priority) >>> priority == prio2 True
Please note that names starting with _
are
not allowed, since they are reserved for internal usage:
>>> prio2 = Enum(['_xx']) Traceback (most recent call last): ... ValueError: name of enumerated value can not start with ``_``: '_xx'
The concrete value of an enumerated value is obtained by
getting its name as an attribute of the Enum
instance (see __getattr__()
) or as an item (see
__getitem__()
). This allows comparisons between
enumerated values and assigning them to ordinary Python
variables:
>>> redv = priority.red >>> redv == priority['red'] True >>> redv > priority.green True >>> priority.red == priority.orange False
The name of the enumerated value corresponding to a concrete
value can also be obtained by using the
__call__()
method of the enumerated type. In this
way you get the symbolic name to use it later with
__getitem__()
:
>>> priority(redv) 'red' >>> priority.red == priority[priority(priority.red)] True
(If you ask, the __getitem__()
method is
not used for this purpose to avoid ambiguity in the case of using
strings as concrete values.)
Get the name of the enumerated value with that concrete
value
.
If there is no value with that concrete value in the
enumeration and a second argument is given as a
default
, this is returned. Else, a
ValueError
is raised.
This method can be used for checking that a concrete value belongs to the set of concrete values in an enumerated type.
Is there an enumerated value with that
name
in the type?
If the enumerated type has an enumerated value with that
name
, True
is returned.
Otherwise, False
is returned. The
name
must be a string.
This method does not check for
concrete values matching a value in an enumerated type. For
that, please use the Enum.__call__()
method (see description).
Is the other
enumerated type equivalent
to this one?
Two enumerated types are equivalent if they have exactly the same enumerated values (i.e. with the same names and concrete values).
Get the concrete value of the enumerated value with that
name
.
The name
of the enumerated value must
be a string. If there is no value with that
name
in the enumeration, an
AttributeError
is raised.
Get the concrete value of the enumerated value with that
name
.
The name
of the enumerated value must
be a string. If there is no value with that
name
in the enumeration, a
KeyError
is raised.
Iterate over the enumerated values.
Enumerated values are returned as (name,
value)
pairs in no particular
order.