tenFrame Reference Documentation#

tenFrame: 1010data queries with a Pandas-like interface#

The motivation behind this library is to provide users with a way to manipulate 1010data queries in a more familiar (pandas-like) way. The operations and changes are converted into 1010data XML code.

The main classes in tenFrame are TenFrame and TenSeries.

A TenFrame represents a single 1010data query. It also tries to present to the user an interface similar to a Pandas DataFrame.

Just as subscripting a DataFrame in Pandas may give you a Series (usually), subscripting a TenFrame yields a TenSeries. Internally, a TenSeries is not much more than a reference to its parent TenFrame and the name of the column it represents, but it is equipped with methods to allow it to behave something like Pandas Series. (As with DataFrames, though, a TenFrame may also be subscripted to select rows instead of columns, if the subscript is a boolean column, etc.)

TenFrame really accomplishes two useful tasks: (1) it converts Pandas-style operations (simple math operators, aggregation, etc.) into 1010data XML macro-code, as used by 1010data Queries. (2) Using py1010, it runs these queries and provides access to the results, again with a Pandas-like interface.

It is important to note that the first of these does not necessarily depend on the second for its usefulness. It can be convenient to construct a query using familiar Pandas operations even when not actually connected to any 1010data database. Perhaps you want to extract the query thus constructed (using TenFrame.extractXML()) and paste it into some other 1010data access method (a QuickApp, for example.) Therefore, it is possible to use tenFrame in “offline mode”. An “offline” TenFrame will never attempt to access the 1010data server to retrieve information or even metadata about the columns. Attempting to access online data with an offline TenFrame will result in return values of NotImplemented or exceptions being raised. In fact, the way you make an “offline” TenFrame is by passing in None in place of its session in the constructor, or setting the frame.session instance variable to None. Since the TenFrame has no valid py1010 Session, it obviously cannot contact any 1010data server. While it is possible to switch between online and offline mode by changing the session attribute of a TenFrame, this is not recommended, and TenFrame was not written to support such alternation. Moreover, it is probably not a thing you would commonly need to do anyway.

Because you cannot determine certain things about a query (or TenFrame) without actually running it, an offline TenFrame has some restrictions compared to an online one. The following is not intended to be an exhaustive list.

  • Obviously, all operations involving fetching data from the results of the query are not available.

  • An offline TenSeries cannot, in general, know what columns it contains. So the .cols member of an offline TenSeries will always return an empty tuple.

  • This limitation also causes issues when doing aggregations on a TenFrame if you do not explicitly specify which columns are to be aggregated.

Module-level Functions#

These are mostly top-level aliases to class methods, to mimic what pandas-users are used to.

tenFrame.melt(frame, *args, **kwargs)#

Call melt on a TenFrame.

tf.melt(frame, *args, **kwargs) is the same as frame.melt(*args, **kwargs). (q.v.)

tenFrame.concat(frames, *args, **kwargs)#

Concatenate a sequence of TenFrames.

Performs the TenFrame.concat() method on the sequence of frames given, passing the args and kwargs to each invocation.

The first element in the sequence must be a TenFrame; the following elements may be the names of tables as strings.

tenFrame.merge(frame, *args, **kwargs)#

Merge a TenFrame with another TenFrame.

tf.merge(frame1, frame2, *args, **kwargs) is the same as frame1.merge(frame2, *args, **kwargs) (q.v.).

tenFrame.wide_to_long(frame, *args, **kwargs)#

Unpivot a TenFrame from wide to long format.

wide_to_long(frame, *args, **kwargs) is the same as frame.wide_to_long(*args, **kwargs) (q.v.).

tenFrame.pivot_table(frame, *args, **kwargs)#

Create a pivot table

pivot_table(frame, *args, **kwargs) is the same as frame.pivot_table(*args, **kwargs) (q.v.).

tenFrame.qcut(x, q, *args, **kwargs)#

Run the qcut method on column x.

Same as x.qcut(q, *args, **kwargs), for a TenSeries x. See TenSeries.qcut().

tenFrame.cut(x, bins, *args, **kwargs)#

Run the cut method on column x.

Same as x.cut(bins, *args, **kwargs) for a TenSeries x. See TenSeries.cut().

There are a few other convenience functions at the top level.

tenFrame.directory(session, dirname, to_df=True, **kwargs)#

Get a DataFrame with the contents of a 1010data directory.

Fetch the directory listing of a directory in the 1010data object tree (uses the <directory> op). Contains all kinds of metadata.

Parameters:
  • session (py1010.Session) – A py1010 Session object to run the query through.

  • dirname (str) – The name (path) of the folder to get the directory of.

  • to_df (bool) – If True (default) and Pandas is available, convert the result to a Pandas DataFrame. Otherwise, returns a TenFrame.

Return type:

pd.DataFrame or TenFrame, depending on to_df.

tenFrame.islive(session)#

Test if a session is “live”

Runs a simple query on a session, to test if it is logged in and not yet timed out or otherwise unresponsive.

Parameters:

session (py1010.Session) – py1010 Session object to test.

Returns:

True if the session appears to be live.

Return type:

bool

tenFrame.select(condlist, choicelist, default=0)#

Choose elements from choicelist depending on conditions.

Returns a string with the 1010data expression:

if(COND1; CHOICE1; COND2; CHOICE2; ... ;DEFAULT)

which in 1010data evaluates to CHOICE1 if COND1 is true (1), otherwise to CHOICE2 if COND2 is true, and so on. See the 1010data documentation on the “if” function. The conditions should be TenSeries with fairly simple expressions, and they should all be based on the same underlying TenFrame.

>>> base['leftxjohny'] = tf.select([base.bats=='r', base.firstname=='john'],
...                                ["X", "Y"], "Z")

will set the leftxjohny column to ‘X’ if the bats column is ‘r’, otherwise it will set it to ‘Y’ if the firstname column is ‘john’, otherwise sets it to ‘Z’.

Parameters:
  • condlist (list) – List of conditions. These should be TenSeries, or expressions returning TenSeries, e.g., frame.weight > 7 or frame.name == "john".

  • choicelist (list) – List of resulting choices. Must be the same length as condlist. The choice corresponding to the first condition that is true is returned, for each row.

  • default – Default value, to be returned if none of the choices are true.

Returns:

A string (actually a tenFrame.Estr object) of the form above,which can be assigned into a TenFrame column, etc.

Return type:

Estr

tenFrame.xml2ops(xml)#

Convert an XML string to a list of Ops.

You can use this function to convert XML code you have already written to a TenFrame, by doing frame = tf.TenFrame(session, ops=xml2ops(xmlstring)). You may want to run removeblanks() on the list of ops to remove blank elements, but note that removeblanks() operates destructively and does not return a value.

Parameters:

xml (str) – A string, XML code.

Returns:

A list of Op objects.

tenFrame.removeblanks(oplist)#

Remove whitespace-only #text ops from a list of ops.

Does not return a value; operates on its input destructively.

Parameters:

oplist (List[Op]) – A list of Op objects.

Returns:

None. Mutates the input list instead.

tenFrame.oneorlist(arg, sep=',', otherdelims=' ', join=True, none=True, quotes=False, *, frame=None)#

Normalize argument(s) to a comma-separated string.

Utility function for handling an argument that might be a singleton string/TenSeries or a list, or a comma-or-space-separated string. Return a comma-separated string, unless join is False; return None if input is None, unless none is False, in which case return the empty string.

Parameters:
  • arg – Element or list of elements.

  • sep – Separator to interleave between the elements, if join=True. Default ","

  • otherdelims – Ignored

  • join – If False, return a list of the elements (even if only one was given). If True, join them together and return the string. Default True.

  • none – Only relevant when arg is None. If none=True, then return None. If False, then consider arg to be the empty list.

  • quotes – Passed as the quotes parameter to strorname() for each parameter.

  • frame – (keyword-only) Passed as the frame parameter to strorname().

Returns:

A string (if join=True) or a list (if join=False).

tenFrame.strorname(c, quotes=False, *, frame=None)#

Convert TenSeries into names, if needed.

Parameters:
  • c – Object to be converted. Expected to be a TenSeries, a str, or a float.

  • quotes – If True and c is a str (but not an Estr object), surround it with single quotes and escape all single quotes inside it.

  • frame – (keyword-only) a TenFrame, in order to convert the title of a column with its name. Optional.

Return type:

str

Classes#

class tenFrame.TenFrame(session, table=None, ops=None, data=None, meta=None, load=None, imports=None, imports_once=None, *, islib=False)#

A 1010data Query, represented in a DataFrame-like form.

A TenFrame is an object that behaves (somewhat) like a DataFrame, but actually represents a 1010data query, so it can involve very large amounts of data.

Examples:

>>> session = py1010.Session(url, user, password, py1010.KILL)
>>> tf = TenFrame(session, table="default.test.solar")
>>> tf["strangestat"] = tf["rkm"] + tf["vol"]
>>> ...

Some TenFrame-specific instance variables. You probably should not to set any of these:

Variables:
  • query_ – The py1010 Query object of this TenFrame (if any)

  • session – The py1010 Session object of this TenFrame (if any)

  • dirty – Does the Query for this TenFrame need to be regenerated (e.g., if an operation has been added)?

  • ops (list) – List of Op objects representing the 1010data macro-code operations of this TenFrame.

Some instance variable which serve as interfaces partcular features of a TenFrame:

Variables:
  • lib (TFLib) – Interface to library-related methods, implemented by the TFLib class.

  • iloc (py1010.RowIterator) – The py1010.Query.rows attribute of the underlying py1010 query. Of type py1010.RowIterator, you can iterate over it or subscript it to get an individual row. The rows are returned as tuples.

  • plot (Plotter) – Interface to plotting routines.

Constructor for TenFrame.

Parameters:
  • session (py1010.Session) – A py1010 Session object, or None.

  • table (str) – The name of a 1010data table.

  • ops (list(Op)) – A list of operations (Op objects).

  • data – Table data, as a dictionary of lists or a Pandas DataFrame.

  • meta (list(str)) – Run the meta() method on the newly-constructed TenFrame, passing this list (or string) as its argument(s). This is used to prepend a special <meta> op to the top of the query.

  • load (str) – The name of a 1010data QuickQuery, to be loaded in as a query.

  • imports – A library name or list of library names, to be imported with <import> ops at the start of the query.

  • imports_once – Same as imports, except that the libraries listed here are imported with once="1" in the operator.

If session is a py1010 Session object, frame works in “online” mode; otherwise offline. You must supply at least one of table, ops, data, imports, and imports_once, but not more than one of table, ops, and data. The data can be a Pandas dataframe or a dictionary of lists, which get converted into a table op containing the data at the start of the query.

In order to create a parametric TenFrame (saveable as a library) whose base table is a parameter, you must use:

frame = tf.TenFrame(session, tf.that.Param("table", "sales.info.tablename"))
class Plotter(frame)#

Implements the .plot element of a TenFrame.

Mainly interfaces with Pandas’ DataFrame.plot() method, or with matplotlib.pyplot. Most kinds of plots used by Pandas are supported.

Use frame.plot(*args, **kwargs) just as you would in pandas. Or you can say frame.plot.line(*args, **kwargs), which is the same as frame.plot(*args, kind="line", **kwargs), and so on for any kind of plot (also just like in pandas.)

The following plot kinds are supported by downloading the TenFrame into a DataFrame and just calling the pandas plot() method on it. The keyword parameter maxrows controls the size of the DataFrame that may be created this way. If the number of rows in the TenFrame is greater than maxrows, then an OverflowError is raised. The default value of maxrows is 10000.

  • “line”

  • “bar”

  • “barh”

  • “area”

  • “scatter”

  • “pie”

So:

frame.plot(kind="line", *args, **kwargs)

is essentially the same as:

frame.to_df(maxrows).plot(kind="line", *args, **kwargs)

and is provided simply as a shortcut, handled by the _straightplot() method. If kind= is not specified, it defaults to "hist" (note: in pandas, the default is "line").

Other kinds of plots entail some server-side calculation to reduce the amount of data to manageable levels. Not all are supported, yet.

  • kind="hist" (histogram)

    Supported. A histogram that involves breaks (either with groupby() or by the by= keyword, which are treated the same) will show each break in its own graph. Choose the columns to be graphed either by the column= keyword parameter or by restricting the frame to the columns desired (frame[['col1', 'col2']].plot(kind="hist"). The number of bins used can be specified with the bins= keyword parameter, and defaults to 10.

    If, after computing the histogram, the data still has more than maxrows rows, an OverflowError is raised. Default is 10000. An OverflowError is also raised if number of graphs to be drawn (number of groups) exceeds the value of the maxplots parameter. Default is 30. Other keyword arguments are passed along to matplotlib.pyplot.hist().

    Handled by the _histo() method, which see for further details.

  • kind="box" (boxplot)

    Supported. Grouping can be indicated by groupby() or by the group= keyword. Each group gets its own plot, with multiple columns shown on the same plot for each. Note that although the showfliers=True is the default in Pandas, it is not available in tenFrame. This is because of the potential of huge numbers of outliers and the added complexity of passing them back to the client.

    Box plots raise the same OverflowErrors as histograms, and accept maxrows= and maxplots= parameters in the same way.

    Handled by the _boxplot() method, which see for further details.

  • kind="kde" or kind="density" (kernel density estimator)

    Partly supported, and still experimental. You can only run a kde plot on a single column, and grouping is not supported (groupby() is ignored).

    The kernel density is computed using the scipy.stats.gaussian_kde() function from the scipy library, which must be installed. First, frequency data is computed, using the same machinery as is used for histograms, and then gaussian_kde is called on the resulting column.

    The high and low bounds of the graph can be specified by the bounds=(lo, hi) keyword.

    Handled by the _kde() method, which see for further details.

  • kind="hist2d" (2-dimensional histogram)

    Not exactly the same as pandas kind="hexbin", but does display mostly the same information. As with 1-dimensional histograms, can be grouped with groupby() or with the by= keyword; groups are each shown in their own graph. Columns to plot are selected with columns=[x_colname, y_colname] (required unless there are only two columns in the table apart from grouping columns.) Number of bins can be set for each axis, as bins=[xbins, ybins]. maxrows= and maxplots= and OverflowErrors are the same as with 1-dimensional histograms.

    Handled by the _histo2d() method, which see for further details.

_boxplot(*args, **kwargs)#

Compute and plot a box plot.

Box plots summarize data from the whole dataset, and thus must be computed on the 1010data platform (since the data is presumed to be too large to work with locally.) A copy of the current frame is made, and then extended with operations to perform aggregations to compute the minumum, maximum, lower and upper quartiles, and median and mean of all the columns being plotted. This is then downloaded as a pandas DataFrame and further processing is done to compute the placement of the whiskers and prepare the data for plotting.

The parameters listed below must all be passed by keyword, and are removed by tenFrame processing. All other keyword parameters are passed along to the the bxp() method on the Axes object in the matplotlib.pyplot module.

Parameters:
  • group – Used for grouping. Should be a column-name or list of column-names. Overrides underlying grouping specified by .groupby(), if such is present.

  • maxrows (int) – As in _straightplot(), specifies the maximum number of rows allowed in the data to be downloaded (after binning and counting) before signalling an OverflowError. Defaults to 10000.

  • maxplots (int) – If the number of plots to be drawn exceeds this number, raise an OverflowError. Defaults to 30.

  • figsize (tuple) – Passed as figsize= parameter to subplots() function of matplotlib.pyplot. Defaults to the default figure.figsize from matplotlib.pyplot, multiplied in the y-direction by the number of plots to be drawn.

Returns:

A plot, if there is no grouping; otherwise, a list of plots.

_histo(*args, **kwargs)#

Compute and plot a histogram.

Frequency counting for a histogram must be done on the 1010data platform, since the data is presumably too large to work with locally.

A copy of this frame is made (restricted to the columns named in the columns= keyword parameter, if present), and rows containing NA are dropped (with TenFrame.dropna()). The overall minimum and maximum for the rows under consideration are found using aggregations and 1010data row-functions r_lo() and r_hi(). (Note: in pandas, the min and max values are apparently based on all the columns, not just the ones that you’re actually analyzing.) Further operations are added to the query to perform binning and counting and other adjustments, eventually resulting in a TenFrame with columns for the column-name, the bin, and the frequency (plus the group(s), if relevant). This is then downloaded as a Pandas DataFrame and the plot is based on it.

The parameters listed below must all be passed by keyword, and are removed by tenFrame processing. All other keyword parameters are passed along to the the hist() method on the Axes object in the matplotlib.pyplot module.

Parameters:
  • bins (int) – Number of bins in the histogram. Defaults to 10.

  • column – Name of column to histogram, or list of names. Defaults to all (numeric) columns.

  • by – Used for grouping. Should be a column-name or list of column-names. Overrides underlying grouping specified by .groupby(), if such is present.

  • maxrows (int) – As in _straightplot(), specifies the maximum number of rows allowed in the data to be downloaded (after binning and counting) before signalling an OverflowError. Defaults to 10000.

  • maxplots (int) – If the number of plots to be drawn exceeds this number, raise an OverflowError. Defaults to 30.

  • figsize (tuple) – Passed as figsize= parameter to subplots() function of matplotlib.pyplot. Defaults to the default figure.figsize from matplotlib.pyplot, multiplied in the y-direction by the number of plots to be drawn.

Returns:

A list of tuples of DataFrames and plots.

_histo2d(*args, **kwargs)#

Compute and plot a 2D histogram.

Frequency counting for a histogram must be done on the 1010data platform, since the data is presumably too large to work with locally.

The logic used is similar to what is done for 1-dimensional histograms, except that bins are computed separately for each dimension and the final tallying is done grouping on both sets of bins, to get a frequency table for all (occurring) combinations of bin-edge values. This is then downloaded as a Pandas DataFrame and used to plot the graph using matplotlib.

The parameters listed below must all be passed by keyword, and are removed by tenFrame processing. All other keyword parameters are passed along to the the hist2d() method on the Axes object in the matplotlib.pyplot module.

Parameters:
  • bins – Number of bins in the histogram. Can be given as a pair of integers [xbins, ybins]; if only one integer is given it is used for both dimensions. Defaults to [10,10].

  • columns – Names of columns to plot, x and y. Should be a list containing exactly two columns. Can be omitted if the table has exactly two columns (apart from the grouping columns).

  • by – Used for grouping. Should be a column-name or list of column-names. Overrides underlying grouping specified by .groupby(), if such is present. Groups are plotted on separate graphs.

  • maxrows (int) – As in _straightplot(), specifies the maximum number of rows allowed in the data to be downloaded (after binning and counting) before signalling an OverflowError. Defaults to 10000.

  • maxplots (int) – If the number of plots to be drawn exceeds this number, raise an OverflowError. Defaults to 30.

  • figsize (tuple) – Passed as figsize= parameter to subplots() function of matplotlib.pyplot. Defaults to the default figure.figsize from matplotlib.pyplot, multiplied in the y-direction by the number of plots to be drawn.

Returns:

A list of tuples of DataFrames and plots.

_kde(*args, **kwargs)#

Compute and plot a Kernel Density Estimator.

Note

This method is still experimental and lacking in some important features.

Kernel density estimation plots are currently limited to only a single column, and no grouping is performed.

The _histo() method is used to compute frequency data, which is then fed into the gaussian_kde() function of the scipy.stats module (which must be available or an exception will be raised.)

The parameters listed below must all be passed by keyword, and are removed by tenFrame processing.

Parameters:
  • bounds (tuple(float,float)) – The lower and upper bounds of the kernel plot.

  • numpoints (int) – Number of points on which to compute and plot the kernel density. Defaults to 1000.

  • bins – Passed on to _histo().

_straightplot(*args, **kwargs)#

Pass a plot straight through to be done by pandas.

Many plot types are basically point-for-point, and there isn’t anything that can be done on the server side to help. These get passed straight through to pandas, by running to_df() on the TenFrame and then running the .plot() method on the DataFrame.

Arguments and keyword arguments are passed straight through to the pandas DataFrame.plot() method, except as noted here. there are a few TenFrame-specific parameters that are passed in **kwargs, which will get popped out before passing them to pandas.

maxrows

If the data is to be pulled down is more than this many rows, raise an OverflowError (which may be caught by the caller in order to try an approximation instead.) Default is 10000.

Returns:

Whatever Pandas DataFrame.plot() returns.

class TFLib(frame)#

Access information about the TenFrame’s library.

This class implements the .lib member of a TenFrame.

delfun(name)#

Remove a function from the library.

Removes any <def_ufun> or <def_gfun> ops which have specified name as their “name” parameter, and also any <resource> ops whose “name” parameter starts with the specified name followed by and underscore. It is not an error to try to remove a function which is not in the library.

Note that this has no effect on the functions actually defined locally or remotely, and only really matters when it comes to saving the library.

To remove (or redefine) remotely-defined functions, you need to do a session.clearCache()

Parameters:

name (str) – The name of the function to delete.

getfun(name)#

Get the code for a function from the library.

Returns a string containing the source code of a function defined in this library.

This may not work for libraries/queries not created by tenFrame.

Parameters:

name (str) – The name of the function to get.

gfuns(sigs=False)#

A list of the names of the g_funs defined here.

Parameters:

sigs (bool) – If True, include each function’s arguments in parentheses after its name, e.g., ['g_mon(x)', 'g_dya(x,y)'], etc. Default False.

ufuns(sigs=False)#

A list of the names of u_funs defined here.

Parameters:

sigs (bool) – If True, include each function’s arguments in parentheses after its name, e.g., ['mon(x)', 'dya(x,y)'], etc. Default False.

Param(name, value='', *, quoted=None, separator=',')#

Create a Parameter for this query.

Parameters:
  • name (str) – Name of this parameter. Should be a valid 1010data identifier.

  • value – Default value for this query. Will be used when running this query from this TenFrame, and used as default values for future runs. Can be numeric, string, TenSeries (column) or list.

  • quoted – (keyword-only) Whether or not this parameter should be quoted in the macrocode, using the qv() function. Default is None, meaning it will be quoted if the value given here is a string (or a list with a string as its first element) , but not if it is numeric or a TenSeries (or a list with a numeric or TenSeries as its first element. An empty list is presumed to start with a string). You may want to use this when making parameters for table names or column names.

  • separator (str) – (keyword-only) When a parameter has a list value, and is quoted (as defined above), it is substituted into the XML code by stringing its elements together into a string and using the str_to_list() function in the macrocode to split it back up into its parts. The original string is strung together using a delimiter, which is then supplied to str_to_list() to split it back up. The default delimiter is ',', but if you expect your list parameter to have strings with commas in them, you might need to change this. Do not use the single quote (') as your separator!

Blocks and Block Parameters#

1010data macro language allows the creation of blocks of code that act like a sort of subroutine, which can be passed parameters like a function. See the 1010data documentation on the subject for information on the details. TenFrame provides a concept of “parameters” to help you develop these blocks by using TenFrames.

Note

Unless you’re developing 1010data applications or something, you likely won’t need this feature. It’s probably much simpler to do parameter substitution while constructing the TenFrame in Python.

Essentially, whenever a TenFrame has any parameters associated with it, then when it is saved, instead of being saved as an ordinary QuickQuery, it is saved wrapped in a <block> element which inside a <library> element. In another query, even in another session, you can then import the block’s library using the imports or imports_once parameters of the TenFrame constructor, and then called using the call() method, and passed in parameters to replace the default values you may have used when writing the block. The name of the subroutine block can be set in the blockname member of the TenFrame, and defaults to "block".

When a TenFrame with parameters is run, the XML code that makes up its query (except for its <base> element) is wrapped in a <block> element in the TenFrame’s library, and then a <call> element is added, so it acts the same as it normally would, with the side-effect of also defining the block, so you can use call() to call it from other TenFrames within this session even if you don’t save it.

For example:

fr = tf.TenFrame(session, "pub.demo.baseball.master")
fname = fr.Param("fname", "john")
fr['hasname'] = fr.firstname == fname
fr.blockname = 'flagbyfirst'

would result this XML:

<macro>
    <library>
        <block name="flagbyfirst" fname="john">
            <willbe name="hasname" value="(firstname) = ({qv(@fname)})"/>
        </block>
    </library>
<base table="pub.demo.baseball.master"/>
<call block="flagbyfirst" fname="john"/>
</macro>

which we might save with:

fr.save("uploads.flagbyfirst")

Then later on, even in another session, you can create a TenFrame that imports this library and use it, passing a possibly different value for the fname parameter:

fr2 = tf.TenFrame(session, "pub.demo.baseball.master", imports="uploads.flagbyfirst")
fr2 = fr2.call("flagbyfirst", fname="james")
fr2 = fr2[fr2.hasname]

This results in XML:

<macro>
    <base table="pub.demo.baseball.master"/>
    <import path="uploads.flagbyfirst"/>
    <call block="flagbyfirst" fname="james"/>
    <sel value="hasname"/>
</macro>

Table as Parameter#

Making the actual base table of a TenFrame a parameter is a common case, but it is exceptional, in that you don’t have the TenFrame object available at construction time to make it an argument of the constructor. In this case, you should use the Param() method of the that object:

fr = tf.TenFrame(session, tf.that.Param("table", "default.test.solar"))

Other special-case treatments apply to this usage as well, viz., that the <base> element will remain inside the block, and the parameter will not get extra quotes even though it is a string.

__bool__()#

Always returns True.

__contains__(key)#

Is the key among the names of the columns in this TenFrame?

__getitem__(columnName)#

Get a column, slice, or selection of a TenFrame.

Returns the TenSeries object corresponding to the named column in the query (if present). Note that accessing a column will always succeed, even if the column is not present in the TenFrame.

If running in online mode and the columns are actually known, can also index by column number.

Subscripting by a list of columns will return a TenFrame which is restricted to the named columns, the others being hidden.

Subscripting can also be used to select rows, as it is in pandas. Subscripting by a slice (between integers) will select the rows in the slice (string slices are not supported.) You can use a boolean TenSeries to select (or an expression that evaluates to one.)

If the subscript is a Callable (function), then it is added to a copy of this TenFrame as a server-side function and the function is used to select rows (as a boolean column), and the copy of the TenFrame is returned. The function should take a single parameter, which will be a pandas DataFrame containing one segment of the table, and it should return a boolean (or integer) numpy array that is the same length. See Server-Side Functions and Type Inference for more information.

__iter__()#

Iterate through column-names OR TenFrames of breaks.

If breaks are not set on this TenFrame (e.g., it isn’t from a groupby() expression), returns an iterator over the names of the columns in the underlying query, as strings.

If this TenFrame has its breaks attribute set (usually, this would be when it is created by groupby()), then returns an iterator which yields tuples of the form (b, fr), where b is a tuple of values for each group (always a tuple, even when there’s only one), and fr is a TenFrame of the rows in that group.

__setitem__(columnName, value)#

Add a new TenSeries to this TenFrame.

The TenSeries to be added must be part of a TenFrame which is “compatible” with this one, that is, one whose ops contain the ops of this frame as a prefix, and which doesn’t have any tabulations in the extension.

addOp(op)#

Append an op to the end of this TenFrame’s ops.

Alters the current TenFrame in place.

addcode(fun)#

Add <code> to the TenFrame.

This method is EXPERIMENTAL. Its interface and behavior may change, or it may be withdrawn in future versions!

Add a <code> element to the TenFrame’s ops, containing this function definition followed by an invocation of the form:

ops = ten.rebase(pd.DataFrame(function()))

Note that the function should return something that can become a pd.DataFrame (which includes DataFrames!) and should take no arguments.

The <code> object is added when the function is evaluated (on the client side), not when it is defined.

The function returns a modified copy of this TenFrame; it does not alter self.

addparam(param)#

Add a parameter to the TenFrame.

Adds or replaces the default value of a parameter of this TenFrame. You should not have to do this in general, as the Param() method on TenFrame takes care of this for you, but you might want to use it to replace a default value. See Blocks and Block Parameters for an explanation of block parameters.

Parameters:
  • name – Name of the parameter. Should be a string or a Param object. If the latter, the default value of the Param object is used if the value argument is not given.

  • value – Default value to give the parameter. Can be a string or a number. Default is “”.

agg(agg_info, newOp=None, breaks=None, *, columnparams=None, rollup=None, breakelts=None, **kwargs)#

Return a new TenFrame with an aggregating <tabu> op added.

Parameters:
  • agg_info – Details on what to aggregate and how (see below).

  • newOp – An operation to add to the tenFrame just before the tabulation.

  • breaks – If set, override the object’s “breaks” attribute.

  • columnparams (list(dict(str,str))) – (keyword-only) Meta-data information for the columns to be created by the aggregation.

  • rollup (dict(str,str)) – (keyword-only) A dictionary of key=value pairs to put into a <rollup> element inside the <tabu>.

  • breakelts – (keyword-only) A dictionary or list of dictionaries of key=value pairs to put into <break> element(s) inside the <tabu>.

  • adjoin (bool) – (keyword-only, passed in kwargs) Whether or not the columns created by this aggregation should be adjoined to the table or replace them. Implies that the resulting table will have the same number of rows as this one, which otherwise might not be the case. Default False.

  • **kwargs – Additional key=value attributes to set on the <tabu> element.

agg_info should be:
  1. A function name, to be applied to all compatible columns, or

  2. A list of function names, to be applied to all compatible columns, or

  3. A dictionary of {column: funs} pairs, where column is the name of a column and funs is a function name or a list of function names.

If newOp (an Op object) is supplied, it is added to the new TenFrame’s ops before the tabulation is added. If breaks is specified in the function call, the value overrides the TenFrame’s breaks attribute.

The optional, keyword-only “columnparams” parameter allows you to set additional meta-data for the aggregation columns. The columnparams parameter should be a list of the form:

[ dict(source="sales", fun="avg", name="salesavg", format=...),
  dict(source="qty", fun="sum", name="totalquant", ...),
  ...]

Each dictionary must contain the “source” and “fun” keys in order to specify a particular column unambiguously (unless there is only one column to begin with). The tabulated columns will be reordered as given in the list; any columns not mentioned come at the end. All the key/value pairs in the dictionary are added to the <tcol> element for that column.

Normally, an aggregation will result in a table that only has the group column(s) and the result column (see below), and often has fewer rows than the original table. For example, frame.groupby("month").agg({"sales":"sum"}) would be expected to have only 12 rows. If the keyword-only parameter adjoin is set to True, then the existing columns in the table are not removed, nor are any rows, and the resulting table is the same length as the original table, containing all the columns previously there in addition to those computed by the tabulation. This functionality is not available in pandas, but it can be very useful. So you might say

>>> aggregated_frame = frame.groupby("month").agg({"sales":"mean", "sales":"sum"}, adjoin=True)

to make a frame like the original one, but having two more columns (sales_mean and sales_sum) containing the average and sum (respectively) of the sales in the same month as the data in each row.

Any key=value pairs supplied as kwargs are added as attributes to the <tabu> op which is added. Possibilities for these include

cbreaks: a comma-separated list of columns for cross-tabulation.

aggregate(agg_info, newOp=None, breaks=None, *, columnparams=None, rollup=None, breakelts=None, **kwargs)#

Return a new TenFrame with an aggregating <tabu> op added.

Parameters:
  • agg_info – Details on what to aggregate and how (see below).

  • newOp – An operation to add to the tenFrame just before the tabulation.

  • breaks – If set, override the object’s “breaks” attribute.

  • columnparams (list(dict(str,str))) – (keyword-only) Meta-data information for the columns to be created by the aggregation.

  • rollup (dict(str,str)) – (keyword-only) A dictionary of key=value pairs to put into a <rollup> element inside the <tabu>.

  • breakelts – (keyword-only) A dictionary or list of dictionaries of key=value pairs to put into <break> element(s) inside the <tabu>.

  • adjoin (bool) – (keyword-only, passed in kwargs) Whether or not the columns created by this aggregation should be adjoined to the table or replace them. Implies that the resulting table will have the same number of rows as this one, which otherwise might not be the case. Default False.

  • **kwargs – Additional key=value attributes to set on the <tabu> element.

agg_info should be:
  1. A function name, to be applied to all compatible columns, or

  2. A list of function names, to be applied to all compatible columns, or

  3. A dictionary of {column: funs} pairs, where column is the name of a column and funs is a function name or a list of function names.

If newOp (an Op object) is supplied, it is added to the new TenFrame’s ops before the tabulation is added. If breaks is specified in the function call, the value overrides the TenFrame’s breaks attribute.

The optional, keyword-only “columnparams” parameter allows you to set additional meta-data for the aggregation columns. The columnparams parameter should be a list of the form:

[ dict(source="sales", fun="avg", name="salesavg", format=...),
  dict(source="qty", fun="sum", name="totalquant", ...),
  ...]

Each dictionary must contain the “source” and “fun” keys in order to specify a particular column unambiguously (unless there is only one column to begin with). The tabulated columns will be reordered as given in the list; any columns not mentioned come at the end. All the key/value pairs in the dictionary are added to the <tcol> element for that column.

Normally, an aggregation will result in a table that only has the group column(s) and the result column (see below), and often has fewer rows than the original table. For example, frame.groupby("month").agg({"sales":"sum"}) would be expected to have only 12 rows. If the keyword-only parameter adjoin is set to True, then the existing columns in the table are not removed, nor are any rows, and the resulting table is the same length as the original table, containing all the columns previously there in addition to those computed by the tabulation. This functionality is not available in pandas, but it can be very useful. So you might say

>>> aggregated_frame = frame.groupby("month").agg({"sales":"mean", "sales":"sum"}, adjoin=True)

to make a frame like the original one, but having two more columns (sales_mean and sales_sum) containing the average and sum (respectively) of the sales in the same month as the data in each row.

Any key=value pairs supplied as kwargs are added as attributes to the <tabu> op which is added. Possibilities for these include

cbreaks: a comma-separated list of columns for cross-tabulation.

apply(func, axis=0, args=(), inplace=False, **kwargs)#

Apply a function to a TenFrame.

The function passed in is added to the query text and is run with the .apply() method on the accumulator on the table as a DataFrame using the server-side Python feature. Please read the documentation on this server-side python to understand the features and limitations associated with it.

Parameters:
  • func (Callable) – Function to apply

  • axis – Axis to apply (passed to apply() method on server side)

  • args (tuple) – Other arguments to pass to the function (passed to apply() method on server side)

  • inplace (bool) – Modify this TenFrame, or return a modified copy (default)

Returns:

A copy of the TenFrame, unless inplace=True.

apply_columnparams(columnparams, tabop)#

Apply specified column info to tabulation operator.

Applies the metadata information specified by columnparams to the given tabulation operation’s <tcol> contents. This operation is DESTRUCTIVE, changing the ops in-place.

See the agg() for more information.

assign(**kwargs)#

Assign new columns to a TenFrame.

Returns a new object with all original columns in addition to new ones. Equivalent to tf[k] = v for keyword/value pairs (k, v) in the kwargs. Assignments are done in order, as in Pandas.

Can be used with callables (functions) as the values, in which case the specified function is added to the query text and run on the server using the server-side python feature and passed the the data as its argument, as a DataFrame. Please read the documentation on server-side python to understand the features and limitations involved, and see Server-Side Functions and Type Inference for information on type annotation.

Return type:

TenFrame

basic_agg(fun, *args, breaks=None, **kwargs)#

Returns a TenFrame with an aggregating <tabu> op added.

This version only takes a single function and optionally columns. The keyword-only “breaks” parameter can be used to override the TenFrame’s breaks.

bestguesscols(exception=False)#

Returns self.cols if online, otherwise self.offlinecols.

Parameters:

exception (bool) – If True, then if the TenFrame is offline and offlinecols() returns no columns, raise a RunTimeError. Default False, meaning return the empty list [] in such a case.

c0colname(col1=None, op=None, col2=None, **kwargs)#

Compute a name for a new column.

Generates a new column name of the form “c0”, “c1”, “c2”, etc using the smallest number not already present as a column name in the TenFrame. The actual inputs are, at this time, ignored.

call(blockname, **kwargs)#

Add a call op.

Call a <block> that was defined earlier, usually in a library imported with the imports= keyword to the constructor.

Parameters:
  • blockname (str) – A string, the name of the already-defined block to call.

  • **kwargs – key=value pairs for the parameters of the block.

Returns:

A TenFrame that is a copy of this one with a <call> operator appended to the end of its ops.

classmethod can_tabulate(fun, col)#

True if column col can be tabulated by function fun.

Numeric columns (types ‘i’, ‘j’, ‘f’) can be tabulated by all functions, but string columns (type ‘a’) can only be used with some functions (you can’t “average” a bunch of strings, but you can count them, for example.)

clear_library()#

Clear the TenFrame’s library.

Clear out the “library” of this TenFrame, which holds function definitions, etc.

cnt(*args, **kwargs)#

Perform a count on this TenFrame.

This is purely counting rows, unlike count(), which, like pandas count(), only counts non-null values. If a group= parameter (keyword-only) is supplied (or this TenFrame has breaks set), returns a TenFrame representing the count by breaks. Otherwise returns simply len(self).

colUsages(columnName)#

Find ops that depend on a column.

Parameters:

columnName (str) – The name of the column to search for.

Returns:

ops which appear in any way to depend on the named column. Used for checking if a column is or is not used.

Return type:

list(Op)

compatible(other)#

Is this frame “compatible” with another?

Currently, this means that at least one of them extends the other, AND that there are no tabulations or <code> elements in the extension part.

concat(other, axis=0, **kwargs)#

Concatenate this TenFrame with another.

String together the data in this TenFrame and that in another, either appending the rows (axis=0) or the columns (axis=1). If concatenating rows, the two TenFrames should have the same columns.

The library of the other TenFrame is merged with the library of this TenFrame in the result.

Parameters:
  • other – The TenFrame (or name of a table in the 1010data object tree) to concatenate onto this table.

  • axis – Concatenate rows (0) or columns (1). Default 0.

Returns:

A TenFrame representing the concatenated frames.

Warning

When using axis=1 (columns), the other TenFrame may not be a “worksheet”, that is, a TenFrame with any ops apart from <base> and <meta>.

consolidate(inplace=False)#

Add an empty <merge/> operation to a TenFrame.

Returns a copy of this TenFrame (or this frame itself, modified, if inplace is True) with an empty “<merge/>” operation appended, which consolidates the current table to a single segment and loads it into memory in the accumulator.

copy()#

Returns a copy of this TenFrame.

The copy uses the same session and (shallow!) copies of self.breaks and self.ops, etc.

cumcnt(*args, **kwargs)#

Perform a cumulative count on this TenFrame.

Groups by the “group=” parameter (keyword-only), if given, otherwise uses the breaks of this TenFrame, if set.

deepcopy()#

A deep copy of this TenFrame

Returns a copy whose ops are a “deep” copy of this TenFrame’s ops. Not currently used.

defFun(funtype, fun, name=None, args=None, types=None, *, forcelambda=False, knowncols=None, **kwargs)#

Internal function for implementing defUfun and defGfun.

Users should probably be using the @def_ufun or @def_gfun decorators. This is the internal implementation.

Parameters:
  • funtype (str) – Either ‘u’ or ‘g’.

  • fun – Either a python function or a string.

  • name (str) – The name to use for this function.

  • args (str) – What to put in the “args” attribute of the function.

  • types (str) – What to put in the “types” attribute of the function.

  • **kwargs – Any other key=value pairs for the function tag.

Adds library resources to the query to define a server-side python function and make it possible to call it. This method is used by:

Users should normally be calling this function through these routes.

If fun is a string, it gets wrapped in a <code> element (with CDATA quoting), which is wrapped in a <def_ufun> or <def_gfun> element, which is placed in this TenFrame’s <library>. In this case, the name, args, and type information are mandatory. Thus:

>>> frm.defFun('u',
... '''r = 3*x+y''', name="foo" args="x,y", types="f(f;f)")

will yield XML code like:

<library>
 <def_ufun name="foo" args="x,y" types="f(f;f)">
   <code language_="python"><![CDATA[
r = 3*x + y
]]></code></def_ufun></library>

If fun is a python function, it gets added to the library in TWO places:

<library>
  <resource for="python" name="Xabcdef"><![CDATA[
def foo(x:np.ndarray[np.floating], y:np.ndarray[np.float64])->np.ndarray[np.ndarray[float64]]:
    return np.array(xx+yy for xx,yy in zip(x,y))
]]>
  </resource>
  <def_ufun name="foo" args="x,y" types="f(f;f)">
    <code language_="python">
      <![CDATA[
r = foo(x,y)
]]>
    </code>
  </def_ufun>
</library>

Once as just a resource, defining the function within Python, and once as a <def_ufun> (or <def_gfun>) defining it as a function in the macrocode sense (consisting of just calling the function). The name of the python <resource> is not used but mandatory, and is supplied with a random string.

If you pass in a function, tenFrame will attempt to derive the name and args therefrom, and the types, too, if type-annotated. If types are unspecified or unsupported, they will be set to ‘n’ (any type). Function parameters take precedence over inferred types and args, though.

You can use kwargs to add other attributes to the <def_ufun>/<def_gfun> element.

Server-Side Functions and Type Inference#

A function defined in a query must specify the names and types of its arguments and its return type (in appropriate syntax); that’s what the args and types parameters of defFun are for. When passing in a Callable object, the names of the arguments will be determined automatically by inspection. In order to specify the types of the arguments, it is best to use Python type annotation for the parameters and the return-type of the function, otherwise the types will default to the most general “unspecified” type, n.

Note that server-side functions are always given numpy arrays (with the exception of dictionaries, see below) of values for each column, never scalars. The only types of parameters that are really available are numpy arrays of ints, 64-bit ints, floats, and strings, or lists of numpy arrays of these types (plus the special case of dictionaries). The only return types that are really available are numpy arrays of the above-mentioned types, except for g_funs, which may return single (scalar) values of the above types.

1010data tables and queries can also contain columns whose contents are “packages” or “models”, essentially dictionaries. These can also be parameters or outputs of user-defined functions, but they are passed slightly differently. Whereas an integer column is passed as a numpy array (of numpy integers), a dictionary column (type d) is passed as a list of dictionaries. Similarly, just as a list of integer columns is passed as a list of numpy arrays, a list of dictionary columns is passed as a list of lists of dictionaries.

This is the list of type annotations which tenFrame recognizes for type inference:

  • np.ndarray[np.int32] (integer column)

  • np.ndarray[np.int64] (64-bit integer column)

  • np.ndarray[np.float64] (floating-point column)

  • np.ndarray[str] (string column)

  • np.ndarray[bytes] (string column)

  • list[dict] (column of dictionaries/packages)

  • list[np.ndarray[np.int32]] (list of integer columns)

  • list[np.ndarray[np.int64]] (list of 64-bit integer columns)

  • list[np.ndarray[np.float64]] (list of floating-point columns)

  • list[np.ndarray[str]] (list of string columns)

  • list[np.ndarray[bytes]] (list of string columns)

  • list[list[dict]] (list of dictionary columns)

  • list (list of unspecified columns)

Scalar types are permitted, as g_funs may return scalars.

  • np.int32 (integer)

  • np.int64 (64-bit integer)

  • np.float64 (floating-point number)

  • str (string)

  • bytes (string)

  • dict (dictionary)

These values must be typed exactly as shown here; don’t use variables set to these values, etc. You can also use string literals as the types (def func(x)->'np.ndarray[np.int32]':), and you may have to, since the list[] syntax is not valid in versions of python before 3.9. You can use the following string literals as well (for fewer keystrokes):

  • 'intCol' (integer column)

  • 'longCol' (64-bit integer column)

  • 'floatCol' (floating-point column)

  • 'stringCol' (string column)

  • 'dictCol' (dictionary column)

  • 'list[intCol]' (list of integer columns)

  • 'list[longCol]' (list of 64-bit integer columns)

  • 'list[floatCol]' (list of floating-point columns)

  • 'list[stringCol]' (list of string columns)

  • 'list[dictCol]' (list of dictionary columns)

Anything else will be taken as the unspecified type, represented by n in 1010data macro code.

Note that the types of the parameters really only matter when defining a u_fun or a g_fun. The other situations that cause functions to be sent to the server-side are the assign() method and selecting rows by a callable, and in those cases there is always only one parameter expected, which will be a Pandas DataFrame.

defGfun(fun, name=None, args=None, types=None, **kwargs)#

Add a g_fun to this TenFrame’s library.

Calls defFun() with an initial argument of ‘g’.

defUfun(fun, name=None, args=None, types=None, **kwargs)#

Add a u_fun to this TenFrame’s library.

Calls defFun() with an initial argument of ‘u’.

def_fun(funtype, **kwargs)#

Internal function implementing def_ufun() and def_gfun().

Passes the function through defFun() and redefines it (on the local, client side) to return a TenSeries of a new column defined by the function.

See also

defFun()

def_gfun(*args, **kwargs)#

Decorator for defining g-functions on the server side.

See def_fun().

You can pass named parameters for defFun() like args and types as arguments to this decorator, if desired (but you can use the decorator without arguments, i.e. @def_gfun and @def_gfun() are both acceptable and work the same.)

See also

defFun()

Use this method as a decorator to define a server-side “g_fun”:

# User-defined g_funs must start with ``g_``.
@frame.def_gfun
def g_mysum(c):
   # c is a numpy array holding one "group" inside one segment
   # of the table.
   return sum(c)

frame['summed'] = frame.groupby("month")['sales'].g_mysum()
def_ufun(*args, **kwargs)#

Decorator for defining u-functions on the server side.

See def_fun().

Use this method as a decorator to define a server-side “u_fun”:

@frame.def_ufun
def sqr(c):
    # c is a numpy array holding one segment of a column
    return c * c

frame['squared'] = frame['length'].sqr()

You can pass named parameters for defFun() like args and types as arguments to this decorator, if desired (but you can use the decorator without arguments, i.e. @def_ufun and @def_ufun() are both acceptable and work the same.)

See also

defFun()

describe(include=None, exclude=None)#

Return a DataFrame with some key statistics.

Returns a Pandas DataFrame showing:

  1. count

  2. mean

  3. std

  4. min

  5. lquart

  6. median

  7. uquart

  8. max

for all numeric columns in the table (except those listed in “exclude”), like the pandas .describe() method.

If the “include” parameter is set to “all”, the following rows are added for all string columns (except those listed in “exclude”), after the “count” row:

  1. count of uniques (string columns only)

  2. mode (string columns only)

  3. frequency of mode (string columns only)

If Pandas is not installed, returns a dictionary whose keys are the names of affected (numeric) columns and whose values are lists of the statistics (listed above), in order, plus a “functions” entry with the names of the statistics.

static dict2table(d)#

Make a table from dictionary or DataFrame.

Convert a dictionary of lists (or a Pandas DataFrame) into a tenFrame table Op representing the data as a table.

static dict2table_XML(d)#

Make a table from dictionary or DataFrame.

Convert a dictionary of lists (or a Pandas DataFrame) into a tenFrame table Op representing the data as a table.

dictslice(start=0, stop=200, step=None)#

Call the dictslice method on the underlying query.

dist_apply(dist_fun, accum_fun, cols, breaks=None, *, inplace=False, as_df=True)#

Server-side tabulation.

This form of server-side tabulation allows you to define a function (dist_fun) to be run on each sub-processor of the query, and another function that takes a list of whatever dist_fun returns and returns a Pandas DataFrame, and use these functions in a tabulation.

Note

Although you can ask for grouping of the input data using groupby() (or the breaks parameter), the server-side computation does not actually do any grouping for you. It only makes the group column(s) available to your server-side code. Actually handling the grouping (i.e., splitting and combining segments based on the groups) has to be done by your own code.

If the table’s segmentation is favorable to the grouping you intend to do, you might consider using the def_gfun() decorator method instead.

For example:

def dist_fun(x):
    # .... function code
    return rv

def accum_fun(y):
    # y is a list of whatever dist_fun returns
    # ... function code
    return pd.DataFrame(whatever)

r = frame.groupby("xx").dist_apply(dist_fun, accum_fun, ["col1", "col2"])

results in a TenFrame (r) with the following appended to the end of its ops:

        <block>
                <tabu>
                        <code language_="python"><![CDATA[def x3ai2j6t():
    def dist_fun(x):
        # .... function code
        return rv
    def accum_fun(y):
        # y is a list of whatever dist_fun returns
        # ... function code
        return pd.DataFrame(whatever)

    def run_with_tf(fun):
        def f(names):
            df = pd.DataFrame({_ : ten.get(_) for _ in names})
            return fun(df)
        return f
    cols = ['col1', 'col2']
    breaks = ['xx']
    subproc_result = ten.dist_apply(breaks + cols, run_with_tf(dist_fun), breaks + cols)
    df = accum_fun(subproc_result)
    tabu = ten.make_table(df)
    return tabu
tabu = x3ai2j6t()
]]></code>
                </tabu>
        </block>

Please refer to 1010data documentation for further information. Make sure you understand table segmentation when writing these functions.

Parameters:
  • dist_fun (Callable) – A function to be run on each segment of the table. Should take a single parameter, which will be a Pandas DataFrame containing the data (for the columns listed in the cols parameter) for this segment. If as_df is False, the parameter will instead be a dictionary of numpy arrays. (This may be desirable for performance reasons in some cases.)

  • accum_fun (Callable) – A function run on the accumulator to assemble the results from the dist_fun executions. Must take a single parameter, which will be a list of whatever dist_fun returns. Must return a Pandas DataFrame.

  • cols (list(str)) – A list of names specifying which columns will be made available to each instance of dist_fun. This list is accessible within the dist_fun and the accum_fun as a nonlocal variable cols.

  • breaks (list(str)) – A list of names specifying the columns to use to form the groups for this tabulation. Note that this is only meaningful if the code in dist_fun and accum_fun use it; the system will not do any grouping automatically. If not passed, defaults to the grouping of the frame, as set by groupby(). This list is accessible within the dist_fun and the accum_fun as the nonlocal variable breaks.

  • inplace (bool) – (Keyword-only) Modify this frame (and return it) or return a copy? Default False.

  • as_df (bool) – (Keyword-only) If True, the dist_fun is passed its data as a Pandas DataFrame. If False, the data is presented as a dictionary of numpy arrays. Default True.

Returns:

A copy of this TenFrame (or this TenFrame itself, depending on the value of inplace) with a <block> op containing a <tabu> op as described above appended to the end of its ops.

classmethod distribute(fun)#

Define server-side tabulation function (Decorator).

Makes a function which adds the code from fun to a TenFrame’s library to a <defop> operator taking a two parameters and placing them into the variable “cols” and “group”, after splitting it up back into a list of names, along with a line to call the function.

The function being decorated should take no parameters and must return a Pandas DataFrame.

The function that is returned by this decorator takes three parameters: a TenFrame, a list of column names, and a list of breaks. If the column names are not supplied or are None, defaults to all the columns in the table (note that this entails running the query to get the columns.) If the breaks are not supplied or are None, defaults to the breaks on the frame, if any (You should supply them explicitly, though: using frame.groupby() will likely not work the way you want currently.) The <defop> is added to the TenFrame’s library if it is not already there and an invocation of the op is appended to the TenFrame’s operations. Both of these happen when the function is invoked.

The function returns an altered copy of the TenFrame, unless it is invoked in inplace=True.

example:

>>> @tenFrame.distribute
... def mat_mul():       # Parameters would be ignored!
...     def subproc_function(cols):
...         mat = np.stack([ten.get(z) for z in cols], axis=1)
...         return mat.T.dot(mat)
...     list_of_grams = ten.dist_apply(cols, subproc_function, cols)
...     gram = np.sum(list_of_grams, axis=0)
...     df = pd.DataFrame(gram, columns=['m'+str(i) for i in range(gram.shape[1])])
...     return df
...
>>> ## mat_mul's signature is
>>> ## mat_mul(frame, cols=None, breaks=None, inplace=False)
>>> frame = mat_mul(frame, ["col1", "col2", "col3"])

results in frame having for its XML:

<library>
        <defop name="mat_mul" columns="" breaks="">
                <tabu><![CDATA[cols = '~{@columns}~'.split(',')
breaks = '~{@breaks}~'.split(',')
def mat_mul():
    def subproc_function(cols):
        mat = np.stack([ten.get(z) for z in cols], axis=1)
        return mat.T.dot(mat)
    list_of_grams = ten.dist_apply(cols, subproc_function, cols)
    gram = np.sum(list_of_grams, axis=0)
    df = pd.DataFrame(gram, columns=['m' + str(i) for i in range(gram.shape[1])])
    return df

tabu = mat_mul()
]]></tabu>
        </defop>
</library>
<base table="....."/>
<!-- ... other ops of the frame ... -->
<mat_mul columns="col1,col2,col3" breaks=""/>

For other ways to do distributed aggregation, see the dist_apply() method or the def_gfun() decorator method.

drop(rows, *, inplace=False)#

Drop rows from the TenFrame.

Adds a selection operator to the TenFrame to omit the rows specified. Negative row-indices are not supported.

Warning

When dropping multiple rows, it is very advantageous to put them in a list and drop them all at once, instead of dropping them one (or a few) at a time. Each call to drop() results in one selection operator, so combining them can simplify the query greatly.

Parameters:
  • rows (list(int)) – A list of row-numbers to drop from the TenFrame.

  • inplace (bool) – (keyword-only) Operate in-place on this TenFrame or return a modified copy (default).

Returns:

A copy of this TenFrame or this TenFrame itself, depending on the value of inplace.

dropColumns(columns, inplace=False)#

Remove columns from the TenFrame.

Given a column or column name (or list of column names), remove the named column from the TenFrame, returning a copy with the deletion made, or this TenFrame itself (altered), depending on the value of inplace.

This method will only affect user-defined (<willbe>) columns, (usually, assigned as frame['newcol'] = expr) not columns that come from the base table, are the result of tabulations or come from merges, etc. Also, a column which other columns use in their calculations will not be deleted.

If the column is found, but cannot be deleted because other columns depend on it, raises a PermissionError.

If the column is not found or is not a top-level <willbe> column, raises a KeyError.

If any column in the list raises an error, no changes are made to the TenFrame, if inplace=True.

Parameters:
  • columns – A column, column name, or list.

  • inplace (bool) – Operate on this TenFrame or on a copy (default False)

Returns:

A copy of this TenFrame or this TenFrame itself, altered, depending on the value of inplace.

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)#

Remove rows (or columns) with NA values.

Strips out rows with missing (NA) values, according to the parameters given. Returns an appopriately-modified copy of this TenFrame unless inplace is True, in which case modifies this TenFrame in-place and returns it.

Parameters:
  • axis – Drop rows (0, default) or drop columns (1). NOTE: currently only axis=0 is supported!

  • how (str) – “any” (default) to drop row (column) if any value therein is NA; “all” to drop only if all values are.

  • thresh (int) – How many NA values does it take to drop row/column (default: 1 for how=”any”; not used for how=”all”)

  • subset (list(str)) – Only look at these columns when counting NA values in rows. (Not used for dropping columns.) Defaults to all columns.

  • inplace (bool) – Modify this TenFrame (True) or return a modified copy (False, default.)

Returns:

A TenFrame, either a copy or the same one, depending on the value of inplace.

equals(other)#

Check if TenFrame’s list of ops equals that of self.

eval(expr, **kwargs)#

Create a column with a given expression.

You can use the .eval() method to create a new TenSeries with a value given by an expression string. Note, though, that the expression will be evaluated with the syntax of 1010data macro code and not via python. Please see the 1010data documentation for information about 1010data expression syntax. The most salient differences are the use of <> for inequality instead of !=, and the use of = as an equality test and not as assignment.

You cannot assign a new column within an eval statement like you can with pandas, and there is no inplace= parameter. You can instead assign it back into the TenFrame externally, for example:

>>> frame['area'] = frame.eval("width * height")
extends(other)#

Does this TenFrame “extend” another?

Parameters:

other (TenFrame) – The other TenFrame.

Returns:

True if other is a TenFrame whose ops equal a prefix (possibly improper) of this TenFrame’s ops.

Return type:

bool

extractXML(nolibrary=False, nocols=False)#

Get the XML for the query represented by this TenFrame.

Parameters:
  • nolibrary (bool) – If True, do not include the <library> of definitions from the query. Default False.

  • nocols (bool) – If True, do not include the <cols> element at the end with column formatting information, etc. Default False.

Returns:

the XML for the query represented by this TenFrame.

Return type:

str

fillna(value=None, *, cols=None, inplace=False)#

Replace NA values with a given value.

Returns a TenFrame (a copy of this one or this one itself, depending on the inplace parameter) in which all the NA values in the columns listed in cols (or all of them, if cols is not specified) are replaced by the value value.

Note that at this time, the value must be a simple scalar value or a TenSeries.

Parameters:
  • value – The value with which to replace the NA values. Must be a constant or a TenSeries (not a function.)

  • cols – (Keyword-only) Only do the replacement in the named columns. Defaults to all columns.

  • inplace (bool) – If False (default), return a copy of this TenFrame with the changes added. If True, modify this TenFrame in-place and return it.

Returns:

A modified copy of this TenFrame or this TenFrame itself, modified, depending on inplace.

filter(items=None, like=None, regex=None, index=None)#

Subset the TenFrame columns according to specified labels.

This method filters the columns (not the rows!) of the TenFrame based on their names (not their contents!). Exactly one of items, like, and regex must be specified or an exception will be raised.

Note that unlike pandas DataFrames, a TenFrame may be subscripted with a “glob” for column names, so this method may not be as necessary in tenFrame. So frame[["birth*"]] will select columns birthyear, birthmonth, and birthday, for example.

Parameters:
  • items (list(str)) – A list of column names (in the order desired) to keep. Equivalent to self[items].

  • like (str) – Keep columns whose names contain the string like. Equivalent to self[['*' + like + '*']] (see above.)

  • regex (str) –

    Keep columns for which re.search(regex, name) is True.

    Note

    Unlike the other two parameters, using regex forces the TenFrame to run its query, in order to find all the column names so it can apply the search to them.

  • indexIGNORED. Filtering is always done on column names.

Returns:

A TenFrame with a <colord> op appended to select only the specified columns.

first1(*args, **kwargs)#

Return the first row of the TenFrame, or of each group.

Groups by the “group=” parameter (keyword-only), if given, otherwise uses the breaks of this TenFrame, if set. Performs a tabulation for the first row; returns a TenFrame.

get_dependencies()#

Add a <dependencies/> operation to this TenFrame.

Returns a copy of this TenFrame with an empty “<dependencies/>” operation appended. When run, this yields a table showing all the base tables this TenFrame depends on, and how.

groupby(args, *, inplace=None, cbreaks=None)#

Set breaks and return a wrapper object.

Returns a object in a GroupByFrameWrapper class which has its “breaks” (grouping) attribute set to the given name or list of names. The object is a wrapper around this TenFrame, and behaves exactly like it except for the value of its “breaks” (and “cbreaks”) attributes.

Parameters:
  • args – Column name or list of column names to do the grouping on.

  • inplace (bool) – (keyword-only) If True, set the “breaks” attribute on this TenFrame directly and do not create a wrapper.

  • cbreaks – (keyword-only) Column name or list of column names for the “cbreaks” attribute, used in 1010data cross-tabulations.

Returns:

A wrapper object, or this TenFrame, depending on the value of inplace.

Note

TenFrame was originally written without the concept of “GroupBy” as a class of its own, but rather just setting the groups in an instance variable of the TenFrame. So GroupBy-relevant methods and information are to be found in in the TenFrame class, and GroupBy objects will mostly act like normal TenFrames (with the “breaks” instance variable set) in all respects, unlike in Pandas.

growOps(other)#

Replace the ops of this frame with those of an extension.

If the other frame’s ops are longer than this one’s, extend this frame’s ops by whatever is added in the other frame, EXCEPT for colord ops (but colord ops with hide= are included).

Other frames block parameters (if any) are also updated into this one’s.

The other frame is presumed to be compatible with this (checked already).

head(n=20)#

Equivalent to self.slice(0, n)

info(buf=None, show_counts=False, *, outtype=None)#

Print a summary of a TenFrame.

Prints a table showing the columns and their types, optionally showing the count of non-null entries in each column.

Parameters:
  • buf – Print the table to the given output buffer. Defaults to sys.stdout.

  • show_counts (bool) – Whether or not to show the count of non-null entries per column. Unlike pandas, defaults to False, so as to avoid running the query more than necessary unless requested.

  • outtype (str) – Keyword-only, mostly for development. “dict” means to return the information as a dictionary of lists, “str” means to return a string instead of printing it.

Returns:

None, unless outtype is set.

isna()#

Return a boolean (0/1) TenFrame on value is NA.

Returns a boolean TenFrame (i.e. integers 0 and 1; TenFrames don’t use boolean types natively) which is 1 just where the corresponding value in this TenFrame is NA.

isnull()#

Return a boolean (0/1) TenFrame on value is NA.

Returns a boolean TenFrame (i.e. integers 0 and 1; TenFrames don’t use boolean types natively) which is 1 just where the corresponding value in this TenFrame is NA.

last1(*args, **kwargs)#

Return the last row of the TenFrame, or of each group.

Groups by the “group=” parameter (keyword-only), if given, otherwise uses the breaks of this TenFrame, if set. Performs a tabulation for the last row; returns a TenFrame.

libraryXML()#

Returns the XML of the library of this TenFrame.

localpipe(func, *args, **kwargs)#

Apply chainable functions.

Parameters:
  • func – Function to apply to the TenFrame. args and kwargs are passed into func. Alternatively, a (callable, data_keyword) tuple, where data_keyword is a string indicating the keyword of callable that expects the TenFrame.

  • *args – Positional arguments passed into func.

  • **kwargs – Keyword arguments passed into func.

Returns:

The return value of func.

melt(id_vars=None, value_vars=None, var_name='variable', val_name='value', *, out_value_vars=None, onlyvals=False)#

Unpivot a TenFrame from wide to long format.

Attempts to work like pandas melt() method. Beware of some differences in 1010data. In particular, if your “value_vars” aren’t all of the same type, the “values” column may be missing altogether!

Parameters:
  • id_vars – Columns to use as identifier variables. Default None.

  • value_vars (list(str)) – Columns to unpivot. If not set, attempts to use all columns that are not id_vars, but this is liable to cause problems, since often not all columns are of the same type (see above). Please always specify this parameter explicitly.

  • var_name (str) – Name for the variable column. Default "variable"

  • val_name (str) – Name for the value column. Default "value"

  • out_val_vars (list(str)) – (keyword-only) List of names to use for thevalue_vars, in order. Must be the same length as value_vars. Defaults to the same as value_vars.

  • onlyvals (bool) – (keyword-only) If True, only hide the columns of the value_vars. By default, hides all columns except the id_vars and the “variable” and “value” columns.

Returns:

“Melted” TenFrame

merge(right, on, right_on=None, how='left', cols=None, **kwargs)#

Merge this TenFrame with another (or a table)

Performs a 1010data “<link>” operation, and returns a new TenFrame with the link operation appended. The “right” parameter can be another TenFrame, or just the name of a table (as a string.)

The “how” parameter can be “left” or “inner”; other pandas options are not supported. “left” corresponds to type="exact" in the 1010data <link> operation; “inner” to type="select".

You can also use 1010data-specific values for how: "include", "exclude", or "asof". See the 1010data documentation for the link op for information.

The library of the TenFrame being linked in is appended to the library of this TenFrame, since libraries cannot be inside the <link> operator.

Parameters:
  • right – The TenFrame (or table name) to be merged with.

  • on (str) – The name of the column in this TenFrame to be matched to one in right.

  • right_on (str) – The name of the column in right to be matched with the column specified by on in this TenFrame. If not specified, taken to be the same name as on.

  • how (str) – Type of merge. Must be "left", "inner", "include", "exclude", or "asof". Defaults to “left”.

  • cols – Name or list of names of the columns of the foreign table to be adjoined to this table in the result.

  • **kwargs – Other key=value pairs will be added to the <link> operator.

Returns:

A copy of this TenFrame, with the <link> operator appended.

meta(*params)#

Add a <meta> operator at the start of the query.

Inserts a meta op at the start of the query. This is a 1010data operator for setting some server-side parameters for this query.

fr.meta("alloc_rr=10", "empty", "condense") will result in:

<meta>alloc_rr=10,empty,condense</meta>

placed at the top of the query.

This method alters the current query, and does not return a value.

classmethod newcolname(col1=None, op=None, col2=None, **kwargs)#

Compute a name for a new column.

Takes two “base” column names and the operator being used to join them.

Actually does nothing so elaborate as that. Just makes up a random name using the randcolname() method The actual inputs are, at this time, ignored.

notna()#

Return a boolean (0/1) TenFrame on value is not NA.

Opposite of isnull().

notnull()#

Return a boolean (0/1) TenFrame on value is not NA.

Opposite of isnull().

numsegs()#

How many segments comprise this table?

Returns an integer, the number of segments that are used to store the table, segmented according to the table’s segmentation at the time it was saved. See 1010data documentation for more information on segmentation. Small tables frequently have only one segment, which means that g_functions can be used on them with any grouping.

online()#

Check if TenFrame is in online mode.

Returns True if this TenFrame is in “online mode,” i.e. with a real Session and capable of running queries at need.

open_table(tablePath)#

Initialize this TenFrame to opening the table.

Parameters:

tablePath (str) – The full table name.

Clears out this TenFrame’s list of ops, replacing them all with the single op <base table="tablePath"/>

pipe(func, *args, **kwargs)#

Apply a function to a TenFrame.

The function passed in is added to the query text and run with the .pipe() method on the accumulator on the table as a DataFrame using the server-side Python feature. Please read the documentation on this server-side python to understand the features and limitations associated with it.

pivot_table(values, index, column, aggfunc='mean', *, rename=True)#

Make a pivot table TenFrame

Compute a pivot table (cross-tabulation) using the given values, index, and columns. In 1010data terms,

>>> frame.pivot_table('col1', 'col2', 'col3', 'sum')

is equivalent to

>>> frame.groupby("col2").agg({"col1":"sum"}, cbreaks="col3")

and maps to the XML code:

<tabu breaks="col2" cbreaks="col3">
    <tcol fun="sum" name="col1_sum" source="col1"/>
</tabu>
Parameters:
  • values – The column to aggregate.

  • index – The column to group by.

  • column – The column whose values to use to define each new column.

  • aggfunc (str) – The aggregating function; a function name such as you would use in agg().

  • rename (bool) – (keyword-only) Whether or not to try to rename thecolumns (like in pandas) to the values they reflect. Otherwise you’ll get columns named things like “m0” “m1” “m2” etc. Default True.

Returns:

A TenFrame representing the pivot table.

prefiltered(expr, **kwargs)#

Get a pre-filtered copy.

Parameters:
  • expr (str) – A string, the expression for the <sel> operator.

  • **kwargs – Any other key=value pairs for the <sel>.

Returns:

a TenFrame that is a copy of this one, with a <sel> operator inserted at the beginning of its ops, just after the <base> (or <table>).

prettyXML(nolibrary=False, escaped=False, nocols=True)#

Get ‘pretty’ XML for this TenFrame’s query.

Parameters:
  • nolibrary (bool) – If True, do not include the <library> of definitions from the query. Default False.

  • escaped (bool) – If True, return standards-compliant XML, with all necessary characters escaped. Default False.

  • nocols (bool) – If True, do not include the <cols> element at the end with column formatting information, etc. Default True.

Returns:

‘pretty’ XML for this TenFrame’s query.

Return type:

str

If the “escaped” parameter is False (the default), the XML text is converted to a more readable version, with <, >, and & characters in the attribute values not changed into escaped versions (&lt;, &gt;, &amp;). 1010data accepts this slightly looser dialect of XML.

prettyprintXML(nolibrary=False, escaped=False, nocols=True)#

Print ‘pretty’ XML for this TenFrame’s query.

Convenience function: just prints out the value of prettyXML().

Parameters:
  • nolibrary (bool) – If True, do not include the <library> of definitions from the query.

  • escaped (bool) – If True, print standards-compliant XML, with all necessary characters escaped.

  • nocols (bool) – If True, do not include the <cols> element at the end with column formatting information, etc.

If the “escaped” parameter is False (the default), the XML text is converted to a more readable version, with <, >, and & characters in the attribute values not changed into escaped versions (&lt;, &gt;, &amp;). 1010data accepts this slightly looser dialect of XML.

printprettyXML(nolibrary=False, escaped=False, nocols=True)#

Print ‘pretty’ XML for this TenFrame’s query.

Convenience function: just prints out the value of prettyXML().

Parameters:
  • nolibrary (bool) – If True, do not include the <library> of definitions from the query.

  • escaped (bool) – If True, print standards-compliant XML, with all necessary characters escaped.

  • nocols (bool) – If True, do not include the <cols> element at the end with column formatting information, etc.

If the “escaped” parameter is False (the default), the XML text is converted to a more readable version, with <, >, and & characters in the attribute values not changed into escaped versions (&lt;, &gt;, &amp;). 1010data accepts this slightly looser dialect of XML.

query(expr, inplace=False, **kwargs)#

Select rows from the table.

Adds a <sel> op to the query with the given expression as its value.

Note that the syntax of the expression needs to be 1010data expression syntax (see the 1010data documentation for details) which is very similar to python syntax but not identical. The most common differences to be aware of are the use of = for equality comparison (instead of ==) and the use of <> for not-equals (instead of !=).

Parameters:
  • expr (str) – A string, the expression for the <sel> operator.

  • inplace (bool) – Return a copy of this TenFrame (False, default) or change this TenFrame in place and return it (True).

  • **kwargs – Any other key=value pairs for the <sel>.

Returns:

a copy of this TenFrame or this TenFrame itself, depending on the value of inplace, with a <sel> operator appended to the end of its ops.

static randcolname(n=7)#

Generate a random string.

The string will be of length n, and made of digits and lowercase letters. An ‘x’ will be prepended to it.

Used for making temporary names for new columns.

refreshquery()#

(Re)build the query represented by this TenFrame from its ops.

renameColumn(oldname, newname)#

DESTRUCTIVELY rename a column.

Alter this TenFrame, replacing oldname with newname, wherever it appears in the attributes of the ops, descending recursively into the contents of the ops. This could be dangerous if you have a column name that’s also a function or something common, but is used (internally) only for random-named temporary columns.

It is not an error to attempt to rename a column that doesn’t exist.

replace(to_replace, value, inplace=False, *, cols=None, negate=False)#

Replace values where the condition is True.

Where cond is False, keep the original value; where True, replace with value other.

This replacement happens in all the columns of the table. The 1010data server will raise an error if a column’s underlying type is changed as a result of the replacement. Make sure all the columns are of the same type as the replacement, or you might prefer to run the TenSeries.replace() method on an individual column instead.

Parameters:
  • to_replace – Condition to determine whether to replace. Currently, this condition must be TenSeries or a constant, i.e., it cannot be a function.

  • value – The value to replace with, when the condition is True. Must be a TenSeries or a constant (i.e., not a function).

  • inplace (bool) – If True, modify this TenFrame and return it. If False (default), return a copy of this TenFrame with the changes added.

  • cols – (Keyword-only) Apply the replacement only to the named columns. Defaults to all columns.

  • negate (bool) – (Keyword-only) Negate the condition (default False).

Returns:

A copy of this TenFrame or this TenFrame itself (depending on inplace), with the changes added.

resetbreaks()#

Reset breaks (and cbreaks) and return self.

If run on a GroupBy, clears the breaks of the GroupBy and also the underlying TenFrame, and returns the actual TenFrame.

resource(name=None)#

Decorator to add a resource to the frame’s library.

Wraps the source code of the decorated function or class (CDATA-quoted) in a <resource for=”python”> element and place it in this TenFrame’s library. If no name is provided, a random string will be used.

run(force=False)#

Run the query represented by this TenFrame.

The query is run if it can be run (i.e. in online more) and needs to be run (i.e. changes have been made since it was last run), or if force is True.

sample(frac, **kwargs)#

Add a sampling selection.

Parameters:

frac (float) – A floating-point number used as the “value” for the <sel> operator.

Returns:

a TenFrame that is a copy of this one, with a <sel sample="1"> operator with value set to frac appended to the end of its ops.

save(path, title='', sdesc='', ldesc='', force=False, *, materialize=False, **kwargs)#

Save this query or the results thereof.

Call the py1010 save() method on the query underlying this TenFrame to save it as a QuickQuery (i.e. saving the XML of the query, not the results). If materialize is True, run the query if necessary and call the py1010 saveTableMaterialize() method.

classmethod searchOps(ops, string)#

Search a list of ops for the given string.

Parameters:
  • ops (list(Op)) – List of ops.

  • string (str) – String to search for.

The string may occur anywhere in any value. Returns a list of ops which contain it.

segby()#

Show the segmentation of the current query.

Returns a list of lists. Each element is a list of one or more columns comprising a segmentation of the table. For a multi-segmented table (see numsegs()), computation must be grouped by all the members in at least one of these lists in order to use g_functions. This function is used by segbybreaks() to determine if a given grouping is supported by the segmentation.

If the table has no segmentation, an empty list is returned.

segbybreaks(breaks)#

Is this query segmented by a given set of breaks?

Returns True if computations may be performed using g_functions on this query using the given grouping. This depends on how the table is stored in 1010data’s servers, specifically how it is “segmented.” See the segby() method for a little more explanation, and the 1010data documentation for further details.

Always returns False in offline mode. If a table only has one segment, returns True.

Parameters:

breaks – Name of column (str), TenSeries, or list of names or TenSeries.

Return type:

bool

setbreaks(args, **kwargs)#

Set the breaks values and return self.

Same as groupby() with inplace=True.

slice(start=None, stop=None, step=None)#

Returns a TenFrame for a slice of this one.

Parameters:
  • start (int) – Starting index of the slice.

  • stop (int) – Ending index of the slice.

  • step (int) – Ignored.

Returns:

a new TenFrame which is a copy of this frame, with a <sel> op added to the end to limit the range to the specified slice (by selecting on the value of i_()).

Note

The step value is currently ignored.

sort_values(by, ascending=True, inplace=False, na_position='last', **kwargs)#

Return a sorted TenFrame.

Returns a copy of this TenFrame (unless inplace is True) with a <sort> operation added to the end. The by parameter can be a column name, or TenSeries, or a list of names/TenSeries to sort on more than one column. If more than one column is specified, the dir parameter is ignored.

tableMetadata()#

Returns the py1010 MetaData for this frame’s table.

If this TenFrame is offline or there is no table associated with this TenFrame, returns None.

tabu(contents=(), *args, **kwargs)#

Low-level access to 1010data <tabu> operator.

Returns a copy of this TenFrame with a <tabu> operator appended.

Parameters:
  • contents (list(Op)) – A list of Op objects, the contents of the <tabu>

  • breaks – (keyword-only) The “breaks” for the tabulation. Defaults to the breaks of this TenFrame.

  • **kwargs – Other key=value pairs to include in the <tabu> op.

tail(n=20)#

Equivalent to self.slice(-n, None, None).

Returns NotImplemented in offline mode.

to_df(maxlen=10000, start=0, *, labels=False)#

Convert the results of this TenFrame into a Pandas DataFrame.

Parameters:
  • maxlen (int) – Maximum number of rows to return (default 10000).

  • start (int) – Index of starting row (default 0).

  • labels (bool) – If True, replace the names of the columns in the 1010data query with their “labels”, if possible.

Returns:

A pandas DataFrame of the data in the query.

Return type:

pandas.DataFrame

Raises:

ModuleNotFoundError – If pandas is not available.

transpose(*, promote=True)#

Return transposed frame.

Returns a copy of this TenFrame with a <transpose> op added to the end of its operations. Note that 1010data requires all columns to be of the same type in order to transpose.

trimCols()#

Remove unnecessary columns.

Remove <willbe>s for unneeded columns that are marked as temporary columns autogenerated by expressions.

weighted_agg(agg_info, breaks=None)#

Returns a TenFrame with a <tabu> for weighted aggregation.

Parameters:
  • agg_info – A sequence of tuples: (col, function, weightcol)

  • breaks – If set, overrides the “breaks” attribute on the object.

Many of the tabulation functions take a pair of columns to work on; a source and a “weight”. Like weighted sum, and also correlation, though “weight” is probably a misnomer.

agg_info should be a sequence of tuples: (col, fun, weightcol).

weighted_agg_1(source, fun, weight, breaks=None)#

Convenience function for a single tabulation.

where(cond, other, inplace=False, *, cols=None)#

Replace values where the condition is False.

Where cond is True, keep the original value; where False, replace with value other.

This replacement happens in all the columns of the table. The 1010data server will raise an error if a column’s underlying type is changed as a result of the replacement. Make sure all the columns are of the same type as the replacement, or you might prefer to run the TenSeries.where() method on an individual column instead.

Parameters:
  • cond – Condition to determine whether to replace. Currently, this condition must be TenSeries or a constant, i.e., it cannot be a function. Note that the replacement happens where the condition is False.

  • other – The value to replace with, when the condition is False. Must be a TenSeries or a constant (i.e., not a function).

  • inplace (bool) – If True, modify this TenFrame and return it. If False (default), return a copy of this TenFrame with the changes added.

  • cols – (Keyword-only) Apply the replacement only to the named columns. Defaults to all columns.

Returns:

A copy of this TenFrame or this TenFrame itself (depending on inplace), with the changes added.

Note

Replacement happens where the condition is False.

wide_to_long(stubnames, i, j, sep='', suffix='\\d+')#

Unpivot a TenFrame from wide to long format.

Less flexible version of melt()

With stubnames [‘A’, ‘B’], this function expects to find one or more group of columns with format A-suffix1, A-suffix2,…, B-suffix1, B-suffix2,… You specify what you want to call this suffix in the resulting long format with j (for example j=’year’)

Each row of these wide variables are assumed to be uniquely identified by i (can be a single column name or a list of column names)

All remaining columns in the TenFrame are left intact.

Parameters:
  • stubnames – String or list of strings specifying the prefix(es) of the wide format column names.

  • i – Column(s) to use as id variable(s).

  • sep (str) – A character indicating the separation of the variable names in the wide format, to be stripped from the names in the long format. For example, if your column names are A-suffix1, A-suffix2, you can strip the hyphen by specifying sep=’-‘. Note that 1010data column names are much more restricted in format than Pandas column names, so there are not many sensible choices for this parameter.

  • suffix (str) – A regular expression capturing the wanted suffixes. \d+ (the default) captures numeric suffixes. Suffixes with no numbers could be specified with the negated character class \D+. You can also further disambiguate suffixes, for example, if your wide variables are of the form A-one, B-two,.., and you have an unrelated column A-rating, you can ignore the last one by specifying suffix='(!?one|two)'.

Return type:

TenFrame

property T#

Return transposed frame.

Returns a copy of this TenFrame with a <transpose> op added to the end of its operations. Note that 1010data requires all columns to be of the same type in order to transpose.

property cols#

The py1010.Column objects.

A tuple of the py1010.Column objects for the columns in this TenFrame (if in online mode). Returns an empty tuple if in offline mode.

Return type:

tuple(py1010.Column,…)

property columns#

The names of the columns in this TenFrame.

Return type:

list(str)

property empty#

True if this DataFrame is empty (no rows)

property firstCol#

The first column in this TenFrame.

Returns the TenSeries of the first column in this TenFrame. In some cases, this might correspond to what would be the “index” in pandas.

Return type:

TenSeries

property iloc#

The py1010 row iterator

Returns the RowIterator object from the underlying query, self.query_.rows.

Return type:

py1010.RowIterator

property lastCol#

The last column in this TenFrame.

Returns the TenSeries of the last column in this TenFrame. This is useful because aggregating functions (like .sum(), .rank()) on TenSeries return TenFrames (containing the breaks columns as well as the result column), unlike in Pandas where such things return Series.

Return type:

TenSeries

property ndim#

Number of dimensions.

Always 2 for a TenFrame.

property offlinecols#

A list of “fake” columns for this TenFrame.

This is computed only if they can be known (easily) without running the query (so this check can be done in offline mode.) In order to be able to know the column names offline, we have to be in one of the three following cases:

  1. The last op is a <colord> which specifies, in its cols attribute, the names of the columns

  2. The last op is a <tabu> which specifies its breaks and also contains the <tcol> elements for the tabulated columns.

  3. The last op is a <table> which lists its columns in the first element within it, a <cols> element. In this case only, the fake columns will have “correct” types. (otherwise, they will all be flagged as type ‘i’, to allow them to be aggregated.)

property shape#

The tuple (numRows, numCols).

Don’t try to do this in offline mode.

Return type:

(int, int)

property size#

Number of elements in the TenFrame (rows*cols).

Not available in offline mode.

Return type:

int

property table#

The table on which this frame is based.

If there was no table specified in the constructor (e.g. if the data= parameter was used, etc.), None is returned.

Return type:

str or None

class tenFrame.TenSeries(frame, columnName)#

A constituent of a TenFrame, representing a single column of data.

Some important instance variables:

Variables:
  • frame (TenFrame) – The TenFrame of which this TenSeries is part.

  • columnName – The name of this column in the TenFrame.

  • str (StrFuncs) – Interface to string-related functions in class StrFuncs.

  • dt (DateProps) – Interface to datetime-related functions in class DateProps.

Constructor for TenSeries.

Create a TenSeries belonging to frame for column named columnName.

class DateProps(col)#

Properties and functions for datetime columns.

This class implements the dt member of a TenSeries. These functions should be called on columns that are marked as 1010data date+time columns, type ‘f’.

Parameters:

col (TenSeries) – The “parent” TenSeries of these properties.

day_name(*args)#

Name of the weekday

See 1010data documentation for possible arguments.

first_of_month()#

The date of the first day of the month.

first_of_quarter()#

The date of the first day of the quarter.

first_of_year()#

The date of the first day of the year.

last_of_month()#

The date of the last day of the month.

last_of_quarter()#

The date of the last day of the quarter.

last_of_year()#

The date of the last day of the year.

month_name(*args)#

Name of the month

See 1010data documentation for possible arguments.

normalize()#

Normalize to midnight.

Returns a datetime column with the times all set to 00:00:00.

property date#

Convert to date.

Returns a “date”-type column containing the date portion of the datetime.

property day#

The day part of the datetime.

property day_of_week#

The day of the week of the datetime.

Monday=0, Tuesday=1, … Sunday=6 (as in Pandas. NOT as the 1010data dayofwk function!)

property day_of_year#

The day of the year.

property dayofweek#

The day of the week of the datetime.

Monday=0, Tuesday=1, … Sunday=6 (as in Pandas. NOT as the 1010data dayofwk function!)

property dayofyear#

The day of the year.

property days_in_month#

How many days there are in this month.

property daysinmonth#

How many days there are in this month.

property hour#

The hour part of the datetime.

property is_leap_year#

Is this year a leap year?

(Only valid for the Gregorian calendar)

property is_month_end#

Is this the last day of the month?

property is_month_start#

Is this the first day of the month?

property is_quarter_end#

Is this the last day of the quarter?

property is_quarter_start#

Is this the first day of the quarter?

property is_year_end#

Is this the last day of the year?

property is_year_start#

Is this the first day of the year?

property minute#

The minute part of the datetime.

property month#

The month part of the datetime.

property quarter#

The quarter of the year (1-4).

property second#

The second part of the datetime.

property time#

Convert to time.

Returns a “time”-type column containing the time portion of the datetime.

property week#

ISO week of year.

property weekday#

The day of the week of the datetime.

Monday=0, Tuesday=1, … Sunday=6 (as in Pandas. NOT as the 1010data dayofwk function!)

property weekofyear#

ISO week of year.

property year#

The year part of the datetime.

class SeriesSmartFuncs(parent)#

Inner class for working with “smart” functions.

Functions that can distinguish g_functions, u_functions, and tabulation-type g_functions based on segmentation! This class implements the _s member of a TenSeries, which is not generally visible to the user, and handles the case of frame['column'].somefunction().

class SeriesSmartFakeFun(name, col)#

Inner class for “smart” functions.

These are what are called when a not-otherwise-known method is called on a TenSeries, e.g. frame.column.somefunctionname(). They translate into calls of like-named functions in 1010data on the server. Depending on circumstances, they may become “g_functions”, tabulations, or “u_functions” (see 1010data documentation for information).

In general, a function will always be interpreted as an aggregating function if at all possible, particularly if groups are specified (see below), otherwise it will become a simple function on its parameters, as described below under ordered parameters.

An “aggregating” function is one which aggregates information over some group of rows. 1010data uses two different tools for this: g_functions and tabulations. Tabulations are the more general tool, and they can be used in any circumstance. But they are not always as efficient as g_functions, and they cause all the resultant data to reside on a single processor, the “accumulator.” G_functions can only be used if the table is “segmented” along one of the groups, and they leave the data distributed (if it was before.) There is much overlap between g_functions and tabulation functions, but not complete overlap. This is a very brief description; please see the 1010data documentation for better explanations.

An aggregating “smart” function will be represented in the query as a g_function if the table is segmented appropriately for the groups being used. Otherwise, a tabulation will be used. (In offline mode, a tabulation is always used, since we cannot inspect the table to find its segmentation.) If the function is called as frame.col.g_something(), i.e., with an actual g_ as the prefix of the function, then it is always treated as a g_function. This is important for user-defined g_functions in particular.

Parameters:#

Ordered (non-keyword) parameters of the function become additional parameters to the server-side function (along with this column). So frame.col.strextract(3, 4) becomes strextract(col;3;4) in the 1010data code, and frame.col1.mod(frame.col2) becomes mod(col1;col2) in 1010data code.

The same is true for g_functions, except that g_functions have some extra parameters for grouping, etc. So frame.col1.g_cov(frame.col2) becomes g_cov(;;col1;col2).

Keywords:#

There are some keyword-only parameters that control some important aspects of smart functions, especially aggregating functions.

group=

A string or list of strings (or columns) indicating the groups to be aggregated over. By default uses the “breaks” from the underlying TenFrame, or in other words frame.groupby("month").sales.sum() works the way you think it should, but you can override that (or be more explicit) by saying frame.sales.sum(group="month")

select=

Should be the name of a boolean column which indicates which rows should participate in the aggregation. Rows where this column is 0 are ignored by the aggregation. It’s more or less equivalent to doing a selection before the aggregation, but g_functions have a special parameter for it, so this parameter is supplied to access it. The default is to include all rows.

order=

Some g_functions depend on the ordering of rows, like cumulative computations. By default, this ordering is simply the order the rows appear in the table, but the order= parameter can be used to name a column that specifies the order in which the rows should be considered.

forcetab=

Boolean. Force using a tabulation even if a g_function might otherwise have been selected. Default False.

adjoin=

Normally, an aggregation will result in a table that only has the group column(s) and the result column (see below), and often has fewer rows than the original table. For example, frame.groupby("month").sales.sum() would be expected to have only 12 rows. If adjoin=True, then the existing columns in the table are not removed, nor are any rows, and the resulting table is the same length as the original table. This functionality is not available in pandas, but it can be very useful. You might want to say frame = frame.groupby("month").sales.sum(adjoin=True) to keep the frame as it is but add a new column (“sales_sum”) which has the sum of the sales in the same month as the data in each row.

name=

Lets you specify the name of the new column being computed. Default is to use NAME_FUNCTION format.

Return Value:#

Smartfuns always return TenFrames, never TenSeries. The corresponding syntax in pandas may return a Series or a DataFrame, but in tenFrame it is always a TenFrame. This is partly because a Series in pandas retains the “index”, so you can see what values the data in the result column corresponds to, while TenFrames have no index. The result column in the returned TenFrame is always the last column, so you can always access it using the .lastCol property. Also, as a special case, you can say

frm[‘newcolname’] = frm.groupby(“col1”).sum()

even though properly speaking the right-hand side of the assignment is a TenFrame. It will do what you expect.

__call__(*args, reverse=False, **kwargs)#

Call self as a function.

class StrFuncs(col)#

String-related functions.

This class implements the str member of the TenSeries class, providing an interface to string-related functions like Pandas does.

capitalize()#

1010data propercase() function.

contains(substr, case=True, regex=True)#

Test if a pattern or regex is contained in the value.

Returns a boolean TenSeries based on whether or not the given pattern or regex is contained within the strings of this TenSeries.

Uses 1010data regex_count() function if regex is True (and checks that the value returned is greater than 0); otherwise uses contains() or contains_ci() depending on the value of case.

Parameters:
  • substr – The substring or regex to search for. May be another TenSeries.

  • case (bool) – Comparison is case-sensitive if True (default).

  • regex (bool) – Compare as a regular expression if True (default).

contains1010(substr)#

Returns a TenSeries testing for substrings.

Parameters:

substr (str) – A string literal

Returns:

a TenSeries defined by the 1010data expression contains(columnName; "substr")

count(reg, *args)#

1010data regex_count() function; accepts flags.

endswith(sfx)#

1010data endswith() function.

find(sub)#

Find the first occurrence of the substring in this value.

Unlike pandas, does not take start, end parameters. Returns NA if not found, unlike pandas which returns -1.

Uses 1010data strfind() function, with the return value adjusted to match pandas conventions.

fullmatch(pat, case=True, flags=0)#

Calls match() with whole=True.

This reqires the regex to match the entire string.

get(ind)#

1010data strpick() function, index-adjusted.

Index is adjusted to match pandas conventions.

index(sub)#

Find the first occurrence of the substring in this value.

Unlike pandas, does not take start, end parameters. Returns NA if not found, unlike pandas which returns -1.

Uses 1010data strfind() function, with the return value adjusted to match pandas conventions.

len()#

1010data strlen() function.

lower(*args)#

1010data strdowncase() function.

lstrip(to_strip=None)#

Call strip() with lrb='l'.

match(pat, case=True, flags=0, *, whole=False)#

Regex matching.

Uses 1010data regex_beg() function. Accepts regex flags just like pandas, etc, and converts into options of 1010data function.

pad(width, side='left')#

Unlike Pandas, can only pad with spaces!

Uses 1010data padleft() or padright() functions.

removeprefix(prefix, *, suffix=False)#

Remove a prefix from a string.

removesuffix(suffix)#

Remove a suffix from a string.

repeat(repeats)#

1010data strrepeat() function.

Unlike pandas, cannot accept a general sequence of ints for its argument. However, it can accept an integer TenSeries.

rstrip(to_strip=None)#

Call strip() with lrb='r'.

startswith(pfx)#

1010data beginswith() function.

strip(to_strip=None, *, lrb='b')#

1010data strtrim() function.

Parameters:
  • to_strip (str) – What do remove from the string. Default " ".

  • lrb (str) – (keyword-only) Which end of the string to strip. "l" means strip from the left end, "r" means strip from the right end, otherwise strip on both ends.

upper(*args)#

1010data strupcase() function.

__abs__()#

Absolute values of this column.

Returns:

a TenSeries representing a <willbe> column holding abs() applied to this column.

__add__(other)#

Add this column’s values to another’s.

Returns:

a TenSeries representing a <willbe> column holding the sum (or string concatenation) of this column and other.

__and__(other)#

Logical AND of this column and another.

Returns:

a TenSeries representing a <willbe> column holding the logical conjunction (AND) of this column and other.

__bool__()#

Always returns True.

__call__(*args, **kwargs)#

Call self as a function.

__ceil__()#

Ceiling of this column’s values.

Returns:

a TenSeries representing a <willbe> column holding ceil() applied to this column.

__eq__(value)#

Make a boolean column on self == value.

Returns:

a TenSeries representing a <willbe> column comparing this column to value.

__float__()#

Reduce to a single float, if there is only one.

If this TenSeries is in online mode, and it has a numeric type, and it contains only a single value (i.e. its length is 1), return that value as a float.

__floor__()#

Floor of this column’s values.

Returns:

a TenSeries representing a <willbe> column holding floor() applied to this column.

__floordiv__(other)#

Divide this column’s values by another’s, rounding down.

Returns:

a TenSeries representing a <willbe> column holding the quotient of this column divided by other, rounded down to an integer. Equivalent to (self / other).floor().astype(int).

__ge__(value)#

Make a boolean column on self >= value.

Returns:

a TenSeries representing a <willbe> column comparing this column to value.

__getattr__(name)#

Get a “fake” function to call on the TenSeries.

__getitem__(index)#

Get the element of this TenSeries at the given index.

If used with a slice, return the same thing that column[slice] would, for the underlying columns.

If the index is not an integer and also not a slice, call the getIndexedItem method (q.v.)

__gt__(value)#

Make a boolean column on self > value.

Returns:

a TenSeries representing a <willbe> column comparing this column to value.

__int__()#

Reduce to a single int, if there is only one.

If this TenSeries is in online mode, and it has an integer type (‘i’ or ‘j’), and it contains only a single value (i.e. its length is 1), return that value as an integer.

__invert__()#

Logical negation of this column.

Returns:

a TenSeries representing a <willbe> column holding the logical negation (NOT) of this column.

__iter__()#

Iterate over the Column represented by this TenSeries.

This is the iterator over the py1010 Column object. Note that unlike pandas, this iteration does not consider grouping with groupby. To obtain behavior like that of pandas, use an expression like:

col = fr.groupby("xyz")['somename']
for x in ((r[0], r[1][col.columnName]) for r in col.frame):
__le__(value)#

Make a boolean column on self <= value.

Returns:

a TenSeries representing a <willbe> column comparing this column to value.

__len__()#

Returns the length of this TenSeries.

Same as the length of this TenSeries’ underlying frame (q.v.), which may be affected by grouping.

In offline mode, returns 0.

__lt__(value)#

Make a boolean column on self < value.

Returns:

a TenSeries representing a <willbe> column comparing this column to value.

__mul__(other)#

Multiply this column’s values by another’s.

Returns:

a TenSeries representing a <willbe> column holding the product of this column and other.

__ne__(value)#

Make a boolean column on self <> value.

Returns:

a TenSeries representing a <willbe> column comparing this column to value.

__neg__()#

Numerical negation of this column.

Returns:

a TenSeries representing a <willbe> column holding the numerical negation of this column.

__nonzero__()#

Always returns True.

__or__(other)#

Logical OR of this column and another.

Returns:

a TenSeries representing a <willbe> column holding the logical disjunction (OR) of this column and other.

__pow__(other)#

Raise this column’s values to another’s.

Returns:

a TenSeries representing a <willbe> column holding the power of this column raised to exponent other.

__radd__(other)#

Add a constant to this column’s values.

Returns:

a TenSeries representing a <willbe> column holding the sum (or string concatenation) of other and this column.

__rand__(other)#

Logical AND of this column and another.

Returns:

a TenSeries representing a <willbe> column holding the logical conjunction (AND) of other and this column.

__repr__()#

Return repr(self).

__rfloordiv__(other)#

Divide a constant by this column’s values, rounding down.

Returns:

a TenSeries representing a <willbe> column holding the quotient of other divided by this column, rounded down to an integer. Equivalent to (other / self).floor().astype(int).

__rmul__(other)#

Multiple a constant by this column’s values.

Returns:

a TenSeries representing a <willbe> column holding the product of other and this column.

__ror__(other)#

Logical OR of this column and another.

Returns:

a TenSeries representing a <willbe> column holding the logical disjunction (OR) of other and this column.

__round__(ndigits=0)#

Apply the round() function to this column’s values.

__rpow__(other)#

Raise another column’s values to this one’s.

Returns:

a TenSeries representing a <willbe> column holding the power of other raised to the exponent in this column.

__rsub__(other)#

Subtract this column’s values from a constant.

Returns:

a TenSeries representing a <willbe> column holding the difference of other and this column.

__rtruediv__(other)#

Divide a constant by this column’s values.

Returns:

a TenSeries representing a <willbe> column holding the quotient of other divided by this column.

__sub__(other)#

Subtract another column’s values from this one’s.

Returns:

a TenSeries representing a <willbe> column holding the difference of this column and other.

__truediv__(other)#

Divide this column’s values by another’s.

Returns:

a TenSeries representing a <willbe> column holding the quotient of this column divided by other.

astype(data_type)#

Cast this column as another type, on the server side.

Make a copy of this TenSeries’ underlying TenFrame and add a new computed column whose value is data_type(self.colexp()), then return the TenSeries of that column.

basic_agg(fun, *, breaks=None)#

Run basic_agg() on the underlying TenFrame.

Parameters:
  • fun (str) – Name of an aggregating function.

  • breaks – Override the breaks specified by this TenFrame.

Call basic_agg(fun, this_column) on the underlying TenFrame, and return a TenSeries corresponding to the aggregated version of this TenSeries.

clip(lower=None, upper=None, inplace=False, *args, **kwargs)#

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or columns.

Parameters:
  • lower – Minimum threshold value or column. All values below this threshold will be set to it. If None, will not clip the minimum.

  • upper – Maximum threshold value or column. All values above this threshold will be set to it. If None, will not clip the maximum.

  • inplace (bool) – Operate on this column’s TenFrame directly, or make a copy.

Returns:

A TenSeries, either in a copy of this one’s TenFrame or in this one’s TenFrame, depending on the choice of inplace.

colexp(wholeop=False, onlytemp=False, justop=False)#

Find the expression which defines this column.

Parameters:
  • wholeop (bool) – If True, return the op in which the expression occurs instead of just the expression.

  • onlytemp (bool) – If True, only search if this column is a temporary column. Otherwise just return the column’s name.

Checks the frame’s ops in reverse order, looking for an operator which defines this TenSeries’ column. If no such operator is found, just return the name of this column. If such an operator is found, return its “value” attribute.

If wholeop is True, return the op object and not just its value attribute. This is the top-level op containing the <tcol> or <willbe> that defines the column, i.e. the one that’s actually in the ops of this frame.

If justop is True, it’s like wholeop except it returns only the actual <tcol> or <willbe> that defines the column, and not the containing op(s).

colname()#

The name of the column represented by this TenSeries.

comparison(op, value)#

Compare this column with another or a constant.

Parameters:
  • op (str) – A string to be used as the comparison operation between this column and the other (e.g. “>”, “<”, etc.)

  • value – The TenSeries or literal to compare with.

Generic function for building a <willbe> column for the comparison between this column and another column or a constant value.

copy()#

Make a copy of this TenSeries.

Create a copy of this TenSeries’ underlying TenFrame object, and return the TenSeries of that frame corresponding to the same column as this one.

cut(bins, labels=None)#

Bin values into discrete intervals.

Segments and sorts data into bins. Has some slight differences from pandas cut.

The bins variable may be an integer or a list of numbers, which represent the upper edges of the bins. The bins list must be sorted in strictly ascending order. Binning is done using the 1010data range2(X;L) function which returns the smallest number in the list that is greater than or equal to the input value.

The labels parameter, if used, must be a list whose length is one greater than that of the bins list (unlike in pandas, where it is to be the same length.) Binning is then done with the range2list(X;L;R) function, which see for details. The last label is returned for values larger than the largest value in bins. Labels can only be specified when bins is a list.

If bins is an integer, the minimum and maximum values of this column (the range) are computed, and the maximum is increased by 0.1% of the range (as in pandas) to ensure that the maximum values will still be included. Binning is done by the range1(X;S;E;I) function

Parameters:
  • bins – Integer or list of numbers.

  • labels – Optional list of labels for the bins. Must have ONE ELEMENT MORE than the number of bins.

Returns:

A TenSeries, how about that?

expanding(min_periods=1, on='', group=None, select=None)#

Perform expanding statistics.

Not as flexible as pandas’ expanding(). A way to make 1010data cumulative g_functions available via a pandas-like syntax.

>>> frame.col.expanding().sum()

Available functions:

  • .expanding().count()

  • .expanding().sum()

  • .expanding().max()

  • .expanding().min()

Parameters:
  • min_periods – IGNORED.

  • on – Column(s) on which to order the rows when computing. Defaults to the row order in the database.

  • group – Column(s) on which to group the data, like any g_fun. Defaults to using the grouping of the parent TenFrame, if any.

  • select – Boolean column on which to select values to participate in the calculation, like any g_fun. Defaults to no selection.

Returns:

A TenFrame resulting from the appropriate g_fun, containing the break column(s) and the moving statistic column.

fillna(*args, **kwargs)#

Replace NA values in this column.

See the TenFrame method TenFrame.fillna()

getIndexedItem(index)#

Attempt to find an item by tabulation index.

In Pandas, when you have done an aggregation after grouping (with groupby), the elements of the columns you grouped-by can be used as an index for the grouped columns. So if you do g = fr.groupby('month').sum(), then you can access the sum of the sales column for January with g['sales_sum']['January']. In 1010data, the group-defining values are simple values in their respective old columns. This function handles performing the Pandas-style indexing, and returns a new TenSeries (NOT a simple value!), which is part of a TenFrame copied from the current one and extended by <sel> and <colord> ops to select the specified element(s).

This function is called automatically by __getitem__() if the index requested is not an integer or slice, but must be called explicitly if you are trying to index off of an integer-valued key.

The index can one of:

  1. A string or numeric value, the value of the group being sought.

  2. An Iterable of strings and numeric values, to specify the index when the grouping was done over multiple columns.

  3. A dictionary of {columname: value} pairs.

classmethod gfunction(name, groups, sels, cols, *args)#

Generic G-function: fun(G;X;S)

Returns a string “name(groups; cols; sels[; xtras])” where groups, cols, and sels are the contents of the like-named arguments concatenated into space-separated strings. xtras is the same applied to any further arguments.

groupby(args)#

Mark data for aggregation.

Run TenFrame.groupby() on the underlying TenFrame and return it.

head(n=20)#

Returns a Pandas DataFrame of the first n values of this Series.

hist(*args, **kwargs)#

Plot a histogram.

Equivalent to:

self.frame[[self.columnName]].hist(*args, **kwargs)

Calls the TenFrame.plot() method on a copy of this TenSeries’ underlying TenFrame restricted to just this column, with the plot kind set to histogram.

info(*args, **kwargs)#

Get info on this column.

Runs the TenFrame.info() method on the underlying TenFrame, restricted to this column.

is_used()#

Is this column depended on by any other column?

isin(stuff)#

Is the value of this column in the given list?

Parameters:

stuff (list) – List of literals.

Returns:

A boolean TenSeries.

isna()#

Boolean column, 0 or 1 if this column is NA.

isnull()#

Boolean column, 0 or 1 if this column is NA.

map(arg)#

Map values of a Series according to an input mapping.

This uses the 1010data rep() function, so there are some differences in behavior compared to pandas Series.map(). If the value to be mapped is not found in the mapping, N/A is not returned, but the original value is left as-is.

Parameters:

arg (dict) – Should be a dictionary or other “Mapping” object (implementing an .items() method to list its key/value pairs.)

Returns:

A TenSeries in copy of this TenFrame, with a temporary name.

Return type:

TenSeries

nlargest(n)#

Find the n largest values.

Returns a TenSeries containing the largest n values in this TenSeries.

notna()#

Boolean column, 0 or 1 if this column is not NA.

notnull()#

Boolean column, 0 or 1 if this column is not NA.

nsmallest(n)#

Find the n smallest values.

Returns a TenSeries containing the smallest n values in this TenSeries.

online()#

True if this TenSeries’ frame is in “online” mode.

qcut(q, labels=None, retbins=False, nbins=100, approx=False, group=None)#

Quantile-based discretization.

Discretize variables into equal-sized buckets based on rank or on sample quantiles.

Parameters:
  • q – Either an int (how many equal-sized buckets to make) or a list of floats (in order, between 0.0 and 1.0), describing the edges of the buckets. So the integer 4 is equivalent to the list [0.0, 0.25, 0.5, 0.75, 1.0]. If the first element of the list is greater than 0.0 and/or the last element of the list is less than 1.0, values which fall outside of the range will be given a value of NA.

  • labels – A list of labels (strings or ints) or None (default) or False. These are the values to be used in the resulting column for each of the bins. List must be of the same length as the resulting bins. If False, considered to be integers 0…#bins. If None, uses the values at the upper edges of the bins.

  • retbins (bool) – If True, ignore the labels and return a TenFrame containing the bin boundaries (NOT the same as in pandas!) Default False.

  • nbins (int) – When approx=True, set number of storage nodes to use in performing the estimation. Default is 100, maximum is 1000. Ignored when approx=False.

  • approx (bool) – Use approximate quantiles (tdigest algorithm) instead of computing exact quantiles. Approximate quantiles are faster for high cardinality quantiles (hundreds of millions of unique values per group), but slow when the cardinality of the source for each break is small. Default False.

  • group – List of column names on which to group. If None, use the grouping of the TenFrame, as set by groupby(), if any.

Returns:

A TenSeries reflecting the bin for each row, as specified by the labels parameter.

rename(newname)#

DESTRUCTIVELY rename this column.

Alter this TenSeries’ underlying TenFrame, replacing self.columnName with newname, wherever it appears in the ops, by running the frame’s renameColumn() method (q.v.)

replace(*args, **kwargs)#

Replace values in this column where a condition is True.

The 1010data server will raise an error if a column’s underlying type would be changed by the replacement.

See the TenFrame method TenFrame.replace().

rolling(window, center=False, on='', group=None, select=None)#

Perform rolling statistics.

Not as flexible or powerful as pandas’ .rolling(), but some rolling-window statistics are available via 1010data g_functions g_movavg(), g_movcnt, etc. The .rolling() method provides access to these using pandas-like syntax:

>>> frame.col.rolling(5).mean()

Available functions:

  • .rolling(window).mean()

  • .rolling(window).count()

  • .rolling(window).max()

  • .rolling(window).min()

  • .rolling(window).product()

  • .rolling(window).sum()

  • .rolling(window).var()

Parameters:
  • window (int) – Size of the rolling window

  • center (bool or str ('lag'), optional) – Whether or not the window is centered on each value computed. Default is False, meaning that each value is considered to be on the leading edge of the window. If given the value ‘lag’ (string), the each value will be considered on the trailing edge of the window.

  • on (list(TenSeries) or list(str), optional) – Column(s) on which to order the rows when computing. Defaults to the row order in the database.

  • group (list(TenSeries) or list(str), optional) – Column(s) on which to group the data, like any g_fun. Defaults to using the grouping of the parent TenFrame, if any.

  • select (TenSeries or str) – Boolean column on which to select values to participate in the calculation, like any g_fun. Defaults to no selection.

Returns:

A TenFrame resulting from the appropriate g_fun, containing the break column(s) and the moving statistic column.

Return type:

TenFrame

shift(periods=1, fill_value=None, order='', name=None)#

Shift by desired number of periods.

Returns a TenSeries shifted by the specified number of periods.

Note that unlike Pandas, there is no freq parameter, but this may be emulated by means of grouping.

Examples:

frame.col1.shift(1)
frame.groupby("month").col1.shift(-3)
frame.groupby("month").col1.shift(2, fill_value=0, order="store",
              name="shifted")
frame.col1.shift(12, fill_value=frame.col2)

Shifting can only be done if the table is segmented on the groups.

Parameters:
  • periods (int) – Number of periods to shift. May be positive or negative.

  • fill_value – Scalar value to use for missing data caused by shift. May be a number, a string, or another column from this TenFrame.

  • order – You can give a column name by which to order the entries within a group.

  • name (str) – Name to give the resulting column. Defaults to COLUMNNAME_shift.

Returns:

A TenSeries (with the given name) belonging to a TenFrame which is a copy of the TenFrame to which this TenSeries belongs. The shift is accomplished by the 1010data function g_rshift().

slice(start=None, stop=None, step=None)#

Make a “server-side slice.”

Returns a new TenSeries which is part of a new TenFrame which is a copy of this frame, with a <sel> op added to the end to limit the range to the specified slice (by selecting on the value of i_()), per the TenFrame slice() method (q.v.).

The step value is currently ignored.

sort_values(ascending=True, inplace=False, **kwargs)#

Return a sorted TenSeries

Parameters:
  • ascending (bool) – Sort in ascending or descending order (default True)

  • inplace (bool) – Alter the TenFrame to which this TenSeries belongs, or make a copy (default False)

  • **kwargs – Any other key=value attributes for the <sort> op.

Creates a copy of the underlying TenFrame and adds a <sort> op to the end of it, sorting on this column, then returns the TenSeries corresponding to this one.

classmethod ufunction(name, args, *, quotes=False)#

Generic N-argument function.

Returns a string “name(args[0];args[1];args[2]…)”

unique(sort=True)#

Find unique values.

Returns a TenFrame containing the unique values present in this TenSeries. Pandas returns these in a numpy array, but with Big Data we can’t assume it can fit in one.

Parameters:

sort (bool) – If True, sort the unique values. Note that for large tables with high cardinality, sorting may be impractical or even impossible(1010data might raise an error.) Default True.

Returns:

A TenFrame with the unique values.

usages()#

Where is this column referenced?

Returns list of ops (in the same frame) which appear in any way to reference this column. For checking if a column is not used.

value_counts(normalize=False, sort=True, ascending=False, dropna=True)#

Provide counts of all the values in this TenSeries.

Parameters:
  • normalize (bool) – If True, normalize the frequencies by dividing each oneby the total number of rows counted in the table, thereby giving a percentage of the total. Default False.

  • sort (bool) – If True, sort the table on the number of occurrences. Default True.

  • ascending (bool) – If True (and sort=True), sort in ascending order. Default False (i.e., sort in descending order.)

  • dropna (bool) – If True, rows in this column wih NA are not counted. Default True.

Returns:

A frequency table for this TenSeries: a two-column TenFrame with a row for each unique value found in this TenSeries and the number of times it occurs

Return type:

TenFrame

weighted_agg(fun, other, breaks=None)#

Run weighted_agg on the underlying TenFrame.

Parameters:
  • fun (str) – Name of an aggregating function.

  • other (TenSeries) – A TenSeries providing the “weights”.

  • breaks (list(str) or list(TenSeries), optional) – Override the breaks specified by this TenFrame.

Returns:

A “weighted” aggregation TenSeries using function fun and weight-column other, by calling weighted_agg([(self.columnName, fun, other)], breaks=breaks) on the underlying TenFrame, and returning a TenSeries corresponding to the aggregated version of this TenSeries.

Return type:

TenSeries

where(*args, **kwargs)#

Replace values in this column where a condition is False.

The 1010data server will raise an error if a column’s underlying type would be changed by the replacement.

See the TenFrame method TenFrame.where().

property breaks#

The breaks attribute of the underlying TenFrame.

property column#

The py1010 Column object represented by this TenSeries.

In “offline” mode, returns a “fake” column object which reports its type to be ‘i’.

property ndim#

Number of dimensions.

Always 1 for a TenSeries.

property plot#

Access plotting functionality.

Equivalent to:

self.frame[[self.columnName]].plot

Accesses the TenFrame.plot() method on a copy of this TenSeries’ underlying TenFrame, restricted to just this column.

property temporary#

Is this a “temporary” column?

Temporary columns are removed the next time an new column is added to a TenFrame, if there is no other column in the frame that depends on them. This property only makes sense for columns created as computations on other others (“<willbe> columns”), not for original columns from the table or from joins or tabulations. Columns for which this property is not meaningful will return None for this property, while columns for which it is meaningful will return True or False depending on if the column is marked as temporary, without regard to whether or not other columns depend on it.

property type#

The type of this column.

Slightly smarter than self.column.type: returns “date”, “time”, or “datetime” if it can infer this from the columns format_type. Otherwise returns self.column.type.

In offline mode, always returns ‘i’.

class tenFrame.Library(session, path)#

Bases: TenFrame

A TenFrame that only has a library.

Load or initialize a library.

Creates a Library, a kind of TenFrame that has no ops, only definitions in its library. Unlike ordinary TenFrames, a Library is run as soon as it is initialized.

Parameters:
  • session (py1010.Session) – A py1010 Session object.

  • path (str) – A string providing the path to a Query in the 1010data Object Tree. This query should provide functions/definitions in its <library> op (all other ops will be discarded). If no query is found at this pathname, an empty Library is created, to be saved to the named path with the .save() method.

def_gfun(*args, **kwargs)#

def_gfun decorator which forces running the library.

def_ufun(*args, **kwargs)#

def_ufun decorator which forces running the library.

run(force=True)#

Run this Library.

By default, running a library forces a run, since the ops don’t change.

save(path=None, *args, **kwargs)#

Save this Library.

Same as TenFrame.save, except:
  1. The path is optional and defaults to the path from this Library’s construction,

  2. The “materialize” option is silently ignored, and

  3. The “force” option defaults to True.

class tenFrame.Op(opname, _contents=None, **kwargs)#

Bases: dict

A 1010data macro language operation.

This is really a representation of a generic XML element, with a name, attributes, and optionally contents. It’s a subclass of dict, representing the attributes of the element as the key=value elements of the dictionary, and the contents and the name by special elements with names that are illegal as XML attribute names (because they have spaces in them.) The contents of an Op (op[' contents ']) is expected to be a list of Ops (or strings). Also handles some column renaming and replacement in strings.

The following properties are shortcuts for setting/accessing the special elements:

Variables:
  • opname – The name of this operator. Equivalent to the '␣opname␣' member of the dictionary. Read-only, set at construction-time.

  • contents – A list of ops (or possibly strings) that constitute the contents of this XML element. Equivalent to the '␣contents␣' member of the dictionary.

  • tempCol – A boolean value indicating whether or not this operator represents a “temporary” column which may be replaced by its value in some cases. Equivalent to the '␣tempCol␣' member of the dictionary.

Constructor for Op class.

Parameters:
  • opname – Name of the operator/XML tag. Must be a legal XML tag-name string.

  • _contents – List containing the contents of the op, if any. Note the leading underscore.

  • **kwargs – Any key=value attributes for the XML element.

__str__()#

Convert op to XML

Returns:

an XML representation of this Op.

tenFrame.that = ((FAKE TenFrame))<base table="default.lonely"/>#

A “fake” TenFrame.

Used to implement the that special variable. “Compatible” with all TenFrames, but always uses the other one’s ops when combining.

Usually used as a way to avoid having to give a TenFrame again, or in some “generic” sense. Examples:

s2019.groupby("date").xsales.mean()[tf.that.date == datetime.date(2019, 4, 2)]
t3 = t1.merge(t2, on="id")[tf.that.width > 18.0]

Users should not be creating these objects directly.

tenFrame.func = <tenFrame.tenFrame.FunctionFaker object>#

Used to return FakeFunctions when needed.

This class implements the module-level func object, making func.function() work for any arbitrary function name. Essentially, func.name(x, y, z) becomes a function-call in the XML that looks like name((x);(y);(z)). Functions whose names start with g_ are treated specially, to make g_fun expressions.

Users should not be creating these objects directly.