Classes Involved in Uploading and Downloading#

py1010.AWSKey#

alias of CloudKey

class py1010.SourceInfo(files=None, rectype=None, sep=None, eor=None, maskw=None, mchr=None, arch=None, format=None, long begbytes=0, long begrecs=0, long numrecs=0, int autoCorrect=0, int numCols=0, ptr=None, *)#

Class describing an individual file as a data source.

A “source” for 1010data is a description of some file outside of the 1010data object tree. A source is described by a SourceInfo object, which specifies features like format and column-separators, etc, and a SourceInfo contains one or more SourceFile objects, which specify locations of actual files (FTP upload directories or cloud storage services.)

Since a “source” describes an external resource, the same objects are used as “destinations” to describe files and formats for writing output.

SourceInfo objects contain metadata about the format of an external file. Several of the fields are meant to hold values from special enumeration classes, which are internal classes of SourceInfo. So the rectype can be SourceInfo.RecType.SEPARATED or SourceInfo.RecType.FIXED. See below, and individual docstrings.

Variables:
  • sourceType – Type of source (FTP, S3, etc.)

  • sep – Column separator

  • eor – Record separator

  • maskw – Max length of variable-width columns

  • mchr – “Masking” character

  • arch – Architecture: little-endian or big-endian

  • format – Type of file (“xlsx” or empty)

  • begbytes – Number of bytes to skip at the start

  • begrecs – Number of records to skip at the start

  • numrecs – Number of records to upload (0 for all)

  • autoCorrect – Enable simple autocorrection feature?

  • truncate – Autocorrect truncate control

  • pad – Autocorrect pad control

  • fix_mask – Autocorrect fix-mask control

  • numCols – Number of columns

  • ignoreNull – Replace '\\0' with ' ' (space)?

Constructor for SourceInfo objects.

Parameters:
  • files – A list of SourceFile objects (or strings, which are taken to be filenames in an FTP directory)

  • rectype – Record type: RecType.SEPARATED or RecType.FIXED or None

  • sep – Column separator

  • eor – Record separator

  • maskw – Max width of variable columns

  • mchr – “Masking” character

  • arch – Architecture: Arch.BENDIAN or Arch.LENDIAN or None

  • format – Either “xlsx” or None (default) for text files

  • begbytes – Bytes to skip at the beginning (default 0)

  • begrecs – Records to skip at the beginning (default 0)

  • numrecs – Number of records to load (default 0, for “all”)

  • autoCorrect – Enable autoCorrect? (default 0 (False))

  • numCols – Number of columns

class Arch(value)#

An enumeration.

class AutoCorrectType(value)#

An enumeration.

class RecType(value)#

An enumeration.

class SrcType(value)#

An enumeration.

getWorksheets(self, Session s)#

Run the getworksheets transaction on this SourceInfo object (which should describe a .xlsx source) using the supplied session.

init(self, files=None, rectype=None, sep=None, eor=None, maskw=None, mchr=None, arch=None, format=None, long begbytes=0, long begrecs=0, long numrecs=0, int autoCorrect=0, int numCols=0)#

Initialize fields on construction.

arch#

Architecture or “endianness” of this source. May be Arch.BENDIAN (big-endian) or Arch.LENDIAN (little-endian) or None (unspecified).

autoCorrect#

Specify “simple” autocorrection: True or False.

begbytes#

Bytes to skip at the beginning.

begrecs#

Records to skip at the beginning.

eor#

Row separator for this Source.

fix_mask#

Autocorrect fix-mask control, for delimited columns only.

Set to AutoCorrectType.NONE, AutoCorrectType.LEFT, AutoCorrectType.RIGHT, AutoCorrectType.LONG, or AutoCorrectType.SHORT.

format#

File-format of this Source. May be None or “” (for text files) or “xslx”.

ignoreNull#

Replace NUL (’\0’) characters with spaces? Set to True, False, or None (unspecified, default).

maskw#

The “masking width,” or the maximum width of variable-length columns in this source. Default 10000.

mchr#

Masking character for this Source.

numCols#

Number of columns in input data.

numFiles#

Number of SourceFiles in this Source.

This property may not be set directly.

numrecs#

Number of records to read.

pad#

Autocorrect pad control.

Set to AutoCorrectType.NONE, AutoCorrectType.RIGHT, or AutoCorrectType.LEFT.

rectype#

Record type for this Source.

May be RecType.SEPARATED or RecType.FIXED or None (unspecified).

sep#

Column separator for this Source.

sourceType#

Type of this source.

May be SrcType.S3, SrcType.ABS, or SrcType.FTP. This property is not set directly, but is determined by whether or not the first SourceFile has a “bucket” property or “container” property set.

(SourceFiles of different types may not be combined in the same SourceInfo.)

truncate#

Autocorrect truncate control.

Set to AutoCorrectType.NONE, AutoCorrectType.RIGHT, or AutoCorrectType.LEFT.

class py1010.SourceFile(path, bucket=None, keyname=None, sheetID=None, range=None, account=None, container=None, ptr=None, *)#

Class describing an individual file as a data source.

A “source” for 1010data is a description of some file outside of the 1010data object tree. A source is described by a SourceInfo object, which specifies features like format and column-separators, etc, and a SourceInfo contains one or more SourceFile objects, which specify locations of actual files (FTP upload directories or cloud storage services.)

Since a “source” describes an external resource, the same objects are used as “destinations” to describe files and formats for writing output.

Construct a SourceFile object.

The SourceFile contains location information for a file outside of 1010 (in an FTP upload directory, an S3 bucket, or in ABS).

Parameters:
  • path – The filename of the file.

  • bucket – The S3 bucket, for files in S3. Leave as None for files in FTP or ABS.

  • keyname – The name assigned to the AWS key to use to access the file. See the Session.addKey() method of Session objects. Leave as None for files in FTP.

  • sheetID – To specify a worksheet in an XLSX workbook, pass the sheet’s ID here (as returned by the getworksheets transaction).

  • range – For specifying a cell-range in an XSLX worksheet.

  • account – The ABS account to be used. Leave as None for files in FTP or S3 storage.

  • container – The ABS container to be used. Leave as None for files in FTP or S3 storage.

init(self, path, bucket, keyname, sheetID, range, account, container)#

Set object attributes on construction.

account#

The account to use on ABS to access the file, or None.

bucket#

The S3 bucket containing the file (or None).

container#

The container to use on ABS to access the file, or None.

keyname#

The user-assigned name of the AWS key to use to access the file, or None

path#

The filename of the file.

range#

A range of cells in an XLSX worksheet which this object refers to, if relevant.

(Implementation note: this property will not contain the value None. If you set it to None, that really sets it to b’’)

sheetID#

The sheetID of the worksheet this object refers to, within an XLSX workbook, if relevant.

(Implementation note: this property will not contain the value None. If you set it to None, that really sets it to b’’)

class py1010.SourceColumnInfo(name=None, title=None, type_=None, format=None, int width=0, exp=None, double scale=0.0, int alpha=Alpha.SKIP, int order=0, int skip=Skip.NOSKIP, int nowrite=Write.WRITE, ptr=None, *)#

Class for holding metadata about a column in a source to be uploaded.

Several of the fields are meant to hold values from special enumeration classes, which are internal classes of SourceColumnInfo. So the “nowrite” parameter can be SourceColumnInfo.Write.WRITE or SourceColumnInfo.Write.NOWRITE. See below, and individual docstrings.

Variables:
  • name – Column name

  • title – Column title

  • type_ – Type of column: a string (or bytes): “text”, “int”, “float”, or “bigint”

  • format – Column format descriptor (string)

  • width – Width of column; 0 (default) for no input width.

  • exp – Expression to be applied to this column before upload

  • scale – Decimal value by which to divide this column before upload. 0.0 (default) for none

  • alpha – Alphabetic case into which to force this column before upload. One of Alpha.UPPER, Alpha.LOWER, or Alpha.SKIP (default)

  • order – Positive integer for the position of this column in a reordering; 0 (default) for no reordering.

  • skip – Skip this column or not? One of Skip.SKIP or Skip.NOSKIP (default)

  • nowrite – Write this column? One of Write.WRITE (default) or Write.NOWRITE

class Alpha(value)#

An enumeration.

class Skip(value)#

An enumeration.

class Write(value)#

An enumeration.

alpha#

Alphabetic case into which to force this column prior to upload

One of SourceColumnInfo.Alpha.UPPER, SourceColumnInfo.Alpha.LOWER, or SourceColumnInfo.Alpha.SKIP.

exp#

Column expression.

format#

Column format.

name#

Name of the column.

nowrite#

Whether or not to write this column.

Set to SourceColumnInfo.Write.WRITE or SourceColumnInfo.Write.NOWRITE

order#

A positive integer indicating this column’s position in a revised column order, or 0 for no reordering.

scale#

Column scale

Decimal value by which to divide the values in this column prior to upload, or 0 for none.

skip#

Whether or not to skip this column on loading.

Set to SourceColumnInfo.Skip.SKIP or SourceColumnInfo.Skip.NOSKIP

title#

Column title.

type#

Type of column.

A string (or bytes), one of: “text”, “int”, “float”, “bigint”

width#

Column width.

class py1010.TableInfo(name, int ID=0, title=None, sdesc=None, ldesc=None, type_=u'', int secure=0, int own=0, owner=None, update=None, int favorite=0, users=None, display=None, int report=0, int chart=0, link=u'', long numRows=0, long numBytes=0, int segs=0, int access=0, long maxdown=0, mode=Mode.REPLACE, stripe=None, stripe_factor=None, ptr=None, *)#

Class for holding metadata about a table as an upload target.

Holds data about a table for uploading (with addTableSpecs).

Several of the fields are meant to hold values from special enumeration classes, which are internal classes of TableInfo. So the mode can be TableInfo.Mode.APPEND or TableInfo.Mode.REPLACE or TableInfo.Mode.NOREPLACE. See below, and individual docstrings.

class Mode(value)#

An enumeration.

class Perm(value)#

An enumeration.

class SegType(value)#

An enumeration.

class TimeSeries(value)#

An enumeration.

init(self, name, int ID=0, title=None, sdesc=None, ldesc=None, type_=u'', int secure=0, int own=0, owner=None, update=None, int favorite=0, users=None, display=None, int report=0, int chart=0, link=u'', long numRows=0, long numBytes=0, int segs=0, int access=0, long maxdown=0, mode=Mode.REPLACE, stripe=None, stripe_factor=None)#
access#

Boolean 1 or 0 indicating whether or not this table is accessible.

chart#

Boolean 1 or 0 indicating whether or not chart specifications are saved for this table.

favorite#

Boolean 1 or 0 indicating whether or not the transaction UID has favorited this table.

id#

Unique identifier for this table.

ldesc#

Long description of the table, if any.

Link header of table, or NULL for no link header.

materialize#

Boolean 1 or 0 indicating whether or not this table is materialized.

maxdown#

Maximum download limit of table, or a non-positive integer for the default maxdown.

merge#

Boolean 1 or 0 indicating whether or not this table is appendable.

method#

Materialize method, or None for the default method.

mode#

Append or replace?

name#

Full path to the table.

numBytes#

Number of bytes in the table.

numCols#

Number of columns in this table.

numRows#

Number of rows in the table.

own#

Boolean 1 or 0 indicating whether or not the transaction UID is the owner of this table.

owner#

UID or groupname of the owner of this table, or None for the default owner.

report#

Boolean 1 or 0 indicating whether or not report specifications are saved for this table.

responsible#

Boolean 1 or 0 indicating whether or not the user is responsible for replication of data.

sdesc#

Short description of the table, if any.

secure#

Boolean 1 or 0 indicating whether or not this table is secure. Deprecated in API.

segmentation#

Comma-separated list of the names of segmentation columns.

segs#

Number of segments spanned by this table.

segsize#

Size of the segments of this table.

segtype#

Integer representing segmentation type of this table. Either TableInfo.SegType.SEGBY or TableInfo.SegType.SORTSEG. 0 if “segmentation” is None.

sort#

Comma-separated list of the names of sort columns.

stripe#

How many machines to stripe the data across.

stripe_factor#

Fraction of machines to stripe data across.

timeSeries#

Integer representing whether or not time-series segmentation is used for this table. Either TENTEN_TS or TENTEN_NOTS. 0 if “segmentation” is None.

title#

Title of the table, if any.

type#

Type of table. Currently, can be “REAL”, “VIEW”, “PARAM”, “MERGED”, “UQ”, or “TOLERANT”.

update#

Datetime of last modification to this table.

users#

users: object

class py1010.ColumnInfo(name, type_, title=None, desc=None, format=None, int index=Index.NOINDEX, int fix=Fix.NOFIX, ptr=None, *)#

Class for holding metadata about a column being uploaded into.

Fix#

alias of Index

class Index(value)#

An enumeration.

fromSourceCol(type cls, scol)#

Translate a SourceColumnInfo into a ColumnInfo by copying over some key fields.

init(self, name, type_, title=None, desc=None, format=None, int index=Index.NOINDEX, int fix=Fix.NOFIX)#
desc#

Description of the column

format#

Format of the column.

name#

Name of the column

title#

Title of the column, displayed when the table is viewed.

type#

Type of the column (“integer”, “yyyymmdd”, etc.)