Classes Involved in Uploading and Downloading#
- class py1010.SourceInfo(files=None, rectype=None, sep=None, eor=None, maskw=None, mchr=None, arch=None, format=None, long begbytes=0, long begrecs=0, long numrecs=0, int autoCorrect=0, int numCols=0, ptr=None, *)#
Class describing an individual file as a data source.
A “source” for 1010data is a description of some file outside of the 1010data object tree. A source is described by a
SourceInfo
object, which specifies features like format and column-separators, etc, and a SourceInfo contains one or moreSourceFile
objects, which specify locations of actual files (FTP upload directories or cloud storage services.)Since a “source” describes an external resource, the same objects are used as “destinations” to describe files and formats for writing output.
SourceInfo objects contain metadata about the format of an external file. Several of the fields are meant to hold values from special enumeration classes, which are internal classes of SourceInfo. So the
rectype
can beSourceInfo.RecType.SEPARATED
orSourceInfo.RecType.FIXED
. See below, and individual docstrings.- Variables:
sourceType – Type of source (FTP, S3, etc.)
sep – Column separator
eor – Record separator
maskw – Max length of variable-width columns
mchr – “Masking” character
arch – Architecture: little-endian or big-endian
format – Type of file (“xlsx” or empty)
begbytes – Number of bytes to skip at the start
begrecs – Number of records to skip at the start
numrecs – Number of records to upload (0 for all)
autoCorrect – Enable simple autocorrection feature?
truncate – Autocorrect truncate control
pad – Autocorrect pad control
fix_mask – Autocorrect fix-mask control
numCols – Number of columns
ignoreNull – Replace
'\\0'
with' '
(space)?
Constructor for SourceInfo objects.
- Parameters:
files – A list of SourceFile objects (or strings, which are taken to be filenames in an FTP directory)
rectype – Record type:
RecType.SEPARATED
orRecType.FIXED
or Nonesep – Column separator
eor – Record separator
maskw – Max width of variable columns
mchr – “Masking” character
arch – Architecture:
Arch.BENDIAN
orArch.LENDIAN
or Noneformat – Either “xlsx” or None (default) for text files
begbytes – Bytes to skip at the beginning (default 0)
begrecs – Records to skip at the beginning (default 0)
numrecs – Number of records to load (default 0, for “all”)
autoCorrect – Enable autoCorrect? (default 0 (False))
numCols – Number of columns
- class Arch(value)#
An enumeration.
- class AutoCorrectType(value)#
An enumeration.
- class RecType(value)#
An enumeration.
- class SrcType(value)#
An enumeration.
- getWorksheets(self, Session s)#
Run the getworksheets transaction on this SourceInfo object (which should describe a .xlsx source) using the supplied session.
- init(self, files=None, rectype=None, sep=None, eor=None, maskw=None, mchr=None, arch=None, format=None, long begbytes=0, long begrecs=0, long numrecs=0, int autoCorrect=0, int numCols=0)#
Initialize fields on construction.
- arch#
Architecture or “endianness” of this source. May be
Arch.BENDIAN
(big-endian) orArch.LENDIAN
(little-endian) or None (unspecified).
- autoCorrect#
Specify “simple” autocorrection: True or False.
- begbytes#
Bytes to skip at the beginning.
- begrecs#
Records to skip at the beginning.
- eor#
Row separator for this Source.
- fix_mask#
Autocorrect fix-mask control, for delimited columns only.
Set to
AutoCorrectType.NONE
,AutoCorrectType.LEFT
,AutoCorrectType.RIGHT
,AutoCorrectType.LONG
, orAutoCorrectType.SHORT
.
- format#
File-format of this Source. May be None or “” (for text files) or “xslx”.
- ignoreNull#
Replace NUL (’\0’) characters with spaces? Set to True, False, or None (unspecified, default).
- maskw#
The “masking width,” or the maximum width of variable-length columns in this source. Default 10000.
- mchr#
Masking character for this Source.
- numCols#
Number of columns in input data.
- numFiles#
Number of SourceFiles in this Source.
This property may not be set directly.
- numrecs#
Number of records to read.
- pad#
Autocorrect pad control.
Set to
AutoCorrectType.NONE
, AutoCorrectType.RIGHT, orAutoCorrectType.LEFT
.
- rectype#
Record type for this Source.
May be
RecType.SEPARATED
orRecType.FIXED
or None (unspecified).
- sep#
Column separator for this Source.
- sourceType#
Type of this source.
May be
SrcType.S3
,SrcType.ABS
,SrcType.GCS
, orSrcType.FTP
. This property is not set directly, but is determined by the sourceType of the first SourceFile.(
SourceFile
s of different types may not be combined in the sameSourceInfo
.)
- truncate#
Autocorrect truncate control.
Set to
AutoCorrectType.NONE
,AutoCorrectType.RIGHT
, orAutoCorrectType.LEFT
.
- class py1010.SourceFile(path, bucket=None, keyname=None, sheetID=None, range=None, account=None, container=None, sourcetype=None, ptr=None, *)#
Class describing an individual file as a data source.
A “source” for 1010data is a description of some file outside of the 1010data object tree. A source is described by a
SourceInfo
object, which specifies features like format and column-separators, etc, and a SourceInfo contains one or moreSourceFile
objects, which specify locations of actual files (FTP upload directories or cloud storage services.)Since a “source” describes an external resource, the same objects are used as “destinations” to describe files and formats for writing output.
Construct a SourceFile object.
The SourceFile contains location information for a file outside of 1010 (in an FTP upload directory, an S3/GCS bucket, or in ABS).
- Parameters:
path – The filename of the file.
bucket – The S3 bucket, for files in S3 or GCS. Leave as None for files in FTP or ABS.
keyname – The name assigned to the AWS key to use to access the file. See the
Session.addKey()
method ofSession
objects. Leave as None for files in FTP.sheetID – To specify a worksheet in an XLSX workbook, pass the sheet’s ID here (as returned by the
getworksheets
transaction).range – For specifying a cell-range in an XSLX worksheet.
account – The ABS account to be used. Leave as None for files in FTP or S3/GCS storage.
container – The ABS container to be used. Leave as None for files in FTP or S3/GCS storage.
sourcetype – The type of source, a value from the
SourceInfo.SrcType
enumeration. Defaults to None, in which case the value is inferred by other data given: if thebucket
parameter is non-empty, the value will beSourceInfo.Type.S3
. If thecontainer
is non-empty, the value will beSourceInfo.SrcType.ABS
. Otherwise, the value will beSourceInfo.SrcType.FTP
. Note that any other value (SourceInfo.SrcType.GCS
) must be passed in explicitly (this is for backward compatibility.)
- init(self, path, bucket, keyname, sheetID, range, account, container, sourcetype)#
Set object attributes on construction.
- account#
The account to use on ABS to access the file, or None.
- bucket#
The S3 bucket containing the file (or None).
- container#
The container to use on ABS to access the file, or None.
- keyname#
The user-assigned name of the AWS key to use to access the file, or None
- path#
The filename of the file.
- range#
A range of cells in an XLSX worksheet which this object refers to, if relevant.
(Implementation note: this property will not contain the value None. If you set it to None, that really sets it to b’’)
- sheetID#
The sheetID of the worksheet this object refers to, within an XLSX workbook, if relevant.
(Implementation note: this property will not contain the value None. If you set it to None, that really sets it to b’’)
- class py1010.SourceColumnInfo(name=None, title=None, type_=None, format=None, int width=0, exp=None, double scale=0.0, int alpha=Alpha.SKIP, int order=0, int skip=Skip.NOSKIP, int nowrite=Write.WRITE, ptr=None, *)#
Class for holding metadata about a column in a source to be uploaded.
Several of the fields are meant to hold values from special enumeration classes, which are internal classes of
SourceColumnInfo
. So the “nowrite” parameter can beSourceColumnInfo.Write.WRITE
orSourceColumnInfo.Write.NOWRITE
. See below, and individual docstrings.- Variables:
name – Column name
title – Column title
type_ – Type of column: a string (or bytes): “text”, “int”, “float”, or “bigint”
format – Column format descriptor (string)
width – Width of column; 0 (default) for no input width.
exp – Expression to be applied to this column before upload
scale – Decimal value by which to divide this column before upload. 0.0 (default) for none
alpha – Alphabetic case into which to force this column before upload. One of
Alpha.UPPER
,Alpha.LOWER
, orAlpha.SKIP
(default)order – Positive integer for the position of this column in a reordering; 0 (default) for no reordering.
skip – Skip this column or not? One of
Skip.SKIP
orSkip.NOSKIP
(default)nowrite – Write this column? One of
Write.WRITE
(default) orWrite.NOWRITE
- class Alpha(value)#
An enumeration.
- class Skip(value)#
An enumeration.
- class Write(value)#
An enumeration.
- alpha#
Alphabetic case into which to force this column prior to upload
One of
SourceColumnInfo.Alpha.UPPER
,SourceColumnInfo.Alpha.LOWER
, orSourceColumnInfo.Alpha.SKIP
.
- exp#
Column expression.
- format#
Column format.
- name#
Name of the column.
- nowrite#
Whether or not to write this column.
Set to
SourceColumnInfo.Write.WRITE
orSourceColumnInfo.Write.NOWRITE
- order#
A positive integer indicating this column’s position in a revised column order, or 0 for no reordering.
- scale#
Column scale
Decimal value by which to divide the values in this column prior to upload, or 0 for none.
- skip#
Whether or not to skip this column on loading.
Set to
SourceColumnInfo.Skip.SKIP
orSourceColumnInfo.Skip.NOSKIP
- title#
Column title.
- type#
Type of column.
A string (or bytes), one of: “text”, “int”, “float”, “bigint”
- width#
Column width.
- class py1010.TableInfo(name, int ID=0, title=None, sdesc=None, ldesc=None, type_=u'', int secure=0, int own=0, owner=None, update=None, int favorite=0, users=None, display=None, int report=0, int chart=0, link=u'', long numRows=0, long numBytes=0, int segs=0, int access=0, long maxdown=0, mode=Mode.REPLACE, stripe=None, stripe_factor=None, ptr=None, *)#
Class for holding metadata about a table as an upload target.
Holds data about a table for uploading (with addTableSpecs).
Several of the fields are meant to hold values from special enumeration classes, which are internal classes of
TableInfo
. So the mode can beTableInfo.Mode.APPEND
orTableInfo.Mode.REPLACE
orTableInfo.Mode.NOREPLACE
. See below, and individual docstrings.- class Mode(value)#
An enumeration.
- class Perm(value)#
An enumeration.
- class SegType(value)#
An enumeration.
- class TimeSeries(value)#
An enumeration.
- init(self, name, int ID=0, title=None, sdesc=None, ldesc=None, type_=u'', int secure=0, int own=0, owner=None, update=None, int favorite=0, users=None, display=None, int report=0, int chart=0, link=u'', long numRows=0, long numBytes=0, int segs=0, int access=0, long maxdown=0, mode=Mode.REPLACE, stripe=None, stripe_factor=None)#
- access#
Boolean 1 or 0 indicating whether or not this table is accessible.
- chart#
Boolean 1 or 0 indicating whether or not chart specifications are saved for this table.
- favorite#
Boolean 1 or 0 indicating whether or not the transaction UID has favorited this table.
- id#
Unique identifier for this table.
- ldesc#
Long description of the table, if any.
- link#
Link header of table, or NULL for no link header.
- materialize#
Boolean 1 or 0 indicating whether or not this table is materialized.
- maxdown#
Maximum download limit of table, or a non-positive integer for the default maxdown.
- merge#
Boolean 1 or 0 indicating whether or not this table is appendable.
- method#
Materialize method, or None for the default method.
- mode#
Append or replace?
- name#
Full path to the table.
- numBytes#
Number of bytes in the table.
- numCols#
Number of columns in this table.
- numRows#
Number of rows in the table.
- own#
Boolean 1 or 0 indicating whether or not the transaction UID is the owner of this table.
- owner#
UID or groupname of the owner of this table, or None for the default owner.
- report#
Boolean 1 or 0 indicating whether or not report specifications are saved for this table.
- responsible#
Boolean 1 or 0 indicating whether or not the user is responsible for replication of data.
- sdesc#
Short description of the table, if any.
- secure#
Boolean 1 or 0 indicating whether or not this table is secure. Deprecated in API.
- segmentation#
Comma-separated list of the names of segmentation columns.
- segs#
Number of segments spanned by this table.
- segsize#
Size of the segments of this table.
- segtype#
Integer representing segmentation type of this table. Either TableInfo.SegType.SEGBY or TableInfo.SegType.SORTSEG. 0 if “segmentation” is None.
- sort#
Comma-separated list of the names of sort columns.
- stripe#
How many machines to stripe the data across.
- stripe_factor#
Fraction of machines to stripe data across.
- timeSeries#
Integer representing whether or not time-series segmentation is used for this table. Either TENTEN_TS or TENTEN_NOTS. 0 if “segmentation” is None.
- title#
Title of the table, if any.
- type#
Type of table. Currently, can be “REAL”, “VIEW”, “PARAM”, “MERGED”, “UQ”, or “TOLERANT”.
- update#
Datetime of last modification to this table.
- users#
users: object
- class py1010.ColumnInfo(name, type_, title=None, desc=None, format=None, int index=Index.NOINDEX, int fix=Fix.NOFIX, ptr=None, *)#
Class for holding metadata about a column being uploaded into.
- class Index(value)#
An enumeration.
- classmethod fromSourceCol(cls, scol)#
Translate a SourceColumnInfo into a ColumnInfo by copying over some key fields.
- init(self, name, type_, title=None, desc=None, format=None, int index=Index.NOINDEX, int fix=Fix.NOFIX)#
- desc#
Description of the column
- format#
Format of the column.
- name#
Name of the column
- title#
Title of the column, displayed when the table is viewed.
- type#
Type of the column (“integer”, “yyyymmdd”, etc.)