Basic usage#
Creating a new application with the 1010data Python SDK includes importing the library, establishing a session, submitting a query, receiving results, and cleaning up the session.
Setting up an application#
Your application setup includes importing any required libraries, including the 1010data Python SDK.
This example imports the Python SDK.
import py1010
Establishing a session#
A 1010data Insights Platform session is an instance of a particular user ID logged into the platform within a certain environment. A session is comprised of a dedicated set of resources that can serve a single request at a time.
A single platform session may have multiple Session
objects attached to it. When a user logs into the platform, if no session is active, a new one is created. If a session is currently active for that user, the login type determines if that session is possessed, ended and a new one is started, or fails to connect. Currently, each user ID can only have one platform session associated with it. For more information, see Login types.
In this example, the Session
object testSession
is created using py1010.Session
. This example uses the with
keyword to ensure the session is cleaned up after the query is run. For more information about cleaning up sessions, see Cleaning up a session.
The standard py1010.Session
object (for a single user) takes four parameters:
A URI (
gateway
)A valid Insights Platform user name (
username
)That user’s password (
password
)The login type, which is
py1010.POSSESS
in this example
Note
When using a SAM pool, the username
and password
parameters are the ID and password of the group owner. For more information, see Shared Access Management pools.
The Session
object also takes a number of optional parameters. For more information, see Optional Parameters.
If you are connecting via a proxy server, your gateway string contains your proxy server address and any credentials. For more information, see Connecting via proxy.
import py1010
import sys
gateway = "https://www2.1010data.com/cgi-bin/gw"
username = "[USER_NAME]"
password = "[PASSWORD]"
testSession = py1010.Session(gateway, username, password, py1010.POSSESS)
Note
[USER_NAME]
and [PASSWORD]
are placeholders for valid Insights Platform user name and password.
Login types#
The login type determines what happens when the application establishes a connection to the platform and another session is currently active.
There are three valid login types: py1010.POSSESS
, py1010.KILL
, and py1010.NOKILL
.
py1010.POSSESS
attaches the new login to an existing session if one is active when the connection is established.py1010.KILL
ends an existing session and starts a new one when the connection is established.py1010.NOKILL
fails to connect if an existing session is active when the connection is established.
If no session is active, a new session is started.
Optional Parameters#
The Session
object can also take a number of optional parameters.
The Session
object:
Session(url, username, password, logintype, group=None, retry=1073741824, wait=10, logfile=None, authentication=TAG, **kwargs)
contains four parameters, url
, username
, password
, and logintype
, that are required to establish a 1010data Insights Platform session. For more information, see Establishing a session. It also contains a number of optional parameters, including group
, retry
, wait
, logfile
, and authentication
.
group
#
The group
parameter specifies the group name of a SAM pool group. By default, the value is None
, which means the session is a single-user login.
retry
#
The retry
parameter specifies the number of times that the application attempts to get a user ID from a SAM pool to log in before quitting. By default, retry
is set to 1073741824 attempts.
wait
#
The wait
parameter specifies the amount of time to wait between attempts. The time is given in seconds. By default, wait
is set to 10 seconds.
logFile
#
The logFile
parameter specifies a path, filename, or file handle where the log is written. By default, logFile
is set to None
, which means no log is written.
Unlike the enableLog()
method, the logFile
parameter captures log data from when a user is logging in. This is the only way to get a log of a failed login attempt.
authentication
#
The TAG
used for SSO login. TAG
is a string specifying the style of authentication. See your administrator for your organization’s custom tag, if one is being used.
**kwarg
#
The **kwarg
parameter is a Python variable enabling you to pass keyword arguments into a function. These named parameters include, but are not limited to:
timeout(connectTimeout, timeout)
#
Sets the connection timeout parameters. The connectTimeout
argument specifies the number of seconds for establishing a connection before timing out. The timeout
argument specifies the number of seconds to keep a connection open before timing out and attempting to reconnect.
reuseConnection
#
Specifies if the same connection is used for each transaction. By default, the same connection is used. Set reuseConnection
to 0
to specify that a new connection should be established between each transaction.
lenient
#
Sets a default type for non-conforming columns to be cast to. By default, the column is not re-cast and the system returns an error. For example, if your query requests an integer column and a float column is returned, lenient can cast that column to the intended data type.
ignoreSSLErrors
#
Specifies whether or not to ignore an SSL error. By default, this parameter is set to FALSE
, which means an SSL error results in a failed log in attempt.
This is useful for debugging or during development, because you may want to ignore any SSL errors when logging in.
Connecting via proxy#
If your application is connecting to the Insights Platform via a proxy server, your gateway string needs to be modified to include the proxy server’s address.
Depending on your proxy server’s configuration, you may also need to include:
user name (
PROXY_USERNAME
)password (
PROXY_PASSWORD
)port (
PORT
)
The proxy server’s information is added to the beginning of the gateway string between square brackets ([]
).
[https://PROXY_USERNAME:PROXY_PASSWORD@
MYCORPORATEPROXYADDRESS:PORT]https://www2.1010data.com/cgi-bin/gw
If you’re unsure of whether or not you need to specify a proxy server, or you need your proxy server credentials, contact your corporate technology support team or IT department.
Note
This example does not use square brackets ([]
) to denote variables.
Examples#
Some proxy servers will require a full set of credentials, for example:
[https://aHamilton:g3n3ricPWD@my.corporate.proxy:8080]https://www2.1010data.com/cgi-bin/gw
A proxy server may only require a user name, for example:
[https://hGranger@my.corporate.proxy:8080]https://www2.1010data.com/cgi-bin/gw
A proxy server may not require any credentials or a port number, for example:
[https://my.corporate.proxy]https://www2.1010data.com/cgi-bin/gw
Running a query#
Your application should act as an intermediary between your users and the platform to pass in a query and receive the queried data.
Gathering specific sets of data or scoping your data should occur within queries. Typically, the analytical work is done by the Macro Language query specified by the Query
object. For more information on writing queries using the Insights Platform Macro Language see the 1010data Reference Manual.
In the example, the new Query
object (exampleQuery
) is created using the py1010.Session.query()
method which takes two parameters, the table path and the XML query. As a best practice, longer queries should be stored in a separate file.
You use the run
method, which is part of the Query
class, to run queries.
import py1010
import sys
gateway = "https://www2.1010data.com/cgi-bin/gw"
tablename = "pub.demo.weather.stations"
username = "[USER_NAME]"
password = "[PASSWORD]"
testSession = py1010.Session(gateway, username, password, py1010.POSSESS)
exampleQuery = testSession.query(tablename, '<sel value="(elev>10)"/>')
exampleQuery.run()
Note
[USER_NAME]
and [PASSWORD]
are placeholders for valid Insights Platform user name and password.
Accessing query results#
After querying the 1010data Insights Platform, your application receives a set of results, which is accessible through the Query
object.
After the query is run, the Query
object is the results of the query. Your results are a set of Column
objects. Column
objects contain the values in the column as a list that can be iterated over and indexed.
This example uses the row iterator exampleQuery.rows to print each row in the results. You can access an individual row by specifying an index (e.g., exampleQuery.rows[100]
).
import py1010
import sys
gateway = "https://www2.1010data.com/cgi-bin/gw"
tablename = "pub.demo.weather.stations"
username = "[USER_NAME]"
password = "[PASSWORD]"
testSession = py1010.Session(gateway, username, password, py1010.POSSESS)
exampleQuery = testSession.query(tablename, '<sel value="(elev>10)"/>')
exampleQuery.run()
for r in exampleQuery.rows:
print("\t".join([str(x) for x in r]))
Note
[USER_NAME]
and [PASSWORD]
are placeholders for valid Insights Platform user name and password.
Cleaning up a session#
If your application uses the with
keyword, the 1010ata Insights Platform session is cleaned up automatically at the end of the with
statement. If your application does not use with
, you need to clean up the platform session when you’ve completed the transaction.
The Session
class contains a py1010.Session.close()
method, which cleans up the session when it is invoked. Depending on the login type that was used to create the platform session, there are a few different ways the session may be cleaned up. If you are a single user logged in with py1010.POSSESS
, the platform session continues to run. If you’re a single user logged in with py1010.KILL
or py1010.NOKILL
, the platform session is terminated. A SAM pool user platform session is released back into the pool.
Cleaning up a session when it is no longer needed is particularly important if you have multiple users sharing a pool of IDs (SAM pool). Waiting for the garbage collector to clean up Session
objects may result in a pool that appears fully utilized, but is waiting for Session
objects to be destroyed. For more information about SAM pools, see Shared Access Management pools.
The most efficient and effective way to release user IDs back to the pool is to invoke the close
method, either by explicitly calling it or employing with
, which invokes it at the end of the statement.
Using the with
keyword:
import py1010
import sys
gateway = "https://www2.1010data.com/cgi-bin/gw"
tablename = "pub.demo.weather.stations"
username = "[USER_NAME]"
password = "[PASSWORD]"
with py1010.Session(gateway, username, password, py1010.POSSESS) as testSession:
exampleQuery = testSession.query(tablename, '<sel value="(elev>10)"/>')
exampleQuery.run()
for r in exampleQuery.rows:
print("\t".join([str(x) for x in r]))
Note
[USER_NAME]
and [PASSWORD]
are placeholders for valid Insights Platform user name and password.
Using the close
method:
import py1010
import sys
gateway = "https://www2.1010data.com/cgi-bin/gw"
tablename = "pub.demo.weather.stations"
username = "[USER_NAME]"
password = "[PASSWORD]"
testSession = py1010.Session(gateway, username, password, py1010.POSSESS)
exampleQuery = testSession.query(tablename, '<sel value="(elev>10)"/>')
exampleQuery.run()
for r in exampleQuery.rows:
print("\t".join([str(x) for x in r]))
testSession.close()
Note
[USER_NAME]
and [PASSWORD]
are placeholders for valid Insights Platform user name and password.
Uploading from FTP#
The 1010data Python SDK enables you to upload tables to the 1010data Insights Platform using the load API.
The py1010.Session
class contains the py1010.Session.addTable()
method, which loads data to the Insights Platform from an FTP account. Typically this method is used for very large tables containing a lot of data.
addTable(self, spec, sync=False)
The addTable
method requires the spec
parameter. The spec
refers to an XML string specifying information about the upload. For more information, refer to addtab in the 1010data API Reference Manual. You can use the py1010.Session.autoSpec()
method to create a spec file from a few rows of data. The XML generated by the autoSpec
method may need to be edited afterward.
The sync
parameter specifies if the load is done synchronously. If you set this to True
, addTable
waits until the load is completed, or an error occurs, to return the name of the new table. False
specifies that addTable
should return the name of the table immediately, before it is fully loaded to the Insights Platform.
To get the status of an asynchronous load, you can use the py1010.Session.addTableStatus()
method. This method returns a status number (0-5) and the number of records that have been uploaded so far.
Uploading a parquet file#
You can upload a parquet file to 1010data from an existing local file, from an FTP site, or from a cloud provider (Amazon S3 or Azure Blob Storage).
Uploading a local file#
The following example creates a parquet file from an existing csv local file, uploads the parquet file to the FTP server, and then uploads the file from the FTP server to 1010data. The pandas method DataFrame.to_parquet() converts the pandas dataframe to a parquet file. Then, py1010.Session.uploadToFTP()
uploads the parquet file to the FTP site. To upload from the FTP site to 1010data, you must provide the method py1010.Session.addTable()
with a spec parameter. The spec parameter contains the file format ("parquet"
), the byte order ("little"
), and the path name of the 1010data table ("path.to.target.name"
).
import pandas as pd
import py1010
#create parquet file from an existing csv
#csv should have a header row of column names
df = pd.read_csv("basemaster.csv")
df.to_parquet("basemaster.parquet")
#create 1010data session
gateway = "https://www2.corp.1010data.com/beta-latest/gw"
username = "[USER_NAME]"
password = "[PASSWORD]"
testSession = py1010.Session(gateway, username, password, py1010.POSSESS)
#upload parquet file to FTP server
ftp_file = "basemaster.parquet"
testSession.uploadToFTP(ftp_file, pw)
#upload parquet file from FTP to database
#optionally omit <name> to upload to a random table in "uploads" directory
spec = f'''
<spec>
<source><file format="parquet">{ftp_file}</file></source>
<name>path.to.target.table</name>
<arch>little</arch>
</spec>
'''
testSession.addTable(spec)
Uploading from FTP#
If your parquet file is already on the FTP site, the following code uploads the file from the FTP site to 1010data:
import pandas as pd
import py1010
#create 1010data session
gateway = "https://www2.corp.1010data.com/beta-latest/gw"
username = "[USER_NAME]"
password = "[PASSWORD]"
testSession = py1010.Session(gateway, username, password, py1010.POSSESS)
remote_file = "baseball.parquet"
#optionally omit <name> to upload to a random table in "uploads" directory
spec = f'''
<spec>
<source><file format="parquet">{remote_file}</file></source>
<name>path.to.target.table</name>
<arch>little</arch>
</spec>
'''
testSession.addTable(spec)
Uploading from Amazon S3#
The following example uploads a parquet file from Amazon S3. The spec must include the S3 api_key
, the S3 bucket
, and the format
.
import pandas as pd
import py1010
#create 1010data session
gateway = "https://www2.corp.1010data.com/beta-latest/gw"
username = "[USER_NAME]"
password = "[PASSWORD]"
testSession = py1010.Session(gateway, username, password, py1010.POSSESS)
remote_file = "baseball.parquet"
spec = f'''
<source><s3 api_key="s3" bucket="1010susanne" format="parquet">{remote_file}</s3></source>
'''
testSession.addTable(spec)
Note
You can also upload a non-parquet file from Amazon S3. Just omit format="parquet"
.
Uploading from ABS#
The following example uploads a parquet file from Azure Blob Storage (ABS). The spec must include the ABS api_key
, the storage_account_name
, the container
, and the format
.
import pandas as pd
import py1010
#create 1010data session
gateway = "https://www2.corp.1010data.com/beta-latest/gw"
username = "[USER_NAME]"
password = "[PASSWORD]"
testSession = py1010.Session(gateway, username, password, py1010.POSSESS)
remote_file = "baseball.parquet"
spec = f'''
<source><abs api_key="azure" storage_account_name="1010susannedev" container="env-susanne" format="parquet">{remote_file}</abs></source>
'''
testSession.addTable(spec)
Note
You can also upload a non-parquet file from ABS. Just omit format="parquet"
.
Adding a key for S3 or ABS#
If you are uploading a file from S3 or ABS, you may first need to add a key with py1010.Session.addKey()
. Use the class py1010.AWSKey
to create the key object parameter for addKey()
.
Note
Amazon S3 and ABS have different requirements. Amazon S3 requires key name, key value, key ID, key region, and key type. ABS requires only key name, key value, and key type.
The following is an example of adding an Amazon S3 key:
session.addKey(py1010.AWSKey("myamazonkey", key="** REDACTED **", id="[AWSID]", region="[AWSRegion]", keytype=py1010.SourceInfo.SrcType.S3))
The following is an example of adding an ABS key:
session.addKey(py1010.AWSKey("myabskey", key="** REDACTED **", keytype=py1010.SourceInfo.SrcType.ABS))
Exception handling#
The Python SDK contains the py1010.TentenException
class.
The TentenException
class can be caught with try
/except
. TentenException
is the parent class of all of the 1010data Python exceptions. In this example, the application employs try
/except
with TentenException
.
import py1010
import sys
gateway = "https://www2.1010data.com/cgi-bin/gw"
tablename = "pub.demo.weather.stations"
username = "[USER_NAME]"
password = "[PASSWORD]"
try:
with py1010.Session(gateway, username, password, py1010.POSSESS) as session:
query = session.query(tablename, '<sel value="(elev>10)"/>')
query.run()
for r in query.rows:
print("\t".join([str(x) for x in r]))
except py1010.TentenException as e:
print("Error: {}".format(e))
Note
[USER_NAME]
and [PASSWORD]
are placeholders for valid Insights Platform user name and password.
Converting results to a pandas DataFrame#
The 1010data Python SDK enables you to convert a small result set into a pandas DataFrame.
The pandas package is used for data analysis. It contains a DataFrame
object which is the pandas primary data structure. The 1010data Python SDK contains a DataFrame
constructor to convert your result set, or a portion of it, to a DataFrame. This constructor uses py1010.Query.dictslice()
to create a dictionary of lists with a starting and ending row value to indicate the desired slice of data.
dataframe = pandas.DataFrame(query.dictslice(start,end))
It is best to keep this data set small because it will be downloaded and stored in memory. Creating a DataFrame with a large set of data may result in performance issues.
Example#
This example converts query results into a pandas DataFrame, uses the pandas dataframe.pivot
to reshape the data into a pivot table, and create three graphs.
After retrieving results (q2
) from the 1010data Insights Platform, the results are converted to a DataFrame (df1
) using the DataFrame
constructor. q2.dictslice(0,100)
creates a dictionary of lists starting at row 0 and ending at 100.
df1 = pd.DataFrame(q2.dictslice(0,100))
The dataframe.pivot method reshapes the data.
df2 = df1.pivot('month','dept','sale')
The application uses Matplotlib and Seaborn to create three charts. Close each chart to display the next one. Close the last chart to end the program.
df2.plot()
plt.ylabel('sale',size=15)
plt.xlabel('months',size=15)
plt.title('Sales Trend by Dept')
plt.show()
df2.plot.bar()
plt.xlabel('months',size=15)
plt.ylabel('sale',size=15)
plt.title('Sale by Dept by Month')
plt.show()
plt.figure(3)
mask=df2.isnull()
df3 = df1.pivot('dept','month','sale')
sns.heatmap(df3, cmap="terrain")
plt.xlabel('months',size=15)
plt.xticks(rotation=90)
plt.ylabel('dept',size=15)
plt.title('Sale by Month by Dept Heatmap')
plt.show()
The example code for the entire application:
import py1010
import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
from matplotlib import style
import seaborn as sns
style.use('ggplot')
username = "[USER_NAME]"
password = "[PASSWORD]"
try:
with py1010.Session("https://www2.1010data.com/cgi-bin/gw", \
username, password, \
py1010.POSSESS) as session:
q2 = session.query("pub.doc.retail.altseg.sales_detail_transid", \
('<sel value="trans_date>20110101"/>' \
'<sel value="dept=20 40 36 19"/>' \
'<willbe name="month"' \
'value="yrmo(trans_date)"' \
'format="type:monthshort"/>' \
'<tabu breaks="month,dept">' \
'<tcol fun="sum" source="xsales"' \
'name="sale"/>' \
'</tabu>'))
q2.run()
df1 = pd.DataFrame(q2.dictslice(0,100))
except py1010.TentenException as exc:
print("Error: {}".format(exc))
else:
df1['month']=df1['month'].astype(int)
df2 = df1.pivot('month','dept','sale')
df2.plot()
plt.ylabel('sale',size=15)
plt.xlabel('months',size=15)
plt.title('Sales Trend by Dept')
plt.show()
df2.plot.bar()
plt.xlabel('months',size=15)
plt.ylabel('sale',size=15)
plt.title('Sale by Dept by Month')
plt.show()
mask=df2.isnull()
plt.figure(3)
mask=df2.isnull()
df3 = df1.pivot('dept','month','sale')
sns.heatmap(df3, cmap="terrain")
plt.xlabel('months',size=15)
plt.xticks(rotation=90)
plt.ylabel('dept',size=15)
plt.title('Sale by Month by Dept Heatmap')
plt.show()
Note
[USER_NAME]
and [PASSWORD]
are placeholders for valid Insights Platform user name and password.