Basic usage =========== Creating a new application with the 1010data Python SDK includes importing the library, establishing a session, submitting a query, receiving results, and cleaning up the session. Setting up an application ------------------------- Your application setup includes importing any required libraries, including the 1010data Python SDK. This example imports the Python SDK. .. code-block:: import py1010 Establishing a session ---------------------- A 1010data Insights Platform session is an instance of a particular user ID logged into the platform within a certain environment. A session is comprised of a dedicated set of resources that can serve a single request at a time. A single platform session may have multiple ``Session`` objects attached to it. When a user logs into the platform, if no session is active, a new one is created. If a session is currently active for that user, the login type determines if that session is possessed, ended and a new one is started, or fails to connect. Currently, each user ID can only have one platform session associated with it. For more information, see :ref:`Login types`. In this example, the ``Session`` object ``testSession`` is created using :class:`py1010.Session`. This example uses the ``with`` keyword to ensure the session is cleaned up after the query is run. For more information about cleaning up sessions, see :ref:`Cleaning up a session`. The standard :class:`py1010.Session` object (for a single user) takes four parameters: * A URI (``gateway``) * A valid Insights Platform user name (``username``) * That user's password (``password``) * The login type, which is ``py1010.POSSESS`` in this example .. note:: When using a SAM pool, the ``username`` and ``password`` parameters are the ID and password of the group owner. For more information, see :ref:`Shared Access Management pools`. The ``Session`` object also takes a number of optional parameters. For more information, see :ref:`Optional parameters`. If you are connecting via a proxy server, your gateway string contains your proxy server address and any credentials. For more information, see :ref:`Connecting via proxy`. .. code-block:: import py1010 import sys gateway = "https://www2.1010data.com/cgi-bin/gw" username = "[USER_NAME]" password = "[PASSWORD]" testSession = py1010.Session(gateway, username, password, py1010.POSSESS) .. note:: ``[USER_NAME]`` and ``[PASSWORD]`` are placeholders for valid Insights Platform user name and password. Login types ^^^^^^^^^^^ The login type determines what happens when the application establishes a connection to the platform and another session is currently active. There are three valid login types: ``py1010.POSSESS``, ``py1010.KILL``, and ``py1010.NOKILL``. * ``py1010.POSSESS`` attaches the new login to an existing session if one is active when the connection is established. * ``py1010.KILL`` ends an existing session and starts a new one when the connection is established. * ``py1010.NOKILL`` fails to connect if an existing session is active when the connection is established. If no session is active, a new session is started. Optional Parameters ^^^^^^^^^^^^^^^^^^^ The ``Session`` object can also take a number of optional parameters. The ``Session`` object: .. code-block:: Session(url, username, password, logintype, group=None, retry=1073741824, wait=10, logfile=None, authentication=TAG, **kwargs) contains four parameters, ``url``, ``username``, ``password``, and ``logintype``, that are required to establish a 1010data Insights Platform session. For more information, see :ref:`Establishing a session`. It also contains a number of optional parameters, including ``group``, ``retry``, ``wait``, ``logfile``, and ``authentication``. ``group`` <<<<<<<<<< The ``group`` parameter specifies the group name of a SAM pool group. By default, the value is ``None``, which means the session is a single-user login. ``retry`` <<<<<<<<<< The ``retry`` parameter specifies the number of times that the application attempts to get a user ID from a SAM pool to log in before quitting. By default, ``retry`` is set to 1073741824 attempts. ``wait`` <<<<<<<<< The ``wait`` parameter specifies the amount of time to wait between attempts. The time is given in seconds. By default, ``wait`` is set to 10 seconds. ``logFile`` <<<<<<<<<<<<< The ``logFile`` parameter specifies a path, filename, or file handle where the log is written. By default, ``logFile`` is set to ``None``, which means no log is written. Unlike the ``enableLog()`` method, the ``logFile`` parameter captures log data from when a user is logging in. This is the only way to get a log of a failed login attempt. ``authentication`` <<<<<<<<<<<<<<<<<<< The ``TAG`` used for SSO login. ``TAG`` is a string specifying the style of authentication. See your administrator for your organization's custom tag, if one is being used. ``**kwarg`` <<<<<<<<<<< The ``**kwarg`` parameter is a Python variable enabling you to pass keyword arguments into a function. These named parameters include, but are not limited to: ``timeout(connectTimeout, timeout)`` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sets the connection timeout parameters. The ``connectTimeout`` argument specifies the number of seconds for establishing a connection before timing out. The ``timeout`` argument specifies the number of seconds to keep a connection open before timing out and attempting to reconnect. ``reuseConnection`` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Specifies if the same connection is used for each transaction. By default, the same connection is used. Set ``reuseConnection`` to ``0`` to specify that a new connection should be established between each transaction. ``lenient`` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sets a default type for non-conforming columns to be cast to. By default, the column is not re-cast and the system returns an error. For example, if your query requests an integer column and a float column is returned, lenient can cast that column to the intended data type. ``ignoreSSLErrors`` >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Specifies whether or not to ignore an SSL error. By default, this parameter is set to ``FALSE``, which means an SSL error results in a failed log in attempt. This is useful for debugging or during development, because you may want to ignore any SSL errors when logging in. Connecting via proxy ^^^^^^^^^^^^^^^^^^^^ If your application is connecting to the Insights Platform via a proxy server, your gateway string needs to be modified to include the proxy server's address. Depending on your proxy server's configuration, you may also need to include: * user name (``PROXY_USERNAME``) * password (``PROXY_PASSWORD``) * port (``PORT``) The proxy server's information is added to the beginning of the gateway string between square brackets (``[]``). .. code-block:: [https://PROXY_USERNAME:PROXY_PASSWORD@ MYCORPORATEPROXYADDRESS:PORT]https://www2.1010data.com/cgi-bin/gw If you're unsure of whether or not you need to specify a proxy server, or you need your proxy server credentials, contact your corporate technology support team or IT department. .. note:: This example does not use square brackets (``[]``) to denote variables. Examples <<<<<<<<< Some proxy servers will require a full set of credentials, for example: .. code-block:: [https://aHamilton:g3n3ricPWD@my.corporate.proxy:8080]https://www2.1010data.com/cgi-bin/gw A proxy server may only require a user name, for example: .. code-block:: [https://hGranger@my.corporate.proxy:8080]https://www2.1010data.com/cgi-bin/gw A proxy server may not require any credentials or a port number, for example: .. code-block:: [https://my.corporate.proxy]https://www2.1010data.com/cgi-bin/gw Running a query --------------- Your application should act as an intermediary between your users and the platform to pass in a query and receive the queried data. Gathering specific sets of data or scoping your data should occur within queries. Typically, the analytical work is done by the Macro Language query specified by the ``Query`` object. For more information on writing queries using the Insights Platform Macro Language see the `1010data Reference Manual `_. In the example, the new ``Query`` object (``exampleQuery``) is created using the :meth:`py1010.Session.query()` method which takes two parameters, the table path and the XML query. As a best practice, longer queries should be stored in a separate file. You use the ``run`` method, which is part of the ``Query`` class, to run queries. .. code-block:: import py1010 import sys gateway = "https://www2.1010data.com/cgi-bin/gw" tablename = "pub.demo.weather.stations" username = "[USER_NAME]" password = "[PASSWORD]" testSession = py1010.Session(gateway, username, password, py1010.POSSESS) exampleQuery = testSession.query(tablename, '') exampleQuery.run() .. note:: ``[USER_NAME]`` and ``[PASSWORD]`` are placeholders for valid Insights Platform user name and password. Accessing query results ----------------------- After querying the 1010data Insights Platform, your application receives a set of results, which is accessible through the ``Query`` object. After the query is run, the ``Query`` object is the results of the query. Your results are a set of ``Column`` objects. ``Column`` objects contain the values in the column as a list that can be iterated over and indexed. This example uses the row iterator exampleQuery.rows to print each row in the results. You can access an individual row by specifying an index (e.g., ``exampleQuery.rows[100]``). .. code-block:: import py1010 import sys gateway = "https://www2.1010data.com/cgi-bin/gw" tablename = "pub.demo.weather.stations" username = "[USER_NAME]" password = "[PASSWORD]" testSession = py1010.Session(gateway, username, password, py1010.POSSESS) exampleQuery = testSession.query(tablename, '') exampleQuery.run() for r in exampleQuery.rows: print("\t".join([str(x) for x in r])) .. note:: ``[USER_NAME]`` and ``[PASSWORD]`` are placeholders for valid Insights Platform user name and password. Cleaning up a session --------------------- If your application uses the ``with`` keyword, the 1010ata Insights Platform session is cleaned up automatically at the end of the ``with`` statement. If your application does not use ``with``, you need to clean up the platform session when you've completed the transaction. The ``Session`` class contains a :meth:`py1010.Session.close()` method, which cleans up the session when it is invoked. Depending on the login type that was used to create the platform session, there are a few different ways the session may be cleaned up. If you are a single user logged in with ``py1010.POSSESS``, the platform session continues to run. If you're a single user logged in with ``py1010.KILL`` or ``py1010.NOKILL``, the platform session is terminated. A SAM pool user platform session is released back into the pool. Cleaning up a session when it is no longer needed is particularly important if you have multiple users sharing a pool of IDs (SAM pool). Waiting for the garbage collector to clean up ``Session`` objects may result in a pool that appears fully utilized, but is waiting for ``Session`` objects to be destroyed. For more information about SAM pools, see :ref:`Shared Access Management pools`. The most efficient and effective way to release user IDs back to the pool is to invoke the ``close`` method, either by explicitly calling it or employing ``with``, which invokes it at the end of the statement. Using the ``with`` keyword: .. code-block:: import py1010 import sys gateway = "https://www2.1010data.com/cgi-bin/gw" tablename = "pub.demo.weather.stations" username = "[USER_NAME]" password = "[PASSWORD]" with py1010.Session(gateway, username, password, py1010.POSSESS) as testSession: exampleQuery = testSession.query(tablename, '') exampleQuery.run() for r in exampleQuery.rows: print("\t".join([str(x) for x in r])) .. note:: ``[USER_NAME]`` and ``[PASSWORD]`` are placeholders for valid Insights Platform user name and password. Using the ``close`` method: .. code-block:: import py1010 import sys gateway = "https://www2.1010data.com/cgi-bin/gw" tablename = "pub.demo.weather.stations" username = "[USER_NAME]" password = "[PASSWORD]" testSession = py1010.Session(gateway, username, password, py1010.POSSESS) exampleQuery = testSession.query(tablename, '') exampleQuery.run() for r in exampleQuery.rows: print("\t".join([str(x) for x in r])) testSession.close() .. note:: ``[USER_NAME]`` and ``[PASSWORD]`` are placeholders for valid Insights Platform user name and password. Uploading from FTP ------------------ The 1010data Python SDK enables you to upload tables to the 1010data Insights Platform using the load API. The :class:`py1010.Session` class contains the :meth:`py1010.Session.addTable()` method, which loads data to the Insights Platform from an FTP account. Typically this method is used for very large tables containing a lot of data. .. code-block:: addTable(self, spec, sync=False) The ``addTable`` method requires the ``spec`` parameter. The ``spec`` refers to an XML string specifying information about the upload. For more information, refer to `addtab `_ in the `1010data API Reference Manual`. You can use the :meth:`py1010.Session.autoSpec()` method to create a spec file from a few rows of data. The XML generated by the ``autoSpec`` method may need to be edited afterward. The ``sync`` parameter specifies if the load is done synchronously. If you set this to ``True``, ``addTable`` waits until the load is completed, or an error occurs, to return the name of the new table. ``False`` specifies that ``addTable`` should return the name of the table immediately, before it is fully loaded to the Insights Platform. To get the status of an asynchronous load, you can use the :meth:`py1010.Session.addTableStatus()` method. This method returns a status number (0-5) and the number of records that have been uploaded so far. Uploading a parquet file ------------------------------------------------ You can upload a parquet file to 1010data from an existing local file, from an FTP site, or from a cloud provider (Amazon S3 or Azure Blob Storage). Uploading a local file ^^^^^^^^^^^^^^^^^^^^^^^ The following example creates a parquet file from an existing csv local file, uploads the parquet file to the FTP server, and then uploads the file from the FTP server to 1010data. The pandas method `DataFrame.to_parquet() `_ converts the pandas dataframe to a parquet file. Then, :meth:`py1010.Session.uploadToFTP()` uploads the parquet file to the FTP site. To upload from the FTP site to 1010data, you must provide the method :meth:`py1010.Session.addTable()` with a spec parameter. The spec parameter contains the file format (``"parquet"``), the byte order (``"little"``), and the path name of the 1010data table (``"path.to.target.name"``). .. code-block:: import pandas as pd import py1010 #create parquet file from an existing csv #csv should have a header row of column names df = pd.read_csv("basemaster.csv") df.to_parquet("basemaster.parquet") #create 1010data session gateway = "https://www2.corp.1010data.com/beta-latest/gw" username = "[USER_NAME]" password = "[PASSWORD]" testSession = py1010.Session(gateway, username, password, py1010.POSSESS) #upload parquet file to FTP server ftp_file = "basemaster.parquet" testSession.uploadToFTP(ftp_file, pw) #upload parquet file from FTP to database #optionally omit to upload to a random table in "uploads" directory spec = f''' {ftp_file} path.to.target.table little ''' testSession.addTable(spec) Uploading from FTP ^^^^^^^^^^^^^^^^^^^ If your parquet file is already on the FTP site, the following code uploads the file from the FTP site to 1010data: .. code-block:: import pandas as pd import py1010 #create 1010data session gateway = "https://www2.corp.1010data.com/beta-latest/gw" username = "[USER_NAME]" password = "[PASSWORD]" testSession = py1010.Session(gateway, username, password, py1010.POSSESS) remote_file = "baseball.parquet" #optionally omit to upload to a random table in "uploads" directory spec = f''' {remote_file} path.to.target.table little ''' testSession.addTable(spec) Uploading from Amazon S3 ^^^^^^^^^^^^^^^^^^^^^^^^^ The following example uploads a parquet file from Amazon S3. The spec must include the S3 ``api_key``, the S3 ``bucket``, and the ``format``. .. code-block:: import pandas as pd import py1010 #create 1010data session gateway = "https://www2.corp.1010data.com/beta-latest/gw" username = "[USER_NAME]" password = "[PASSWORD]" testSession = py1010.Session(gateway, username, password, py1010.POSSESS) remote_file = "baseball.parquet" spec = f''' {remote_file} ''' testSession.addTable(spec) .. note:: You can also upload a non-parquet file from Amazon S3. Just omit ``format="parquet"``. Uploading from ABS ^^^^^^^^^^^^^^^^^^^^ The following example uploads a parquet file from Azure Blob Storage (ABS). The spec must include the ABS ``api_key``, the ``storage_account_name``, the ``container``, and the ``format``. .. code-block:: import pandas as pd import py1010 #create 1010data session gateway = "https://www2.corp.1010data.com/beta-latest/gw" username = "[USER_NAME]" password = "[PASSWORD]" testSession = py1010.Session(gateway, username, password, py1010.POSSESS) remote_file = "baseball.parquet" spec = f''' {remote_file} ''' testSession.addTable(spec) .. note:: You can also upload a non-parquet file from ABS. Just omit ``format="parquet"``. Adding a key for S3 or ABS ^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you are uploading a file from S3 or ABS, you may first need to add a key with :meth:`py1010.Session.addKey()`. Use the class :class:`py1010.AWSKey` to create the key object parameter for ``addKey()``. .. note:: Amazon S3 and ABS have different requirements. Amazon S3 requires key name, key value, key ID, key region, and key type. ABS requires only key name, key value, and key type. The following is an example of adding an Amazon S3 key: .. code-block:: session.addKey(py1010.AWSKey("myamazonkey", key="** REDACTED **", id="[AWSID]", region="[AWSRegion]", keytype=py1010.SourceInfo.SrcType.S3)) The following is an example of adding an ABS key: .. code-block:: session.addKey(py1010.AWSKey("myabskey", key="** REDACTED **", keytype=py1010.SourceInfo.SrcType.ABS)) Exception handling ------------------ The Python SDK contains the :class:`py1010.TentenException` class. The ``TentenException`` class can be caught with ``try``/``except``. ``TentenException`` is the parent class of all of the 1010data Python exceptions. In this example, the application employs ``try``/``except`` with ``TentenException``. .. code-block:: import py1010 import sys gateway = "https://www2.1010data.com/cgi-bin/gw" tablename = "pub.demo.weather.stations" username = "[USER_NAME]" password = "[PASSWORD]" try: with py1010.Session(gateway, username, password, py1010.POSSESS) as session: query = session.query(tablename, '') query.run() for r in query.rows: print("\t".join([str(x) for x in r])) except py1010.TentenException as e: print("Error: {}".format(e)) .. note:: ``[USER_NAME]`` and ``[PASSWORD]`` are placeholders for valid Insights Platform user name and password. Converting results to a pandas DataFrame ------------------------------------------ The 1010data Python SDK enables you to convert a small result set into a pandas DataFrame. The pandas package is used for data analysis. It contains a ``DataFrame`` object which is the pandas primary data structure. The 1010data Python SDK contains a ``DataFrame`` constructor to convert your result set, or a portion of it, to a DataFrame. This constructor uses :meth:`py1010.Query.dictslice()` to create a dictionary of lists with a starting and ending row value to indicate the desired slice of data. .. code-block:: dataframe = pandas.DataFrame(query.dictslice(start,end)) It is best to keep this data set small because it will be downloaded and stored in memory. Creating a DataFrame with a large set of data may result in performance issues. Example ^^^^^^^ This example converts query results into a pandas DataFrame, uses the pandas ``dataframe.pivot`` to reshape the data into a pivot table, and create three graphs. After retrieving results (``q2``) from the 1010data Insights Platform, the results are converted to a DataFrame (``df1``) using the ``DataFrame`` constructor. ``q2.dictslice(0,100)`` creates a dictionary of lists starting at row 0 and ending at 100. .. code-block:: df1 = pd.DataFrame(q2.dictslice(0,100)) The dataframe.pivot method reshapes the data. .. code-block:: df2 = df1.pivot('month','dept','sale') The application uses Matplotlib and Seaborn to create three charts. Close each chart to display the next one. Close the last chart to end the program. .. code-block:: df2.plot() plt.ylabel('sale',size=15) plt.xlabel('months',size=15) plt.title('Sales Trend by Dept') plt.show() .. image:: images/matplotlib_graph1.png .. code-block:: df2.plot.bar() plt.xlabel('months',size=15) plt.ylabel('sale',size=15) plt.title('Sale by Dept by Month') plt.show() .. image:: images/matplotlib_graph2.png .. code-block:: plt.figure(3) mask=df2.isnull() df3 = df1.pivot('dept','month','sale') sns.heatmap(df3, cmap="terrain") plt.xlabel('months',size=15) plt.xticks(rotation=90) plt.ylabel('dept',size=15) plt.title('Sale by Month by Dept Heatmap') plt.show() .. image:: images/seaborn_graph.png The example code for the entire application: .. code-block:: import py1010 import numpy as np import pandas as pd import datetime import matplotlib.pyplot as plt from matplotlib import style import seaborn as sns style.use('ggplot') username = "[USER_NAME]" password = "[PASSWORD]" try: with py1010.Session("https://www2.1010data.com/cgi-bin/gw", \ username, password, \ py1010.POSSESS) as session: q2 = session.query("pub.doc.retail.altseg.sales_detail_transid", \ ('' \ '' \ '' \ '' \ '' \ '')) q2.run() df1 = pd.DataFrame(q2.dictslice(0,100)) except py1010.TentenException as exc: print("Error: {}".format(exc)) else: df1['month']=df1['month'].astype(int) df2 = df1.pivot('month','dept','sale') df2.plot() plt.ylabel('sale',size=15) plt.xlabel('months',size=15) plt.title('Sales Trend by Dept') plt.show() df2.plot.bar() plt.xlabel('months',size=15) plt.ylabel('sale',size=15) plt.title('Sale by Dept by Month') plt.show() mask=df2.isnull() plt.figure(3) mask=df2.isnull() df3 = df1.pivot('dept','month','sale') sns.heatmap(df3, cmap="terrain") plt.xlabel('months',size=15) plt.xticks(rotation=90) plt.ylabel('dept',size=15) plt.title('Sale by Month by Dept Heatmap') plt.show() .. note:: ``[USER_NAME]`` and ``[PASSWORD]`` are placeholders for valid Insights Platform user name and password.