Best practices
There are several recommended best practices when using the 1010data Python SDK.
- Clean up your sessions upon query completion
- Use the
with
keyword to clean up sessions. For more information, see Cleaning up a session.
- Scroll through data sequentially
- When accessing data, scrolling sequentially produces better performance results.
- Use XML queries for random access or set window size to 1
-
As a best practice, users should access random data via queries. The example query below requests a random 10% sample:
exampleQuery = testSession.query(path, '<sel value="draw_(33;10)")'
If your users are going to ask for a random set of data intersections that do not progress incrementally without using a query, the best practice is to set your window size to 1. The window is the number of rows of data that buffered when the Python SDK requests results from the 1010data Insights Platform. By setting the window to 1, you retrieve each row individually, which keeps the system from retrieving larger sets of unneeded data at random points in the data set and prevents unnecessary transfer of data. In general, you won't have to worry about the window. It is worth noting that even with a window size of 1, this kind of access will most likely perform poorly.
If you do want to change the window size, you can use thesetWindowSize
method. For example:query.win_size = 100
- Have each thread create a
Session
object -
In a multithreaded application, each thread should create its own
Session
object. If multiple threads that share a singleSession
object interact with the object in parallel, all but one of the threads will be queued. Users will experience much better performance if each thread has its ownSession
object. These objects can all be connected to the same Insights Platform session viapy1010.POSSESS
.