mirror of
https://github.com/deepseek-ai/smallpond
synced 2025-06-26 18:27:45 +00:00
deploy: 52ecc5e455
This commit is contained in:
81
_sources/api.rst.txt
Normal file
81
_sources/api.rst.txt
Normal file
@@ -0,0 +1,81 @@
|
||||
API Reference
|
||||
=============
|
||||
|
||||
Smallpond provides both high-level and low-level APIs.
|
||||
|
||||
.. note::
|
||||
Currently, smallpond provides two different APIs, supporting dynamic and static construction of data flow graphs respectively. Due to historical reasons, these two APIs use different scheduler backends and support different configuration options.
|
||||
|
||||
- The High-level API currently uses Ray as the backend, supporting dynamic construction and execution of data flow graphs.
|
||||
- The Low-level API uses a built-in scheduler and only supports one-time execution of static data flow graphs. However, it offers more performance optimizations and richer configuration options.
|
||||
|
||||
We are working to merge them so that in the future, you can use a unified high-level API and freely choose between Ray or the built-in scheduler.
|
||||
|
||||
High-level API
|
||||
--------------
|
||||
|
||||
The high-level API is centered around :ref:`dataframe`. It allows dynamic construction of data flow graphs, execution, and result retrieval.
|
||||
|
||||
A typical workflow looks like this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import smallpond
|
||||
|
||||
sp = smallpond.init()
|
||||
|
||||
df = sp.read_parquet("path/to/dataset/*.parquet")
|
||||
df = df.repartition(10)
|
||||
df = df.map("x + 1")
|
||||
df.write_parquet("path/to/output")
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
api/dataframe
|
||||
|
||||
It is recommended to use the DataFrame API.
|
||||
|
||||
Low-level API
|
||||
-------------
|
||||
|
||||
In the low-level API, users manually create :ref:`nodes` to construct static data flow graphs, then submit them to smallpond to generate :ref:`tasks` and wait for all tasks to complete.
|
||||
|
||||
A complete example is shown below.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from smallpond.logical.dataset import ParquetDataSet
|
||||
from smallpond.logical.node import Context, DataSourceNode, DataSetPartitionNode, SqlEngineNode, LogicalPlan
|
||||
from smallpond.execution.driver import Driver
|
||||
|
||||
def my_pipeline(input_paths: List[str], npartitions: int):
|
||||
ctx = Context()
|
||||
dataset = ParquetDataSet(input_paths)
|
||||
node = DataSourceNode(ctx, dataset)
|
||||
node = DataSetPartitionNode(ctx, (node,), npartitions=npartitions)
|
||||
node = SqlEngineNode(ctx, (node,), "SELECT * FROM {0}")
|
||||
return LogicalPlan(ctx, node)
|
||||
|
||||
if __name__ == "__main__":
|
||||
driver = Driver()
|
||||
driver.add_argument("-i", "--input_paths", nargs="+")
|
||||
driver.add_argument("-n", "--npartitions", type=int, default=10)
|
||||
|
||||
plan = my_pipeline(**driver.get_arguments())
|
||||
driver.run(plan)
|
||||
|
||||
To run this script:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python script.py -i "path/to/*.parquet" -n 10
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
api/dataset
|
||||
api/nodes
|
||||
api/tasks
|
||||
api/execution
|
||||
104
_sources/api/dataframe.rst.txt
Normal file
104
_sources/api/dataframe.rst.txt
Normal file
@@ -0,0 +1,104 @@
|
||||
.. _dataframe:
|
||||
|
||||
DataFrame
|
||||
=========
|
||||
|
||||
DataFrame is the main class in smallpond. It represents a lazily computed, partitioned data set.
|
||||
|
||||
A typical workflow looks like this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import smallpond
|
||||
|
||||
sp = smallpond.init()
|
||||
|
||||
df = sp.read_parquet("path/to/dataset/*.parquet")
|
||||
df = df.repartition(10)
|
||||
df = df.map("x + 1")
|
||||
df.write_parquet("path/to/output")
|
||||
|
||||
Initialization
|
||||
--------------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
smallpond.init
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. _loading_data:
|
||||
|
||||
Loading Data
|
||||
------------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
Session.from_items
|
||||
Session.from_arrow
|
||||
Session.from_pandas
|
||||
Session.read_csv
|
||||
Session.read_json
|
||||
Session.read_parquet
|
||||
|
||||
.. _partitioning_data:
|
||||
|
||||
Partitioning Data
|
||||
-----------------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
DataFrame.repartition
|
||||
|
||||
.. _transformations:
|
||||
|
||||
Transformations
|
||||
---------------
|
||||
|
||||
Apply transformations and return a new DataFrame.
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
Session.partial_sql
|
||||
DataFrame.map
|
||||
DataFrame.map_batches
|
||||
DataFrame.flat_map
|
||||
DataFrame.filter
|
||||
DataFrame.limit
|
||||
DataFrame.partial_sort
|
||||
DataFrame.random_shuffle
|
||||
|
||||
.. _consuming_data:
|
||||
|
||||
Consuming Data
|
||||
--------------
|
||||
|
||||
These operations will trigger execution of the lazy transformations performed on this DataFrame.
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
DataFrame.count
|
||||
DataFrame.take
|
||||
DataFrame.take_all
|
||||
DataFrame.to_arrow
|
||||
DataFrame.to_pandas
|
||||
DataFrame.write_parquet
|
||||
DataFrame.write_parquet_lazy
|
||||
|
||||
Execution
|
||||
---------
|
||||
|
||||
DataFrames are lazily computed. You can use these methods to manually trigger computation.
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
DataFrame.compute
|
||||
DataFrame.is_computed
|
||||
DataFrame.recompute
|
||||
Session.wait
|
||||
28
_sources/api/dataset.rst.txt
Normal file
28
_sources/api/dataset.rst.txt
Normal file
@@ -0,0 +1,28 @@
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
Dataset
|
||||
=======
|
||||
|
||||
Dataset represents a collection of files.
|
||||
|
||||
To create a dataset:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
dataset = ParquetDataSet("path/to/dataset/*.parquet")
|
||||
|
||||
DataSets
|
||||
--------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
DataSet
|
||||
FileSet
|
||||
ParquetDataSet
|
||||
CsvDataSet
|
||||
JsonDataSet
|
||||
ArrowTableDataSet
|
||||
PandasDataSet
|
||||
PartitionedDataSet
|
||||
SqlQueryDataSet
|
||||
85
_sources/api/execution.rst.txt
Normal file
85
_sources/api/execution.rst.txt
Normal file
@@ -0,0 +1,85 @@
|
||||
.. currentmodule:: smallpond.execution
|
||||
|
||||
.. _execution:
|
||||
|
||||
Execution
|
||||
=========
|
||||
|
||||
Submit a Job
|
||||
------------
|
||||
|
||||
After constructing the LogicalPlan, you can use the JobManager to create a Job in the cluster to execute it. However, in most cases, you only need to use the Driver as the entry point of the entire script and then submit the plan. The Driver is a simple wrapper around the JobManager. It reads the configuration from the command line arguments and passes it to the JobManager.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from smallpond.execution.driver import Driver
|
||||
|
||||
if __name__ == "__main__":
|
||||
driver = Driver()
|
||||
# add your own arguments
|
||||
driver.add_argument("-i", "--input_paths", nargs="+")
|
||||
driver.add_argument("-n", "--npartitions", type=int, default=10)
|
||||
# build and run logical plan
|
||||
plan = my_pipeline(**driver.get_arguments())
|
||||
driver.run(plan)
|
||||
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
~driver.Driver
|
||||
~manager.JobManager
|
||||
|
||||
Scheduler and Executor
|
||||
----------------------
|
||||
|
||||
Scheduler and Executor are lower-level APIs. They are directly responsible for scheduling and executing tasks, respectively. Generally, users do not need to use them directly.
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
~scheduler.Scheduler
|
||||
~executor.Executor
|
||||
|
||||
.. _platform:
|
||||
|
||||
Customize Platform
|
||||
------------------
|
||||
|
||||
Smallpond supports user-defined task execution platforms. A Platform includes methods for submitting jobs and a series of default configurations. By default, smallpond automatically detects the current environment and selects the most suitable platform. If it cannot detect one, it uses the default platform.
|
||||
|
||||
You can specify a built-in platform via parameters:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# run with your platform
|
||||
python script.py --platform mpi
|
||||
|
||||
|
||||
Or implement your own Platform class:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# path/to/my/platform.py
|
||||
from smallpond.platform import Platform
|
||||
|
||||
class MyPlatform(Platform):
|
||||
def start_job(self, ...) -> List[str]:
|
||||
...
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# run with your platform
|
||||
# if using Driver
|
||||
python script.py --platform path.to.my.platform
|
||||
|
||||
# if using smallpond.init
|
||||
SP_PLATFORM=path.to.my.platform python script.py
|
||||
|
||||
.. currentmodule:: smallpond
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
~platform.Platform
|
||||
~platform.MPI
|
||||
89
_sources/api/nodes.rst.txt
Normal file
89
_sources/api/nodes.rst.txt
Normal file
@@ -0,0 +1,89 @@
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. _nodes:
|
||||
|
||||
Nodes
|
||||
=====
|
||||
|
||||
Nodes represent the fundamental building blocks of a data processing pipeline. Each node encapsulates a specific operation or transformation that can be applied to a dataset.
|
||||
Nodes can be chained together to form a logical plan, which is a directed acyclic graph (DAG) of nodes that represent the overall data processing workflow.
|
||||
|
||||
A typical workflow to create a logical plan is as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Create a global context
|
||||
ctx = Context()
|
||||
|
||||
# Create a dataset
|
||||
dataset = ParquetDataSet("path/to/dataset/*.parquet")
|
||||
|
||||
# Create a data source node
|
||||
node = DataSourceNode(ctx, dataset)
|
||||
|
||||
# Partition the data
|
||||
node = DataSetPartitionNode(ctx, (node,), npartitions=2)
|
||||
|
||||
# Create a SQL engine node to transform the data
|
||||
node = SqlEngineNode(ctx, (node,), "SELECT * FROM {0}")
|
||||
|
||||
# Create a logical plan from the root node
|
||||
plan = LogicalPlan(ctx, node)
|
||||
|
||||
You can then create tasks from the logical plan, see :ref:`tasks`.
|
||||
|
||||
Notable properties of Node:
|
||||
|
||||
1. Nodes are partitioned. Each Node generates a series of tasks, with each task processing one partition of data.
|
||||
2. The input and output of a Node are a series of partitioned Datasets. A Node may write data to shared storage and return a new Dataset, or it may simply recombine the input Datasets.
|
||||
|
||||
Context
|
||||
-------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
Context
|
||||
NodeId
|
||||
|
||||
LogicalPlan
|
||||
-----------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
LogicalPlan
|
||||
LogicalPlanVisitor
|
||||
.. Planner
|
||||
|
||||
Nodes
|
||||
-----
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
Node
|
||||
DataSetPartitionNode
|
||||
ArrowBatchNode
|
||||
ArrowComputeNode
|
||||
ArrowStreamNode
|
||||
ConsolidateNode
|
||||
DataSinkNode
|
||||
DataSourceNode
|
||||
EvenlyDistributedPartitionNode
|
||||
HashPartitionNode
|
||||
LimitNode
|
||||
LoadPartitionedDataSetNode
|
||||
PandasBatchNode
|
||||
PandasComputeNode
|
||||
PartitionNode
|
||||
ProjectionNode
|
||||
PythonScriptNode
|
||||
RangePartitionNode
|
||||
RepeatPartitionNode
|
||||
RootNode
|
||||
ShuffleNode
|
||||
SqlEngineNode
|
||||
UnionNode
|
||||
UserDefinedPartitionNode
|
||||
UserPartitionedDataSourceNode
|
||||
76
_sources/api/tasks.rst.txt
Normal file
76
_sources/api/tasks.rst.txt
Normal file
@@ -0,0 +1,76 @@
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. _tasks:
|
||||
|
||||
Tasks
|
||||
=====
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# create a runtime context
|
||||
runtime_ctx = RuntimeContext(JobId.new(), data_root)
|
||||
runtime_ctx.initialize(socket.gethostname(), cleanup_root=True)
|
||||
|
||||
# create a logical plan
|
||||
plan = create_logical_plan()
|
||||
|
||||
# create an execution plan
|
||||
planner = Planner(runtime_ctx)
|
||||
exec_plan = planner.create_exec_plan(plan)
|
||||
|
||||
You can then execute the tasks in a scheduler, see :ref:`execution`.
|
||||
|
||||
RuntimeContext
|
||||
--------------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
RuntimeContext
|
||||
JobId
|
||||
TaskId
|
||||
TaskRuntimeId
|
||||
PartitionInfo
|
||||
PerfStats
|
||||
|
||||
|
||||
ExecutionPlan
|
||||
-------------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
ExecutionPlan
|
||||
|
||||
|
||||
Tasks
|
||||
-----
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
Task
|
||||
ArrowBatchTask
|
||||
ArrowComputeTask
|
||||
ArrowStreamTask
|
||||
DataSinkTask
|
||||
DataSourceTask
|
||||
EvenlyDistributedPartitionProducerTask
|
||||
HashPartitionArrowTask
|
||||
HashPartitionDuckDbTask
|
||||
HashPartitionTask
|
||||
LoadPartitionedDataSetProducerTask
|
||||
MergeDataSetsTask
|
||||
PandasBatchTask
|
||||
PandasComputeTask
|
||||
PartitionConsumerTask
|
||||
PartitionProducerTask
|
||||
ProjectionTask
|
||||
PythonScriptTask
|
||||
RangePartitionTask
|
||||
RepeatPartitionProducerTask
|
||||
RootTask
|
||||
SplitDataSetTask
|
||||
SqlEngineTask
|
||||
UserDefinedPartitionProducerTask
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.compute
|
||||
=====================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.compute
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.count
|
||||
===================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.count
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.filter
|
||||
====================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.filter
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.flat\_map
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.flat_map
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.is\_computed
|
||||
==========================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.is_computed
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.limit
|
||||
===================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.limit
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.map
|
||||
=================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.map
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.map\_batches
|
||||
==========================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.map_batches
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.partial\_sort
|
||||
===========================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.partial_sort
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.random\_shuffle
|
||||
=============================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.random_shuffle
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.recompute
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.recompute
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.repartition
|
||||
=========================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.repartition
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.take
|
||||
==================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.take
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.take\_all
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.take_all
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.to\_arrow
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.to_arrow
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.to\_pandas
|
||||
========================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.to_pandas
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.write\_parquet
|
||||
============================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.write_parquet
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.DataFrame.write\_parquet\_lazy
|
||||
==================================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: DataFrame.write_parquet_lazy
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.Session.from\_arrow
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: Session.from_arrow
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.Session.from\_items
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: Session.from_items
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.Session.from\_pandas
|
||||
========================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: Session.from_pandas
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.Session.partial\_sql
|
||||
========================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: Session.partial_sql
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.Session.read\_csv
|
||||
=====================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: Session.read_csv
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.Session.read\_json
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: Session.read_json
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.Session.read\_parquet
|
||||
=========================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: Session.read_parquet
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.dataframe.Session.wait
|
||||
================================
|
||||
|
||||
.. currentmodule:: smallpond.dataframe
|
||||
|
||||
.. automethod:: Session.wait
|
||||
37
_sources/generated/smallpond.execution.driver.Driver.rst.txt
Normal file
37
_sources/generated/smallpond.execution.driver.Driver.rst.txt
Normal file
@@ -0,0 +1,37 @@
|
||||
smallpond.execution.driver.Driver
|
||||
=================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.driver
|
||||
|
||||
.. autoclass:: Driver
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Driver.__init__
|
||||
~Driver.add_argument
|
||||
~Driver.get_arguments
|
||||
~Driver.get_driver_arguments
|
||||
~Driver.get_user_arguments
|
||||
~Driver.parse_arguments
|
||||
~Driver.run
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Driver.data_root
|
||||
~Driver.job_id
|
||||
~Driver.mode
|
||||
~Driver.num_executors
|
||||
|
||||
|
||||
@@ -0,0 +1,39 @@
|
||||
smallpond.execution.executor.Executor
|
||||
=====================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.executor
|
||||
|
||||
.. autoclass:: Executor
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Executor.__init__
|
||||
~Executor.acquire_gpu
|
||||
~Executor.collect_finished_works
|
||||
~Executor.create
|
||||
~Executor.exec_loop
|
||||
~Executor.process_work
|
||||
~Executor.release_gpu
|
||||
~Executor.run
|
||||
~Executor.skip_probes
|
||||
~Executor.stop
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Executor.available_gpu_quota
|
||||
~Executor.busy
|
||||
~Executor.local_gpus
|
||||
|
||||
|
||||
@@ -0,0 +1,31 @@
|
||||
smallpond.execution.manager.JobManager
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.manager
|
||||
|
||||
.. autoclass:: JobManager
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~JobManager.__init__
|
||||
~JobManager.run
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~JobManager.env_template
|
||||
~JobManager.jemalloc_filename
|
||||
~JobManager.mimalloc_filename
|
||||
|
||||
|
||||
@@ -0,0 +1,75 @@
|
||||
smallpond.execution.scheduler.Scheduler
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.scheduler
|
||||
|
||||
.. autoclass:: Scheduler
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Scheduler.__init__
|
||||
~Scheduler.add_state_observer
|
||||
~Scheduler.clean_temp_files
|
||||
~Scheduler.clear_cached_executor_lists
|
||||
~Scheduler.copy_task_for_execution
|
||||
~Scheduler.dispatch_tasks
|
||||
~Scheduler.export_task_metrics
|
||||
~Scheduler.export_timeline_figs
|
||||
~Scheduler.get_retry_task
|
||||
~Scheduler.get_runnable_tasks
|
||||
~Scheduler.log_current_status
|
||||
~Scheduler.log_overall_progress
|
||||
~Scheduler.notify_state_observers
|
||||
~Scheduler.probe_executors
|
||||
~Scheduler.process_finished_tasks
|
||||
~Scheduler.run
|
||||
~Scheduler.save_task_final_state
|
||||
~Scheduler.sched_loop
|
||||
~Scheduler.start_speculative_execution
|
||||
~Scheduler.stop_executors
|
||||
~Scheduler.stop_running_tasks
|
||||
~Scheduler.suspend_good_executors
|
||||
~Scheduler.try_boost_resource
|
||||
~Scheduler.try_enqueue
|
||||
~Scheduler.try_relax_memory_limit
|
||||
~Scheduler.update_executor_states
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Scheduler.StateCallback
|
||||
~Scheduler.abandoned_tasks
|
||||
~Scheduler.alive_executors
|
||||
~Scheduler.elapsed_time
|
||||
~Scheduler.failed_executors
|
||||
~Scheduler.good_executors
|
||||
~Scheduler.large_num_nontrivial_tasks
|
||||
~Scheduler.large_runtime_state
|
||||
~Scheduler.local_executors
|
||||
~Scheduler.low_resource_executors
|
||||
~Scheduler.num_local_running_works
|
||||
~Scheduler.num_pending_nontrivial_tasks
|
||||
~Scheduler.num_pending_tasks
|
||||
~Scheduler.num_running_works
|
||||
~Scheduler.pending_nontrivial_tasks
|
||||
~Scheduler.progress
|
||||
~Scheduler.remote_executors
|
||||
~Scheduler.running_works
|
||||
~Scheduler.stopped_executors
|
||||
~Scheduler.stopping_executors
|
||||
~Scheduler.succeeded_task_ids
|
||||
~Scheduler.success
|
||||
~Scheduler.working_executors
|
||||
|
||||
|
||||
@@ -0,0 +1,132 @@
|
||||
smallpond.execution.task.ArrowBatchTask
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: ArrowBatchTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowBatchTask.__init__
|
||||
~ArrowBatchTask.add_elapsed_time
|
||||
~ArrowBatchTask.adjust_row_group_size
|
||||
~ArrowBatchTask.clean_complex_attrs
|
||||
~ArrowBatchTask.clean_output
|
||||
~ArrowBatchTask.cleanup
|
||||
~ArrowBatchTask.compute_avg_row_size
|
||||
~ArrowBatchTask.create_input_views
|
||||
~ArrowBatchTask.dump
|
||||
~ArrowBatchTask.dump_output
|
||||
~ArrowBatchTask.exec
|
||||
~ArrowBatchTask.exec_query
|
||||
~ArrowBatchTask.finalize
|
||||
~ArrowBatchTask.get_partition_info
|
||||
~ArrowBatchTask.initialize
|
||||
~ArrowBatchTask.inject_fault
|
||||
~ArrowBatchTask.merge_metrics
|
||||
~ArrowBatchTask.oom
|
||||
~ArrowBatchTask.parquet_kv_metadata_bytes
|
||||
~ArrowBatchTask.parquet_kv_metadata_str
|
||||
~ArrowBatchTask.prepare_connection
|
||||
~ArrowBatchTask.process
|
||||
~ArrowBatchTask.random_float
|
||||
~ArrowBatchTask.random_uint32
|
||||
~ArrowBatchTask.restore_input_state
|
||||
~ArrowBatchTask.run
|
||||
~ArrowBatchTask.run_on_ray
|
||||
~ArrowBatchTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowBatchTask.process_func
|
||||
~ArrowBatchTask.background_io_thread
|
||||
~ArrowBatchTask.streaming_batch_size
|
||||
~ArrowBatchTask.streaming_batch_count
|
||||
~ArrowBatchTask.parquet_row_group_size
|
||||
~ArrowBatchTask.parquet_row_group_bytes
|
||||
~ArrowBatchTask.parquet_dictionary_encoding
|
||||
~ArrowBatchTask.parquet_compression
|
||||
~ArrowBatchTask.parquet_compression_level
|
||||
~ArrowBatchTask.secs_checkpoint_interval
|
||||
~ArrowBatchTask.allow_speculative_exec
|
||||
~ArrowBatchTask.any_input_empty
|
||||
~ArrowBatchTask.compression_level_str
|
||||
~ArrowBatchTask.compression_options
|
||||
~ArrowBatchTask.compression_type_str
|
||||
~ArrowBatchTask.cpu_limit
|
||||
~ArrowBatchTask.cpu_overcommit_ratio
|
||||
~ArrowBatchTask.ctx
|
||||
~ArrowBatchTask.dataset
|
||||
~ArrowBatchTask.default_output_name
|
||||
~ArrowBatchTask.elapsed_time
|
||||
~ArrowBatchTask.enable_temp_directory
|
||||
~ArrowBatchTask.exception
|
||||
~ArrowBatchTask.exec_cq
|
||||
~ArrowBatchTask.exec_id
|
||||
~ArrowBatchTask.exec_on_scheduler
|
||||
~ArrowBatchTask.fail_count
|
||||
~ArrowBatchTask.final_output_abspath
|
||||
~ArrowBatchTask.finish_time
|
||||
~ArrowBatchTask.gpu_limit
|
||||
~ArrowBatchTask.id
|
||||
~ArrowBatchTask.input_datasets
|
||||
~ArrowBatchTask.input_deps
|
||||
~ArrowBatchTask.input_udfs
|
||||
~ArrowBatchTask.input_view_index
|
||||
~ArrowBatchTask.key
|
||||
~ArrowBatchTask.local_gpu
|
||||
~ArrowBatchTask.local_gpu_ranks
|
||||
~ArrowBatchTask.local_rank
|
||||
~ArrowBatchTask.location
|
||||
~ArrowBatchTask.max_batch_size
|
||||
~ArrowBatchTask.memory_limit
|
||||
~ArrowBatchTask.memory_overcommit_ratio
|
||||
~ArrowBatchTask.node_id
|
||||
~ArrowBatchTask.numa_node
|
||||
~ArrowBatchTask.numpy_random_gen
|
||||
~ArrowBatchTask.output
|
||||
~ArrowBatchTask.output_deps
|
||||
~ArrowBatchTask.output_dirname
|
||||
~ArrowBatchTask.output_filename
|
||||
~ArrowBatchTask.output_name
|
||||
~ArrowBatchTask.output_root
|
||||
~ArrowBatchTask.partition_dims
|
||||
~ArrowBatchTask.partition_infos
|
||||
~ArrowBatchTask.partition_infos_as_dict
|
||||
~ArrowBatchTask.perf_metrics
|
||||
~ArrowBatchTask.perf_profile
|
||||
~ArrowBatchTask.python_random_gen
|
||||
~ArrowBatchTask.query_udfs
|
||||
~ArrowBatchTask.rand_seed_float
|
||||
~ArrowBatchTask.rand_seed_uint32
|
||||
~ArrowBatchTask.random_seed_bytes
|
||||
~ArrowBatchTask.ray_dataset_path
|
||||
~ArrowBatchTask.ray_marker_path
|
||||
~ArrowBatchTask.retry_count
|
||||
~ArrowBatchTask.runtime_id
|
||||
~ArrowBatchTask.runtime_output_abspath
|
||||
~ArrowBatchTask.runtime_state
|
||||
~ArrowBatchTask.sched_epoch
|
||||
~ArrowBatchTask.self_contained_output
|
||||
~ArrowBatchTask.skip_when_any_input_empty
|
||||
~ArrowBatchTask.staging_root
|
||||
~ArrowBatchTask.start_time
|
||||
~ArrowBatchTask.status
|
||||
~ArrowBatchTask.temp_abspath
|
||||
~ArrowBatchTask.temp_output
|
||||
~ArrowBatchTask.udfs
|
||||
~ArrowBatchTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,127 @@
|
||||
smallpond.execution.task.ArrowComputeTask
|
||||
=========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: ArrowComputeTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowComputeTask.__init__
|
||||
~ArrowComputeTask.add_elapsed_time
|
||||
~ArrowComputeTask.adjust_row_group_size
|
||||
~ArrowComputeTask.clean_complex_attrs
|
||||
~ArrowComputeTask.clean_output
|
||||
~ArrowComputeTask.cleanup
|
||||
~ArrowComputeTask.compute_avg_row_size
|
||||
~ArrowComputeTask.create_input_views
|
||||
~ArrowComputeTask.dump
|
||||
~ArrowComputeTask.dump_output
|
||||
~ArrowComputeTask.exec
|
||||
~ArrowComputeTask.exec_query
|
||||
~ArrowComputeTask.finalize
|
||||
~ArrowComputeTask.get_partition_info
|
||||
~ArrowComputeTask.initialize
|
||||
~ArrowComputeTask.inject_fault
|
||||
~ArrowComputeTask.merge_metrics
|
||||
~ArrowComputeTask.oom
|
||||
~ArrowComputeTask.parquet_kv_metadata_bytes
|
||||
~ArrowComputeTask.parquet_kv_metadata_str
|
||||
~ArrowComputeTask.prepare_connection
|
||||
~ArrowComputeTask.process
|
||||
~ArrowComputeTask.random_float
|
||||
~ArrowComputeTask.random_uint32
|
||||
~ArrowComputeTask.run
|
||||
~ArrowComputeTask.run_on_ray
|
||||
~ArrowComputeTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowComputeTask.process_func
|
||||
~ArrowComputeTask.parquet_row_group_size
|
||||
~ArrowComputeTask.parquet_row_group_bytes
|
||||
~ArrowComputeTask.parquet_dictionary_encoding
|
||||
~ArrowComputeTask.use_duckdb_reader
|
||||
~ArrowComputeTask.allow_speculative_exec
|
||||
~ArrowComputeTask.any_input_empty
|
||||
~ArrowComputeTask.compression_level_str
|
||||
~ArrowComputeTask.compression_options
|
||||
~ArrowComputeTask.compression_type_str
|
||||
~ArrowComputeTask.cpu_limit
|
||||
~ArrowComputeTask.cpu_overcommit_ratio
|
||||
~ArrowComputeTask.ctx
|
||||
~ArrowComputeTask.dataset
|
||||
~ArrowComputeTask.default_output_name
|
||||
~ArrowComputeTask.elapsed_time
|
||||
~ArrowComputeTask.enable_temp_directory
|
||||
~ArrowComputeTask.exception
|
||||
~ArrowComputeTask.exec_cq
|
||||
~ArrowComputeTask.exec_id
|
||||
~ArrowComputeTask.exec_on_scheduler
|
||||
~ArrowComputeTask.fail_count
|
||||
~ArrowComputeTask.final_output_abspath
|
||||
~ArrowComputeTask.finish_time
|
||||
~ArrowComputeTask.gpu_limit
|
||||
~ArrowComputeTask.id
|
||||
~ArrowComputeTask.input_datasets
|
||||
~ArrowComputeTask.input_deps
|
||||
~ArrowComputeTask.input_udfs
|
||||
~ArrowComputeTask.input_view_index
|
||||
~ArrowComputeTask.key
|
||||
~ArrowComputeTask.local_gpu
|
||||
~ArrowComputeTask.local_gpu_ranks
|
||||
~ArrowComputeTask.local_rank
|
||||
~ArrowComputeTask.location
|
||||
~ArrowComputeTask.memory_limit
|
||||
~ArrowComputeTask.memory_overcommit_ratio
|
||||
~ArrowComputeTask.node_id
|
||||
~ArrowComputeTask.numa_node
|
||||
~ArrowComputeTask.numpy_random_gen
|
||||
~ArrowComputeTask.output
|
||||
~ArrowComputeTask.output_deps
|
||||
~ArrowComputeTask.output_dirname
|
||||
~ArrowComputeTask.output_filename
|
||||
~ArrowComputeTask.output_name
|
||||
~ArrowComputeTask.output_root
|
||||
~ArrowComputeTask.parquet_compression
|
||||
~ArrowComputeTask.parquet_compression_level
|
||||
~ArrowComputeTask.partition_dims
|
||||
~ArrowComputeTask.partition_infos
|
||||
~ArrowComputeTask.partition_infos_as_dict
|
||||
~ArrowComputeTask.perf_metrics
|
||||
~ArrowComputeTask.perf_profile
|
||||
~ArrowComputeTask.python_random_gen
|
||||
~ArrowComputeTask.query_udfs
|
||||
~ArrowComputeTask.rand_seed_float
|
||||
~ArrowComputeTask.rand_seed_uint32
|
||||
~ArrowComputeTask.random_seed_bytes
|
||||
~ArrowComputeTask.ray_dataset_path
|
||||
~ArrowComputeTask.ray_marker_path
|
||||
~ArrowComputeTask.retry_count
|
||||
~ArrowComputeTask.runtime_id
|
||||
~ArrowComputeTask.runtime_output_abspath
|
||||
~ArrowComputeTask.runtime_state
|
||||
~ArrowComputeTask.sched_epoch
|
||||
~ArrowComputeTask.self_contained_output
|
||||
~ArrowComputeTask.skip_when_any_input_empty
|
||||
~ArrowComputeTask.staging_root
|
||||
~ArrowComputeTask.start_time
|
||||
~ArrowComputeTask.status
|
||||
~ArrowComputeTask.temp_abspath
|
||||
~ArrowComputeTask.temp_output
|
||||
~ArrowComputeTask.udfs
|
||||
~ArrowComputeTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,132 @@
|
||||
smallpond.execution.task.ArrowStreamTask
|
||||
========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: ArrowStreamTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowStreamTask.__init__
|
||||
~ArrowStreamTask.add_elapsed_time
|
||||
~ArrowStreamTask.adjust_row_group_size
|
||||
~ArrowStreamTask.clean_complex_attrs
|
||||
~ArrowStreamTask.clean_output
|
||||
~ArrowStreamTask.cleanup
|
||||
~ArrowStreamTask.compute_avg_row_size
|
||||
~ArrowStreamTask.create_input_views
|
||||
~ArrowStreamTask.dump
|
||||
~ArrowStreamTask.dump_output
|
||||
~ArrowStreamTask.exec
|
||||
~ArrowStreamTask.exec_query
|
||||
~ArrowStreamTask.finalize
|
||||
~ArrowStreamTask.get_partition_info
|
||||
~ArrowStreamTask.initialize
|
||||
~ArrowStreamTask.inject_fault
|
||||
~ArrowStreamTask.merge_metrics
|
||||
~ArrowStreamTask.oom
|
||||
~ArrowStreamTask.parquet_kv_metadata_bytes
|
||||
~ArrowStreamTask.parquet_kv_metadata_str
|
||||
~ArrowStreamTask.prepare_connection
|
||||
~ArrowStreamTask.process
|
||||
~ArrowStreamTask.random_float
|
||||
~ArrowStreamTask.random_uint32
|
||||
~ArrowStreamTask.restore_input_state
|
||||
~ArrowStreamTask.run
|
||||
~ArrowStreamTask.run_on_ray
|
||||
~ArrowStreamTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowStreamTask.process_func
|
||||
~ArrowStreamTask.background_io_thread
|
||||
~ArrowStreamTask.streaming_batch_size
|
||||
~ArrowStreamTask.streaming_batch_count
|
||||
~ArrowStreamTask.parquet_row_group_size
|
||||
~ArrowStreamTask.parquet_row_group_bytes
|
||||
~ArrowStreamTask.parquet_dictionary_encoding
|
||||
~ArrowStreamTask.parquet_compression
|
||||
~ArrowStreamTask.parquet_compression_level
|
||||
~ArrowStreamTask.secs_checkpoint_interval
|
||||
~ArrowStreamTask.allow_speculative_exec
|
||||
~ArrowStreamTask.any_input_empty
|
||||
~ArrowStreamTask.compression_level_str
|
||||
~ArrowStreamTask.compression_options
|
||||
~ArrowStreamTask.compression_type_str
|
||||
~ArrowStreamTask.cpu_limit
|
||||
~ArrowStreamTask.cpu_overcommit_ratio
|
||||
~ArrowStreamTask.ctx
|
||||
~ArrowStreamTask.dataset
|
||||
~ArrowStreamTask.default_output_name
|
||||
~ArrowStreamTask.elapsed_time
|
||||
~ArrowStreamTask.enable_temp_directory
|
||||
~ArrowStreamTask.exception
|
||||
~ArrowStreamTask.exec_cq
|
||||
~ArrowStreamTask.exec_id
|
||||
~ArrowStreamTask.exec_on_scheduler
|
||||
~ArrowStreamTask.fail_count
|
||||
~ArrowStreamTask.final_output_abspath
|
||||
~ArrowStreamTask.finish_time
|
||||
~ArrowStreamTask.gpu_limit
|
||||
~ArrowStreamTask.id
|
||||
~ArrowStreamTask.input_datasets
|
||||
~ArrowStreamTask.input_deps
|
||||
~ArrowStreamTask.input_udfs
|
||||
~ArrowStreamTask.input_view_index
|
||||
~ArrowStreamTask.key
|
||||
~ArrowStreamTask.local_gpu
|
||||
~ArrowStreamTask.local_gpu_ranks
|
||||
~ArrowStreamTask.local_rank
|
||||
~ArrowStreamTask.location
|
||||
~ArrowStreamTask.max_batch_size
|
||||
~ArrowStreamTask.memory_limit
|
||||
~ArrowStreamTask.memory_overcommit_ratio
|
||||
~ArrowStreamTask.node_id
|
||||
~ArrowStreamTask.numa_node
|
||||
~ArrowStreamTask.numpy_random_gen
|
||||
~ArrowStreamTask.output
|
||||
~ArrowStreamTask.output_deps
|
||||
~ArrowStreamTask.output_dirname
|
||||
~ArrowStreamTask.output_filename
|
||||
~ArrowStreamTask.output_name
|
||||
~ArrowStreamTask.output_root
|
||||
~ArrowStreamTask.partition_dims
|
||||
~ArrowStreamTask.partition_infos
|
||||
~ArrowStreamTask.partition_infos_as_dict
|
||||
~ArrowStreamTask.perf_metrics
|
||||
~ArrowStreamTask.perf_profile
|
||||
~ArrowStreamTask.python_random_gen
|
||||
~ArrowStreamTask.query_udfs
|
||||
~ArrowStreamTask.rand_seed_float
|
||||
~ArrowStreamTask.rand_seed_uint32
|
||||
~ArrowStreamTask.random_seed_bytes
|
||||
~ArrowStreamTask.ray_dataset_path
|
||||
~ArrowStreamTask.ray_marker_path
|
||||
~ArrowStreamTask.retry_count
|
||||
~ArrowStreamTask.runtime_id
|
||||
~ArrowStreamTask.runtime_output_abspath
|
||||
~ArrowStreamTask.runtime_state
|
||||
~ArrowStreamTask.sched_epoch
|
||||
~ArrowStreamTask.self_contained_output
|
||||
~ArrowStreamTask.skip_when_any_input_empty
|
||||
~ArrowStreamTask.staging_root
|
||||
~ArrowStreamTask.start_time
|
||||
~ArrowStreamTask.status
|
||||
~ArrowStreamTask.temp_abspath
|
||||
~ArrowStreamTask.temp_output
|
||||
~ArrowStreamTask.udfs
|
||||
~ArrowStreamTask.uniform_failure_prob
|
||||
|
||||
|
||||
107
_sources/generated/smallpond.execution.task.DataSinkTask.rst.txt
Normal file
107
_sources/generated/smallpond.execution.task.DataSinkTask.rst.txt
Normal file
@@ -0,0 +1,107 @@
|
||||
smallpond.execution.task.DataSinkTask
|
||||
=====================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: DataSinkTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSinkTask.__init__
|
||||
~DataSinkTask.add_elapsed_time
|
||||
~DataSinkTask.adjust_row_group_size
|
||||
~DataSinkTask.clean_complex_attrs
|
||||
~DataSinkTask.clean_output
|
||||
~DataSinkTask.cleanup
|
||||
~DataSinkTask.collect_output_files
|
||||
~DataSinkTask.compute_avg_row_size
|
||||
~DataSinkTask.dump
|
||||
~DataSinkTask.exec
|
||||
~DataSinkTask.finalize
|
||||
~DataSinkTask.get_partition_info
|
||||
~DataSinkTask.initialize
|
||||
~DataSinkTask.inject_fault
|
||||
~DataSinkTask.merge_metrics
|
||||
~DataSinkTask.oom
|
||||
~DataSinkTask.parquet_kv_metadata_bytes
|
||||
~DataSinkTask.parquet_kv_metadata_str
|
||||
~DataSinkTask.random_float
|
||||
~DataSinkTask.random_uint32
|
||||
~DataSinkTask.run
|
||||
~DataSinkTask.run_on_ray
|
||||
~DataSinkTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSinkTask.type
|
||||
~DataSinkTask.is_final_node
|
||||
~DataSinkTask.allow_speculative_exec
|
||||
~DataSinkTask.any_input_empty
|
||||
~DataSinkTask.cpu_limit
|
||||
~DataSinkTask.ctx
|
||||
~DataSinkTask.dataset
|
||||
~DataSinkTask.default_output_name
|
||||
~DataSinkTask.elapsed_time
|
||||
~DataSinkTask.exception
|
||||
~DataSinkTask.exec_cq
|
||||
~DataSinkTask.exec_id
|
||||
~DataSinkTask.exec_on_scheduler
|
||||
~DataSinkTask.fail_count
|
||||
~DataSinkTask.final_output_abspath
|
||||
~DataSinkTask.finish_time
|
||||
~DataSinkTask.gpu_limit
|
||||
~DataSinkTask.id
|
||||
~DataSinkTask.input_datasets
|
||||
~DataSinkTask.input_deps
|
||||
~DataSinkTask.key
|
||||
~DataSinkTask.local_gpu
|
||||
~DataSinkTask.local_gpu_ranks
|
||||
~DataSinkTask.local_rank
|
||||
~DataSinkTask.location
|
||||
~DataSinkTask.manifest_filename
|
||||
~DataSinkTask.memory_limit
|
||||
~DataSinkTask.node_id
|
||||
~DataSinkTask.numa_node
|
||||
~DataSinkTask.numpy_random_gen
|
||||
~DataSinkTask.output
|
||||
~DataSinkTask.output_deps
|
||||
~DataSinkTask.output_dirname
|
||||
~DataSinkTask.output_filename
|
||||
~DataSinkTask.output_name
|
||||
~DataSinkTask.output_root
|
||||
~DataSinkTask.partition_dims
|
||||
~DataSinkTask.partition_infos
|
||||
~DataSinkTask.partition_infos_as_dict
|
||||
~DataSinkTask.perf_metrics
|
||||
~DataSinkTask.perf_profile
|
||||
~DataSinkTask.python_random_gen
|
||||
~DataSinkTask.random_seed_bytes
|
||||
~DataSinkTask.ray_dataset_path
|
||||
~DataSinkTask.ray_marker_path
|
||||
~DataSinkTask.retry_count
|
||||
~DataSinkTask.runtime_id
|
||||
~DataSinkTask.runtime_output_abspath
|
||||
~DataSinkTask.runtime_state
|
||||
~DataSinkTask.sched_epoch
|
||||
~DataSinkTask.self_contained_output
|
||||
~DataSinkTask.skip_when_any_input_empty
|
||||
~DataSinkTask.staging_root
|
||||
~DataSinkTask.start_time
|
||||
~DataSinkTask.status
|
||||
~DataSinkTask.temp_abspath
|
||||
~DataSinkTask.temp_output
|
||||
~DataSinkTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,103 @@
|
||||
smallpond.execution.task.DataSourceTask
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: DataSourceTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSourceTask.__init__
|
||||
~DataSourceTask.add_elapsed_time
|
||||
~DataSourceTask.adjust_row_group_size
|
||||
~DataSourceTask.clean_complex_attrs
|
||||
~DataSourceTask.clean_output
|
||||
~DataSourceTask.cleanup
|
||||
~DataSourceTask.compute_avg_row_size
|
||||
~DataSourceTask.dump
|
||||
~DataSourceTask.exec
|
||||
~DataSourceTask.finalize
|
||||
~DataSourceTask.get_partition_info
|
||||
~DataSourceTask.initialize
|
||||
~DataSourceTask.inject_fault
|
||||
~DataSourceTask.merge_metrics
|
||||
~DataSourceTask.oom
|
||||
~DataSourceTask.parquet_kv_metadata_bytes
|
||||
~DataSourceTask.parquet_kv_metadata_str
|
||||
~DataSourceTask.random_float
|
||||
~DataSourceTask.random_uint32
|
||||
~DataSourceTask.run
|
||||
~DataSourceTask.run_on_ray
|
||||
~DataSourceTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSourceTask.ctx
|
||||
~DataSourceTask.id
|
||||
~DataSourceTask.node_id
|
||||
~DataSourceTask.sched_epoch
|
||||
~DataSourceTask.output_name
|
||||
~DataSourceTask.output_root
|
||||
~DataSourceTask.dataset
|
||||
~DataSourceTask.input_deps
|
||||
~DataSourceTask.output_deps
|
||||
~DataSourceTask.perf_metrics
|
||||
~DataSourceTask.perf_profile
|
||||
~DataSourceTask.runtime_state
|
||||
~DataSourceTask.input_datasets
|
||||
~DataSourceTask.allow_speculative_exec
|
||||
~DataSourceTask.any_input_empty
|
||||
~DataSourceTask.cpu_limit
|
||||
~DataSourceTask.default_output_name
|
||||
~DataSourceTask.elapsed_time
|
||||
~DataSourceTask.exception
|
||||
~DataSourceTask.exec_cq
|
||||
~DataSourceTask.exec_id
|
||||
~DataSourceTask.exec_on_scheduler
|
||||
~DataSourceTask.fail_count
|
||||
~DataSourceTask.final_output_abspath
|
||||
~DataSourceTask.finish_time
|
||||
~DataSourceTask.gpu_limit
|
||||
~DataSourceTask.key
|
||||
~DataSourceTask.local_gpu
|
||||
~DataSourceTask.local_gpu_ranks
|
||||
~DataSourceTask.local_rank
|
||||
~DataSourceTask.location
|
||||
~DataSourceTask.memory_limit
|
||||
~DataSourceTask.numa_node
|
||||
~DataSourceTask.numpy_random_gen
|
||||
~DataSourceTask.output
|
||||
~DataSourceTask.output_dirname
|
||||
~DataSourceTask.output_filename
|
||||
~DataSourceTask.partition_dims
|
||||
~DataSourceTask.partition_infos
|
||||
~DataSourceTask.partition_infos_as_dict
|
||||
~DataSourceTask.python_random_gen
|
||||
~DataSourceTask.random_seed_bytes
|
||||
~DataSourceTask.ray_dataset_path
|
||||
~DataSourceTask.ray_marker_path
|
||||
~DataSourceTask.retry_count
|
||||
~DataSourceTask.runtime_id
|
||||
~DataSourceTask.runtime_output_abspath
|
||||
~DataSourceTask.self_contained_output
|
||||
~DataSourceTask.skip_when_any_input_empty
|
||||
~DataSourceTask.staging_root
|
||||
~DataSourceTask.start_time
|
||||
~DataSourceTask.status
|
||||
~DataSourceTask.temp_abspath
|
||||
~DataSourceTask.temp_output
|
||||
~DataSourceTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,108 @@
|
||||
smallpond.execution.task.EvenlyDistributedPartitionProducerTask
|
||||
===============================================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: EvenlyDistributedPartitionProducerTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~EvenlyDistributedPartitionProducerTask.__init__
|
||||
~EvenlyDistributedPartitionProducerTask.add_elapsed_time
|
||||
~EvenlyDistributedPartitionProducerTask.adjust_row_group_size
|
||||
~EvenlyDistributedPartitionProducerTask.clean_complex_attrs
|
||||
~EvenlyDistributedPartitionProducerTask.clean_output
|
||||
~EvenlyDistributedPartitionProducerTask.cleanup
|
||||
~EvenlyDistributedPartitionProducerTask.compute_avg_row_size
|
||||
~EvenlyDistributedPartitionProducerTask.dump
|
||||
~EvenlyDistributedPartitionProducerTask.exec
|
||||
~EvenlyDistributedPartitionProducerTask.finalize
|
||||
~EvenlyDistributedPartitionProducerTask.get_partition_info
|
||||
~EvenlyDistributedPartitionProducerTask.initialize
|
||||
~EvenlyDistributedPartitionProducerTask.inject_fault
|
||||
~EvenlyDistributedPartitionProducerTask.merge_metrics
|
||||
~EvenlyDistributedPartitionProducerTask.oom
|
||||
~EvenlyDistributedPartitionProducerTask.parquet_kv_metadata_bytes
|
||||
~EvenlyDistributedPartitionProducerTask.parquet_kv_metadata_str
|
||||
~EvenlyDistributedPartitionProducerTask.random_float
|
||||
~EvenlyDistributedPartitionProducerTask.random_uint32
|
||||
~EvenlyDistributedPartitionProducerTask.run
|
||||
~EvenlyDistributedPartitionProducerTask.run_on_ray
|
||||
~EvenlyDistributedPartitionProducerTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~EvenlyDistributedPartitionProducerTask.partition_by_rows
|
||||
~EvenlyDistributedPartitionProducerTask.random_shuffle
|
||||
~EvenlyDistributedPartitionProducerTask.allow_speculative_exec
|
||||
~EvenlyDistributedPartitionProducerTask.any_input_empty
|
||||
~EvenlyDistributedPartitionProducerTask.cpu_limit
|
||||
~EvenlyDistributedPartitionProducerTask.ctx
|
||||
~EvenlyDistributedPartitionProducerTask.dataset
|
||||
~EvenlyDistributedPartitionProducerTask.default_output_name
|
||||
~EvenlyDistributedPartitionProducerTask.dimension
|
||||
~EvenlyDistributedPartitionProducerTask.elapsed_time
|
||||
~EvenlyDistributedPartitionProducerTask.exception
|
||||
~EvenlyDistributedPartitionProducerTask.exec_cq
|
||||
~EvenlyDistributedPartitionProducerTask.exec_id
|
||||
~EvenlyDistributedPartitionProducerTask.exec_on_scheduler
|
||||
~EvenlyDistributedPartitionProducerTask.fail_count
|
||||
~EvenlyDistributedPartitionProducerTask.final_output_abspath
|
||||
~EvenlyDistributedPartitionProducerTask.finish_time
|
||||
~EvenlyDistributedPartitionProducerTask.gpu_limit
|
||||
~EvenlyDistributedPartitionProducerTask.id
|
||||
~EvenlyDistributedPartitionProducerTask.input_datasets
|
||||
~EvenlyDistributedPartitionProducerTask.input_deps
|
||||
~EvenlyDistributedPartitionProducerTask.key
|
||||
~EvenlyDistributedPartitionProducerTask.local_gpu
|
||||
~EvenlyDistributedPartitionProducerTask.local_gpu_ranks
|
||||
~EvenlyDistributedPartitionProducerTask.local_rank
|
||||
~EvenlyDistributedPartitionProducerTask.location
|
||||
~EvenlyDistributedPartitionProducerTask.memory_limit
|
||||
~EvenlyDistributedPartitionProducerTask.node_id
|
||||
~EvenlyDistributedPartitionProducerTask.npartitions
|
||||
~EvenlyDistributedPartitionProducerTask.numa_node
|
||||
~EvenlyDistributedPartitionProducerTask.numpy_random_gen
|
||||
~EvenlyDistributedPartitionProducerTask.output
|
||||
~EvenlyDistributedPartitionProducerTask.output_deps
|
||||
~EvenlyDistributedPartitionProducerTask.output_dirname
|
||||
~EvenlyDistributedPartitionProducerTask.output_filename
|
||||
~EvenlyDistributedPartitionProducerTask.output_name
|
||||
~EvenlyDistributedPartitionProducerTask.output_root
|
||||
~EvenlyDistributedPartitionProducerTask.partition_dims
|
||||
~EvenlyDistributedPartitionProducerTask.partition_infos
|
||||
~EvenlyDistributedPartitionProducerTask.partition_infos_as_dict
|
||||
~EvenlyDistributedPartitionProducerTask.partitioned_datasets
|
||||
~EvenlyDistributedPartitionProducerTask.perf_metrics
|
||||
~EvenlyDistributedPartitionProducerTask.perf_profile
|
||||
~EvenlyDistributedPartitionProducerTask.python_random_gen
|
||||
~EvenlyDistributedPartitionProducerTask.random_seed_bytes
|
||||
~EvenlyDistributedPartitionProducerTask.ray_dataset_path
|
||||
~EvenlyDistributedPartitionProducerTask.ray_marker_path
|
||||
~EvenlyDistributedPartitionProducerTask.retry_count
|
||||
~EvenlyDistributedPartitionProducerTask.runtime_id
|
||||
~EvenlyDistributedPartitionProducerTask.runtime_output_abspath
|
||||
~EvenlyDistributedPartitionProducerTask.runtime_state
|
||||
~EvenlyDistributedPartitionProducerTask.sched_epoch
|
||||
~EvenlyDistributedPartitionProducerTask.self_contained_output
|
||||
~EvenlyDistributedPartitionProducerTask.skip_when_any_input_empty
|
||||
~EvenlyDistributedPartitionProducerTask.staging_root
|
||||
~EvenlyDistributedPartitionProducerTask.start_time
|
||||
~EvenlyDistributedPartitionProducerTask.status
|
||||
~EvenlyDistributedPartitionProducerTask.temp_abspath
|
||||
~EvenlyDistributedPartitionProducerTask.temp_output
|
||||
~EvenlyDistributedPartitionProducerTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,36 @@
|
||||
smallpond.execution.task.ExecutionPlan
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: ExecutionPlan
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ExecutionPlan.__init__
|
||||
~ExecutionPlan.get_output
|
||||
~ExecutionPlan.iter_tasks
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ExecutionPlan.analyzed_logical_plan
|
||||
~ExecutionPlan.final_output
|
||||
~ExecutionPlan.final_output_path
|
||||
~ExecutionPlan.leaves
|
||||
~ExecutionPlan.named_outputs
|
||||
~ExecutionPlan.successful
|
||||
~ExecutionPlan.tasks
|
||||
|
||||
|
||||
@@ -0,0 +1,125 @@
|
||||
smallpond.execution.task.HashPartitionArrowTask
|
||||
===============================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: HashPartitionArrowTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~HashPartitionArrowTask.__init__
|
||||
~HashPartitionArrowTask.add_elapsed_time
|
||||
~HashPartitionArrowTask.adjust_row_group_size
|
||||
~HashPartitionArrowTask.clean_complex_attrs
|
||||
~HashPartitionArrowTask.clean_output
|
||||
~HashPartitionArrowTask.cleanup
|
||||
~HashPartitionArrowTask.compute_avg_row_size
|
||||
~HashPartitionArrowTask.create
|
||||
~HashPartitionArrowTask.dump
|
||||
~HashPartitionArrowTask.exec
|
||||
~HashPartitionArrowTask.finalize
|
||||
~HashPartitionArrowTask.get_partition_info
|
||||
~HashPartitionArrowTask.initialize
|
||||
~HashPartitionArrowTask.inject_fault
|
||||
~HashPartitionArrowTask.merge_metrics
|
||||
~HashPartitionArrowTask.oom
|
||||
~HashPartitionArrowTask.parquet_kv_metadata_bytes
|
||||
~HashPartitionArrowTask.parquet_kv_metadata_str
|
||||
~HashPartitionArrowTask.partition
|
||||
~HashPartitionArrowTask.random_float
|
||||
~HashPartitionArrowTask.random_uint32
|
||||
~HashPartitionArrowTask.run
|
||||
~HashPartitionArrowTask.run_on_ray
|
||||
~HashPartitionArrowTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~HashPartitionArrowTask.hash_columns
|
||||
~HashPartitionArrowTask.data_partition_column
|
||||
~HashPartitionArrowTask.random_shuffle
|
||||
~HashPartitionArrowTask.shuffle_only
|
||||
~HashPartitionArrowTask.drop_partition_column
|
||||
~HashPartitionArrowTask.use_parquet_writer
|
||||
~HashPartitionArrowTask.hive_partitioning
|
||||
~HashPartitionArrowTask.parquet_row_group_size
|
||||
~HashPartitionArrowTask.parquet_row_group_bytes
|
||||
~HashPartitionArrowTask.parquet_dictionary_encoding
|
||||
~HashPartitionArrowTask.parquet_compression
|
||||
~HashPartitionArrowTask.parquet_compression_level
|
||||
~HashPartitionArrowTask.partitioned_datasets
|
||||
~HashPartitionArrowTask.allow_speculative_exec
|
||||
~HashPartitionArrowTask.any_input_empty
|
||||
~HashPartitionArrowTask.cpu_limit
|
||||
~HashPartitionArrowTask.ctx
|
||||
~HashPartitionArrowTask.dataset
|
||||
~HashPartitionArrowTask.default_output_name
|
||||
~HashPartitionArrowTask.dimension
|
||||
~HashPartitionArrowTask.elapsed_time
|
||||
~HashPartitionArrowTask.exception
|
||||
~HashPartitionArrowTask.exec_cq
|
||||
~HashPartitionArrowTask.exec_id
|
||||
~HashPartitionArrowTask.exec_on_scheduler
|
||||
~HashPartitionArrowTask.fail_count
|
||||
~HashPartitionArrowTask.final_output_abspath
|
||||
~HashPartitionArrowTask.finish_time
|
||||
~HashPartitionArrowTask.fixed_rand_seeds
|
||||
~HashPartitionArrowTask.gpu_limit
|
||||
~HashPartitionArrowTask.id
|
||||
~HashPartitionArrowTask.input_datasets
|
||||
~HashPartitionArrowTask.input_deps
|
||||
~HashPartitionArrowTask.io_workers
|
||||
~HashPartitionArrowTask.key
|
||||
~HashPartitionArrowTask.local_gpu
|
||||
~HashPartitionArrowTask.local_gpu_ranks
|
||||
~HashPartitionArrowTask.local_rank
|
||||
~HashPartitionArrowTask.location
|
||||
~HashPartitionArrowTask.max_batch_size
|
||||
~HashPartitionArrowTask.memory_limit
|
||||
~HashPartitionArrowTask.node_id
|
||||
~HashPartitionArrowTask.npartitions
|
||||
~HashPartitionArrowTask.num_workers
|
||||
~HashPartitionArrowTask.numa_node
|
||||
~HashPartitionArrowTask.numpy_random_gen
|
||||
~HashPartitionArrowTask.output
|
||||
~HashPartitionArrowTask.output_deps
|
||||
~HashPartitionArrowTask.output_dirname
|
||||
~HashPartitionArrowTask.output_filename
|
||||
~HashPartitionArrowTask.output_name
|
||||
~HashPartitionArrowTask.output_root
|
||||
~HashPartitionArrowTask.partition_dims
|
||||
~HashPartitionArrowTask.partition_infos
|
||||
~HashPartitionArrowTask.partition_infos_as_dict
|
||||
~HashPartitionArrowTask.perf_metrics
|
||||
~HashPartitionArrowTask.perf_profile
|
||||
~HashPartitionArrowTask.python_random_gen
|
||||
~HashPartitionArrowTask.random_seed_bytes
|
||||
~HashPartitionArrowTask.ray_dataset_path
|
||||
~HashPartitionArrowTask.ray_marker_path
|
||||
~HashPartitionArrowTask.retry_count
|
||||
~HashPartitionArrowTask.runtime_id
|
||||
~HashPartitionArrowTask.runtime_output_abspath
|
||||
~HashPartitionArrowTask.runtime_state
|
||||
~HashPartitionArrowTask.sched_epoch
|
||||
~HashPartitionArrowTask.self_contained_output
|
||||
~HashPartitionArrowTask.skip_when_any_input_empty
|
||||
~HashPartitionArrowTask.staging_root
|
||||
~HashPartitionArrowTask.start_time
|
||||
~HashPartitionArrowTask.status
|
||||
~HashPartitionArrowTask.temp_abspath
|
||||
~HashPartitionArrowTask.temp_output
|
||||
~HashPartitionArrowTask.uniform_failure_prob
|
||||
~HashPartitionArrowTask.write_buffer_size
|
||||
|
||||
|
||||
@@ -0,0 +1,143 @@
|
||||
smallpond.execution.task.HashPartitionDuckDbTask
|
||||
================================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: HashPartitionDuckDbTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~HashPartitionDuckDbTask.__init__
|
||||
~HashPartitionDuckDbTask.add_elapsed_time
|
||||
~HashPartitionDuckDbTask.adjust_row_group_size
|
||||
~HashPartitionDuckDbTask.clean_complex_attrs
|
||||
~HashPartitionDuckDbTask.clean_output
|
||||
~HashPartitionDuckDbTask.cleanup
|
||||
~HashPartitionDuckDbTask.compute_avg_row_size
|
||||
~HashPartitionDuckDbTask.create
|
||||
~HashPartitionDuckDbTask.create_input_views
|
||||
~HashPartitionDuckDbTask.dump
|
||||
~HashPartitionDuckDbTask.exec
|
||||
~HashPartitionDuckDbTask.exec_query
|
||||
~HashPartitionDuckDbTask.finalize
|
||||
~HashPartitionDuckDbTask.get_partition_info
|
||||
~HashPartitionDuckDbTask.initialize
|
||||
~HashPartitionDuckDbTask.inject_fault
|
||||
~HashPartitionDuckDbTask.load_input_batch
|
||||
~HashPartitionDuckDbTask.merge_metrics
|
||||
~HashPartitionDuckDbTask.oom
|
||||
~HashPartitionDuckDbTask.parquet_kv_metadata_bytes
|
||||
~HashPartitionDuckDbTask.parquet_kv_metadata_str
|
||||
~HashPartitionDuckDbTask.partition
|
||||
~HashPartitionDuckDbTask.prepare_connection
|
||||
~HashPartitionDuckDbTask.random_float
|
||||
~HashPartitionDuckDbTask.random_uint32
|
||||
~HashPartitionDuckDbTask.run
|
||||
~HashPartitionDuckDbTask.run_on_ray
|
||||
~HashPartitionDuckDbTask.set_memory_limit
|
||||
~HashPartitionDuckDbTask.write_flat_partitions
|
||||
~HashPartitionDuckDbTask.write_hive_partitions
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~HashPartitionDuckDbTask.hash_columns
|
||||
~HashPartitionDuckDbTask.data_partition_column
|
||||
~HashPartitionDuckDbTask.random_shuffle
|
||||
~HashPartitionDuckDbTask.shuffle_only
|
||||
~HashPartitionDuckDbTask.drop_partition_column
|
||||
~HashPartitionDuckDbTask.use_parquet_writer
|
||||
~HashPartitionDuckDbTask.hive_partitioning
|
||||
~HashPartitionDuckDbTask.parquet_row_group_size
|
||||
~HashPartitionDuckDbTask.parquet_row_group_bytes
|
||||
~HashPartitionDuckDbTask.parquet_dictionary_encoding
|
||||
~HashPartitionDuckDbTask.parquet_compression
|
||||
~HashPartitionDuckDbTask.parquet_compression_level
|
||||
~HashPartitionDuckDbTask.partitioned_datasets
|
||||
~HashPartitionDuckDbTask.allow_speculative_exec
|
||||
~HashPartitionDuckDbTask.any_input_empty
|
||||
~HashPartitionDuckDbTask.compression_level_str
|
||||
~HashPartitionDuckDbTask.compression_options
|
||||
~HashPartitionDuckDbTask.compression_type_str
|
||||
~HashPartitionDuckDbTask.cpu_limit
|
||||
~HashPartitionDuckDbTask.cpu_overcommit_ratio
|
||||
~HashPartitionDuckDbTask.ctx
|
||||
~HashPartitionDuckDbTask.dataset
|
||||
~HashPartitionDuckDbTask.default_output_name
|
||||
~HashPartitionDuckDbTask.dimension
|
||||
~HashPartitionDuckDbTask.elapsed_time
|
||||
~HashPartitionDuckDbTask.enable_temp_directory
|
||||
~HashPartitionDuckDbTask.exception
|
||||
~HashPartitionDuckDbTask.exec_cq
|
||||
~HashPartitionDuckDbTask.exec_id
|
||||
~HashPartitionDuckDbTask.exec_on_scheduler
|
||||
~HashPartitionDuckDbTask.fail_count
|
||||
~HashPartitionDuckDbTask.final_output_abspath
|
||||
~HashPartitionDuckDbTask.finish_time
|
||||
~HashPartitionDuckDbTask.gpu_limit
|
||||
~HashPartitionDuckDbTask.id
|
||||
~HashPartitionDuckDbTask.input_datasets
|
||||
~HashPartitionDuckDbTask.input_deps
|
||||
~HashPartitionDuckDbTask.input_udfs
|
||||
~HashPartitionDuckDbTask.input_view_index
|
||||
~HashPartitionDuckDbTask.io_workers
|
||||
~HashPartitionDuckDbTask.key
|
||||
~HashPartitionDuckDbTask.local_gpu
|
||||
~HashPartitionDuckDbTask.local_gpu_ranks
|
||||
~HashPartitionDuckDbTask.local_rank
|
||||
~HashPartitionDuckDbTask.location
|
||||
~HashPartitionDuckDbTask.max_batch_size
|
||||
~HashPartitionDuckDbTask.memory_limit
|
||||
~HashPartitionDuckDbTask.memory_overcommit_ratio
|
||||
~HashPartitionDuckDbTask.node_id
|
||||
~HashPartitionDuckDbTask.npartitions
|
||||
~HashPartitionDuckDbTask.num_workers
|
||||
~HashPartitionDuckDbTask.numa_node
|
||||
~HashPartitionDuckDbTask.numpy_random_gen
|
||||
~HashPartitionDuckDbTask.output
|
||||
~HashPartitionDuckDbTask.output_deps
|
||||
~HashPartitionDuckDbTask.output_dirname
|
||||
~HashPartitionDuckDbTask.output_filename
|
||||
~HashPartitionDuckDbTask.output_name
|
||||
~HashPartitionDuckDbTask.output_root
|
||||
~HashPartitionDuckDbTask.partition_dims
|
||||
~HashPartitionDuckDbTask.partition_infos
|
||||
~HashPartitionDuckDbTask.partition_infos_as_dict
|
||||
~HashPartitionDuckDbTask.partition_query
|
||||
~HashPartitionDuckDbTask.perf_metrics
|
||||
~HashPartitionDuckDbTask.perf_profile
|
||||
~HashPartitionDuckDbTask.python_random_gen
|
||||
~HashPartitionDuckDbTask.query_udfs
|
||||
~HashPartitionDuckDbTask.rand_seed_float
|
||||
~HashPartitionDuckDbTask.rand_seed_uint32
|
||||
~HashPartitionDuckDbTask.random_seed_bytes
|
||||
~HashPartitionDuckDbTask.ray_dataset_path
|
||||
~HashPartitionDuckDbTask.ray_marker_path
|
||||
~HashPartitionDuckDbTask.retry_count
|
||||
~HashPartitionDuckDbTask.runtime_id
|
||||
~HashPartitionDuckDbTask.runtime_output_abspath
|
||||
~HashPartitionDuckDbTask.runtime_state
|
||||
~HashPartitionDuckDbTask.sched_epoch
|
||||
~HashPartitionDuckDbTask.self_contained_output
|
||||
~HashPartitionDuckDbTask.skip_when_any_input_empty
|
||||
~HashPartitionDuckDbTask.staging_root
|
||||
~HashPartitionDuckDbTask.start_time
|
||||
~HashPartitionDuckDbTask.status
|
||||
~HashPartitionDuckDbTask.temp_abspath
|
||||
~HashPartitionDuckDbTask.temp_output
|
||||
~HashPartitionDuckDbTask.udfs
|
||||
~HashPartitionDuckDbTask.uniform_failure_prob
|
||||
~HashPartitionDuckDbTask.write_buffer_size
|
||||
|
||||
|
||||
@@ -0,0 +1,124 @@
|
||||
smallpond.execution.task.HashPartitionTask
|
||||
==========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: HashPartitionTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~HashPartitionTask.__init__
|
||||
~HashPartitionTask.add_elapsed_time
|
||||
~HashPartitionTask.adjust_row_group_size
|
||||
~HashPartitionTask.clean_complex_attrs
|
||||
~HashPartitionTask.clean_output
|
||||
~HashPartitionTask.cleanup
|
||||
~HashPartitionTask.compute_avg_row_size
|
||||
~HashPartitionTask.create
|
||||
~HashPartitionTask.dump
|
||||
~HashPartitionTask.exec
|
||||
~HashPartitionTask.finalize
|
||||
~HashPartitionTask.get_partition_info
|
||||
~HashPartitionTask.initialize
|
||||
~HashPartitionTask.inject_fault
|
||||
~HashPartitionTask.merge_metrics
|
||||
~HashPartitionTask.oom
|
||||
~HashPartitionTask.parquet_kv_metadata_bytes
|
||||
~HashPartitionTask.parquet_kv_metadata_str
|
||||
~HashPartitionTask.partition
|
||||
~HashPartitionTask.random_float
|
||||
~HashPartitionTask.random_uint32
|
||||
~HashPartitionTask.run
|
||||
~HashPartitionTask.run_on_ray
|
||||
~HashPartitionTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~HashPartitionTask.hash_columns
|
||||
~HashPartitionTask.data_partition_column
|
||||
~HashPartitionTask.random_shuffle
|
||||
~HashPartitionTask.shuffle_only
|
||||
~HashPartitionTask.drop_partition_column
|
||||
~HashPartitionTask.use_parquet_writer
|
||||
~HashPartitionTask.hive_partitioning
|
||||
~HashPartitionTask.parquet_row_group_size
|
||||
~HashPartitionTask.parquet_row_group_bytes
|
||||
~HashPartitionTask.parquet_dictionary_encoding
|
||||
~HashPartitionTask.parquet_compression
|
||||
~HashPartitionTask.parquet_compression_level
|
||||
~HashPartitionTask.partitioned_datasets
|
||||
~HashPartitionTask.allow_speculative_exec
|
||||
~HashPartitionTask.any_input_empty
|
||||
~HashPartitionTask.cpu_limit
|
||||
~HashPartitionTask.ctx
|
||||
~HashPartitionTask.dataset
|
||||
~HashPartitionTask.default_output_name
|
||||
~HashPartitionTask.dimension
|
||||
~HashPartitionTask.elapsed_time
|
||||
~HashPartitionTask.exception
|
||||
~HashPartitionTask.exec_cq
|
||||
~HashPartitionTask.exec_id
|
||||
~HashPartitionTask.exec_on_scheduler
|
||||
~HashPartitionTask.fail_count
|
||||
~HashPartitionTask.final_output_abspath
|
||||
~HashPartitionTask.finish_time
|
||||
~HashPartitionTask.gpu_limit
|
||||
~HashPartitionTask.id
|
||||
~HashPartitionTask.input_datasets
|
||||
~HashPartitionTask.input_deps
|
||||
~HashPartitionTask.io_workers
|
||||
~HashPartitionTask.key
|
||||
~HashPartitionTask.local_gpu
|
||||
~HashPartitionTask.local_gpu_ranks
|
||||
~HashPartitionTask.local_rank
|
||||
~HashPartitionTask.location
|
||||
~HashPartitionTask.max_batch_size
|
||||
~HashPartitionTask.memory_limit
|
||||
~HashPartitionTask.node_id
|
||||
~HashPartitionTask.npartitions
|
||||
~HashPartitionTask.num_workers
|
||||
~HashPartitionTask.numa_node
|
||||
~HashPartitionTask.numpy_random_gen
|
||||
~HashPartitionTask.output
|
||||
~HashPartitionTask.output_deps
|
||||
~HashPartitionTask.output_dirname
|
||||
~HashPartitionTask.output_filename
|
||||
~HashPartitionTask.output_name
|
||||
~HashPartitionTask.output_root
|
||||
~HashPartitionTask.partition_dims
|
||||
~HashPartitionTask.partition_infos
|
||||
~HashPartitionTask.partition_infos_as_dict
|
||||
~HashPartitionTask.perf_metrics
|
||||
~HashPartitionTask.perf_profile
|
||||
~HashPartitionTask.python_random_gen
|
||||
~HashPartitionTask.random_seed_bytes
|
||||
~HashPartitionTask.ray_dataset_path
|
||||
~HashPartitionTask.ray_marker_path
|
||||
~HashPartitionTask.retry_count
|
||||
~HashPartitionTask.runtime_id
|
||||
~HashPartitionTask.runtime_output_abspath
|
||||
~HashPartitionTask.runtime_state
|
||||
~HashPartitionTask.sched_epoch
|
||||
~HashPartitionTask.self_contained_output
|
||||
~HashPartitionTask.skip_when_any_input_empty
|
||||
~HashPartitionTask.staging_root
|
||||
~HashPartitionTask.start_time
|
||||
~HashPartitionTask.status
|
||||
~HashPartitionTask.temp_abspath
|
||||
~HashPartitionTask.temp_output
|
||||
~HashPartitionTask.uniform_failure_prob
|
||||
~HashPartitionTask.write_buffer_size
|
||||
|
||||
|
||||
45
_sources/generated/smallpond.execution.task.JobId.rst.txt
Normal file
45
_sources/generated/smallpond.execution.task.JobId.rst.txt
Normal file
@@ -0,0 +1,45 @@
|
||||
smallpond.execution.task.JobId
|
||||
==============================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: JobId
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~JobId.__init__
|
||||
~JobId.new
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~JobId.int
|
||||
~JobId.is_safe
|
||||
~JobId.bytes
|
||||
~JobId.bytes_le
|
||||
~JobId.clock_seq
|
||||
~JobId.clock_seq_hi_variant
|
||||
~JobId.clock_seq_low
|
||||
~JobId.fields
|
||||
~JobId.hex
|
||||
~JobId.node
|
||||
~JobId.time
|
||||
~JobId.time_hi_version
|
||||
~JobId.time_low
|
||||
~JobId.time_mid
|
||||
~JobId.urn
|
||||
~JobId.variant
|
||||
~JobId.version
|
||||
|
||||
|
||||
@@ -0,0 +1,108 @@
|
||||
smallpond.execution.task.LoadPartitionedDataSetProducerTask
|
||||
===========================================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: LoadPartitionedDataSetProducerTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LoadPartitionedDataSetProducerTask.__init__
|
||||
~LoadPartitionedDataSetProducerTask.add_elapsed_time
|
||||
~LoadPartitionedDataSetProducerTask.adjust_row_group_size
|
||||
~LoadPartitionedDataSetProducerTask.clean_complex_attrs
|
||||
~LoadPartitionedDataSetProducerTask.clean_output
|
||||
~LoadPartitionedDataSetProducerTask.cleanup
|
||||
~LoadPartitionedDataSetProducerTask.compute_avg_row_size
|
||||
~LoadPartitionedDataSetProducerTask.dump
|
||||
~LoadPartitionedDataSetProducerTask.exec
|
||||
~LoadPartitionedDataSetProducerTask.finalize
|
||||
~LoadPartitionedDataSetProducerTask.get_partition_info
|
||||
~LoadPartitionedDataSetProducerTask.initialize
|
||||
~LoadPartitionedDataSetProducerTask.inject_fault
|
||||
~LoadPartitionedDataSetProducerTask.merge_metrics
|
||||
~LoadPartitionedDataSetProducerTask.oom
|
||||
~LoadPartitionedDataSetProducerTask.parquet_kv_metadata_bytes
|
||||
~LoadPartitionedDataSetProducerTask.parquet_kv_metadata_str
|
||||
~LoadPartitionedDataSetProducerTask.random_float
|
||||
~LoadPartitionedDataSetProducerTask.random_uint32
|
||||
~LoadPartitionedDataSetProducerTask.run
|
||||
~LoadPartitionedDataSetProducerTask.run_on_ray
|
||||
~LoadPartitionedDataSetProducerTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LoadPartitionedDataSetProducerTask.data_partition_column
|
||||
~LoadPartitionedDataSetProducerTask.hive_partitioning
|
||||
~LoadPartitionedDataSetProducerTask.allow_speculative_exec
|
||||
~LoadPartitionedDataSetProducerTask.any_input_empty
|
||||
~LoadPartitionedDataSetProducerTask.cpu_limit
|
||||
~LoadPartitionedDataSetProducerTask.ctx
|
||||
~LoadPartitionedDataSetProducerTask.dataset
|
||||
~LoadPartitionedDataSetProducerTask.default_output_name
|
||||
~LoadPartitionedDataSetProducerTask.dimension
|
||||
~LoadPartitionedDataSetProducerTask.elapsed_time
|
||||
~LoadPartitionedDataSetProducerTask.exception
|
||||
~LoadPartitionedDataSetProducerTask.exec_cq
|
||||
~LoadPartitionedDataSetProducerTask.exec_id
|
||||
~LoadPartitionedDataSetProducerTask.exec_on_scheduler
|
||||
~LoadPartitionedDataSetProducerTask.fail_count
|
||||
~LoadPartitionedDataSetProducerTask.final_output_abspath
|
||||
~LoadPartitionedDataSetProducerTask.finish_time
|
||||
~LoadPartitionedDataSetProducerTask.gpu_limit
|
||||
~LoadPartitionedDataSetProducerTask.id
|
||||
~LoadPartitionedDataSetProducerTask.input_datasets
|
||||
~LoadPartitionedDataSetProducerTask.input_deps
|
||||
~LoadPartitionedDataSetProducerTask.key
|
||||
~LoadPartitionedDataSetProducerTask.local_gpu
|
||||
~LoadPartitionedDataSetProducerTask.local_gpu_ranks
|
||||
~LoadPartitionedDataSetProducerTask.local_rank
|
||||
~LoadPartitionedDataSetProducerTask.location
|
||||
~LoadPartitionedDataSetProducerTask.memory_limit
|
||||
~LoadPartitionedDataSetProducerTask.node_id
|
||||
~LoadPartitionedDataSetProducerTask.npartitions
|
||||
~LoadPartitionedDataSetProducerTask.numa_node
|
||||
~LoadPartitionedDataSetProducerTask.numpy_random_gen
|
||||
~LoadPartitionedDataSetProducerTask.output
|
||||
~LoadPartitionedDataSetProducerTask.output_deps
|
||||
~LoadPartitionedDataSetProducerTask.output_dirname
|
||||
~LoadPartitionedDataSetProducerTask.output_filename
|
||||
~LoadPartitionedDataSetProducerTask.output_name
|
||||
~LoadPartitionedDataSetProducerTask.output_root
|
||||
~LoadPartitionedDataSetProducerTask.partition_dims
|
||||
~LoadPartitionedDataSetProducerTask.partition_infos
|
||||
~LoadPartitionedDataSetProducerTask.partition_infos_as_dict
|
||||
~LoadPartitionedDataSetProducerTask.partitioned_datasets
|
||||
~LoadPartitionedDataSetProducerTask.perf_metrics
|
||||
~LoadPartitionedDataSetProducerTask.perf_profile
|
||||
~LoadPartitionedDataSetProducerTask.python_random_gen
|
||||
~LoadPartitionedDataSetProducerTask.random_seed_bytes
|
||||
~LoadPartitionedDataSetProducerTask.ray_dataset_path
|
||||
~LoadPartitionedDataSetProducerTask.ray_marker_path
|
||||
~LoadPartitionedDataSetProducerTask.retry_count
|
||||
~LoadPartitionedDataSetProducerTask.runtime_id
|
||||
~LoadPartitionedDataSetProducerTask.runtime_output_abspath
|
||||
~LoadPartitionedDataSetProducerTask.runtime_state
|
||||
~LoadPartitionedDataSetProducerTask.sched_epoch
|
||||
~LoadPartitionedDataSetProducerTask.self_contained_output
|
||||
~LoadPartitionedDataSetProducerTask.skip_when_any_input_empty
|
||||
~LoadPartitionedDataSetProducerTask.staging_root
|
||||
~LoadPartitionedDataSetProducerTask.start_time
|
||||
~LoadPartitionedDataSetProducerTask.status
|
||||
~LoadPartitionedDataSetProducerTask.temp_abspath
|
||||
~LoadPartitionedDataSetProducerTask.temp_output
|
||||
~LoadPartitionedDataSetProducerTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,103 @@
|
||||
smallpond.execution.task.MergeDataSetsTask
|
||||
==========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: MergeDataSetsTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~MergeDataSetsTask.__init__
|
||||
~MergeDataSetsTask.add_elapsed_time
|
||||
~MergeDataSetsTask.adjust_row_group_size
|
||||
~MergeDataSetsTask.clean_complex_attrs
|
||||
~MergeDataSetsTask.clean_output
|
||||
~MergeDataSetsTask.cleanup
|
||||
~MergeDataSetsTask.compute_avg_row_size
|
||||
~MergeDataSetsTask.dump
|
||||
~MergeDataSetsTask.exec
|
||||
~MergeDataSetsTask.finalize
|
||||
~MergeDataSetsTask.get_partition_info
|
||||
~MergeDataSetsTask.initialize
|
||||
~MergeDataSetsTask.inject_fault
|
||||
~MergeDataSetsTask.merge_metrics
|
||||
~MergeDataSetsTask.oom
|
||||
~MergeDataSetsTask.parquet_kv_metadata_bytes
|
||||
~MergeDataSetsTask.parquet_kv_metadata_str
|
||||
~MergeDataSetsTask.random_float
|
||||
~MergeDataSetsTask.random_uint32
|
||||
~MergeDataSetsTask.run
|
||||
~MergeDataSetsTask.run_on_ray
|
||||
~MergeDataSetsTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~MergeDataSetsTask.ctx
|
||||
~MergeDataSetsTask.id
|
||||
~MergeDataSetsTask.node_id
|
||||
~MergeDataSetsTask.sched_epoch
|
||||
~MergeDataSetsTask.output_name
|
||||
~MergeDataSetsTask.output_root
|
||||
~MergeDataSetsTask.dataset
|
||||
~MergeDataSetsTask.input_deps
|
||||
~MergeDataSetsTask.output_deps
|
||||
~MergeDataSetsTask.perf_metrics
|
||||
~MergeDataSetsTask.perf_profile
|
||||
~MergeDataSetsTask.runtime_state
|
||||
~MergeDataSetsTask.input_datasets
|
||||
~MergeDataSetsTask.allow_speculative_exec
|
||||
~MergeDataSetsTask.any_input_empty
|
||||
~MergeDataSetsTask.cpu_limit
|
||||
~MergeDataSetsTask.default_output_name
|
||||
~MergeDataSetsTask.elapsed_time
|
||||
~MergeDataSetsTask.exception
|
||||
~MergeDataSetsTask.exec_cq
|
||||
~MergeDataSetsTask.exec_id
|
||||
~MergeDataSetsTask.exec_on_scheduler
|
||||
~MergeDataSetsTask.fail_count
|
||||
~MergeDataSetsTask.final_output_abspath
|
||||
~MergeDataSetsTask.finish_time
|
||||
~MergeDataSetsTask.gpu_limit
|
||||
~MergeDataSetsTask.key
|
||||
~MergeDataSetsTask.local_gpu
|
||||
~MergeDataSetsTask.local_gpu_ranks
|
||||
~MergeDataSetsTask.local_rank
|
||||
~MergeDataSetsTask.location
|
||||
~MergeDataSetsTask.memory_limit
|
||||
~MergeDataSetsTask.numa_node
|
||||
~MergeDataSetsTask.numpy_random_gen
|
||||
~MergeDataSetsTask.output
|
||||
~MergeDataSetsTask.output_dirname
|
||||
~MergeDataSetsTask.output_filename
|
||||
~MergeDataSetsTask.partition_dims
|
||||
~MergeDataSetsTask.partition_infos
|
||||
~MergeDataSetsTask.partition_infos_as_dict
|
||||
~MergeDataSetsTask.python_random_gen
|
||||
~MergeDataSetsTask.random_seed_bytes
|
||||
~MergeDataSetsTask.ray_dataset_path
|
||||
~MergeDataSetsTask.ray_marker_path
|
||||
~MergeDataSetsTask.retry_count
|
||||
~MergeDataSetsTask.runtime_id
|
||||
~MergeDataSetsTask.runtime_output_abspath
|
||||
~MergeDataSetsTask.self_contained_output
|
||||
~MergeDataSetsTask.skip_when_any_input_empty
|
||||
~MergeDataSetsTask.staging_root
|
||||
~MergeDataSetsTask.start_time
|
||||
~MergeDataSetsTask.status
|
||||
~MergeDataSetsTask.temp_abspath
|
||||
~MergeDataSetsTask.temp_output
|
||||
~MergeDataSetsTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,132 @@
|
||||
smallpond.execution.task.PandasBatchTask
|
||||
========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: PandasBatchTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasBatchTask.__init__
|
||||
~PandasBatchTask.add_elapsed_time
|
||||
~PandasBatchTask.adjust_row_group_size
|
||||
~PandasBatchTask.clean_complex_attrs
|
||||
~PandasBatchTask.clean_output
|
||||
~PandasBatchTask.cleanup
|
||||
~PandasBatchTask.compute_avg_row_size
|
||||
~PandasBatchTask.create_input_views
|
||||
~PandasBatchTask.dump
|
||||
~PandasBatchTask.dump_output
|
||||
~PandasBatchTask.exec
|
||||
~PandasBatchTask.exec_query
|
||||
~PandasBatchTask.finalize
|
||||
~PandasBatchTask.get_partition_info
|
||||
~PandasBatchTask.initialize
|
||||
~PandasBatchTask.inject_fault
|
||||
~PandasBatchTask.merge_metrics
|
||||
~PandasBatchTask.oom
|
||||
~PandasBatchTask.parquet_kv_metadata_bytes
|
||||
~PandasBatchTask.parquet_kv_metadata_str
|
||||
~PandasBatchTask.prepare_connection
|
||||
~PandasBatchTask.process
|
||||
~PandasBatchTask.random_float
|
||||
~PandasBatchTask.random_uint32
|
||||
~PandasBatchTask.restore_input_state
|
||||
~PandasBatchTask.run
|
||||
~PandasBatchTask.run_on_ray
|
||||
~PandasBatchTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasBatchTask.process_func
|
||||
~PandasBatchTask.background_io_thread
|
||||
~PandasBatchTask.streaming_batch_size
|
||||
~PandasBatchTask.streaming_batch_count
|
||||
~PandasBatchTask.parquet_row_group_size
|
||||
~PandasBatchTask.parquet_row_group_bytes
|
||||
~PandasBatchTask.parquet_dictionary_encoding
|
||||
~PandasBatchTask.parquet_compression
|
||||
~PandasBatchTask.parquet_compression_level
|
||||
~PandasBatchTask.secs_checkpoint_interval
|
||||
~PandasBatchTask.allow_speculative_exec
|
||||
~PandasBatchTask.any_input_empty
|
||||
~PandasBatchTask.compression_level_str
|
||||
~PandasBatchTask.compression_options
|
||||
~PandasBatchTask.compression_type_str
|
||||
~PandasBatchTask.cpu_limit
|
||||
~PandasBatchTask.cpu_overcommit_ratio
|
||||
~PandasBatchTask.ctx
|
||||
~PandasBatchTask.dataset
|
||||
~PandasBatchTask.default_output_name
|
||||
~PandasBatchTask.elapsed_time
|
||||
~PandasBatchTask.enable_temp_directory
|
||||
~PandasBatchTask.exception
|
||||
~PandasBatchTask.exec_cq
|
||||
~PandasBatchTask.exec_id
|
||||
~PandasBatchTask.exec_on_scheduler
|
||||
~PandasBatchTask.fail_count
|
||||
~PandasBatchTask.final_output_abspath
|
||||
~PandasBatchTask.finish_time
|
||||
~PandasBatchTask.gpu_limit
|
||||
~PandasBatchTask.id
|
||||
~PandasBatchTask.input_datasets
|
||||
~PandasBatchTask.input_deps
|
||||
~PandasBatchTask.input_udfs
|
||||
~PandasBatchTask.input_view_index
|
||||
~PandasBatchTask.key
|
||||
~PandasBatchTask.local_gpu
|
||||
~PandasBatchTask.local_gpu_ranks
|
||||
~PandasBatchTask.local_rank
|
||||
~PandasBatchTask.location
|
||||
~PandasBatchTask.max_batch_size
|
||||
~PandasBatchTask.memory_limit
|
||||
~PandasBatchTask.memory_overcommit_ratio
|
||||
~PandasBatchTask.node_id
|
||||
~PandasBatchTask.numa_node
|
||||
~PandasBatchTask.numpy_random_gen
|
||||
~PandasBatchTask.output
|
||||
~PandasBatchTask.output_deps
|
||||
~PandasBatchTask.output_dirname
|
||||
~PandasBatchTask.output_filename
|
||||
~PandasBatchTask.output_name
|
||||
~PandasBatchTask.output_root
|
||||
~PandasBatchTask.partition_dims
|
||||
~PandasBatchTask.partition_infos
|
||||
~PandasBatchTask.partition_infos_as_dict
|
||||
~PandasBatchTask.perf_metrics
|
||||
~PandasBatchTask.perf_profile
|
||||
~PandasBatchTask.python_random_gen
|
||||
~PandasBatchTask.query_udfs
|
||||
~PandasBatchTask.rand_seed_float
|
||||
~PandasBatchTask.rand_seed_uint32
|
||||
~PandasBatchTask.random_seed_bytes
|
||||
~PandasBatchTask.ray_dataset_path
|
||||
~PandasBatchTask.ray_marker_path
|
||||
~PandasBatchTask.retry_count
|
||||
~PandasBatchTask.runtime_id
|
||||
~PandasBatchTask.runtime_output_abspath
|
||||
~PandasBatchTask.runtime_state
|
||||
~PandasBatchTask.sched_epoch
|
||||
~PandasBatchTask.self_contained_output
|
||||
~PandasBatchTask.skip_when_any_input_empty
|
||||
~PandasBatchTask.staging_root
|
||||
~PandasBatchTask.start_time
|
||||
~PandasBatchTask.status
|
||||
~PandasBatchTask.temp_abspath
|
||||
~PandasBatchTask.temp_output
|
||||
~PandasBatchTask.udfs
|
||||
~PandasBatchTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,127 @@
|
||||
smallpond.execution.task.PandasComputeTask
|
||||
==========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: PandasComputeTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasComputeTask.__init__
|
||||
~PandasComputeTask.add_elapsed_time
|
||||
~PandasComputeTask.adjust_row_group_size
|
||||
~PandasComputeTask.clean_complex_attrs
|
||||
~PandasComputeTask.clean_output
|
||||
~PandasComputeTask.cleanup
|
||||
~PandasComputeTask.compute_avg_row_size
|
||||
~PandasComputeTask.create_input_views
|
||||
~PandasComputeTask.dump
|
||||
~PandasComputeTask.dump_output
|
||||
~PandasComputeTask.exec
|
||||
~PandasComputeTask.exec_query
|
||||
~PandasComputeTask.finalize
|
||||
~PandasComputeTask.get_partition_info
|
||||
~PandasComputeTask.initialize
|
||||
~PandasComputeTask.inject_fault
|
||||
~PandasComputeTask.merge_metrics
|
||||
~PandasComputeTask.oom
|
||||
~PandasComputeTask.parquet_kv_metadata_bytes
|
||||
~PandasComputeTask.parquet_kv_metadata_str
|
||||
~PandasComputeTask.prepare_connection
|
||||
~PandasComputeTask.process
|
||||
~PandasComputeTask.random_float
|
||||
~PandasComputeTask.random_uint32
|
||||
~PandasComputeTask.run
|
||||
~PandasComputeTask.run_on_ray
|
||||
~PandasComputeTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasComputeTask.process_func
|
||||
~PandasComputeTask.parquet_row_group_size
|
||||
~PandasComputeTask.parquet_row_group_bytes
|
||||
~PandasComputeTask.parquet_dictionary_encoding
|
||||
~PandasComputeTask.use_duckdb_reader
|
||||
~PandasComputeTask.allow_speculative_exec
|
||||
~PandasComputeTask.any_input_empty
|
||||
~PandasComputeTask.compression_level_str
|
||||
~PandasComputeTask.compression_options
|
||||
~PandasComputeTask.compression_type_str
|
||||
~PandasComputeTask.cpu_limit
|
||||
~PandasComputeTask.cpu_overcommit_ratio
|
||||
~PandasComputeTask.ctx
|
||||
~PandasComputeTask.dataset
|
||||
~PandasComputeTask.default_output_name
|
||||
~PandasComputeTask.elapsed_time
|
||||
~PandasComputeTask.enable_temp_directory
|
||||
~PandasComputeTask.exception
|
||||
~PandasComputeTask.exec_cq
|
||||
~PandasComputeTask.exec_id
|
||||
~PandasComputeTask.exec_on_scheduler
|
||||
~PandasComputeTask.fail_count
|
||||
~PandasComputeTask.final_output_abspath
|
||||
~PandasComputeTask.finish_time
|
||||
~PandasComputeTask.gpu_limit
|
||||
~PandasComputeTask.id
|
||||
~PandasComputeTask.input_datasets
|
||||
~PandasComputeTask.input_deps
|
||||
~PandasComputeTask.input_udfs
|
||||
~PandasComputeTask.input_view_index
|
||||
~PandasComputeTask.key
|
||||
~PandasComputeTask.local_gpu
|
||||
~PandasComputeTask.local_gpu_ranks
|
||||
~PandasComputeTask.local_rank
|
||||
~PandasComputeTask.location
|
||||
~PandasComputeTask.memory_limit
|
||||
~PandasComputeTask.memory_overcommit_ratio
|
||||
~PandasComputeTask.node_id
|
||||
~PandasComputeTask.numa_node
|
||||
~PandasComputeTask.numpy_random_gen
|
||||
~PandasComputeTask.output
|
||||
~PandasComputeTask.output_deps
|
||||
~PandasComputeTask.output_dirname
|
||||
~PandasComputeTask.output_filename
|
||||
~PandasComputeTask.output_name
|
||||
~PandasComputeTask.output_root
|
||||
~PandasComputeTask.parquet_compression
|
||||
~PandasComputeTask.parquet_compression_level
|
||||
~PandasComputeTask.partition_dims
|
||||
~PandasComputeTask.partition_infos
|
||||
~PandasComputeTask.partition_infos_as_dict
|
||||
~PandasComputeTask.perf_metrics
|
||||
~PandasComputeTask.perf_profile
|
||||
~PandasComputeTask.python_random_gen
|
||||
~PandasComputeTask.query_udfs
|
||||
~PandasComputeTask.rand_seed_float
|
||||
~PandasComputeTask.rand_seed_uint32
|
||||
~PandasComputeTask.random_seed_bytes
|
||||
~PandasComputeTask.ray_dataset_path
|
||||
~PandasComputeTask.ray_marker_path
|
||||
~PandasComputeTask.retry_count
|
||||
~PandasComputeTask.runtime_id
|
||||
~PandasComputeTask.runtime_output_abspath
|
||||
~PandasComputeTask.runtime_state
|
||||
~PandasComputeTask.sched_epoch
|
||||
~PandasComputeTask.self_contained_output
|
||||
~PandasComputeTask.skip_when_any_input_empty
|
||||
~PandasComputeTask.staging_root
|
||||
~PandasComputeTask.start_time
|
||||
~PandasComputeTask.status
|
||||
~PandasComputeTask.temp_abspath
|
||||
~PandasComputeTask.temp_output
|
||||
~PandasComputeTask.udfs
|
||||
~PandasComputeTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,104 @@
|
||||
smallpond.execution.task.PartitionConsumerTask
|
||||
==============================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: PartitionConsumerTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionConsumerTask.__init__
|
||||
~PartitionConsumerTask.add_elapsed_time
|
||||
~PartitionConsumerTask.adjust_row_group_size
|
||||
~PartitionConsumerTask.clean_complex_attrs
|
||||
~PartitionConsumerTask.clean_output
|
||||
~PartitionConsumerTask.cleanup
|
||||
~PartitionConsumerTask.compute_avg_row_size
|
||||
~PartitionConsumerTask.dump
|
||||
~PartitionConsumerTask.exec
|
||||
~PartitionConsumerTask.finalize
|
||||
~PartitionConsumerTask.get_partition_info
|
||||
~PartitionConsumerTask.initialize
|
||||
~PartitionConsumerTask.inject_fault
|
||||
~PartitionConsumerTask.merge_metrics
|
||||
~PartitionConsumerTask.oom
|
||||
~PartitionConsumerTask.parquet_kv_metadata_bytes
|
||||
~PartitionConsumerTask.parquet_kv_metadata_str
|
||||
~PartitionConsumerTask.random_float
|
||||
~PartitionConsumerTask.random_uint32
|
||||
~PartitionConsumerTask.run
|
||||
~PartitionConsumerTask.run_on_ray
|
||||
~PartitionConsumerTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionConsumerTask.last_partition
|
||||
~PartitionConsumerTask.allow_speculative_exec
|
||||
~PartitionConsumerTask.any_input_empty
|
||||
~PartitionConsumerTask.cpu_limit
|
||||
~PartitionConsumerTask.ctx
|
||||
~PartitionConsumerTask.dataset
|
||||
~PartitionConsumerTask.default_output_name
|
||||
~PartitionConsumerTask.elapsed_time
|
||||
~PartitionConsumerTask.exception
|
||||
~PartitionConsumerTask.exec_cq
|
||||
~PartitionConsumerTask.exec_id
|
||||
~PartitionConsumerTask.exec_on_scheduler
|
||||
~PartitionConsumerTask.fail_count
|
||||
~PartitionConsumerTask.final_output_abspath
|
||||
~PartitionConsumerTask.finish_time
|
||||
~PartitionConsumerTask.gpu_limit
|
||||
~PartitionConsumerTask.id
|
||||
~PartitionConsumerTask.input_datasets
|
||||
~PartitionConsumerTask.input_deps
|
||||
~PartitionConsumerTask.key
|
||||
~PartitionConsumerTask.local_gpu
|
||||
~PartitionConsumerTask.local_gpu_ranks
|
||||
~PartitionConsumerTask.local_rank
|
||||
~PartitionConsumerTask.location
|
||||
~PartitionConsumerTask.memory_limit
|
||||
~PartitionConsumerTask.node_id
|
||||
~PartitionConsumerTask.numa_node
|
||||
~PartitionConsumerTask.numpy_random_gen
|
||||
~PartitionConsumerTask.output
|
||||
~PartitionConsumerTask.output_deps
|
||||
~PartitionConsumerTask.output_dirname
|
||||
~PartitionConsumerTask.output_filename
|
||||
~PartitionConsumerTask.output_name
|
||||
~PartitionConsumerTask.output_root
|
||||
~PartitionConsumerTask.partition_dims
|
||||
~PartitionConsumerTask.partition_infos
|
||||
~PartitionConsumerTask.partition_infos_as_dict
|
||||
~PartitionConsumerTask.perf_metrics
|
||||
~PartitionConsumerTask.perf_profile
|
||||
~PartitionConsumerTask.python_random_gen
|
||||
~PartitionConsumerTask.random_seed_bytes
|
||||
~PartitionConsumerTask.ray_dataset_path
|
||||
~PartitionConsumerTask.ray_marker_path
|
||||
~PartitionConsumerTask.retry_count
|
||||
~PartitionConsumerTask.runtime_id
|
||||
~PartitionConsumerTask.runtime_output_abspath
|
||||
~PartitionConsumerTask.runtime_state
|
||||
~PartitionConsumerTask.sched_epoch
|
||||
~PartitionConsumerTask.self_contained_output
|
||||
~PartitionConsumerTask.skip_when_any_input_empty
|
||||
~PartitionConsumerTask.staging_root
|
||||
~PartitionConsumerTask.start_time
|
||||
~PartitionConsumerTask.status
|
||||
~PartitionConsumerTask.temp_abspath
|
||||
~PartitionConsumerTask.temp_output
|
||||
~PartitionConsumerTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,32 @@
|
||||
smallpond.execution.task.PartitionInfo
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: PartitionInfo
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionInfo.__init__
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionInfo.index
|
||||
~PartitionInfo.npartitions
|
||||
~PartitionInfo.dimension
|
||||
~PartitionInfo.default_dimension
|
||||
~PartitionInfo.toplevel_dimension
|
||||
|
||||
|
||||
@@ -0,0 +1,106 @@
|
||||
smallpond.execution.task.PartitionProducerTask
|
||||
==============================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: PartitionProducerTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionProducerTask.__init__
|
||||
~PartitionProducerTask.add_elapsed_time
|
||||
~PartitionProducerTask.adjust_row_group_size
|
||||
~PartitionProducerTask.clean_complex_attrs
|
||||
~PartitionProducerTask.clean_output
|
||||
~PartitionProducerTask.cleanup
|
||||
~PartitionProducerTask.compute_avg_row_size
|
||||
~PartitionProducerTask.dump
|
||||
~PartitionProducerTask.exec
|
||||
~PartitionProducerTask.finalize
|
||||
~PartitionProducerTask.get_partition_info
|
||||
~PartitionProducerTask.initialize
|
||||
~PartitionProducerTask.inject_fault
|
||||
~PartitionProducerTask.merge_metrics
|
||||
~PartitionProducerTask.oom
|
||||
~PartitionProducerTask.parquet_kv_metadata_bytes
|
||||
~PartitionProducerTask.parquet_kv_metadata_str
|
||||
~PartitionProducerTask.random_float
|
||||
~PartitionProducerTask.random_uint32
|
||||
~PartitionProducerTask.run
|
||||
~PartitionProducerTask.run_on_ray
|
||||
~PartitionProducerTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionProducerTask.npartitions
|
||||
~PartitionProducerTask.dimension
|
||||
~PartitionProducerTask.partitioned_datasets
|
||||
~PartitionProducerTask.allow_speculative_exec
|
||||
~PartitionProducerTask.any_input_empty
|
||||
~PartitionProducerTask.cpu_limit
|
||||
~PartitionProducerTask.ctx
|
||||
~PartitionProducerTask.dataset
|
||||
~PartitionProducerTask.default_output_name
|
||||
~PartitionProducerTask.elapsed_time
|
||||
~PartitionProducerTask.exception
|
||||
~PartitionProducerTask.exec_cq
|
||||
~PartitionProducerTask.exec_id
|
||||
~PartitionProducerTask.exec_on_scheduler
|
||||
~PartitionProducerTask.fail_count
|
||||
~PartitionProducerTask.final_output_abspath
|
||||
~PartitionProducerTask.finish_time
|
||||
~PartitionProducerTask.gpu_limit
|
||||
~PartitionProducerTask.id
|
||||
~PartitionProducerTask.input_datasets
|
||||
~PartitionProducerTask.input_deps
|
||||
~PartitionProducerTask.key
|
||||
~PartitionProducerTask.local_gpu
|
||||
~PartitionProducerTask.local_gpu_ranks
|
||||
~PartitionProducerTask.local_rank
|
||||
~PartitionProducerTask.location
|
||||
~PartitionProducerTask.memory_limit
|
||||
~PartitionProducerTask.node_id
|
||||
~PartitionProducerTask.numa_node
|
||||
~PartitionProducerTask.numpy_random_gen
|
||||
~PartitionProducerTask.output
|
||||
~PartitionProducerTask.output_deps
|
||||
~PartitionProducerTask.output_dirname
|
||||
~PartitionProducerTask.output_filename
|
||||
~PartitionProducerTask.output_name
|
||||
~PartitionProducerTask.output_root
|
||||
~PartitionProducerTask.partition_dims
|
||||
~PartitionProducerTask.partition_infos
|
||||
~PartitionProducerTask.partition_infos_as_dict
|
||||
~PartitionProducerTask.perf_metrics
|
||||
~PartitionProducerTask.perf_profile
|
||||
~PartitionProducerTask.python_random_gen
|
||||
~PartitionProducerTask.random_seed_bytes
|
||||
~PartitionProducerTask.ray_dataset_path
|
||||
~PartitionProducerTask.ray_marker_path
|
||||
~PartitionProducerTask.retry_count
|
||||
~PartitionProducerTask.runtime_id
|
||||
~PartitionProducerTask.runtime_output_abspath
|
||||
~PartitionProducerTask.runtime_state
|
||||
~PartitionProducerTask.sched_epoch
|
||||
~PartitionProducerTask.self_contained_output
|
||||
~PartitionProducerTask.skip_when_any_input_empty
|
||||
~PartitionProducerTask.staging_root
|
||||
~PartitionProducerTask.start_time
|
||||
~PartitionProducerTask.status
|
||||
~PartitionProducerTask.temp_abspath
|
||||
~PartitionProducerTask.temp_output
|
||||
~PartitionProducerTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,38 @@
|
||||
smallpond.execution.task.PerfStats
|
||||
==================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: PerfStats
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PerfStats.__init__
|
||||
~PerfStats.count
|
||||
~PerfStats.index
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PerfStats.avg
|
||||
~PerfStats.cnt
|
||||
~PerfStats.max
|
||||
~PerfStats.min
|
||||
~PerfStats.p50
|
||||
~PerfStats.p75
|
||||
~PerfStats.p95
|
||||
~PerfStats.p99
|
||||
~PerfStats.sum
|
||||
|
||||
|
||||
@@ -0,0 +1,106 @@
|
||||
smallpond.execution.task.ProjectionTask
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: ProjectionTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ProjectionTask.__init__
|
||||
~ProjectionTask.add_elapsed_time
|
||||
~ProjectionTask.adjust_row_group_size
|
||||
~ProjectionTask.clean_complex_attrs
|
||||
~ProjectionTask.clean_output
|
||||
~ProjectionTask.cleanup
|
||||
~ProjectionTask.compute_avg_row_size
|
||||
~ProjectionTask.dump
|
||||
~ProjectionTask.exec
|
||||
~ProjectionTask.finalize
|
||||
~ProjectionTask.get_partition_info
|
||||
~ProjectionTask.initialize
|
||||
~ProjectionTask.inject_fault
|
||||
~ProjectionTask.merge_metrics
|
||||
~ProjectionTask.oom
|
||||
~ProjectionTask.parquet_kv_metadata_bytes
|
||||
~ProjectionTask.parquet_kv_metadata_str
|
||||
~ProjectionTask.random_float
|
||||
~ProjectionTask.random_uint32
|
||||
~ProjectionTask.run
|
||||
~ProjectionTask.run_on_ray
|
||||
~ProjectionTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ProjectionTask.columns
|
||||
~ProjectionTask.generated_columns
|
||||
~ProjectionTask.union_by_name
|
||||
~ProjectionTask.allow_speculative_exec
|
||||
~ProjectionTask.any_input_empty
|
||||
~ProjectionTask.cpu_limit
|
||||
~ProjectionTask.ctx
|
||||
~ProjectionTask.dataset
|
||||
~ProjectionTask.default_output_name
|
||||
~ProjectionTask.elapsed_time
|
||||
~ProjectionTask.exception
|
||||
~ProjectionTask.exec_cq
|
||||
~ProjectionTask.exec_id
|
||||
~ProjectionTask.exec_on_scheduler
|
||||
~ProjectionTask.fail_count
|
||||
~ProjectionTask.final_output_abspath
|
||||
~ProjectionTask.finish_time
|
||||
~ProjectionTask.gpu_limit
|
||||
~ProjectionTask.id
|
||||
~ProjectionTask.input_datasets
|
||||
~ProjectionTask.input_deps
|
||||
~ProjectionTask.key
|
||||
~ProjectionTask.local_gpu
|
||||
~ProjectionTask.local_gpu_ranks
|
||||
~ProjectionTask.local_rank
|
||||
~ProjectionTask.location
|
||||
~ProjectionTask.memory_limit
|
||||
~ProjectionTask.node_id
|
||||
~ProjectionTask.numa_node
|
||||
~ProjectionTask.numpy_random_gen
|
||||
~ProjectionTask.output
|
||||
~ProjectionTask.output_deps
|
||||
~ProjectionTask.output_dirname
|
||||
~ProjectionTask.output_filename
|
||||
~ProjectionTask.output_name
|
||||
~ProjectionTask.output_root
|
||||
~ProjectionTask.partition_dims
|
||||
~ProjectionTask.partition_infos
|
||||
~ProjectionTask.partition_infos_as_dict
|
||||
~ProjectionTask.perf_metrics
|
||||
~ProjectionTask.perf_profile
|
||||
~ProjectionTask.python_random_gen
|
||||
~ProjectionTask.random_seed_bytes
|
||||
~ProjectionTask.ray_dataset_path
|
||||
~ProjectionTask.ray_marker_path
|
||||
~ProjectionTask.retry_count
|
||||
~ProjectionTask.runtime_id
|
||||
~ProjectionTask.runtime_output_abspath
|
||||
~ProjectionTask.runtime_state
|
||||
~ProjectionTask.sched_epoch
|
||||
~ProjectionTask.self_contained_output
|
||||
~ProjectionTask.skip_when_any_input_empty
|
||||
~ProjectionTask.staging_root
|
||||
~ProjectionTask.start_time
|
||||
~ProjectionTask.status
|
||||
~ProjectionTask.temp_abspath
|
||||
~ProjectionTask.temp_output
|
||||
~ProjectionTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,122 @@
|
||||
smallpond.execution.task.PythonScriptTask
|
||||
=========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: PythonScriptTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PythonScriptTask.__init__
|
||||
~PythonScriptTask.add_elapsed_time
|
||||
~PythonScriptTask.adjust_row_group_size
|
||||
~PythonScriptTask.clean_complex_attrs
|
||||
~PythonScriptTask.clean_output
|
||||
~PythonScriptTask.cleanup
|
||||
~PythonScriptTask.compute_avg_row_size
|
||||
~PythonScriptTask.create_input_views
|
||||
~PythonScriptTask.dump
|
||||
~PythonScriptTask.exec
|
||||
~PythonScriptTask.exec_query
|
||||
~PythonScriptTask.finalize
|
||||
~PythonScriptTask.get_partition_info
|
||||
~PythonScriptTask.initialize
|
||||
~PythonScriptTask.inject_fault
|
||||
~PythonScriptTask.merge_metrics
|
||||
~PythonScriptTask.oom
|
||||
~PythonScriptTask.parquet_kv_metadata_bytes
|
||||
~PythonScriptTask.parquet_kv_metadata_str
|
||||
~PythonScriptTask.prepare_connection
|
||||
~PythonScriptTask.process
|
||||
~PythonScriptTask.random_float
|
||||
~PythonScriptTask.random_uint32
|
||||
~PythonScriptTask.run
|
||||
~PythonScriptTask.run_on_ray
|
||||
~PythonScriptTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PythonScriptTask.process_func
|
||||
~PythonScriptTask.allow_speculative_exec
|
||||
~PythonScriptTask.any_input_empty
|
||||
~PythonScriptTask.compression_level_str
|
||||
~PythonScriptTask.compression_options
|
||||
~PythonScriptTask.compression_type_str
|
||||
~PythonScriptTask.cpu_limit
|
||||
~PythonScriptTask.cpu_overcommit_ratio
|
||||
~PythonScriptTask.ctx
|
||||
~PythonScriptTask.dataset
|
||||
~PythonScriptTask.default_output_name
|
||||
~PythonScriptTask.elapsed_time
|
||||
~PythonScriptTask.enable_temp_directory
|
||||
~PythonScriptTask.exception
|
||||
~PythonScriptTask.exec_cq
|
||||
~PythonScriptTask.exec_id
|
||||
~PythonScriptTask.exec_on_scheduler
|
||||
~PythonScriptTask.fail_count
|
||||
~PythonScriptTask.final_output_abspath
|
||||
~PythonScriptTask.finish_time
|
||||
~PythonScriptTask.gpu_limit
|
||||
~PythonScriptTask.id
|
||||
~PythonScriptTask.input_datasets
|
||||
~PythonScriptTask.input_deps
|
||||
~PythonScriptTask.input_udfs
|
||||
~PythonScriptTask.input_view_index
|
||||
~PythonScriptTask.key
|
||||
~PythonScriptTask.local_gpu
|
||||
~PythonScriptTask.local_gpu_ranks
|
||||
~PythonScriptTask.local_rank
|
||||
~PythonScriptTask.location
|
||||
~PythonScriptTask.memory_limit
|
||||
~PythonScriptTask.memory_overcommit_ratio
|
||||
~PythonScriptTask.node_id
|
||||
~PythonScriptTask.numa_node
|
||||
~PythonScriptTask.numpy_random_gen
|
||||
~PythonScriptTask.output
|
||||
~PythonScriptTask.output_deps
|
||||
~PythonScriptTask.output_dirname
|
||||
~PythonScriptTask.output_filename
|
||||
~PythonScriptTask.output_name
|
||||
~PythonScriptTask.output_root
|
||||
~PythonScriptTask.parquet_compression
|
||||
~PythonScriptTask.parquet_compression_level
|
||||
~PythonScriptTask.partition_dims
|
||||
~PythonScriptTask.partition_infos
|
||||
~PythonScriptTask.partition_infos_as_dict
|
||||
~PythonScriptTask.perf_metrics
|
||||
~PythonScriptTask.perf_profile
|
||||
~PythonScriptTask.python_random_gen
|
||||
~PythonScriptTask.query_udfs
|
||||
~PythonScriptTask.rand_seed_float
|
||||
~PythonScriptTask.rand_seed_uint32
|
||||
~PythonScriptTask.random_seed_bytes
|
||||
~PythonScriptTask.ray_dataset_path
|
||||
~PythonScriptTask.ray_marker_path
|
||||
~PythonScriptTask.retry_count
|
||||
~PythonScriptTask.runtime_id
|
||||
~PythonScriptTask.runtime_output_abspath
|
||||
~PythonScriptTask.runtime_state
|
||||
~PythonScriptTask.sched_epoch
|
||||
~PythonScriptTask.self_contained_output
|
||||
~PythonScriptTask.skip_when_any_input_empty
|
||||
~PythonScriptTask.staging_root
|
||||
~PythonScriptTask.start_time
|
||||
~PythonScriptTask.status
|
||||
~PythonScriptTask.temp_abspath
|
||||
~PythonScriptTask.temp_output
|
||||
~PythonScriptTask.udfs
|
||||
~PythonScriptTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,103 @@
|
||||
smallpond.execution.task.RangePartitionTask
|
||||
===========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: RangePartitionTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RangePartitionTask.__init__
|
||||
~RangePartitionTask.add_elapsed_time
|
||||
~RangePartitionTask.adjust_row_group_size
|
||||
~RangePartitionTask.clean_complex_attrs
|
||||
~RangePartitionTask.clean_output
|
||||
~RangePartitionTask.cleanup
|
||||
~RangePartitionTask.compute_avg_row_size
|
||||
~RangePartitionTask.dump
|
||||
~RangePartitionTask.exec
|
||||
~RangePartitionTask.finalize
|
||||
~RangePartitionTask.get_partition_info
|
||||
~RangePartitionTask.initialize
|
||||
~RangePartitionTask.inject_fault
|
||||
~RangePartitionTask.merge_metrics
|
||||
~RangePartitionTask.oom
|
||||
~RangePartitionTask.parquet_kv_metadata_bytes
|
||||
~RangePartitionTask.parquet_kv_metadata_str
|
||||
~RangePartitionTask.random_float
|
||||
~RangePartitionTask.random_uint32
|
||||
~RangePartitionTask.run
|
||||
~RangePartitionTask.run_on_ray
|
||||
~RangePartitionTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RangePartitionTask.ctx
|
||||
~RangePartitionTask.id
|
||||
~RangePartitionTask.node_id
|
||||
~RangePartitionTask.sched_epoch
|
||||
~RangePartitionTask.output_name
|
||||
~RangePartitionTask.output_root
|
||||
~RangePartitionTask.dataset
|
||||
~RangePartitionTask.input_deps
|
||||
~RangePartitionTask.output_deps
|
||||
~RangePartitionTask.perf_metrics
|
||||
~RangePartitionTask.perf_profile
|
||||
~RangePartitionTask.runtime_state
|
||||
~RangePartitionTask.input_datasets
|
||||
~RangePartitionTask.allow_speculative_exec
|
||||
~RangePartitionTask.any_input_empty
|
||||
~RangePartitionTask.cpu_limit
|
||||
~RangePartitionTask.default_output_name
|
||||
~RangePartitionTask.elapsed_time
|
||||
~RangePartitionTask.exception
|
||||
~RangePartitionTask.exec_cq
|
||||
~RangePartitionTask.exec_id
|
||||
~RangePartitionTask.exec_on_scheduler
|
||||
~RangePartitionTask.fail_count
|
||||
~RangePartitionTask.final_output_abspath
|
||||
~RangePartitionTask.finish_time
|
||||
~RangePartitionTask.gpu_limit
|
||||
~RangePartitionTask.key
|
||||
~RangePartitionTask.local_gpu
|
||||
~RangePartitionTask.local_gpu_ranks
|
||||
~RangePartitionTask.local_rank
|
||||
~RangePartitionTask.location
|
||||
~RangePartitionTask.memory_limit
|
||||
~RangePartitionTask.numa_node
|
||||
~RangePartitionTask.numpy_random_gen
|
||||
~RangePartitionTask.output
|
||||
~RangePartitionTask.output_dirname
|
||||
~RangePartitionTask.output_filename
|
||||
~RangePartitionTask.partition_dims
|
||||
~RangePartitionTask.partition_infos
|
||||
~RangePartitionTask.partition_infos_as_dict
|
||||
~RangePartitionTask.python_random_gen
|
||||
~RangePartitionTask.random_seed_bytes
|
||||
~RangePartitionTask.ray_dataset_path
|
||||
~RangePartitionTask.ray_marker_path
|
||||
~RangePartitionTask.retry_count
|
||||
~RangePartitionTask.runtime_id
|
||||
~RangePartitionTask.runtime_output_abspath
|
||||
~RangePartitionTask.self_contained_output
|
||||
~RangePartitionTask.skip_when_any_input_empty
|
||||
~RangePartitionTask.staging_root
|
||||
~RangePartitionTask.start_time
|
||||
~RangePartitionTask.status
|
||||
~RangePartitionTask.temp_abspath
|
||||
~RangePartitionTask.temp_output
|
||||
~RangePartitionTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,106 @@
|
||||
smallpond.execution.task.RepeatPartitionProducerTask
|
||||
====================================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: RepeatPartitionProducerTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RepeatPartitionProducerTask.__init__
|
||||
~RepeatPartitionProducerTask.add_elapsed_time
|
||||
~RepeatPartitionProducerTask.adjust_row_group_size
|
||||
~RepeatPartitionProducerTask.clean_complex_attrs
|
||||
~RepeatPartitionProducerTask.clean_output
|
||||
~RepeatPartitionProducerTask.cleanup
|
||||
~RepeatPartitionProducerTask.compute_avg_row_size
|
||||
~RepeatPartitionProducerTask.dump
|
||||
~RepeatPartitionProducerTask.exec
|
||||
~RepeatPartitionProducerTask.finalize
|
||||
~RepeatPartitionProducerTask.get_partition_info
|
||||
~RepeatPartitionProducerTask.initialize
|
||||
~RepeatPartitionProducerTask.inject_fault
|
||||
~RepeatPartitionProducerTask.merge_metrics
|
||||
~RepeatPartitionProducerTask.oom
|
||||
~RepeatPartitionProducerTask.parquet_kv_metadata_bytes
|
||||
~RepeatPartitionProducerTask.parquet_kv_metadata_str
|
||||
~RepeatPartitionProducerTask.random_float
|
||||
~RepeatPartitionProducerTask.random_uint32
|
||||
~RepeatPartitionProducerTask.run
|
||||
~RepeatPartitionProducerTask.run_on_ray
|
||||
~RepeatPartitionProducerTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RepeatPartitionProducerTask.npartitions
|
||||
~RepeatPartitionProducerTask.dimension
|
||||
~RepeatPartitionProducerTask.partitioned_datasets
|
||||
~RepeatPartitionProducerTask.allow_speculative_exec
|
||||
~RepeatPartitionProducerTask.any_input_empty
|
||||
~RepeatPartitionProducerTask.cpu_limit
|
||||
~RepeatPartitionProducerTask.ctx
|
||||
~RepeatPartitionProducerTask.dataset
|
||||
~RepeatPartitionProducerTask.default_output_name
|
||||
~RepeatPartitionProducerTask.elapsed_time
|
||||
~RepeatPartitionProducerTask.exception
|
||||
~RepeatPartitionProducerTask.exec_cq
|
||||
~RepeatPartitionProducerTask.exec_id
|
||||
~RepeatPartitionProducerTask.exec_on_scheduler
|
||||
~RepeatPartitionProducerTask.fail_count
|
||||
~RepeatPartitionProducerTask.final_output_abspath
|
||||
~RepeatPartitionProducerTask.finish_time
|
||||
~RepeatPartitionProducerTask.gpu_limit
|
||||
~RepeatPartitionProducerTask.id
|
||||
~RepeatPartitionProducerTask.input_datasets
|
||||
~RepeatPartitionProducerTask.input_deps
|
||||
~RepeatPartitionProducerTask.key
|
||||
~RepeatPartitionProducerTask.local_gpu
|
||||
~RepeatPartitionProducerTask.local_gpu_ranks
|
||||
~RepeatPartitionProducerTask.local_rank
|
||||
~RepeatPartitionProducerTask.location
|
||||
~RepeatPartitionProducerTask.memory_limit
|
||||
~RepeatPartitionProducerTask.node_id
|
||||
~RepeatPartitionProducerTask.numa_node
|
||||
~RepeatPartitionProducerTask.numpy_random_gen
|
||||
~RepeatPartitionProducerTask.output
|
||||
~RepeatPartitionProducerTask.output_deps
|
||||
~RepeatPartitionProducerTask.output_dirname
|
||||
~RepeatPartitionProducerTask.output_filename
|
||||
~RepeatPartitionProducerTask.output_name
|
||||
~RepeatPartitionProducerTask.output_root
|
||||
~RepeatPartitionProducerTask.partition_dims
|
||||
~RepeatPartitionProducerTask.partition_infos
|
||||
~RepeatPartitionProducerTask.partition_infos_as_dict
|
||||
~RepeatPartitionProducerTask.perf_metrics
|
||||
~RepeatPartitionProducerTask.perf_profile
|
||||
~RepeatPartitionProducerTask.python_random_gen
|
||||
~RepeatPartitionProducerTask.random_seed_bytes
|
||||
~RepeatPartitionProducerTask.ray_dataset_path
|
||||
~RepeatPartitionProducerTask.ray_marker_path
|
||||
~RepeatPartitionProducerTask.retry_count
|
||||
~RepeatPartitionProducerTask.runtime_id
|
||||
~RepeatPartitionProducerTask.runtime_output_abspath
|
||||
~RepeatPartitionProducerTask.runtime_state
|
||||
~RepeatPartitionProducerTask.sched_epoch
|
||||
~RepeatPartitionProducerTask.self_contained_output
|
||||
~RepeatPartitionProducerTask.skip_when_any_input_empty
|
||||
~RepeatPartitionProducerTask.staging_root
|
||||
~RepeatPartitionProducerTask.start_time
|
||||
~RepeatPartitionProducerTask.status
|
||||
~RepeatPartitionProducerTask.temp_abspath
|
||||
~RepeatPartitionProducerTask.temp_output
|
||||
~RepeatPartitionProducerTask.uniform_failure_prob
|
||||
|
||||
|
||||
103
_sources/generated/smallpond.execution.task.RootTask.rst.txt
Normal file
103
_sources/generated/smallpond.execution.task.RootTask.rst.txt
Normal file
@@ -0,0 +1,103 @@
|
||||
smallpond.execution.task.RootTask
|
||||
=================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: RootTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RootTask.__init__
|
||||
~RootTask.add_elapsed_time
|
||||
~RootTask.adjust_row_group_size
|
||||
~RootTask.clean_complex_attrs
|
||||
~RootTask.clean_output
|
||||
~RootTask.cleanup
|
||||
~RootTask.compute_avg_row_size
|
||||
~RootTask.dump
|
||||
~RootTask.exec
|
||||
~RootTask.finalize
|
||||
~RootTask.get_partition_info
|
||||
~RootTask.initialize
|
||||
~RootTask.inject_fault
|
||||
~RootTask.merge_metrics
|
||||
~RootTask.oom
|
||||
~RootTask.parquet_kv_metadata_bytes
|
||||
~RootTask.parquet_kv_metadata_str
|
||||
~RootTask.random_float
|
||||
~RootTask.random_uint32
|
||||
~RootTask.run
|
||||
~RootTask.run_on_ray
|
||||
~RootTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RootTask.ctx
|
||||
~RootTask.id
|
||||
~RootTask.node_id
|
||||
~RootTask.sched_epoch
|
||||
~RootTask.output_name
|
||||
~RootTask.output_root
|
||||
~RootTask.dataset
|
||||
~RootTask.input_deps
|
||||
~RootTask.output_deps
|
||||
~RootTask.perf_metrics
|
||||
~RootTask.perf_profile
|
||||
~RootTask.runtime_state
|
||||
~RootTask.input_datasets
|
||||
~RootTask.allow_speculative_exec
|
||||
~RootTask.any_input_empty
|
||||
~RootTask.cpu_limit
|
||||
~RootTask.default_output_name
|
||||
~RootTask.elapsed_time
|
||||
~RootTask.exception
|
||||
~RootTask.exec_cq
|
||||
~RootTask.exec_id
|
||||
~RootTask.exec_on_scheduler
|
||||
~RootTask.fail_count
|
||||
~RootTask.final_output_abspath
|
||||
~RootTask.finish_time
|
||||
~RootTask.gpu_limit
|
||||
~RootTask.key
|
||||
~RootTask.local_gpu
|
||||
~RootTask.local_gpu_ranks
|
||||
~RootTask.local_rank
|
||||
~RootTask.location
|
||||
~RootTask.memory_limit
|
||||
~RootTask.numa_node
|
||||
~RootTask.numpy_random_gen
|
||||
~RootTask.output
|
||||
~RootTask.output_dirname
|
||||
~RootTask.output_filename
|
||||
~RootTask.partition_dims
|
||||
~RootTask.partition_infos
|
||||
~RootTask.partition_infos_as_dict
|
||||
~RootTask.python_random_gen
|
||||
~RootTask.random_seed_bytes
|
||||
~RootTask.ray_dataset_path
|
||||
~RootTask.ray_marker_path
|
||||
~RootTask.retry_count
|
||||
~RootTask.runtime_id
|
||||
~RootTask.runtime_output_abspath
|
||||
~RootTask.self_contained_output
|
||||
~RootTask.skip_when_any_input_empty
|
||||
~RootTask.staging_root
|
||||
~RootTask.start_time
|
||||
~RootTask.status
|
||||
~RootTask.temp_abspath
|
||||
~RootTask.temp_output
|
||||
~RootTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,50 @@
|
||||
smallpond.execution.task.RuntimeContext
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: RuntimeContext
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RuntimeContext.__init__
|
||||
~RuntimeContext.cleanup
|
||||
~RuntimeContext.cleanup_root
|
||||
~RuntimeContext.get_local_gpus
|
||||
~RuntimeContext.initialize
|
||||
~RuntimeContext.new_task_id
|
||||
~RuntimeContext.set_current_task
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RuntimeContext.available_memory
|
||||
~RuntimeContext.exec_plan_path
|
||||
~RuntimeContext.job_root_dirname
|
||||
~RuntimeContext.job_status_path
|
||||
~RuntimeContext.logcial_plan_graph_path
|
||||
~RuntimeContext.logcial_plan_path
|
||||
~RuntimeContext.numa_node_count
|
||||
~RuntimeContext.physical_cpu_count
|
||||
~RuntimeContext.ray_log_path
|
||||
~RuntimeContext.runtime_ctx_path
|
||||
~RuntimeContext.sched_state_path
|
||||
~RuntimeContext.secs_executor_probe_timeout
|
||||
~RuntimeContext.task
|
||||
~RuntimeContext.total_memory
|
||||
~RuntimeContext.usable_cpu_count
|
||||
~RuntimeContext.usable_gpu_count
|
||||
~RuntimeContext.usable_memory_size
|
||||
|
||||
|
||||
@@ -0,0 +1,105 @@
|
||||
smallpond.execution.task.SplitDataSetTask
|
||||
=========================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: SplitDataSetTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~SplitDataSetTask.__init__
|
||||
~SplitDataSetTask.add_elapsed_time
|
||||
~SplitDataSetTask.adjust_row_group_size
|
||||
~SplitDataSetTask.clean_complex_attrs
|
||||
~SplitDataSetTask.clean_output
|
||||
~SplitDataSetTask.cleanup
|
||||
~SplitDataSetTask.compute_avg_row_size
|
||||
~SplitDataSetTask.dump
|
||||
~SplitDataSetTask.exec
|
||||
~SplitDataSetTask.finalize
|
||||
~SplitDataSetTask.get_partition_info
|
||||
~SplitDataSetTask.initialize
|
||||
~SplitDataSetTask.inject_fault
|
||||
~SplitDataSetTask.merge_metrics
|
||||
~SplitDataSetTask.oom
|
||||
~SplitDataSetTask.parquet_kv_metadata_bytes
|
||||
~SplitDataSetTask.parquet_kv_metadata_str
|
||||
~SplitDataSetTask.random_float
|
||||
~SplitDataSetTask.random_uint32
|
||||
~SplitDataSetTask.run
|
||||
~SplitDataSetTask.run_on_ray
|
||||
~SplitDataSetTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~SplitDataSetTask.partition
|
||||
~SplitDataSetTask.npartitions
|
||||
~SplitDataSetTask.allow_speculative_exec
|
||||
~SplitDataSetTask.any_input_empty
|
||||
~SplitDataSetTask.cpu_limit
|
||||
~SplitDataSetTask.ctx
|
||||
~SplitDataSetTask.dataset
|
||||
~SplitDataSetTask.default_output_name
|
||||
~SplitDataSetTask.elapsed_time
|
||||
~SplitDataSetTask.exception
|
||||
~SplitDataSetTask.exec_cq
|
||||
~SplitDataSetTask.exec_id
|
||||
~SplitDataSetTask.exec_on_scheduler
|
||||
~SplitDataSetTask.fail_count
|
||||
~SplitDataSetTask.final_output_abspath
|
||||
~SplitDataSetTask.finish_time
|
||||
~SplitDataSetTask.gpu_limit
|
||||
~SplitDataSetTask.id
|
||||
~SplitDataSetTask.input_datasets
|
||||
~SplitDataSetTask.input_deps
|
||||
~SplitDataSetTask.key
|
||||
~SplitDataSetTask.local_gpu
|
||||
~SplitDataSetTask.local_gpu_ranks
|
||||
~SplitDataSetTask.local_rank
|
||||
~SplitDataSetTask.location
|
||||
~SplitDataSetTask.memory_limit
|
||||
~SplitDataSetTask.node_id
|
||||
~SplitDataSetTask.numa_node
|
||||
~SplitDataSetTask.numpy_random_gen
|
||||
~SplitDataSetTask.output
|
||||
~SplitDataSetTask.output_deps
|
||||
~SplitDataSetTask.output_dirname
|
||||
~SplitDataSetTask.output_filename
|
||||
~SplitDataSetTask.output_name
|
||||
~SplitDataSetTask.output_root
|
||||
~SplitDataSetTask.partition_dims
|
||||
~SplitDataSetTask.partition_infos
|
||||
~SplitDataSetTask.partition_infos_as_dict
|
||||
~SplitDataSetTask.perf_metrics
|
||||
~SplitDataSetTask.perf_profile
|
||||
~SplitDataSetTask.python_random_gen
|
||||
~SplitDataSetTask.random_seed_bytes
|
||||
~SplitDataSetTask.ray_dataset_path
|
||||
~SplitDataSetTask.ray_marker_path
|
||||
~SplitDataSetTask.retry_count
|
||||
~SplitDataSetTask.runtime_id
|
||||
~SplitDataSetTask.runtime_output_abspath
|
||||
~SplitDataSetTask.runtime_state
|
||||
~SplitDataSetTask.sched_epoch
|
||||
~SplitDataSetTask.self_contained_output
|
||||
~SplitDataSetTask.skip_when_any_input_empty
|
||||
~SplitDataSetTask.staging_root
|
||||
~SplitDataSetTask.start_time
|
||||
~SplitDataSetTask.status
|
||||
~SplitDataSetTask.temp_abspath
|
||||
~SplitDataSetTask.temp_output
|
||||
~SplitDataSetTask.uniform_failure_prob
|
||||
|
||||
|
||||
@@ -0,0 +1,131 @@
|
||||
smallpond.execution.task.SqlEngineTask
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: SqlEngineTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~SqlEngineTask.__init__
|
||||
~SqlEngineTask.add_elapsed_time
|
||||
~SqlEngineTask.adjust_row_group_size
|
||||
~SqlEngineTask.clean_complex_attrs
|
||||
~SqlEngineTask.clean_output
|
||||
~SqlEngineTask.cleanup
|
||||
~SqlEngineTask.compute_avg_row_size
|
||||
~SqlEngineTask.create_input_views
|
||||
~SqlEngineTask.dump
|
||||
~SqlEngineTask.exec
|
||||
~SqlEngineTask.exec_query
|
||||
~SqlEngineTask.finalize
|
||||
~SqlEngineTask.get_partition_info
|
||||
~SqlEngineTask.initialize
|
||||
~SqlEngineTask.inject_fault
|
||||
~SqlEngineTask.merge_metrics
|
||||
~SqlEngineTask.oom
|
||||
~SqlEngineTask.parquet_kv_metadata_bytes
|
||||
~SqlEngineTask.parquet_kv_metadata_str
|
||||
~SqlEngineTask.prepare_connection
|
||||
~SqlEngineTask.process_batch
|
||||
~SqlEngineTask.random_float
|
||||
~SqlEngineTask.random_uint32
|
||||
~SqlEngineTask.run
|
||||
~SqlEngineTask.run_on_ray
|
||||
~SqlEngineTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~SqlEngineTask.sql_queries
|
||||
~SqlEngineTask.per_thread_output
|
||||
~SqlEngineTask.materialize_output
|
||||
~SqlEngineTask.materialize_in_memory
|
||||
~SqlEngineTask.batched_processing
|
||||
~SqlEngineTask.parquet_row_group_size
|
||||
~SqlEngineTask.parquet_row_group_bytes
|
||||
~SqlEngineTask.parquet_dictionary_encoding
|
||||
~SqlEngineTask.parquet_compression
|
||||
~SqlEngineTask.parquet_compression_level
|
||||
~SqlEngineTask.allow_speculative_exec
|
||||
~SqlEngineTask.any_input_empty
|
||||
~SqlEngineTask.compression_level_str
|
||||
~SqlEngineTask.compression_options
|
||||
~SqlEngineTask.compression_type_str
|
||||
~SqlEngineTask.cpu_limit
|
||||
~SqlEngineTask.cpu_overcommit_ratio
|
||||
~SqlEngineTask.ctx
|
||||
~SqlEngineTask.dataset
|
||||
~SqlEngineTask.default_output_name
|
||||
~SqlEngineTask.elapsed_time
|
||||
~SqlEngineTask.enable_temp_directory
|
||||
~SqlEngineTask.exception
|
||||
~SqlEngineTask.exec_cq
|
||||
~SqlEngineTask.exec_id
|
||||
~SqlEngineTask.exec_on_scheduler
|
||||
~SqlEngineTask.fail_count
|
||||
~SqlEngineTask.final_output_abspath
|
||||
~SqlEngineTask.finish_time
|
||||
~SqlEngineTask.gpu_limit
|
||||
~SqlEngineTask.id
|
||||
~SqlEngineTask.input_datasets
|
||||
~SqlEngineTask.input_deps
|
||||
~SqlEngineTask.input_udfs
|
||||
~SqlEngineTask.input_view_index
|
||||
~SqlEngineTask.key
|
||||
~SqlEngineTask.local_gpu
|
||||
~SqlEngineTask.local_gpu_ranks
|
||||
~SqlEngineTask.local_rank
|
||||
~SqlEngineTask.location
|
||||
~SqlEngineTask.max_batch_size
|
||||
~SqlEngineTask.memory_limit
|
||||
~SqlEngineTask.memory_overcommit_ratio
|
||||
~SqlEngineTask.node_id
|
||||
~SqlEngineTask.numa_node
|
||||
~SqlEngineTask.numpy_random_gen
|
||||
~SqlEngineTask.oneline_query
|
||||
~SqlEngineTask.output
|
||||
~SqlEngineTask.output_deps
|
||||
~SqlEngineTask.output_dirname
|
||||
~SqlEngineTask.output_filename
|
||||
~SqlEngineTask.output_name
|
||||
~SqlEngineTask.output_root
|
||||
~SqlEngineTask.partition_dims
|
||||
~SqlEngineTask.partition_infos
|
||||
~SqlEngineTask.partition_infos_as_dict
|
||||
~SqlEngineTask.perf_metrics
|
||||
~SqlEngineTask.perf_profile
|
||||
~SqlEngineTask.python_random_gen
|
||||
~SqlEngineTask.query_udfs
|
||||
~SqlEngineTask.rand_seed_float
|
||||
~SqlEngineTask.rand_seed_uint32
|
||||
~SqlEngineTask.random_seed_bytes
|
||||
~SqlEngineTask.ray_dataset_path
|
||||
~SqlEngineTask.ray_marker_path
|
||||
~SqlEngineTask.retry_count
|
||||
~SqlEngineTask.runtime_id
|
||||
~SqlEngineTask.runtime_output_abspath
|
||||
~SqlEngineTask.runtime_state
|
||||
~SqlEngineTask.sched_epoch
|
||||
~SqlEngineTask.self_contained_output
|
||||
~SqlEngineTask.skip_when_any_input_empty
|
||||
~SqlEngineTask.staging_root
|
||||
~SqlEngineTask.start_time
|
||||
~SqlEngineTask.status
|
||||
~SqlEngineTask.temp_abspath
|
||||
~SqlEngineTask.temp_output
|
||||
~SqlEngineTask.udfs
|
||||
~SqlEngineTask.uniform_failure_prob
|
||||
|
||||
|
||||
103
_sources/generated/smallpond.execution.task.Task.rst.txt
Normal file
103
_sources/generated/smallpond.execution.task.Task.rst.txt
Normal file
@@ -0,0 +1,103 @@
|
||||
smallpond.execution.task.Task
|
||||
=============================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: Task
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Task.__init__
|
||||
~Task.add_elapsed_time
|
||||
~Task.adjust_row_group_size
|
||||
~Task.clean_complex_attrs
|
||||
~Task.clean_output
|
||||
~Task.cleanup
|
||||
~Task.compute_avg_row_size
|
||||
~Task.dump
|
||||
~Task.exec
|
||||
~Task.finalize
|
||||
~Task.get_partition_info
|
||||
~Task.initialize
|
||||
~Task.inject_fault
|
||||
~Task.merge_metrics
|
||||
~Task.oom
|
||||
~Task.parquet_kv_metadata_bytes
|
||||
~Task.parquet_kv_metadata_str
|
||||
~Task.random_float
|
||||
~Task.random_uint32
|
||||
~Task.run
|
||||
~Task.run_on_ray
|
||||
~Task.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Task.ctx
|
||||
~Task.id
|
||||
~Task.node_id
|
||||
~Task.sched_epoch
|
||||
~Task.output_name
|
||||
~Task.output_root
|
||||
~Task.dataset
|
||||
~Task.input_deps
|
||||
~Task.output_deps
|
||||
~Task.perf_metrics
|
||||
~Task.perf_profile
|
||||
~Task.runtime_state
|
||||
~Task.input_datasets
|
||||
~Task.allow_speculative_exec
|
||||
~Task.any_input_empty
|
||||
~Task.cpu_limit
|
||||
~Task.default_output_name
|
||||
~Task.elapsed_time
|
||||
~Task.exception
|
||||
~Task.exec_cq
|
||||
~Task.exec_id
|
||||
~Task.exec_on_scheduler
|
||||
~Task.fail_count
|
||||
~Task.final_output_abspath
|
||||
~Task.finish_time
|
||||
~Task.gpu_limit
|
||||
~Task.key
|
||||
~Task.local_gpu
|
||||
~Task.local_gpu_ranks
|
||||
~Task.local_rank
|
||||
~Task.location
|
||||
~Task.memory_limit
|
||||
~Task.numa_node
|
||||
~Task.numpy_random_gen
|
||||
~Task.output
|
||||
~Task.output_dirname
|
||||
~Task.output_filename
|
||||
~Task.partition_dims
|
||||
~Task.partition_infos
|
||||
~Task.partition_infos_as_dict
|
||||
~Task.python_random_gen
|
||||
~Task.random_seed_bytes
|
||||
~Task.ray_dataset_path
|
||||
~Task.ray_marker_path
|
||||
~Task.retry_count
|
||||
~Task.runtime_id
|
||||
~Task.runtime_output_abspath
|
||||
~Task.self_contained_output
|
||||
~Task.skip_when_any_input_empty
|
||||
~Task.staging_root
|
||||
~Task.start_time
|
||||
~Task.status
|
||||
~Task.temp_abspath
|
||||
~Task.temp_output
|
||||
~Task.uniform_failure_prob
|
||||
|
||||
|
||||
36
_sources/generated/smallpond.execution.task.TaskId.rst.txt
Normal file
36
_sources/generated/smallpond.execution.task.TaskId.rst.txt
Normal file
@@ -0,0 +1,36 @@
|
||||
smallpond.execution.task.TaskId
|
||||
===============================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: TaskId
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~TaskId.__init__
|
||||
~TaskId.as_integer_ratio
|
||||
~TaskId.bit_length
|
||||
~TaskId.conjugate
|
||||
~TaskId.from_bytes
|
||||
~TaskId.to_bytes
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~TaskId.denominator
|
||||
~TaskId.imag
|
||||
~TaskId.numerator
|
||||
~TaskId.real
|
||||
|
||||
|
||||
@@ -0,0 +1,30 @@
|
||||
smallpond.execution.task.TaskRuntimeId
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: TaskRuntimeId
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~TaskRuntimeId.__init__
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~TaskRuntimeId.id
|
||||
~TaskRuntimeId.epoch
|
||||
~TaskRuntimeId.retry
|
||||
|
||||
|
||||
@@ -0,0 +1,107 @@
|
||||
smallpond.execution.task.UserDefinedPartitionProducerTask
|
||||
=========================================================
|
||||
|
||||
.. currentmodule:: smallpond.execution.task
|
||||
|
||||
.. autoclass:: UserDefinedPartitionProducerTask
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~UserDefinedPartitionProducerTask.__init__
|
||||
~UserDefinedPartitionProducerTask.add_elapsed_time
|
||||
~UserDefinedPartitionProducerTask.adjust_row_group_size
|
||||
~UserDefinedPartitionProducerTask.clean_complex_attrs
|
||||
~UserDefinedPartitionProducerTask.clean_output
|
||||
~UserDefinedPartitionProducerTask.cleanup
|
||||
~UserDefinedPartitionProducerTask.compute_avg_row_size
|
||||
~UserDefinedPartitionProducerTask.dump
|
||||
~UserDefinedPartitionProducerTask.exec
|
||||
~UserDefinedPartitionProducerTask.finalize
|
||||
~UserDefinedPartitionProducerTask.get_partition_info
|
||||
~UserDefinedPartitionProducerTask.initialize
|
||||
~UserDefinedPartitionProducerTask.inject_fault
|
||||
~UserDefinedPartitionProducerTask.merge_metrics
|
||||
~UserDefinedPartitionProducerTask.oom
|
||||
~UserDefinedPartitionProducerTask.parquet_kv_metadata_bytes
|
||||
~UserDefinedPartitionProducerTask.parquet_kv_metadata_str
|
||||
~UserDefinedPartitionProducerTask.random_float
|
||||
~UserDefinedPartitionProducerTask.random_uint32
|
||||
~UserDefinedPartitionProducerTask.run
|
||||
~UserDefinedPartitionProducerTask.run_on_ray
|
||||
~UserDefinedPartitionProducerTask.set_memory_limit
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~UserDefinedPartitionProducerTask.partition_func
|
||||
~UserDefinedPartitionProducerTask.allow_speculative_exec
|
||||
~UserDefinedPartitionProducerTask.any_input_empty
|
||||
~UserDefinedPartitionProducerTask.cpu_limit
|
||||
~UserDefinedPartitionProducerTask.ctx
|
||||
~UserDefinedPartitionProducerTask.dataset
|
||||
~UserDefinedPartitionProducerTask.default_output_name
|
||||
~UserDefinedPartitionProducerTask.dimension
|
||||
~UserDefinedPartitionProducerTask.elapsed_time
|
||||
~UserDefinedPartitionProducerTask.exception
|
||||
~UserDefinedPartitionProducerTask.exec_cq
|
||||
~UserDefinedPartitionProducerTask.exec_id
|
||||
~UserDefinedPartitionProducerTask.exec_on_scheduler
|
||||
~UserDefinedPartitionProducerTask.fail_count
|
||||
~UserDefinedPartitionProducerTask.final_output_abspath
|
||||
~UserDefinedPartitionProducerTask.finish_time
|
||||
~UserDefinedPartitionProducerTask.gpu_limit
|
||||
~UserDefinedPartitionProducerTask.id
|
||||
~UserDefinedPartitionProducerTask.input_datasets
|
||||
~UserDefinedPartitionProducerTask.input_deps
|
||||
~UserDefinedPartitionProducerTask.key
|
||||
~UserDefinedPartitionProducerTask.local_gpu
|
||||
~UserDefinedPartitionProducerTask.local_gpu_ranks
|
||||
~UserDefinedPartitionProducerTask.local_rank
|
||||
~UserDefinedPartitionProducerTask.location
|
||||
~UserDefinedPartitionProducerTask.memory_limit
|
||||
~UserDefinedPartitionProducerTask.node_id
|
||||
~UserDefinedPartitionProducerTask.npartitions
|
||||
~UserDefinedPartitionProducerTask.numa_node
|
||||
~UserDefinedPartitionProducerTask.numpy_random_gen
|
||||
~UserDefinedPartitionProducerTask.output
|
||||
~UserDefinedPartitionProducerTask.output_deps
|
||||
~UserDefinedPartitionProducerTask.output_dirname
|
||||
~UserDefinedPartitionProducerTask.output_filename
|
||||
~UserDefinedPartitionProducerTask.output_name
|
||||
~UserDefinedPartitionProducerTask.output_root
|
||||
~UserDefinedPartitionProducerTask.partition_dims
|
||||
~UserDefinedPartitionProducerTask.partition_infos
|
||||
~UserDefinedPartitionProducerTask.partition_infos_as_dict
|
||||
~UserDefinedPartitionProducerTask.partitioned_datasets
|
||||
~UserDefinedPartitionProducerTask.perf_metrics
|
||||
~UserDefinedPartitionProducerTask.perf_profile
|
||||
~UserDefinedPartitionProducerTask.python_random_gen
|
||||
~UserDefinedPartitionProducerTask.random_seed_bytes
|
||||
~UserDefinedPartitionProducerTask.ray_dataset_path
|
||||
~UserDefinedPartitionProducerTask.ray_marker_path
|
||||
~UserDefinedPartitionProducerTask.retry_count
|
||||
~UserDefinedPartitionProducerTask.runtime_id
|
||||
~UserDefinedPartitionProducerTask.runtime_output_abspath
|
||||
~UserDefinedPartitionProducerTask.runtime_state
|
||||
~UserDefinedPartitionProducerTask.sched_epoch
|
||||
~UserDefinedPartitionProducerTask.self_contained_output
|
||||
~UserDefinedPartitionProducerTask.skip_when_any_input_empty
|
||||
~UserDefinedPartitionProducerTask.staging_root
|
||||
~UserDefinedPartitionProducerTask.start_time
|
||||
~UserDefinedPartitionProducerTask.status
|
||||
~UserDefinedPartitionProducerTask.temp_abspath
|
||||
~UserDefinedPartitionProducerTask.temp_output
|
||||
~UserDefinedPartitionProducerTask.uniform_failure_prob
|
||||
|
||||
|
||||
6
_sources/generated/smallpond.init.rst.txt
Normal file
6
_sources/generated/smallpond.init.rst.txt
Normal file
@@ -0,0 +1,6 @@
|
||||
smallpond.init
|
||||
==============
|
||||
|
||||
.. currentmodule:: smallpond
|
||||
|
||||
.. autofunction:: init
|
||||
@@ -0,0 +1,46 @@
|
||||
smallpond.logical.dataset.ArrowTableDataSet
|
||||
===========================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: ArrowTableDataSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowTableDataSet.__init__
|
||||
~ArrowTableDataSet.log
|
||||
~ArrowTableDataSet.merge
|
||||
~ArrowTableDataSet.partition_by_files
|
||||
~ArrowTableDataSet.reset
|
||||
~ArrowTableDataSet.sql_query_fragment
|
||||
~ArrowTableDataSet.to_arrow_table
|
||||
~ArrowTableDataSet.to_batch_reader
|
||||
~ArrowTableDataSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowTableDataSet.paths
|
||||
~ArrowTableDataSet.root_dir
|
||||
~ArrowTableDataSet.recursive
|
||||
~ArrowTableDataSet.columns
|
||||
~ArrowTableDataSet.absolute_paths
|
||||
~ArrowTableDataSet.empty
|
||||
~ArrowTableDataSet.num_files
|
||||
~ArrowTableDataSet.num_rows
|
||||
~ArrowTableDataSet.resolved_paths
|
||||
~ArrowTableDataSet.udfs
|
||||
~ArrowTableDataSet.union_by_name
|
||||
|
||||
|
||||
@@ -0,0 +1,51 @@
|
||||
smallpond.logical.dataset.CsvDataSet
|
||||
====================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: CsvDataSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~CsvDataSet.__init__
|
||||
~CsvDataSet.log
|
||||
~CsvDataSet.merge
|
||||
~CsvDataSet.partition_by_files
|
||||
~CsvDataSet.reset
|
||||
~CsvDataSet.sql_query_fragment
|
||||
~CsvDataSet.to_arrow_table
|
||||
~CsvDataSet.to_batch_reader
|
||||
~CsvDataSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~CsvDataSet.schema
|
||||
~CsvDataSet.delim
|
||||
~CsvDataSet.max_line_size
|
||||
~CsvDataSet.parallel
|
||||
~CsvDataSet.header
|
||||
~CsvDataSet.absolute_paths
|
||||
~CsvDataSet.columns
|
||||
~CsvDataSet.empty
|
||||
~CsvDataSet.num_files
|
||||
~CsvDataSet.num_rows
|
||||
~CsvDataSet.paths
|
||||
~CsvDataSet.recursive
|
||||
~CsvDataSet.resolved_paths
|
||||
~CsvDataSet.root_dir
|
||||
~CsvDataSet.udfs
|
||||
~CsvDataSet.union_by_name
|
||||
|
||||
|
||||
46
_sources/generated/smallpond.logical.dataset.DataSet.rst.txt
Normal file
46
_sources/generated/smallpond.logical.dataset.DataSet.rst.txt
Normal file
@@ -0,0 +1,46 @@
|
||||
smallpond.logical.dataset.DataSet
|
||||
=================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: DataSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSet.__init__
|
||||
~DataSet.log
|
||||
~DataSet.merge
|
||||
~DataSet.partition_by_files
|
||||
~DataSet.reset
|
||||
~DataSet.sql_query_fragment
|
||||
~DataSet.to_arrow_table
|
||||
~DataSet.to_batch_reader
|
||||
~DataSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSet.paths
|
||||
~DataSet.root_dir
|
||||
~DataSet.recursive
|
||||
~DataSet.columns
|
||||
~DataSet.absolute_paths
|
||||
~DataSet.empty
|
||||
~DataSet.num_files
|
||||
~DataSet.num_rows
|
||||
~DataSet.resolved_paths
|
||||
~DataSet.udfs
|
||||
~DataSet.union_by_name
|
||||
|
||||
|
||||
46
_sources/generated/smallpond.logical.dataset.FileSet.rst.txt
Normal file
46
_sources/generated/smallpond.logical.dataset.FileSet.rst.txt
Normal file
@@ -0,0 +1,46 @@
|
||||
smallpond.logical.dataset.FileSet
|
||||
=================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: FileSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~FileSet.__init__
|
||||
~FileSet.log
|
||||
~FileSet.merge
|
||||
~FileSet.partition_by_files
|
||||
~FileSet.reset
|
||||
~FileSet.sql_query_fragment
|
||||
~FileSet.to_arrow_table
|
||||
~FileSet.to_batch_reader
|
||||
~FileSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~FileSet.paths
|
||||
~FileSet.root_dir
|
||||
~FileSet.recursive
|
||||
~FileSet.columns
|
||||
~FileSet.absolute_paths
|
||||
~FileSet.empty
|
||||
~FileSet.num_files
|
||||
~FileSet.num_rows
|
||||
~FileSet.resolved_paths
|
||||
~FileSet.udfs
|
||||
~FileSet.union_by_name
|
||||
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
smallpond.logical.dataset.JsonDataSet
|
||||
=====================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: JsonDataSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~JsonDataSet.__init__
|
||||
~JsonDataSet.log
|
||||
~JsonDataSet.merge
|
||||
~JsonDataSet.partition_by_files
|
||||
~JsonDataSet.reset
|
||||
~JsonDataSet.sql_query_fragment
|
||||
~JsonDataSet.to_arrow_table
|
||||
~JsonDataSet.to_batch_reader
|
||||
~JsonDataSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~JsonDataSet.schema
|
||||
~JsonDataSet.format
|
||||
~JsonDataSet.max_object_size
|
||||
~JsonDataSet.absolute_paths
|
||||
~JsonDataSet.columns
|
||||
~JsonDataSet.empty
|
||||
~JsonDataSet.num_files
|
||||
~JsonDataSet.num_rows
|
||||
~JsonDataSet.paths
|
||||
~JsonDataSet.recursive
|
||||
~JsonDataSet.resolved_paths
|
||||
~JsonDataSet.root_dir
|
||||
~JsonDataSet.udfs
|
||||
~JsonDataSet.union_by_name
|
||||
|
||||
|
||||
@@ -0,0 +1,46 @@
|
||||
smallpond.logical.dataset.PandasDataSet
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: PandasDataSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasDataSet.__init__
|
||||
~PandasDataSet.log
|
||||
~PandasDataSet.merge
|
||||
~PandasDataSet.partition_by_files
|
||||
~PandasDataSet.reset
|
||||
~PandasDataSet.sql_query_fragment
|
||||
~PandasDataSet.to_arrow_table
|
||||
~PandasDataSet.to_batch_reader
|
||||
~PandasDataSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasDataSet.paths
|
||||
~PandasDataSet.root_dir
|
||||
~PandasDataSet.recursive
|
||||
~PandasDataSet.columns
|
||||
~PandasDataSet.absolute_paths
|
||||
~PandasDataSet.empty
|
||||
~PandasDataSet.num_files
|
||||
~PandasDataSet.num_rows
|
||||
~PandasDataSet.resolved_paths
|
||||
~PandasDataSet.udfs
|
||||
~PandasDataSet.union_by_name
|
||||
|
||||
|
||||
@@ -0,0 +1,54 @@
|
||||
smallpond.logical.dataset.ParquetDataSet
|
||||
========================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: ParquetDataSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ParquetDataSet.__init__
|
||||
~ParquetDataSet.create_from
|
||||
~ParquetDataSet.load_partitioned_datasets
|
||||
~ParquetDataSet.log
|
||||
~ParquetDataSet.merge
|
||||
~ParquetDataSet.partition_by_files
|
||||
~ParquetDataSet.partition_by_rows
|
||||
~ParquetDataSet.partition_by_size
|
||||
~ParquetDataSet.remove_empty_files
|
||||
~ParquetDataSet.reset
|
||||
~ParquetDataSet.sql_query_fragment
|
||||
~ParquetDataSet.to_arrow_table
|
||||
~ParquetDataSet.to_batch_reader
|
||||
~ParquetDataSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ParquetDataSet.generated_columns
|
||||
~ParquetDataSet.absolute_paths
|
||||
~ParquetDataSet.columns
|
||||
~ParquetDataSet.empty
|
||||
~ParquetDataSet.estimated_data_size
|
||||
~ParquetDataSet.num_files
|
||||
~ParquetDataSet.num_rows
|
||||
~ParquetDataSet.paths
|
||||
~ParquetDataSet.recursive
|
||||
~ParquetDataSet.resolved_paths
|
||||
~ParquetDataSet.resolved_row_ranges
|
||||
~ParquetDataSet.root_dir
|
||||
~ParquetDataSet.udfs
|
||||
~ParquetDataSet.union_by_name
|
||||
|
||||
|
||||
@@ -0,0 +1,47 @@
|
||||
smallpond.logical.dataset.PartitionedDataSet
|
||||
============================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: PartitionedDataSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionedDataSet.__init__
|
||||
~PartitionedDataSet.log
|
||||
~PartitionedDataSet.merge
|
||||
~PartitionedDataSet.partition_by_files
|
||||
~PartitionedDataSet.reset
|
||||
~PartitionedDataSet.sql_query_fragment
|
||||
~PartitionedDataSet.to_arrow_table
|
||||
~PartitionedDataSet.to_batch_reader
|
||||
~PartitionedDataSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionedDataSet.datasets
|
||||
~PartitionedDataSet.absolute_paths
|
||||
~PartitionedDataSet.columns
|
||||
~PartitionedDataSet.empty
|
||||
~PartitionedDataSet.num_files
|
||||
~PartitionedDataSet.num_rows
|
||||
~PartitionedDataSet.paths
|
||||
~PartitionedDataSet.recursive
|
||||
~PartitionedDataSet.resolved_paths
|
||||
~PartitionedDataSet.root_dir
|
||||
~PartitionedDataSet.udfs
|
||||
~PartitionedDataSet.union_by_name
|
||||
|
||||
|
||||
@@ -0,0 +1,48 @@
|
||||
smallpond.logical.dataset.SqlQueryDataSet
|
||||
=========================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.dataset
|
||||
|
||||
.. autoclass:: SqlQueryDataSet
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~SqlQueryDataSet.__init__
|
||||
~SqlQueryDataSet.log
|
||||
~SqlQueryDataSet.merge
|
||||
~SqlQueryDataSet.partition_by_files
|
||||
~SqlQueryDataSet.reset
|
||||
~SqlQueryDataSet.sql_query_fragment
|
||||
~SqlQueryDataSet.to_arrow_table
|
||||
~SqlQueryDataSet.to_batch_reader
|
||||
~SqlQueryDataSet.to_pandas
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~SqlQueryDataSet.sql_query
|
||||
~SqlQueryDataSet.query_builder
|
||||
~SqlQueryDataSet.absolute_paths
|
||||
~SqlQueryDataSet.columns
|
||||
~SqlQueryDataSet.empty
|
||||
~SqlQueryDataSet.num_files
|
||||
~SqlQueryDataSet.num_rows
|
||||
~SqlQueryDataSet.paths
|
||||
~SqlQueryDataSet.recursive
|
||||
~SqlQueryDataSet.resolved_paths
|
||||
~SqlQueryDataSet.root_dir
|
||||
~SqlQueryDataSet.udfs
|
||||
~SqlQueryDataSet.union_by_name
|
||||
|
||||
|
||||
@@ -0,0 +1,39 @@
|
||||
smallpond.logical.node.ArrowBatchNode
|
||||
=====================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: ArrowBatchNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowBatchNode.__init__
|
||||
~ArrowBatchNode.add_perf_metrics
|
||||
~ArrowBatchNode.create_task
|
||||
~ArrowBatchNode.get_perf_stats
|
||||
~ArrowBatchNode.process
|
||||
~ArrowBatchNode.slim_copy
|
||||
~ArrowBatchNode.spawn
|
||||
~ArrowBatchNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowBatchNode.default_batch_size
|
||||
~ArrowBatchNode.default_row_group_size
|
||||
~ArrowBatchNode.default_secs_checkpoint_interval
|
||||
~ArrowBatchNode.enable_resource_boost
|
||||
~ArrowBatchNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,37 @@
|
||||
smallpond.logical.node.ArrowComputeNode
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: ArrowComputeNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowComputeNode.__init__
|
||||
~ArrowComputeNode.add_perf_metrics
|
||||
~ArrowComputeNode.create_task
|
||||
~ArrowComputeNode.get_perf_stats
|
||||
~ArrowComputeNode.process
|
||||
~ArrowComputeNode.slim_copy
|
||||
~ArrowComputeNode.spawn
|
||||
~ArrowComputeNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowComputeNode.default_row_group_size
|
||||
~ArrowComputeNode.enable_resource_boost
|
||||
~ArrowComputeNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,39 @@
|
||||
smallpond.logical.node.ArrowStreamNode
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: ArrowStreamNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowStreamNode.__init__
|
||||
~ArrowStreamNode.add_perf_metrics
|
||||
~ArrowStreamNode.create_task
|
||||
~ArrowStreamNode.get_perf_stats
|
||||
~ArrowStreamNode.process
|
||||
~ArrowStreamNode.slim_copy
|
||||
~ArrowStreamNode.spawn
|
||||
~ArrowStreamNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ArrowStreamNode.default_batch_size
|
||||
~ArrowStreamNode.default_row_group_size
|
||||
~ArrowStreamNode.default_secs_checkpoint_interval
|
||||
~ArrowStreamNode.enable_resource_boost
|
||||
~ArrowStreamNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,34 @@
|
||||
smallpond.logical.node.ConsolidateNode
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: ConsolidateNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ConsolidateNode.__init__
|
||||
~ConsolidateNode.add_perf_metrics
|
||||
~ConsolidateNode.create_task
|
||||
~ConsolidateNode.get_perf_stats
|
||||
~ConsolidateNode.slim_copy
|
||||
~ConsolidateNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ConsolidateNode.enable_resource_boost
|
||||
~ConsolidateNode.num_partitions
|
||||
|
||||
|
||||
25
_sources/generated/smallpond.logical.node.Context.rst.txt
Normal file
25
_sources/generated/smallpond.logical.node.Context.rst.txt
Normal file
@@ -0,0 +1,25 @@
|
||||
smallpond.logical.node.Context
|
||||
==============================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: Context
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Context.__init__
|
||||
~Context.create_duckdb_extension
|
||||
~Context.create_external_module
|
||||
~Context.create_function
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
smallpond.logical.node.DataSetPartitionNode
|
||||
===========================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autofunction:: DataSetPartitionNode
|
||||
@@ -0,0 +1,34 @@
|
||||
smallpond.logical.node.DataSinkNode
|
||||
===================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: DataSinkNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSinkNode.__init__
|
||||
~DataSinkNode.add_perf_metrics
|
||||
~DataSinkNode.create_task
|
||||
~DataSinkNode.get_perf_stats
|
||||
~DataSinkNode.slim_copy
|
||||
~DataSinkNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSinkNode.enable_resource_boost
|
||||
~DataSinkNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,34 @@
|
||||
smallpond.logical.node.DataSourceNode
|
||||
=====================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: DataSourceNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSourceNode.__init__
|
||||
~DataSourceNode.add_perf_metrics
|
||||
~DataSourceNode.create_task
|
||||
~DataSourceNode.get_perf_stats
|
||||
~DataSourceNode.slim_copy
|
||||
~DataSourceNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~DataSourceNode.enable_resource_boost
|
||||
~DataSourceNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,40 @@
|
||||
smallpond.logical.node.EvenlyDistributedPartitionNode
|
||||
=====================================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: EvenlyDistributedPartitionNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~EvenlyDistributedPartitionNode.__init__
|
||||
~EvenlyDistributedPartitionNode.add_perf_metrics
|
||||
~EvenlyDistributedPartitionNode.create_consumer_task
|
||||
~EvenlyDistributedPartitionNode.create_merge_task
|
||||
~EvenlyDistributedPartitionNode.create_producer_task
|
||||
~EvenlyDistributedPartitionNode.create_split_task
|
||||
~EvenlyDistributedPartitionNode.create_task
|
||||
~EvenlyDistributedPartitionNode.get_perf_stats
|
||||
~EvenlyDistributedPartitionNode.slim_copy
|
||||
~EvenlyDistributedPartitionNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~EvenlyDistributedPartitionNode.enable_resource_boost
|
||||
~EvenlyDistributedPartitionNode.max_card_of_producers_x_consumers
|
||||
~EvenlyDistributedPartitionNode.max_num_producer_tasks
|
||||
~EvenlyDistributedPartitionNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,45 @@
|
||||
smallpond.logical.node.HashPartitionNode
|
||||
========================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: HashPartitionNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~HashPartitionNode.__init__
|
||||
~HashPartitionNode.add_perf_metrics
|
||||
~HashPartitionNode.create_consumer_task
|
||||
~HashPartitionNode.create_merge_task
|
||||
~HashPartitionNode.create_producer_task
|
||||
~HashPartitionNode.create_split_task
|
||||
~HashPartitionNode.create_task
|
||||
~HashPartitionNode.get_perf_stats
|
||||
~HashPartitionNode.slim_copy
|
||||
~HashPartitionNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~HashPartitionNode.default_cpu_limit
|
||||
~HashPartitionNode.default_data_partition_column
|
||||
~HashPartitionNode.default_engine_type
|
||||
~HashPartitionNode.default_memory_limit
|
||||
~HashPartitionNode.default_row_group_size
|
||||
~HashPartitionNode.enable_resource_boost
|
||||
~HashPartitionNode.max_card_of_producers_x_consumers
|
||||
~HashPartitionNode.max_num_producer_tasks
|
||||
~HashPartitionNode.num_partitions
|
||||
|
||||
|
||||
41
_sources/generated/smallpond.logical.node.LimitNode.rst.txt
Normal file
41
_sources/generated/smallpond.logical.node.LimitNode.rst.txt
Normal file
@@ -0,0 +1,41 @@
|
||||
smallpond.logical.node.LimitNode
|
||||
================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: LimitNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LimitNode.__init__
|
||||
~LimitNode.add_perf_metrics
|
||||
~LimitNode.create_merge_task
|
||||
~LimitNode.create_task
|
||||
~LimitNode.get_perf_stats
|
||||
~LimitNode.slim_copy
|
||||
~LimitNode.spawn
|
||||
~LimitNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LimitNode.default_cpu_limit
|
||||
~LimitNode.default_memory_limit
|
||||
~LimitNode.default_row_group_size
|
||||
~LimitNode.enable_resource_boost
|
||||
~LimitNode.max_udf_cpu_limit
|
||||
~LimitNode.num_partitions
|
||||
~LimitNode.oneline_query
|
||||
|
||||
|
||||
@@ -0,0 +1,40 @@
|
||||
smallpond.logical.node.LoadPartitionedDataSetNode
|
||||
=================================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: LoadPartitionedDataSetNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LoadPartitionedDataSetNode.__init__
|
||||
~LoadPartitionedDataSetNode.add_perf_metrics
|
||||
~LoadPartitionedDataSetNode.create_consumer_task
|
||||
~LoadPartitionedDataSetNode.create_merge_task
|
||||
~LoadPartitionedDataSetNode.create_producer_task
|
||||
~LoadPartitionedDataSetNode.create_split_task
|
||||
~LoadPartitionedDataSetNode.create_task
|
||||
~LoadPartitionedDataSetNode.get_perf_stats
|
||||
~LoadPartitionedDataSetNode.slim_copy
|
||||
~LoadPartitionedDataSetNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LoadPartitionedDataSetNode.enable_resource_boost
|
||||
~LoadPartitionedDataSetNode.max_card_of_producers_x_consumers
|
||||
~LoadPartitionedDataSetNode.max_num_producer_tasks
|
||||
~LoadPartitionedDataSetNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,30 @@
|
||||
smallpond.logical.node.LogicalPlan
|
||||
==================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: LogicalPlan
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LogicalPlan.__init__
|
||||
~LogicalPlan.explain_str
|
||||
~LogicalPlan.graph
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LogicalPlan.nodes
|
||||
|
||||
|
||||
@@ -0,0 +1,36 @@
|
||||
smallpond.logical.node.LogicalPlanVisitor
|
||||
=========================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: LogicalPlanVisitor
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~LogicalPlanVisitor.__init__
|
||||
~LogicalPlanVisitor.generic_visit
|
||||
~LogicalPlanVisitor.visit
|
||||
~LogicalPlanVisitor.visit_arrow_compute_node
|
||||
~LogicalPlanVisitor.visit_arrow_stream_node
|
||||
~LogicalPlanVisitor.visit_consolidate_node
|
||||
~LogicalPlanVisitor.visit_data_sink_node
|
||||
~LogicalPlanVisitor.visit_data_source_node
|
||||
~LogicalPlanVisitor.visit_limit_node
|
||||
~LogicalPlanVisitor.visit_partition_node
|
||||
~LogicalPlanVisitor.visit_projection_node
|
||||
~LogicalPlanVisitor.visit_python_script_node
|
||||
~LogicalPlanVisitor.visit_query_engine_node
|
||||
~LogicalPlanVisitor.visit_root_node
|
||||
~LogicalPlanVisitor.visit_union_node
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
34
_sources/generated/smallpond.logical.node.Node.rst.txt
Normal file
34
_sources/generated/smallpond.logical.node.Node.rst.txt
Normal file
@@ -0,0 +1,34 @@
|
||||
smallpond.logical.node.Node
|
||||
===========================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: Node
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Node.__init__
|
||||
~Node.add_perf_metrics
|
||||
~Node.create_task
|
||||
~Node.get_perf_stats
|
||||
~Node.slim_copy
|
||||
~Node.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~Node.enable_resource_boost
|
||||
~Node.num_partitions
|
||||
|
||||
|
||||
36
_sources/generated/smallpond.logical.node.NodeId.rst.txt
Normal file
36
_sources/generated/smallpond.logical.node.NodeId.rst.txt
Normal file
@@ -0,0 +1,36 @@
|
||||
smallpond.logical.node.NodeId
|
||||
=============================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: NodeId
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~NodeId.__init__
|
||||
~NodeId.as_integer_ratio
|
||||
~NodeId.bit_length
|
||||
~NodeId.conjugate
|
||||
~NodeId.from_bytes
|
||||
~NodeId.to_bytes
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~NodeId.denominator
|
||||
~NodeId.imag
|
||||
~NodeId.numerator
|
||||
~NodeId.real
|
||||
|
||||
|
||||
@@ -0,0 +1,39 @@
|
||||
smallpond.logical.node.PandasBatchNode
|
||||
======================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: PandasBatchNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasBatchNode.__init__
|
||||
~PandasBatchNode.add_perf_metrics
|
||||
~PandasBatchNode.create_task
|
||||
~PandasBatchNode.get_perf_stats
|
||||
~PandasBatchNode.process
|
||||
~PandasBatchNode.slim_copy
|
||||
~PandasBatchNode.spawn
|
||||
~PandasBatchNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasBatchNode.default_batch_size
|
||||
~PandasBatchNode.default_row_group_size
|
||||
~PandasBatchNode.default_secs_checkpoint_interval
|
||||
~PandasBatchNode.enable_resource_boost
|
||||
~PandasBatchNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,37 @@
|
||||
smallpond.logical.node.PandasComputeNode
|
||||
========================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: PandasComputeNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasComputeNode.__init__
|
||||
~PandasComputeNode.add_perf_metrics
|
||||
~PandasComputeNode.create_task
|
||||
~PandasComputeNode.get_perf_stats
|
||||
~PandasComputeNode.process
|
||||
~PandasComputeNode.slim_copy
|
||||
~PandasComputeNode.spawn
|
||||
~PandasComputeNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PandasComputeNode.default_row_group_size
|
||||
~PandasComputeNode.enable_resource_boost
|
||||
~PandasComputeNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,40 @@
|
||||
smallpond.logical.node.PartitionNode
|
||||
====================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: PartitionNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionNode.__init__
|
||||
~PartitionNode.add_perf_metrics
|
||||
~PartitionNode.create_consumer_task
|
||||
~PartitionNode.create_merge_task
|
||||
~PartitionNode.create_producer_task
|
||||
~PartitionNode.create_split_task
|
||||
~PartitionNode.create_task
|
||||
~PartitionNode.get_perf_stats
|
||||
~PartitionNode.slim_copy
|
||||
~PartitionNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PartitionNode.enable_resource_boost
|
||||
~PartitionNode.max_card_of_producers_x_consumers
|
||||
~PartitionNode.max_num_producer_tasks
|
||||
~PartitionNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,34 @@
|
||||
smallpond.logical.node.ProjectionNode
|
||||
=====================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: ProjectionNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ProjectionNode.__init__
|
||||
~ProjectionNode.add_perf_metrics
|
||||
~ProjectionNode.create_task
|
||||
~ProjectionNode.get_perf_stats
|
||||
~ProjectionNode.slim_copy
|
||||
~ProjectionNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~ProjectionNode.enable_resource_boost
|
||||
~ProjectionNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,36 @@
|
||||
smallpond.logical.node.PythonScriptNode
|
||||
=======================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: PythonScriptNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PythonScriptNode.__init__
|
||||
~PythonScriptNode.add_perf_metrics
|
||||
~PythonScriptNode.create_task
|
||||
~PythonScriptNode.get_perf_stats
|
||||
~PythonScriptNode.process
|
||||
~PythonScriptNode.slim_copy
|
||||
~PythonScriptNode.spawn
|
||||
~PythonScriptNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~PythonScriptNode.enable_resource_boost
|
||||
~PythonScriptNode.num_partitions
|
||||
|
||||
|
||||
@@ -0,0 +1,40 @@
|
||||
smallpond.logical.node.RangePartitionNode
|
||||
=========================================
|
||||
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. autoclass:: RangePartitionNode
|
||||
|
||||
|
||||
.. automethod:: __init__
|
||||
|
||||
|
||||
.. rubric:: Methods
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RangePartitionNode.__init__
|
||||
~RangePartitionNode.add_perf_metrics
|
||||
~RangePartitionNode.create_consumer_task
|
||||
~RangePartitionNode.create_merge_task
|
||||
~RangePartitionNode.create_producer_task
|
||||
~RangePartitionNode.create_split_task
|
||||
~RangePartitionNode.create_task
|
||||
~RangePartitionNode.get_perf_stats
|
||||
~RangePartitionNode.slim_copy
|
||||
~RangePartitionNode.task_factory
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. rubric:: Attributes
|
||||
|
||||
.. autosummary::
|
||||
|
||||
~RangePartitionNode.enable_resource_boost
|
||||
~RangePartitionNode.max_card_of_producers_x_consumers
|
||||
~RangePartitionNode.max_num_producer_tasks
|
||||
~RangePartitionNode.num_partitions
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user