This commit is contained in:
wangrunji0408
2025-03-06 02:23:52 +08:00
commit 2745ca5889
260 changed files with 90194 additions and 0 deletions

0
.nojekyll Normal file
View File

81
_sources/api.rst.txt Normal file
View File

@@ -0,0 +1,81 @@
API Reference
=============
Smallpond provides both high-level and low-level APIs.
.. note::
Currently, smallpond provides two different APIs, supporting dynamic and static construction of data flow graphs respectively. Due to historical reasons, these two APIs use different scheduler backends and support different configuration options.
- The High-level API currently uses Ray as the backend, supporting dynamic construction and execution of data flow graphs.
- The Low-level API uses a built-in scheduler and only supports one-time execution of static data flow graphs. However, it offers more performance optimizations and richer configuration options.
We are working to merge them so that in the future, you can use a unified high-level API and freely choose between Ray or the built-in scheduler.
High-level API
--------------
The high-level API is centered around :ref:`dataframe`. It allows dynamic construction of data flow graphs, execution, and result retrieval.
A typical workflow looks like this:
.. code-block:: python
import smallpond
sp = smallpond.init()
df = sp.read_parquet("path/to/dataset/*.parquet")
df = df.repartition(10)
df = df.map("x + 1")
df.write_parquet("path/to/output")
.. toctree::
:maxdepth: 2
api/dataframe
It is recommended to use the DataFrame API.
Low-level API
-------------
In the low-level API, users manually create :ref:`nodes` to construct static data flow graphs, then submit them to smallpond to generate :ref:`tasks` and wait for all tasks to complete.
A complete example is shown below.
.. code-block:: python
from smallpond.logical.dataset import ParquetDataSet
from smallpond.logical.node import Context, DataSourceNode, DataSetPartitionNode, SqlEngineNode, LogicalPlan
from smallpond.execution.driver import Driver
def my_pipeline(input_paths: List[str], npartitions: int):
ctx = Context()
dataset = ParquetDataSet(input_paths)
node = DataSourceNode(ctx, dataset)
node = DataSetPartitionNode(ctx, (node,), npartitions=npartitions)
node = SqlEngineNode(ctx, (node,), "SELECT * FROM {0}")
return LogicalPlan(ctx, node)
if __name__ == "__main__":
driver = Driver()
driver.add_argument("-i", "--input_paths", nargs="+")
driver.add_argument("-n", "--npartitions", type=int, default=10)
plan = my_pipeline(**driver.get_arguments())
driver.run(plan)
To run this script:
.. code-block:: bash
python script.py -i "path/to/*.parquet" -n 10
.. toctree::
:maxdepth: 2
api/dataset
api/nodes
api/tasks
api/execution

View File

@@ -0,0 +1,104 @@
.. _dataframe:
DataFrame
=========
DataFrame is the main class in smallpond. It represents a lazily computed, partitioned data set.
A typical workflow looks like this:
.. code-block:: python
import smallpond
sp = smallpond.init()
df = sp.read_parquet("path/to/dataset/*.parquet")
df = df.repartition(10)
df = df.map("x + 1")
df.write_parquet("path/to/output")
Initialization
--------------
.. autosummary::
:toctree: ../generated
smallpond.init
.. currentmodule:: smallpond.dataframe
.. _loading_data:
Loading Data
------------
.. autosummary::
:toctree: ../generated
Session.from_items
Session.from_arrow
Session.from_pandas
Session.read_csv
Session.read_json
Session.read_parquet
.. _partitioning_data:
Partitioning Data
-----------------
.. autosummary::
:toctree: ../generated
DataFrame.repartition
.. _transformations:
Transformations
---------------
Apply transformations and return a new DataFrame.
.. autosummary::
:toctree: ../generated
Session.partial_sql
DataFrame.map
DataFrame.map_batches
DataFrame.flat_map
DataFrame.filter
DataFrame.limit
DataFrame.partial_sort
DataFrame.random_shuffle
.. _consuming_data:
Consuming Data
--------------
These operations will trigger execution of the lazy transformations performed on this DataFrame.
.. autosummary::
:toctree: ../generated
DataFrame.count
DataFrame.take
DataFrame.take_all
DataFrame.to_arrow
DataFrame.to_pandas
DataFrame.write_parquet
DataFrame.write_parquet_lazy
Execution
---------
DataFrames are lazily computed. You can use these methods to manually trigger computation.
.. autosummary::
:toctree: ../generated
DataFrame.compute
DataFrame.is_computed
DataFrame.recompute
Session.wait

View File

@@ -0,0 +1,28 @@
.. currentmodule:: smallpond.logical.dataset
Dataset
=======
Dataset represents a collection of files.
To create a dataset:
.. code-block:: python
dataset = ParquetDataSet("path/to/dataset/*.parquet")
DataSets
--------
.. autosummary::
:toctree: ../generated
DataSet
FileSet
ParquetDataSet
CsvDataSet
JsonDataSet
ArrowTableDataSet
PandasDataSet
PartitionedDataSet
SqlQueryDataSet

View File

@@ -0,0 +1,85 @@
.. currentmodule:: smallpond.execution
.. _execution:
Execution
=========
Submit a Job
------------
After constructing the LogicalPlan, you can use the JobManager to create a Job in the cluster to execute it. However, in most cases, you only need to use the Driver as the entry point of the entire script and then submit the plan. The Driver is a simple wrapper around the JobManager. It reads the configuration from the command line arguments and passes it to the JobManager.
.. code-block:: python
from smallpond.execution.driver import Driver
if __name__ == "__main__":
driver = Driver()
# add your own arguments
driver.add_argument("-i", "--input_paths", nargs="+")
driver.add_argument("-n", "--npartitions", type=int, default=10)
# build and run logical plan
plan = my_pipeline(**driver.get_arguments())
driver.run(plan)
.. autosummary::
:toctree: ../generated
~driver.Driver
~manager.JobManager
Scheduler and Executor
----------------------
Scheduler and Executor are lower-level APIs. They are directly responsible for scheduling and executing tasks, respectively. Generally, users do not need to use them directly.
.. autosummary::
:toctree: ../generated
~scheduler.Scheduler
~executor.Executor
.. _platform:
Customize Platform
------------------
Smallpond supports user-defined task execution platforms. A Platform includes methods for submitting jobs and a series of default configurations. By default, smallpond automatically detects the current environment and selects the most suitable platform. If it cannot detect one, it uses the default platform.
You can specify a built-in platform via parameters:
.. code-block:: bash
# run with your platform
python script.py --platform mpi
Or implement your own Platform class:
.. code-block:: python
# path/to/my/platform.py
from smallpond.platform import Platform
class MyPlatform(Platform):
def start_job(self, ...) -> List[str]:
...
.. code-block:: bash
# run with your platform
# if using Driver
python script.py --platform path.to.my.platform
# if using smallpond.init
SP_PLATFORM=path.to.my.platform python script.py
.. currentmodule:: smallpond
.. autosummary::
:toctree: ../generated
~platform.Platform
~platform.MPI

View File

@@ -0,0 +1,89 @@
.. currentmodule:: smallpond.logical.node
.. _nodes:
Nodes
=====
Nodes represent the fundamental building blocks of a data processing pipeline. Each node encapsulates a specific operation or transformation that can be applied to a dataset.
Nodes can be chained together to form a logical plan, which is a directed acyclic graph (DAG) of nodes that represent the overall data processing workflow.
A typical workflow to create a logical plan is as follows:
.. code-block:: python
# Create a global context
ctx = Context()
# Create a dataset
dataset = ParquetDataSet("path/to/dataset/*.parquet")
# Create a data source node
node = DataSourceNode(ctx, dataset)
# Partition the data
node = DataSetPartitionNode(ctx, (node,), npartitions=2)
# Create a SQL engine node to transform the data
node = SqlEngineNode(ctx, (node,), "SELECT * FROM {0}")
# Create a logical plan from the root node
plan = LogicalPlan(ctx, node)
You can then create tasks from the logical plan, see :ref:`tasks`.
Notable properties of Node:
1. Nodes are partitioned. Each Node generates a series of tasks, with each task processing one partition of data.
2. The input and output of a Node are a series of partitioned Datasets. A Node may write data to shared storage and return a new Dataset, or it may simply recombine the input Datasets.
Context
-------
.. autosummary::
:toctree: ../generated
Context
NodeId
LogicalPlan
-----------
.. autosummary::
:toctree: ../generated
LogicalPlan
LogicalPlanVisitor
.. Planner
Nodes
-----
.. autosummary::
:toctree: ../generated
Node
DataSetPartitionNode
ArrowBatchNode
ArrowComputeNode
ArrowStreamNode
ConsolidateNode
DataSinkNode
DataSourceNode
EvenlyDistributedPartitionNode
HashPartitionNode
LimitNode
LoadPartitionedDataSetNode
PandasBatchNode
PandasComputeNode
PartitionNode
ProjectionNode
PythonScriptNode
RangePartitionNode
RepeatPartitionNode
RootNode
ShuffleNode
SqlEngineNode
UnionNode
UserDefinedPartitionNode
UserPartitionedDataSourceNode

View File

@@ -0,0 +1,76 @@
.. currentmodule:: smallpond.execution.task
.. _tasks:
Tasks
=====
.. code-block:: python
# create a runtime context
runtime_ctx = RuntimeContext(JobId.new(), data_root)
runtime_ctx.initialize(socket.gethostname(), cleanup_root=True)
# create a logical plan
plan = create_logical_plan()
# create an execution plan
planner = Planner(runtime_ctx)
exec_plan = planner.create_exec_plan(plan)
You can then execute the tasks in a scheduler, see :ref:`execution`.
RuntimeContext
--------------
.. autosummary::
:toctree: ../generated
RuntimeContext
JobId
TaskId
TaskRuntimeId
PartitionInfo
PerfStats
ExecutionPlan
-------------
.. autosummary::
:toctree: ../generated
ExecutionPlan
Tasks
-----
.. autosummary::
:toctree: ../generated
Task
ArrowBatchTask
ArrowComputeTask
ArrowStreamTask
DataSinkTask
DataSourceTask
EvenlyDistributedPartitionProducerTask
HashPartitionArrowTask
HashPartitionDuckDbTask
HashPartitionTask
LoadPartitionedDataSetProducerTask
MergeDataSetsTask
PandasBatchTask
PandasComputeTask
PartitionConsumerTask
PartitionProducerTask
ProjectionTask
PythonScriptTask
RangePartitionTask
RepeatPartitionProducerTask
RootTask
SplitDataSetTask
SqlEngineTask
UserDefinedPartitionProducerTask

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.compute
=====================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.compute

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.count
===================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.count

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.filter
====================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.filter

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.flat\_map
=======================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.flat_map

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.is\_computed
==========================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.is_computed

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.limit
===================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.limit

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.map
=================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.map

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.map\_batches
==========================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.map_batches

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.partial\_sort
===========================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.partial_sort

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.random\_shuffle
=============================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.random_shuffle

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.recompute
=======================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.recompute

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.repartition
=========================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.repartition

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.take
==================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.take

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.take\_all
=======================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.take_all

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.to\_arrow
=======================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.to_arrow

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.to\_pandas
========================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.to_pandas

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.write\_parquet
============================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.write_parquet

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.DataFrame.write\_parquet\_lazy
==================================================
.. currentmodule:: smallpond.dataframe
.. automethod:: DataFrame.write_parquet_lazy

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.Session.from\_arrow
=======================================
.. currentmodule:: smallpond.dataframe
.. automethod:: Session.from_arrow

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.Session.from\_items
=======================================
.. currentmodule:: smallpond.dataframe
.. automethod:: Session.from_items

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.Session.from\_pandas
========================================
.. currentmodule:: smallpond.dataframe
.. automethod:: Session.from_pandas

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.Session.partial\_sql
========================================
.. currentmodule:: smallpond.dataframe
.. automethod:: Session.partial_sql

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.Session.read\_csv
=====================================
.. currentmodule:: smallpond.dataframe
.. automethod:: Session.read_csv

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.Session.read\_json
======================================
.. currentmodule:: smallpond.dataframe
.. automethod:: Session.read_json

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.Session.read\_parquet
=========================================
.. currentmodule:: smallpond.dataframe
.. automethod:: Session.read_parquet

View File

@@ -0,0 +1,6 @@
smallpond.dataframe.Session.wait
================================
.. currentmodule:: smallpond.dataframe
.. automethod:: Session.wait

View File

@@ -0,0 +1,37 @@
smallpond.execution.driver.Driver
=================================
.. currentmodule:: smallpond.execution.driver
.. autoclass:: Driver
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~Driver.__init__
~Driver.add_argument
~Driver.get_arguments
~Driver.get_driver_arguments
~Driver.get_user_arguments
~Driver.parse_arguments
~Driver.run
.. rubric:: Attributes
.. autosummary::
~Driver.data_root
~Driver.job_id
~Driver.mode
~Driver.num_executors

View File

@@ -0,0 +1,39 @@
smallpond.execution.executor.Executor
=====================================
.. currentmodule:: smallpond.execution.executor
.. autoclass:: Executor
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~Executor.__init__
~Executor.acquire_gpu
~Executor.collect_finished_works
~Executor.create
~Executor.exec_loop
~Executor.process_work
~Executor.release_gpu
~Executor.run
~Executor.skip_probes
~Executor.stop
.. rubric:: Attributes
.. autosummary::
~Executor.available_gpu_quota
~Executor.busy
~Executor.local_gpus

View File

@@ -0,0 +1,31 @@
smallpond.execution.manager.JobManager
======================================
.. currentmodule:: smallpond.execution.manager
.. autoclass:: JobManager
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~JobManager.__init__
~JobManager.run
.. rubric:: Attributes
.. autosummary::
~JobManager.env_template
~JobManager.jemalloc_filename
~JobManager.mimalloc_filename

View File

@@ -0,0 +1,75 @@
smallpond.execution.scheduler.Scheduler
=======================================
.. currentmodule:: smallpond.execution.scheduler
.. autoclass:: Scheduler
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~Scheduler.__init__
~Scheduler.add_state_observer
~Scheduler.clean_temp_files
~Scheduler.clear_cached_executor_lists
~Scheduler.copy_task_for_execution
~Scheduler.dispatch_tasks
~Scheduler.export_task_metrics
~Scheduler.export_timeline_figs
~Scheduler.get_retry_task
~Scheduler.get_runnable_tasks
~Scheduler.log_current_status
~Scheduler.log_overall_progress
~Scheduler.notify_state_observers
~Scheduler.probe_executors
~Scheduler.process_finished_tasks
~Scheduler.run
~Scheduler.save_task_final_state
~Scheduler.sched_loop
~Scheduler.start_speculative_execution
~Scheduler.stop_executors
~Scheduler.stop_running_tasks
~Scheduler.suspend_good_executors
~Scheduler.try_boost_resource
~Scheduler.try_enqueue
~Scheduler.try_relax_memory_limit
~Scheduler.update_executor_states
.. rubric:: Attributes
.. autosummary::
~Scheduler.StateCallback
~Scheduler.abandoned_tasks
~Scheduler.alive_executors
~Scheduler.elapsed_time
~Scheduler.failed_executors
~Scheduler.good_executors
~Scheduler.large_num_nontrivial_tasks
~Scheduler.large_runtime_state
~Scheduler.local_executors
~Scheduler.low_resource_executors
~Scheduler.num_local_running_works
~Scheduler.num_pending_nontrivial_tasks
~Scheduler.num_pending_tasks
~Scheduler.num_running_works
~Scheduler.pending_nontrivial_tasks
~Scheduler.progress
~Scheduler.remote_executors
~Scheduler.running_works
~Scheduler.stopped_executors
~Scheduler.stopping_executors
~Scheduler.succeeded_task_ids
~Scheduler.success
~Scheduler.working_executors

View File

@@ -0,0 +1,132 @@
smallpond.execution.task.ArrowBatchTask
=======================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: ArrowBatchTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ArrowBatchTask.__init__
~ArrowBatchTask.add_elapsed_time
~ArrowBatchTask.adjust_row_group_size
~ArrowBatchTask.clean_complex_attrs
~ArrowBatchTask.clean_output
~ArrowBatchTask.cleanup
~ArrowBatchTask.compute_avg_row_size
~ArrowBatchTask.create_input_views
~ArrowBatchTask.dump
~ArrowBatchTask.dump_output
~ArrowBatchTask.exec
~ArrowBatchTask.exec_query
~ArrowBatchTask.finalize
~ArrowBatchTask.get_partition_info
~ArrowBatchTask.initialize
~ArrowBatchTask.inject_fault
~ArrowBatchTask.merge_metrics
~ArrowBatchTask.oom
~ArrowBatchTask.parquet_kv_metadata_bytes
~ArrowBatchTask.parquet_kv_metadata_str
~ArrowBatchTask.prepare_connection
~ArrowBatchTask.process
~ArrowBatchTask.random_float
~ArrowBatchTask.random_uint32
~ArrowBatchTask.restore_input_state
~ArrowBatchTask.run
~ArrowBatchTask.run_on_ray
~ArrowBatchTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~ArrowBatchTask.process_func
~ArrowBatchTask.background_io_thread
~ArrowBatchTask.streaming_batch_size
~ArrowBatchTask.streaming_batch_count
~ArrowBatchTask.parquet_row_group_size
~ArrowBatchTask.parquet_row_group_bytes
~ArrowBatchTask.parquet_dictionary_encoding
~ArrowBatchTask.parquet_compression
~ArrowBatchTask.parquet_compression_level
~ArrowBatchTask.secs_checkpoint_interval
~ArrowBatchTask.allow_speculative_exec
~ArrowBatchTask.any_input_empty
~ArrowBatchTask.compression_level_str
~ArrowBatchTask.compression_options
~ArrowBatchTask.compression_type_str
~ArrowBatchTask.cpu_limit
~ArrowBatchTask.cpu_overcommit_ratio
~ArrowBatchTask.ctx
~ArrowBatchTask.dataset
~ArrowBatchTask.default_output_name
~ArrowBatchTask.elapsed_time
~ArrowBatchTask.enable_temp_directory
~ArrowBatchTask.exception
~ArrowBatchTask.exec_cq
~ArrowBatchTask.exec_id
~ArrowBatchTask.exec_on_scheduler
~ArrowBatchTask.fail_count
~ArrowBatchTask.final_output_abspath
~ArrowBatchTask.finish_time
~ArrowBatchTask.gpu_limit
~ArrowBatchTask.id
~ArrowBatchTask.input_datasets
~ArrowBatchTask.input_deps
~ArrowBatchTask.input_udfs
~ArrowBatchTask.input_view_index
~ArrowBatchTask.key
~ArrowBatchTask.local_gpu
~ArrowBatchTask.local_gpu_ranks
~ArrowBatchTask.local_rank
~ArrowBatchTask.location
~ArrowBatchTask.max_batch_size
~ArrowBatchTask.memory_limit
~ArrowBatchTask.memory_overcommit_ratio
~ArrowBatchTask.node_id
~ArrowBatchTask.numa_node
~ArrowBatchTask.numpy_random_gen
~ArrowBatchTask.output
~ArrowBatchTask.output_deps
~ArrowBatchTask.output_dirname
~ArrowBatchTask.output_filename
~ArrowBatchTask.output_name
~ArrowBatchTask.output_root
~ArrowBatchTask.partition_dims
~ArrowBatchTask.partition_infos
~ArrowBatchTask.partition_infos_as_dict
~ArrowBatchTask.perf_metrics
~ArrowBatchTask.perf_profile
~ArrowBatchTask.python_random_gen
~ArrowBatchTask.query_udfs
~ArrowBatchTask.rand_seed_float
~ArrowBatchTask.rand_seed_uint32
~ArrowBatchTask.random_seed_bytes
~ArrowBatchTask.ray_dataset_path
~ArrowBatchTask.ray_marker_path
~ArrowBatchTask.retry_count
~ArrowBatchTask.runtime_id
~ArrowBatchTask.runtime_output_abspath
~ArrowBatchTask.runtime_state
~ArrowBatchTask.sched_epoch
~ArrowBatchTask.self_contained_output
~ArrowBatchTask.skip_when_any_input_empty
~ArrowBatchTask.staging_root
~ArrowBatchTask.start_time
~ArrowBatchTask.status
~ArrowBatchTask.temp_abspath
~ArrowBatchTask.temp_output
~ArrowBatchTask.udfs
~ArrowBatchTask.uniform_failure_prob

View File

@@ -0,0 +1,127 @@
smallpond.execution.task.ArrowComputeTask
=========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: ArrowComputeTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ArrowComputeTask.__init__
~ArrowComputeTask.add_elapsed_time
~ArrowComputeTask.adjust_row_group_size
~ArrowComputeTask.clean_complex_attrs
~ArrowComputeTask.clean_output
~ArrowComputeTask.cleanup
~ArrowComputeTask.compute_avg_row_size
~ArrowComputeTask.create_input_views
~ArrowComputeTask.dump
~ArrowComputeTask.dump_output
~ArrowComputeTask.exec
~ArrowComputeTask.exec_query
~ArrowComputeTask.finalize
~ArrowComputeTask.get_partition_info
~ArrowComputeTask.initialize
~ArrowComputeTask.inject_fault
~ArrowComputeTask.merge_metrics
~ArrowComputeTask.oom
~ArrowComputeTask.parquet_kv_metadata_bytes
~ArrowComputeTask.parquet_kv_metadata_str
~ArrowComputeTask.prepare_connection
~ArrowComputeTask.process
~ArrowComputeTask.random_float
~ArrowComputeTask.random_uint32
~ArrowComputeTask.run
~ArrowComputeTask.run_on_ray
~ArrowComputeTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~ArrowComputeTask.process_func
~ArrowComputeTask.parquet_row_group_size
~ArrowComputeTask.parquet_row_group_bytes
~ArrowComputeTask.parquet_dictionary_encoding
~ArrowComputeTask.use_duckdb_reader
~ArrowComputeTask.allow_speculative_exec
~ArrowComputeTask.any_input_empty
~ArrowComputeTask.compression_level_str
~ArrowComputeTask.compression_options
~ArrowComputeTask.compression_type_str
~ArrowComputeTask.cpu_limit
~ArrowComputeTask.cpu_overcommit_ratio
~ArrowComputeTask.ctx
~ArrowComputeTask.dataset
~ArrowComputeTask.default_output_name
~ArrowComputeTask.elapsed_time
~ArrowComputeTask.enable_temp_directory
~ArrowComputeTask.exception
~ArrowComputeTask.exec_cq
~ArrowComputeTask.exec_id
~ArrowComputeTask.exec_on_scheduler
~ArrowComputeTask.fail_count
~ArrowComputeTask.final_output_abspath
~ArrowComputeTask.finish_time
~ArrowComputeTask.gpu_limit
~ArrowComputeTask.id
~ArrowComputeTask.input_datasets
~ArrowComputeTask.input_deps
~ArrowComputeTask.input_udfs
~ArrowComputeTask.input_view_index
~ArrowComputeTask.key
~ArrowComputeTask.local_gpu
~ArrowComputeTask.local_gpu_ranks
~ArrowComputeTask.local_rank
~ArrowComputeTask.location
~ArrowComputeTask.memory_limit
~ArrowComputeTask.memory_overcommit_ratio
~ArrowComputeTask.node_id
~ArrowComputeTask.numa_node
~ArrowComputeTask.numpy_random_gen
~ArrowComputeTask.output
~ArrowComputeTask.output_deps
~ArrowComputeTask.output_dirname
~ArrowComputeTask.output_filename
~ArrowComputeTask.output_name
~ArrowComputeTask.output_root
~ArrowComputeTask.parquet_compression
~ArrowComputeTask.parquet_compression_level
~ArrowComputeTask.partition_dims
~ArrowComputeTask.partition_infos
~ArrowComputeTask.partition_infos_as_dict
~ArrowComputeTask.perf_metrics
~ArrowComputeTask.perf_profile
~ArrowComputeTask.python_random_gen
~ArrowComputeTask.query_udfs
~ArrowComputeTask.rand_seed_float
~ArrowComputeTask.rand_seed_uint32
~ArrowComputeTask.random_seed_bytes
~ArrowComputeTask.ray_dataset_path
~ArrowComputeTask.ray_marker_path
~ArrowComputeTask.retry_count
~ArrowComputeTask.runtime_id
~ArrowComputeTask.runtime_output_abspath
~ArrowComputeTask.runtime_state
~ArrowComputeTask.sched_epoch
~ArrowComputeTask.self_contained_output
~ArrowComputeTask.skip_when_any_input_empty
~ArrowComputeTask.staging_root
~ArrowComputeTask.start_time
~ArrowComputeTask.status
~ArrowComputeTask.temp_abspath
~ArrowComputeTask.temp_output
~ArrowComputeTask.udfs
~ArrowComputeTask.uniform_failure_prob

View File

@@ -0,0 +1,132 @@
smallpond.execution.task.ArrowStreamTask
========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: ArrowStreamTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ArrowStreamTask.__init__
~ArrowStreamTask.add_elapsed_time
~ArrowStreamTask.adjust_row_group_size
~ArrowStreamTask.clean_complex_attrs
~ArrowStreamTask.clean_output
~ArrowStreamTask.cleanup
~ArrowStreamTask.compute_avg_row_size
~ArrowStreamTask.create_input_views
~ArrowStreamTask.dump
~ArrowStreamTask.dump_output
~ArrowStreamTask.exec
~ArrowStreamTask.exec_query
~ArrowStreamTask.finalize
~ArrowStreamTask.get_partition_info
~ArrowStreamTask.initialize
~ArrowStreamTask.inject_fault
~ArrowStreamTask.merge_metrics
~ArrowStreamTask.oom
~ArrowStreamTask.parquet_kv_metadata_bytes
~ArrowStreamTask.parquet_kv_metadata_str
~ArrowStreamTask.prepare_connection
~ArrowStreamTask.process
~ArrowStreamTask.random_float
~ArrowStreamTask.random_uint32
~ArrowStreamTask.restore_input_state
~ArrowStreamTask.run
~ArrowStreamTask.run_on_ray
~ArrowStreamTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~ArrowStreamTask.process_func
~ArrowStreamTask.background_io_thread
~ArrowStreamTask.streaming_batch_size
~ArrowStreamTask.streaming_batch_count
~ArrowStreamTask.parquet_row_group_size
~ArrowStreamTask.parquet_row_group_bytes
~ArrowStreamTask.parquet_dictionary_encoding
~ArrowStreamTask.parquet_compression
~ArrowStreamTask.parquet_compression_level
~ArrowStreamTask.secs_checkpoint_interval
~ArrowStreamTask.allow_speculative_exec
~ArrowStreamTask.any_input_empty
~ArrowStreamTask.compression_level_str
~ArrowStreamTask.compression_options
~ArrowStreamTask.compression_type_str
~ArrowStreamTask.cpu_limit
~ArrowStreamTask.cpu_overcommit_ratio
~ArrowStreamTask.ctx
~ArrowStreamTask.dataset
~ArrowStreamTask.default_output_name
~ArrowStreamTask.elapsed_time
~ArrowStreamTask.enable_temp_directory
~ArrowStreamTask.exception
~ArrowStreamTask.exec_cq
~ArrowStreamTask.exec_id
~ArrowStreamTask.exec_on_scheduler
~ArrowStreamTask.fail_count
~ArrowStreamTask.final_output_abspath
~ArrowStreamTask.finish_time
~ArrowStreamTask.gpu_limit
~ArrowStreamTask.id
~ArrowStreamTask.input_datasets
~ArrowStreamTask.input_deps
~ArrowStreamTask.input_udfs
~ArrowStreamTask.input_view_index
~ArrowStreamTask.key
~ArrowStreamTask.local_gpu
~ArrowStreamTask.local_gpu_ranks
~ArrowStreamTask.local_rank
~ArrowStreamTask.location
~ArrowStreamTask.max_batch_size
~ArrowStreamTask.memory_limit
~ArrowStreamTask.memory_overcommit_ratio
~ArrowStreamTask.node_id
~ArrowStreamTask.numa_node
~ArrowStreamTask.numpy_random_gen
~ArrowStreamTask.output
~ArrowStreamTask.output_deps
~ArrowStreamTask.output_dirname
~ArrowStreamTask.output_filename
~ArrowStreamTask.output_name
~ArrowStreamTask.output_root
~ArrowStreamTask.partition_dims
~ArrowStreamTask.partition_infos
~ArrowStreamTask.partition_infos_as_dict
~ArrowStreamTask.perf_metrics
~ArrowStreamTask.perf_profile
~ArrowStreamTask.python_random_gen
~ArrowStreamTask.query_udfs
~ArrowStreamTask.rand_seed_float
~ArrowStreamTask.rand_seed_uint32
~ArrowStreamTask.random_seed_bytes
~ArrowStreamTask.ray_dataset_path
~ArrowStreamTask.ray_marker_path
~ArrowStreamTask.retry_count
~ArrowStreamTask.runtime_id
~ArrowStreamTask.runtime_output_abspath
~ArrowStreamTask.runtime_state
~ArrowStreamTask.sched_epoch
~ArrowStreamTask.self_contained_output
~ArrowStreamTask.skip_when_any_input_empty
~ArrowStreamTask.staging_root
~ArrowStreamTask.start_time
~ArrowStreamTask.status
~ArrowStreamTask.temp_abspath
~ArrowStreamTask.temp_output
~ArrowStreamTask.udfs
~ArrowStreamTask.uniform_failure_prob

View File

@@ -0,0 +1,107 @@
smallpond.execution.task.DataSinkTask
=====================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: DataSinkTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~DataSinkTask.__init__
~DataSinkTask.add_elapsed_time
~DataSinkTask.adjust_row_group_size
~DataSinkTask.clean_complex_attrs
~DataSinkTask.clean_output
~DataSinkTask.cleanup
~DataSinkTask.collect_output_files
~DataSinkTask.compute_avg_row_size
~DataSinkTask.dump
~DataSinkTask.exec
~DataSinkTask.finalize
~DataSinkTask.get_partition_info
~DataSinkTask.initialize
~DataSinkTask.inject_fault
~DataSinkTask.merge_metrics
~DataSinkTask.oom
~DataSinkTask.parquet_kv_metadata_bytes
~DataSinkTask.parquet_kv_metadata_str
~DataSinkTask.random_float
~DataSinkTask.random_uint32
~DataSinkTask.run
~DataSinkTask.run_on_ray
~DataSinkTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~DataSinkTask.type
~DataSinkTask.is_final_node
~DataSinkTask.allow_speculative_exec
~DataSinkTask.any_input_empty
~DataSinkTask.cpu_limit
~DataSinkTask.ctx
~DataSinkTask.dataset
~DataSinkTask.default_output_name
~DataSinkTask.elapsed_time
~DataSinkTask.exception
~DataSinkTask.exec_cq
~DataSinkTask.exec_id
~DataSinkTask.exec_on_scheduler
~DataSinkTask.fail_count
~DataSinkTask.final_output_abspath
~DataSinkTask.finish_time
~DataSinkTask.gpu_limit
~DataSinkTask.id
~DataSinkTask.input_datasets
~DataSinkTask.input_deps
~DataSinkTask.key
~DataSinkTask.local_gpu
~DataSinkTask.local_gpu_ranks
~DataSinkTask.local_rank
~DataSinkTask.location
~DataSinkTask.manifest_filename
~DataSinkTask.memory_limit
~DataSinkTask.node_id
~DataSinkTask.numa_node
~DataSinkTask.numpy_random_gen
~DataSinkTask.output
~DataSinkTask.output_deps
~DataSinkTask.output_dirname
~DataSinkTask.output_filename
~DataSinkTask.output_name
~DataSinkTask.output_root
~DataSinkTask.partition_dims
~DataSinkTask.partition_infos
~DataSinkTask.partition_infos_as_dict
~DataSinkTask.perf_metrics
~DataSinkTask.perf_profile
~DataSinkTask.python_random_gen
~DataSinkTask.random_seed_bytes
~DataSinkTask.ray_dataset_path
~DataSinkTask.ray_marker_path
~DataSinkTask.retry_count
~DataSinkTask.runtime_id
~DataSinkTask.runtime_output_abspath
~DataSinkTask.runtime_state
~DataSinkTask.sched_epoch
~DataSinkTask.self_contained_output
~DataSinkTask.skip_when_any_input_empty
~DataSinkTask.staging_root
~DataSinkTask.start_time
~DataSinkTask.status
~DataSinkTask.temp_abspath
~DataSinkTask.temp_output
~DataSinkTask.uniform_failure_prob

View File

@@ -0,0 +1,103 @@
smallpond.execution.task.DataSourceTask
=======================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: DataSourceTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~DataSourceTask.__init__
~DataSourceTask.add_elapsed_time
~DataSourceTask.adjust_row_group_size
~DataSourceTask.clean_complex_attrs
~DataSourceTask.clean_output
~DataSourceTask.cleanup
~DataSourceTask.compute_avg_row_size
~DataSourceTask.dump
~DataSourceTask.exec
~DataSourceTask.finalize
~DataSourceTask.get_partition_info
~DataSourceTask.initialize
~DataSourceTask.inject_fault
~DataSourceTask.merge_metrics
~DataSourceTask.oom
~DataSourceTask.parquet_kv_metadata_bytes
~DataSourceTask.parquet_kv_metadata_str
~DataSourceTask.random_float
~DataSourceTask.random_uint32
~DataSourceTask.run
~DataSourceTask.run_on_ray
~DataSourceTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~DataSourceTask.ctx
~DataSourceTask.id
~DataSourceTask.node_id
~DataSourceTask.sched_epoch
~DataSourceTask.output_name
~DataSourceTask.output_root
~DataSourceTask.dataset
~DataSourceTask.input_deps
~DataSourceTask.output_deps
~DataSourceTask.perf_metrics
~DataSourceTask.perf_profile
~DataSourceTask.runtime_state
~DataSourceTask.input_datasets
~DataSourceTask.allow_speculative_exec
~DataSourceTask.any_input_empty
~DataSourceTask.cpu_limit
~DataSourceTask.default_output_name
~DataSourceTask.elapsed_time
~DataSourceTask.exception
~DataSourceTask.exec_cq
~DataSourceTask.exec_id
~DataSourceTask.exec_on_scheduler
~DataSourceTask.fail_count
~DataSourceTask.final_output_abspath
~DataSourceTask.finish_time
~DataSourceTask.gpu_limit
~DataSourceTask.key
~DataSourceTask.local_gpu
~DataSourceTask.local_gpu_ranks
~DataSourceTask.local_rank
~DataSourceTask.location
~DataSourceTask.memory_limit
~DataSourceTask.numa_node
~DataSourceTask.numpy_random_gen
~DataSourceTask.output
~DataSourceTask.output_dirname
~DataSourceTask.output_filename
~DataSourceTask.partition_dims
~DataSourceTask.partition_infos
~DataSourceTask.partition_infos_as_dict
~DataSourceTask.python_random_gen
~DataSourceTask.random_seed_bytes
~DataSourceTask.ray_dataset_path
~DataSourceTask.ray_marker_path
~DataSourceTask.retry_count
~DataSourceTask.runtime_id
~DataSourceTask.runtime_output_abspath
~DataSourceTask.self_contained_output
~DataSourceTask.skip_when_any_input_empty
~DataSourceTask.staging_root
~DataSourceTask.start_time
~DataSourceTask.status
~DataSourceTask.temp_abspath
~DataSourceTask.temp_output
~DataSourceTask.uniform_failure_prob

View File

@@ -0,0 +1,108 @@
smallpond.execution.task.EvenlyDistributedPartitionProducerTask
===============================================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: EvenlyDistributedPartitionProducerTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~EvenlyDistributedPartitionProducerTask.__init__
~EvenlyDistributedPartitionProducerTask.add_elapsed_time
~EvenlyDistributedPartitionProducerTask.adjust_row_group_size
~EvenlyDistributedPartitionProducerTask.clean_complex_attrs
~EvenlyDistributedPartitionProducerTask.clean_output
~EvenlyDistributedPartitionProducerTask.cleanup
~EvenlyDistributedPartitionProducerTask.compute_avg_row_size
~EvenlyDistributedPartitionProducerTask.dump
~EvenlyDistributedPartitionProducerTask.exec
~EvenlyDistributedPartitionProducerTask.finalize
~EvenlyDistributedPartitionProducerTask.get_partition_info
~EvenlyDistributedPartitionProducerTask.initialize
~EvenlyDistributedPartitionProducerTask.inject_fault
~EvenlyDistributedPartitionProducerTask.merge_metrics
~EvenlyDistributedPartitionProducerTask.oom
~EvenlyDistributedPartitionProducerTask.parquet_kv_metadata_bytes
~EvenlyDistributedPartitionProducerTask.parquet_kv_metadata_str
~EvenlyDistributedPartitionProducerTask.random_float
~EvenlyDistributedPartitionProducerTask.random_uint32
~EvenlyDistributedPartitionProducerTask.run
~EvenlyDistributedPartitionProducerTask.run_on_ray
~EvenlyDistributedPartitionProducerTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~EvenlyDistributedPartitionProducerTask.partition_by_rows
~EvenlyDistributedPartitionProducerTask.random_shuffle
~EvenlyDistributedPartitionProducerTask.allow_speculative_exec
~EvenlyDistributedPartitionProducerTask.any_input_empty
~EvenlyDistributedPartitionProducerTask.cpu_limit
~EvenlyDistributedPartitionProducerTask.ctx
~EvenlyDistributedPartitionProducerTask.dataset
~EvenlyDistributedPartitionProducerTask.default_output_name
~EvenlyDistributedPartitionProducerTask.dimension
~EvenlyDistributedPartitionProducerTask.elapsed_time
~EvenlyDistributedPartitionProducerTask.exception
~EvenlyDistributedPartitionProducerTask.exec_cq
~EvenlyDistributedPartitionProducerTask.exec_id
~EvenlyDistributedPartitionProducerTask.exec_on_scheduler
~EvenlyDistributedPartitionProducerTask.fail_count
~EvenlyDistributedPartitionProducerTask.final_output_abspath
~EvenlyDistributedPartitionProducerTask.finish_time
~EvenlyDistributedPartitionProducerTask.gpu_limit
~EvenlyDistributedPartitionProducerTask.id
~EvenlyDistributedPartitionProducerTask.input_datasets
~EvenlyDistributedPartitionProducerTask.input_deps
~EvenlyDistributedPartitionProducerTask.key
~EvenlyDistributedPartitionProducerTask.local_gpu
~EvenlyDistributedPartitionProducerTask.local_gpu_ranks
~EvenlyDistributedPartitionProducerTask.local_rank
~EvenlyDistributedPartitionProducerTask.location
~EvenlyDistributedPartitionProducerTask.memory_limit
~EvenlyDistributedPartitionProducerTask.node_id
~EvenlyDistributedPartitionProducerTask.npartitions
~EvenlyDistributedPartitionProducerTask.numa_node
~EvenlyDistributedPartitionProducerTask.numpy_random_gen
~EvenlyDistributedPartitionProducerTask.output
~EvenlyDistributedPartitionProducerTask.output_deps
~EvenlyDistributedPartitionProducerTask.output_dirname
~EvenlyDistributedPartitionProducerTask.output_filename
~EvenlyDistributedPartitionProducerTask.output_name
~EvenlyDistributedPartitionProducerTask.output_root
~EvenlyDistributedPartitionProducerTask.partition_dims
~EvenlyDistributedPartitionProducerTask.partition_infos
~EvenlyDistributedPartitionProducerTask.partition_infos_as_dict
~EvenlyDistributedPartitionProducerTask.partitioned_datasets
~EvenlyDistributedPartitionProducerTask.perf_metrics
~EvenlyDistributedPartitionProducerTask.perf_profile
~EvenlyDistributedPartitionProducerTask.python_random_gen
~EvenlyDistributedPartitionProducerTask.random_seed_bytes
~EvenlyDistributedPartitionProducerTask.ray_dataset_path
~EvenlyDistributedPartitionProducerTask.ray_marker_path
~EvenlyDistributedPartitionProducerTask.retry_count
~EvenlyDistributedPartitionProducerTask.runtime_id
~EvenlyDistributedPartitionProducerTask.runtime_output_abspath
~EvenlyDistributedPartitionProducerTask.runtime_state
~EvenlyDistributedPartitionProducerTask.sched_epoch
~EvenlyDistributedPartitionProducerTask.self_contained_output
~EvenlyDistributedPartitionProducerTask.skip_when_any_input_empty
~EvenlyDistributedPartitionProducerTask.staging_root
~EvenlyDistributedPartitionProducerTask.start_time
~EvenlyDistributedPartitionProducerTask.status
~EvenlyDistributedPartitionProducerTask.temp_abspath
~EvenlyDistributedPartitionProducerTask.temp_output
~EvenlyDistributedPartitionProducerTask.uniform_failure_prob

View File

@@ -0,0 +1,36 @@
smallpond.execution.task.ExecutionPlan
======================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: ExecutionPlan
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ExecutionPlan.__init__
~ExecutionPlan.get_output
~ExecutionPlan.iter_tasks
.. rubric:: Attributes
.. autosummary::
~ExecutionPlan.analyzed_logical_plan
~ExecutionPlan.final_output
~ExecutionPlan.final_output_path
~ExecutionPlan.leaves
~ExecutionPlan.named_outputs
~ExecutionPlan.successful
~ExecutionPlan.tasks

View File

@@ -0,0 +1,125 @@
smallpond.execution.task.HashPartitionArrowTask
===============================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: HashPartitionArrowTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~HashPartitionArrowTask.__init__
~HashPartitionArrowTask.add_elapsed_time
~HashPartitionArrowTask.adjust_row_group_size
~HashPartitionArrowTask.clean_complex_attrs
~HashPartitionArrowTask.clean_output
~HashPartitionArrowTask.cleanup
~HashPartitionArrowTask.compute_avg_row_size
~HashPartitionArrowTask.create
~HashPartitionArrowTask.dump
~HashPartitionArrowTask.exec
~HashPartitionArrowTask.finalize
~HashPartitionArrowTask.get_partition_info
~HashPartitionArrowTask.initialize
~HashPartitionArrowTask.inject_fault
~HashPartitionArrowTask.merge_metrics
~HashPartitionArrowTask.oom
~HashPartitionArrowTask.parquet_kv_metadata_bytes
~HashPartitionArrowTask.parquet_kv_metadata_str
~HashPartitionArrowTask.partition
~HashPartitionArrowTask.random_float
~HashPartitionArrowTask.random_uint32
~HashPartitionArrowTask.run
~HashPartitionArrowTask.run_on_ray
~HashPartitionArrowTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~HashPartitionArrowTask.hash_columns
~HashPartitionArrowTask.data_partition_column
~HashPartitionArrowTask.random_shuffle
~HashPartitionArrowTask.shuffle_only
~HashPartitionArrowTask.drop_partition_column
~HashPartitionArrowTask.use_parquet_writer
~HashPartitionArrowTask.hive_partitioning
~HashPartitionArrowTask.parquet_row_group_size
~HashPartitionArrowTask.parquet_row_group_bytes
~HashPartitionArrowTask.parquet_dictionary_encoding
~HashPartitionArrowTask.parquet_compression
~HashPartitionArrowTask.parquet_compression_level
~HashPartitionArrowTask.partitioned_datasets
~HashPartitionArrowTask.allow_speculative_exec
~HashPartitionArrowTask.any_input_empty
~HashPartitionArrowTask.cpu_limit
~HashPartitionArrowTask.ctx
~HashPartitionArrowTask.dataset
~HashPartitionArrowTask.default_output_name
~HashPartitionArrowTask.dimension
~HashPartitionArrowTask.elapsed_time
~HashPartitionArrowTask.exception
~HashPartitionArrowTask.exec_cq
~HashPartitionArrowTask.exec_id
~HashPartitionArrowTask.exec_on_scheduler
~HashPartitionArrowTask.fail_count
~HashPartitionArrowTask.final_output_abspath
~HashPartitionArrowTask.finish_time
~HashPartitionArrowTask.fixed_rand_seeds
~HashPartitionArrowTask.gpu_limit
~HashPartitionArrowTask.id
~HashPartitionArrowTask.input_datasets
~HashPartitionArrowTask.input_deps
~HashPartitionArrowTask.io_workers
~HashPartitionArrowTask.key
~HashPartitionArrowTask.local_gpu
~HashPartitionArrowTask.local_gpu_ranks
~HashPartitionArrowTask.local_rank
~HashPartitionArrowTask.location
~HashPartitionArrowTask.max_batch_size
~HashPartitionArrowTask.memory_limit
~HashPartitionArrowTask.node_id
~HashPartitionArrowTask.npartitions
~HashPartitionArrowTask.num_workers
~HashPartitionArrowTask.numa_node
~HashPartitionArrowTask.numpy_random_gen
~HashPartitionArrowTask.output
~HashPartitionArrowTask.output_deps
~HashPartitionArrowTask.output_dirname
~HashPartitionArrowTask.output_filename
~HashPartitionArrowTask.output_name
~HashPartitionArrowTask.output_root
~HashPartitionArrowTask.partition_dims
~HashPartitionArrowTask.partition_infos
~HashPartitionArrowTask.partition_infos_as_dict
~HashPartitionArrowTask.perf_metrics
~HashPartitionArrowTask.perf_profile
~HashPartitionArrowTask.python_random_gen
~HashPartitionArrowTask.random_seed_bytes
~HashPartitionArrowTask.ray_dataset_path
~HashPartitionArrowTask.ray_marker_path
~HashPartitionArrowTask.retry_count
~HashPartitionArrowTask.runtime_id
~HashPartitionArrowTask.runtime_output_abspath
~HashPartitionArrowTask.runtime_state
~HashPartitionArrowTask.sched_epoch
~HashPartitionArrowTask.self_contained_output
~HashPartitionArrowTask.skip_when_any_input_empty
~HashPartitionArrowTask.staging_root
~HashPartitionArrowTask.start_time
~HashPartitionArrowTask.status
~HashPartitionArrowTask.temp_abspath
~HashPartitionArrowTask.temp_output
~HashPartitionArrowTask.uniform_failure_prob
~HashPartitionArrowTask.write_buffer_size

View File

@@ -0,0 +1,143 @@
smallpond.execution.task.HashPartitionDuckDbTask
================================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: HashPartitionDuckDbTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~HashPartitionDuckDbTask.__init__
~HashPartitionDuckDbTask.add_elapsed_time
~HashPartitionDuckDbTask.adjust_row_group_size
~HashPartitionDuckDbTask.clean_complex_attrs
~HashPartitionDuckDbTask.clean_output
~HashPartitionDuckDbTask.cleanup
~HashPartitionDuckDbTask.compute_avg_row_size
~HashPartitionDuckDbTask.create
~HashPartitionDuckDbTask.create_input_views
~HashPartitionDuckDbTask.dump
~HashPartitionDuckDbTask.exec
~HashPartitionDuckDbTask.exec_query
~HashPartitionDuckDbTask.finalize
~HashPartitionDuckDbTask.get_partition_info
~HashPartitionDuckDbTask.initialize
~HashPartitionDuckDbTask.inject_fault
~HashPartitionDuckDbTask.load_input_batch
~HashPartitionDuckDbTask.merge_metrics
~HashPartitionDuckDbTask.oom
~HashPartitionDuckDbTask.parquet_kv_metadata_bytes
~HashPartitionDuckDbTask.parquet_kv_metadata_str
~HashPartitionDuckDbTask.partition
~HashPartitionDuckDbTask.prepare_connection
~HashPartitionDuckDbTask.random_float
~HashPartitionDuckDbTask.random_uint32
~HashPartitionDuckDbTask.run
~HashPartitionDuckDbTask.run_on_ray
~HashPartitionDuckDbTask.set_memory_limit
~HashPartitionDuckDbTask.write_flat_partitions
~HashPartitionDuckDbTask.write_hive_partitions
.. rubric:: Attributes
.. autosummary::
~HashPartitionDuckDbTask.hash_columns
~HashPartitionDuckDbTask.data_partition_column
~HashPartitionDuckDbTask.random_shuffle
~HashPartitionDuckDbTask.shuffle_only
~HashPartitionDuckDbTask.drop_partition_column
~HashPartitionDuckDbTask.use_parquet_writer
~HashPartitionDuckDbTask.hive_partitioning
~HashPartitionDuckDbTask.parquet_row_group_size
~HashPartitionDuckDbTask.parquet_row_group_bytes
~HashPartitionDuckDbTask.parquet_dictionary_encoding
~HashPartitionDuckDbTask.parquet_compression
~HashPartitionDuckDbTask.parquet_compression_level
~HashPartitionDuckDbTask.partitioned_datasets
~HashPartitionDuckDbTask.allow_speculative_exec
~HashPartitionDuckDbTask.any_input_empty
~HashPartitionDuckDbTask.compression_level_str
~HashPartitionDuckDbTask.compression_options
~HashPartitionDuckDbTask.compression_type_str
~HashPartitionDuckDbTask.cpu_limit
~HashPartitionDuckDbTask.cpu_overcommit_ratio
~HashPartitionDuckDbTask.ctx
~HashPartitionDuckDbTask.dataset
~HashPartitionDuckDbTask.default_output_name
~HashPartitionDuckDbTask.dimension
~HashPartitionDuckDbTask.elapsed_time
~HashPartitionDuckDbTask.enable_temp_directory
~HashPartitionDuckDbTask.exception
~HashPartitionDuckDbTask.exec_cq
~HashPartitionDuckDbTask.exec_id
~HashPartitionDuckDbTask.exec_on_scheduler
~HashPartitionDuckDbTask.fail_count
~HashPartitionDuckDbTask.final_output_abspath
~HashPartitionDuckDbTask.finish_time
~HashPartitionDuckDbTask.gpu_limit
~HashPartitionDuckDbTask.id
~HashPartitionDuckDbTask.input_datasets
~HashPartitionDuckDbTask.input_deps
~HashPartitionDuckDbTask.input_udfs
~HashPartitionDuckDbTask.input_view_index
~HashPartitionDuckDbTask.io_workers
~HashPartitionDuckDbTask.key
~HashPartitionDuckDbTask.local_gpu
~HashPartitionDuckDbTask.local_gpu_ranks
~HashPartitionDuckDbTask.local_rank
~HashPartitionDuckDbTask.location
~HashPartitionDuckDbTask.max_batch_size
~HashPartitionDuckDbTask.memory_limit
~HashPartitionDuckDbTask.memory_overcommit_ratio
~HashPartitionDuckDbTask.node_id
~HashPartitionDuckDbTask.npartitions
~HashPartitionDuckDbTask.num_workers
~HashPartitionDuckDbTask.numa_node
~HashPartitionDuckDbTask.numpy_random_gen
~HashPartitionDuckDbTask.output
~HashPartitionDuckDbTask.output_deps
~HashPartitionDuckDbTask.output_dirname
~HashPartitionDuckDbTask.output_filename
~HashPartitionDuckDbTask.output_name
~HashPartitionDuckDbTask.output_root
~HashPartitionDuckDbTask.partition_dims
~HashPartitionDuckDbTask.partition_infos
~HashPartitionDuckDbTask.partition_infos_as_dict
~HashPartitionDuckDbTask.partition_query
~HashPartitionDuckDbTask.perf_metrics
~HashPartitionDuckDbTask.perf_profile
~HashPartitionDuckDbTask.python_random_gen
~HashPartitionDuckDbTask.query_udfs
~HashPartitionDuckDbTask.rand_seed_float
~HashPartitionDuckDbTask.rand_seed_uint32
~HashPartitionDuckDbTask.random_seed_bytes
~HashPartitionDuckDbTask.ray_dataset_path
~HashPartitionDuckDbTask.ray_marker_path
~HashPartitionDuckDbTask.retry_count
~HashPartitionDuckDbTask.runtime_id
~HashPartitionDuckDbTask.runtime_output_abspath
~HashPartitionDuckDbTask.runtime_state
~HashPartitionDuckDbTask.sched_epoch
~HashPartitionDuckDbTask.self_contained_output
~HashPartitionDuckDbTask.skip_when_any_input_empty
~HashPartitionDuckDbTask.staging_root
~HashPartitionDuckDbTask.start_time
~HashPartitionDuckDbTask.status
~HashPartitionDuckDbTask.temp_abspath
~HashPartitionDuckDbTask.temp_output
~HashPartitionDuckDbTask.udfs
~HashPartitionDuckDbTask.uniform_failure_prob
~HashPartitionDuckDbTask.write_buffer_size

View File

@@ -0,0 +1,124 @@
smallpond.execution.task.HashPartitionTask
==========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: HashPartitionTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~HashPartitionTask.__init__
~HashPartitionTask.add_elapsed_time
~HashPartitionTask.adjust_row_group_size
~HashPartitionTask.clean_complex_attrs
~HashPartitionTask.clean_output
~HashPartitionTask.cleanup
~HashPartitionTask.compute_avg_row_size
~HashPartitionTask.create
~HashPartitionTask.dump
~HashPartitionTask.exec
~HashPartitionTask.finalize
~HashPartitionTask.get_partition_info
~HashPartitionTask.initialize
~HashPartitionTask.inject_fault
~HashPartitionTask.merge_metrics
~HashPartitionTask.oom
~HashPartitionTask.parquet_kv_metadata_bytes
~HashPartitionTask.parquet_kv_metadata_str
~HashPartitionTask.partition
~HashPartitionTask.random_float
~HashPartitionTask.random_uint32
~HashPartitionTask.run
~HashPartitionTask.run_on_ray
~HashPartitionTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~HashPartitionTask.hash_columns
~HashPartitionTask.data_partition_column
~HashPartitionTask.random_shuffle
~HashPartitionTask.shuffle_only
~HashPartitionTask.drop_partition_column
~HashPartitionTask.use_parquet_writer
~HashPartitionTask.hive_partitioning
~HashPartitionTask.parquet_row_group_size
~HashPartitionTask.parquet_row_group_bytes
~HashPartitionTask.parquet_dictionary_encoding
~HashPartitionTask.parquet_compression
~HashPartitionTask.parquet_compression_level
~HashPartitionTask.partitioned_datasets
~HashPartitionTask.allow_speculative_exec
~HashPartitionTask.any_input_empty
~HashPartitionTask.cpu_limit
~HashPartitionTask.ctx
~HashPartitionTask.dataset
~HashPartitionTask.default_output_name
~HashPartitionTask.dimension
~HashPartitionTask.elapsed_time
~HashPartitionTask.exception
~HashPartitionTask.exec_cq
~HashPartitionTask.exec_id
~HashPartitionTask.exec_on_scheduler
~HashPartitionTask.fail_count
~HashPartitionTask.final_output_abspath
~HashPartitionTask.finish_time
~HashPartitionTask.gpu_limit
~HashPartitionTask.id
~HashPartitionTask.input_datasets
~HashPartitionTask.input_deps
~HashPartitionTask.io_workers
~HashPartitionTask.key
~HashPartitionTask.local_gpu
~HashPartitionTask.local_gpu_ranks
~HashPartitionTask.local_rank
~HashPartitionTask.location
~HashPartitionTask.max_batch_size
~HashPartitionTask.memory_limit
~HashPartitionTask.node_id
~HashPartitionTask.npartitions
~HashPartitionTask.num_workers
~HashPartitionTask.numa_node
~HashPartitionTask.numpy_random_gen
~HashPartitionTask.output
~HashPartitionTask.output_deps
~HashPartitionTask.output_dirname
~HashPartitionTask.output_filename
~HashPartitionTask.output_name
~HashPartitionTask.output_root
~HashPartitionTask.partition_dims
~HashPartitionTask.partition_infos
~HashPartitionTask.partition_infos_as_dict
~HashPartitionTask.perf_metrics
~HashPartitionTask.perf_profile
~HashPartitionTask.python_random_gen
~HashPartitionTask.random_seed_bytes
~HashPartitionTask.ray_dataset_path
~HashPartitionTask.ray_marker_path
~HashPartitionTask.retry_count
~HashPartitionTask.runtime_id
~HashPartitionTask.runtime_output_abspath
~HashPartitionTask.runtime_state
~HashPartitionTask.sched_epoch
~HashPartitionTask.self_contained_output
~HashPartitionTask.skip_when_any_input_empty
~HashPartitionTask.staging_root
~HashPartitionTask.start_time
~HashPartitionTask.status
~HashPartitionTask.temp_abspath
~HashPartitionTask.temp_output
~HashPartitionTask.uniform_failure_prob
~HashPartitionTask.write_buffer_size

View File

@@ -0,0 +1,45 @@
smallpond.execution.task.JobId
==============================
.. currentmodule:: smallpond.execution.task
.. autoclass:: JobId
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~JobId.__init__
~JobId.new
.. rubric:: Attributes
.. autosummary::
~JobId.int
~JobId.is_safe
~JobId.bytes
~JobId.bytes_le
~JobId.clock_seq
~JobId.clock_seq_hi_variant
~JobId.clock_seq_low
~JobId.fields
~JobId.hex
~JobId.node
~JobId.time
~JobId.time_hi_version
~JobId.time_low
~JobId.time_mid
~JobId.urn
~JobId.variant
~JobId.version

View File

@@ -0,0 +1,108 @@
smallpond.execution.task.LoadPartitionedDataSetProducerTask
===========================================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: LoadPartitionedDataSetProducerTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~LoadPartitionedDataSetProducerTask.__init__
~LoadPartitionedDataSetProducerTask.add_elapsed_time
~LoadPartitionedDataSetProducerTask.adjust_row_group_size
~LoadPartitionedDataSetProducerTask.clean_complex_attrs
~LoadPartitionedDataSetProducerTask.clean_output
~LoadPartitionedDataSetProducerTask.cleanup
~LoadPartitionedDataSetProducerTask.compute_avg_row_size
~LoadPartitionedDataSetProducerTask.dump
~LoadPartitionedDataSetProducerTask.exec
~LoadPartitionedDataSetProducerTask.finalize
~LoadPartitionedDataSetProducerTask.get_partition_info
~LoadPartitionedDataSetProducerTask.initialize
~LoadPartitionedDataSetProducerTask.inject_fault
~LoadPartitionedDataSetProducerTask.merge_metrics
~LoadPartitionedDataSetProducerTask.oom
~LoadPartitionedDataSetProducerTask.parquet_kv_metadata_bytes
~LoadPartitionedDataSetProducerTask.parquet_kv_metadata_str
~LoadPartitionedDataSetProducerTask.random_float
~LoadPartitionedDataSetProducerTask.random_uint32
~LoadPartitionedDataSetProducerTask.run
~LoadPartitionedDataSetProducerTask.run_on_ray
~LoadPartitionedDataSetProducerTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~LoadPartitionedDataSetProducerTask.data_partition_column
~LoadPartitionedDataSetProducerTask.hive_partitioning
~LoadPartitionedDataSetProducerTask.allow_speculative_exec
~LoadPartitionedDataSetProducerTask.any_input_empty
~LoadPartitionedDataSetProducerTask.cpu_limit
~LoadPartitionedDataSetProducerTask.ctx
~LoadPartitionedDataSetProducerTask.dataset
~LoadPartitionedDataSetProducerTask.default_output_name
~LoadPartitionedDataSetProducerTask.dimension
~LoadPartitionedDataSetProducerTask.elapsed_time
~LoadPartitionedDataSetProducerTask.exception
~LoadPartitionedDataSetProducerTask.exec_cq
~LoadPartitionedDataSetProducerTask.exec_id
~LoadPartitionedDataSetProducerTask.exec_on_scheduler
~LoadPartitionedDataSetProducerTask.fail_count
~LoadPartitionedDataSetProducerTask.final_output_abspath
~LoadPartitionedDataSetProducerTask.finish_time
~LoadPartitionedDataSetProducerTask.gpu_limit
~LoadPartitionedDataSetProducerTask.id
~LoadPartitionedDataSetProducerTask.input_datasets
~LoadPartitionedDataSetProducerTask.input_deps
~LoadPartitionedDataSetProducerTask.key
~LoadPartitionedDataSetProducerTask.local_gpu
~LoadPartitionedDataSetProducerTask.local_gpu_ranks
~LoadPartitionedDataSetProducerTask.local_rank
~LoadPartitionedDataSetProducerTask.location
~LoadPartitionedDataSetProducerTask.memory_limit
~LoadPartitionedDataSetProducerTask.node_id
~LoadPartitionedDataSetProducerTask.npartitions
~LoadPartitionedDataSetProducerTask.numa_node
~LoadPartitionedDataSetProducerTask.numpy_random_gen
~LoadPartitionedDataSetProducerTask.output
~LoadPartitionedDataSetProducerTask.output_deps
~LoadPartitionedDataSetProducerTask.output_dirname
~LoadPartitionedDataSetProducerTask.output_filename
~LoadPartitionedDataSetProducerTask.output_name
~LoadPartitionedDataSetProducerTask.output_root
~LoadPartitionedDataSetProducerTask.partition_dims
~LoadPartitionedDataSetProducerTask.partition_infos
~LoadPartitionedDataSetProducerTask.partition_infos_as_dict
~LoadPartitionedDataSetProducerTask.partitioned_datasets
~LoadPartitionedDataSetProducerTask.perf_metrics
~LoadPartitionedDataSetProducerTask.perf_profile
~LoadPartitionedDataSetProducerTask.python_random_gen
~LoadPartitionedDataSetProducerTask.random_seed_bytes
~LoadPartitionedDataSetProducerTask.ray_dataset_path
~LoadPartitionedDataSetProducerTask.ray_marker_path
~LoadPartitionedDataSetProducerTask.retry_count
~LoadPartitionedDataSetProducerTask.runtime_id
~LoadPartitionedDataSetProducerTask.runtime_output_abspath
~LoadPartitionedDataSetProducerTask.runtime_state
~LoadPartitionedDataSetProducerTask.sched_epoch
~LoadPartitionedDataSetProducerTask.self_contained_output
~LoadPartitionedDataSetProducerTask.skip_when_any_input_empty
~LoadPartitionedDataSetProducerTask.staging_root
~LoadPartitionedDataSetProducerTask.start_time
~LoadPartitionedDataSetProducerTask.status
~LoadPartitionedDataSetProducerTask.temp_abspath
~LoadPartitionedDataSetProducerTask.temp_output
~LoadPartitionedDataSetProducerTask.uniform_failure_prob

View File

@@ -0,0 +1,103 @@
smallpond.execution.task.MergeDataSetsTask
==========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: MergeDataSetsTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~MergeDataSetsTask.__init__
~MergeDataSetsTask.add_elapsed_time
~MergeDataSetsTask.adjust_row_group_size
~MergeDataSetsTask.clean_complex_attrs
~MergeDataSetsTask.clean_output
~MergeDataSetsTask.cleanup
~MergeDataSetsTask.compute_avg_row_size
~MergeDataSetsTask.dump
~MergeDataSetsTask.exec
~MergeDataSetsTask.finalize
~MergeDataSetsTask.get_partition_info
~MergeDataSetsTask.initialize
~MergeDataSetsTask.inject_fault
~MergeDataSetsTask.merge_metrics
~MergeDataSetsTask.oom
~MergeDataSetsTask.parquet_kv_metadata_bytes
~MergeDataSetsTask.parquet_kv_metadata_str
~MergeDataSetsTask.random_float
~MergeDataSetsTask.random_uint32
~MergeDataSetsTask.run
~MergeDataSetsTask.run_on_ray
~MergeDataSetsTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~MergeDataSetsTask.ctx
~MergeDataSetsTask.id
~MergeDataSetsTask.node_id
~MergeDataSetsTask.sched_epoch
~MergeDataSetsTask.output_name
~MergeDataSetsTask.output_root
~MergeDataSetsTask.dataset
~MergeDataSetsTask.input_deps
~MergeDataSetsTask.output_deps
~MergeDataSetsTask.perf_metrics
~MergeDataSetsTask.perf_profile
~MergeDataSetsTask.runtime_state
~MergeDataSetsTask.input_datasets
~MergeDataSetsTask.allow_speculative_exec
~MergeDataSetsTask.any_input_empty
~MergeDataSetsTask.cpu_limit
~MergeDataSetsTask.default_output_name
~MergeDataSetsTask.elapsed_time
~MergeDataSetsTask.exception
~MergeDataSetsTask.exec_cq
~MergeDataSetsTask.exec_id
~MergeDataSetsTask.exec_on_scheduler
~MergeDataSetsTask.fail_count
~MergeDataSetsTask.final_output_abspath
~MergeDataSetsTask.finish_time
~MergeDataSetsTask.gpu_limit
~MergeDataSetsTask.key
~MergeDataSetsTask.local_gpu
~MergeDataSetsTask.local_gpu_ranks
~MergeDataSetsTask.local_rank
~MergeDataSetsTask.location
~MergeDataSetsTask.memory_limit
~MergeDataSetsTask.numa_node
~MergeDataSetsTask.numpy_random_gen
~MergeDataSetsTask.output
~MergeDataSetsTask.output_dirname
~MergeDataSetsTask.output_filename
~MergeDataSetsTask.partition_dims
~MergeDataSetsTask.partition_infos
~MergeDataSetsTask.partition_infos_as_dict
~MergeDataSetsTask.python_random_gen
~MergeDataSetsTask.random_seed_bytes
~MergeDataSetsTask.ray_dataset_path
~MergeDataSetsTask.ray_marker_path
~MergeDataSetsTask.retry_count
~MergeDataSetsTask.runtime_id
~MergeDataSetsTask.runtime_output_abspath
~MergeDataSetsTask.self_contained_output
~MergeDataSetsTask.skip_when_any_input_empty
~MergeDataSetsTask.staging_root
~MergeDataSetsTask.start_time
~MergeDataSetsTask.status
~MergeDataSetsTask.temp_abspath
~MergeDataSetsTask.temp_output
~MergeDataSetsTask.uniform_failure_prob

View File

@@ -0,0 +1,132 @@
smallpond.execution.task.PandasBatchTask
========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: PandasBatchTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PandasBatchTask.__init__
~PandasBatchTask.add_elapsed_time
~PandasBatchTask.adjust_row_group_size
~PandasBatchTask.clean_complex_attrs
~PandasBatchTask.clean_output
~PandasBatchTask.cleanup
~PandasBatchTask.compute_avg_row_size
~PandasBatchTask.create_input_views
~PandasBatchTask.dump
~PandasBatchTask.dump_output
~PandasBatchTask.exec
~PandasBatchTask.exec_query
~PandasBatchTask.finalize
~PandasBatchTask.get_partition_info
~PandasBatchTask.initialize
~PandasBatchTask.inject_fault
~PandasBatchTask.merge_metrics
~PandasBatchTask.oom
~PandasBatchTask.parquet_kv_metadata_bytes
~PandasBatchTask.parquet_kv_metadata_str
~PandasBatchTask.prepare_connection
~PandasBatchTask.process
~PandasBatchTask.random_float
~PandasBatchTask.random_uint32
~PandasBatchTask.restore_input_state
~PandasBatchTask.run
~PandasBatchTask.run_on_ray
~PandasBatchTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~PandasBatchTask.process_func
~PandasBatchTask.background_io_thread
~PandasBatchTask.streaming_batch_size
~PandasBatchTask.streaming_batch_count
~PandasBatchTask.parquet_row_group_size
~PandasBatchTask.parquet_row_group_bytes
~PandasBatchTask.parquet_dictionary_encoding
~PandasBatchTask.parquet_compression
~PandasBatchTask.parquet_compression_level
~PandasBatchTask.secs_checkpoint_interval
~PandasBatchTask.allow_speculative_exec
~PandasBatchTask.any_input_empty
~PandasBatchTask.compression_level_str
~PandasBatchTask.compression_options
~PandasBatchTask.compression_type_str
~PandasBatchTask.cpu_limit
~PandasBatchTask.cpu_overcommit_ratio
~PandasBatchTask.ctx
~PandasBatchTask.dataset
~PandasBatchTask.default_output_name
~PandasBatchTask.elapsed_time
~PandasBatchTask.enable_temp_directory
~PandasBatchTask.exception
~PandasBatchTask.exec_cq
~PandasBatchTask.exec_id
~PandasBatchTask.exec_on_scheduler
~PandasBatchTask.fail_count
~PandasBatchTask.final_output_abspath
~PandasBatchTask.finish_time
~PandasBatchTask.gpu_limit
~PandasBatchTask.id
~PandasBatchTask.input_datasets
~PandasBatchTask.input_deps
~PandasBatchTask.input_udfs
~PandasBatchTask.input_view_index
~PandasBatchTask.key
~PandasBatchTask.local_gpu
~PandasBatchTask.local_gpu_ranks
~PandasBatchTask.local_rank
~PandasBatchTask.location
~PandasBatchTask.max_batch_size
~PandasBatchTask.memory_limit
~PandasBatchTask.memory_overcommit_ratio
~PandasBatchTask.node_id
~PandasBatchTask.numa_node
~PandasBatchTask.numpy_random_gen
~PandasBatchTask.output
~PandasBatchTask.output_deps
~PandasBatchTask.output_dirname
~PandasBatchTask.output_filename
~PandasBatchTask.output_name
~PandasBatchTask.output_root
~PandasBatchTask.partition_dims
~PandasBatchTask.partition_infos
~PandasBatchTask.partition_infos_as_dict
~PandasBatchTask.perf_metrics
~PandasBatchTask.perf_profile
~PandasBatchTask.python_random_gen
~PandasBatchTask.query_udfs
~PandasBatchTask.rand_seed_float
~PandasBatchTask.rand_seed_uint32
~PandasBatchTask.random_seed_bytes
~PandasBatchTask.ray_dataset_path
~PandasBatchTask.ray_marker_path
~PandasBatchTask.retry_count
~PandasBatchTask.runtime_id
~PandasBatchTask.runtime_output_abspath
~PandasBatchTask.runtime_state
~PandasBatchTask.sched_epoch
~PandasBatchTask.self_contained_output
~PandasBatchTask.skip_when_any_input_empty
~PandasBatchTask.staging_root
~PandasBatchTask.start_time
~PandasBatchTask.status
~PandasBatchTask.temp_abspath
~PandasBatchTask.temp_output
~PandasBatchTask.udfs
~PandasBatchTask.uniform_failure_prob

View File

@@ -0,0 +1,127 @@
smallpond.execution.task.PandasComputeTask
==========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: PandasComputeTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PandasComputeTask.__init__
~PandasComputeTask.add_elapsed_time
~PandasComputeTask.adjust_row_group_size
~PandasComputeTask.clean_complex_attrs
~PandasComputeTask.clean_output
~PandasComputeTask.cleanup
~PandasComputeTask.compute_avg_row_size
~PandasComputeTask.create_input_views
~PandasComputeTask.dump
~PandasComputeTask.dump_output
~PandasComputeTask.exec
~PandasComputeTask.exec_query
~PandasComputeTask.finalize
~PandasComputeTask.get_partition_info
~PandasComputeTask.initialize
~PandasComputeTask.inject_fault
~PandasComputeTask.merge_metrics
~PandasComputeTask.oom
~PandasComputeTask.parquet_kv_metadata_bytes
~PandasComputeTask.parquet_kv_metadata_str
~PandasComputeTask.prepare_connection
~PandasComputeTask.process
~PandasComputeTask.random_float
~PandasComputeTask.random_uint32
~PandasComputeTask.run
~PandasComputeTask.run_on_ray
~PandasComputeTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~PandasComputeTask.process_func
~PandasComputeTask.parquet_row_group_size
~PandasComputeTask.parquet_row_group_bytes
~PandasComputeTask.parquet_dictionary_encoding
~PandasComputeTask.use_duckdb_reader
~PandasComputeTask.allow_speculative_exec
~PandasComputeTask.any_input_empty
~PandasComputeTask.compression_level_str
~PandasComputeTask.compression_options
~PandasComputeTask.compression_type_str
~PandasComputeTask.cpu_limit
~PandasComputeTask.cpu_overcommit_ratio
~PandasComputeTask.ctx
~PandasComputeTask.dataset
~PandasComputeTask.default_output_name
~PandasComputeTask.elapsed_time
~PandasComputeTask.enable_temp_directory
~PandasComputeTask.exception
~PandasComputeTask.exec_cq
~PandasComputeTask.exec_id
~PandasComputeTask.exec_on_scheduler
~PandasComputeTask.fail_count
~PandasComputeTask.final_output_abspath
~PandasComputeTask.finish_time
~PandasComputeTask.gpu_limit
~PandasComputeTask.id
~PandasComputeTask.input_datasets
~PandasComputeTask.input_deps
~PandasComputeTask.input_udfs
~PandasComputeTask.input_view_index
~PandasComputeTask.key
~PandasComputeTask.local_gpu
~PandasComputeTask.local_gpu_ranks
~PandasComputeTask.local_rank
~PandasComputeTask.location
~PandasComputeTask.memory_limit
~PandasComputeTask.memory_overcommit_ratio
~PandasComputeTask.node_id
~PandasComputeTask.numa_node
~PandasComputeTask.numpy_random_gen
~PandasComputeTask.output
~PandasComputeTask.output_deps
~PandasComputeTask.output_dirname
~PandasComputeTask.output_filename
~PandasComputeTask.output_name
~PandasComputeTask.output_root
~PandasComputeTask.parquet_compression
~PandasComputeTask.parquet_compression_level
~PandasComputeTask.partition_dims
~PandasComputeTask.partition_infos
~PandasComputeTask.partition_infos_as_dict
~PandasComputeTask.perf_metrics
~PandasComputeTask.perf_profile
~PandasComputeTask.python_random_gen
~PandasComputeTask.query_udfs
~PandasComputeTask.rand_seed_float
~PandasComputeTask.rand_seed_uint32
~PandasComputeTask.random_seed_bytes
~PandasComputeTask.ray_dataset_path
~PandasComputeTask.ray_marker_path
~PandasComputeTask.retry_count
~PandasComputeTask.runtime_id
~PandasComputeTask.runtime_output_abspath
~PandasComputeTask.runtime_state
~PandasComputeTask.sched_epoch
~PandasComputeTask.self_contained_output
~PandasComputeTask.skip_when_any_input_empty
~PandasComputeTask.staging_root
~PandasComputeTask.start_time
~PandasComputeTask.status
~PandasComputeTask.temp_abspath
~PandasComputeTask.temp_output
~PandasComputeTask.udfs
~PandasComputeTask.uniform_failure_prob

View File

@@ -0,0 +1,104 @@
smallpond.execution.task.PartitionConsumerTask
==============================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: PartitionConsumerTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PartitionConsumerTask.__init__
~PartitionConsumerTask.add_elapsed_time
~PartitionConsumerTask.adjust_row_group_size
~PartitionConsumerTask.clean_complex_attrs
~PartitionConsumerTask.clean_output
~PartitionConsumerTask.cleanup
~PartitionConsumerTask.compute_avg_row_size
~PartitionConsumerTask.dump
~PartitionConsumerTask.exec
~PartitionConsumerTask.finalize
~PartitionConsumerTask.get_partition_info
~PartitionConsumerTask.initialize
~PartitionConsumerTask.inject_fault
~PartitionConsumerTask.merge_metrics
~PartitionConsumerTask.oom
~PartitionConsumerTask.parquet_kv_metadata_bytes
~PartitionConsumerTask.parquet_kv_metadata_str
~PartitionConsumerTask.random_float
~PartitionConsumerTask.random_uint32
~PartitionConsumerTask.run
~PartitionConsumerTask.run_on_ray
~PartitionConsumerTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~PartitionConsumerTask.last_partition
~PartitionConsumerTask.allow_speculative_exec
~PartitionConsumerTask.any_input_empty
~PartitionConsumerTask.cpu_limit
~PartitionConsumerTask.ctx
~PartitionConsumerTask.dataset
~PartitionConsumerTask.default_output_name
~PartitionConsumerTask.elapsed_time
~PartitionConsumerTask.exception
~PartitionConsumerTask.exec_cq
~PartitionConsumerTask.exec_id
~PartitionConsumerTask.exec_on_scheduler
~PartitionConsumerTask.fail_count
~PartitionConsumerTask.final_output_abspath
~PartitionConsumerTask.finish_time
~PartitionConsumerTask.gpu_limit
~PartitionConsumerTask.id
~PartitionConsumerTask.input_datasets
~PartitionConsumerTask.input_deps
~PartitionConsumerTask.key
~PartitionConsumerTask.local_gpu
~PartitionConsumerTask.local_gpu_ranks
~PartitionConsumerTask.local_rank
~PartitionConsumerTask.location
~PartitionConsumerTask.memory_limit
~PartitionConsumerTask.node_id
~PartitionConsumerTask.numa_node
~PartitionConsumerTask.numpy_random_gen
~PartitionConsumerTask.output
~PartitionConsumerTask.output_deps
~PartitionConsumerTask.output_dirname
~PartitionConsumerTask.output_filename
~PartitionConsumerTask.output_name
~PartitionConsumerTask.output_root
~PartitionConsumerTask.partition_dims
~PartitionConsumerTask.partition_infos
~PartitionConsumerTask.partition_infos_as_dict
~PartitionConsumerTask.perf_metrics
~PartitionConsumerTask.perf_profile
~PartitionConsumerTask.python_random_gen
~PartitionConsumerTask.random_seed_bytes
~PartitionConsumerTask.ray_dataset_path
~PartitionConsumerTask.ray_marker_path
~PartitionConsumerTask.retry_count
~PartitionConsumerTask.runtime_id
~PartitionConsumerTask.runtime_output_abspath
~PartitionConsumerTask.runtime_state
~PartitionConsumerTask.sched_epoch
~PartitionConsumerTask.self_contained_output
~PartitionConsumerTask.skip_when_any_input_empty
~PartitionConsumerTask.staging_root
~PartitionConsumerTask.start_time
~PartitionConsumerTask.status
~PartitionConsumerTask.temp_abspath
~PartitionConsumerTask.temp_output
~PartitionConsumerTask.uniform_failure_prob

View File

@@ -0,0 +1,32 @@
smallpond.execution.task.PartitionInfo
======================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: PartitionInfo
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PartitionInfo.__init__
.. rubric:: Attributes
.. autosummary::
~PartitionInfo.index
~PartitionInfo.npartitions
~PartitionInfo.dimension
~PartitionInfo.default_dimension
~PartitionInfo.toplevel_dimension

View File

@@ -0,0 +1,106 @@
smallpond.execution.task.PartitionProducerTask
==============================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: PartitionProducerTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PartitionProducerTask.__init__
~PartitionProducerTask.add_elapsed_time
~PartitionProducerTask.adjust_row_group_size
~PartitionProducerTask.clean_complex_attrs
~PartitionProducerTask.clean_output
~PartitionProducerTask.cleanup
~PartitionProducerTask.compute_avg_row_size
~PartitionProducerTask.dump
~PartitionProducerTask.exec
~PartitionProducerTask.finalize
~PartitionProducerTask.get_partition_info
~PartitionProducerTask.initialize
~PartitionProducerTask.inject_fault
~PartitionProducerTask.merge_metrics
~PartitionProducerTask.oom
~PartitionProducerTask.parquet_kv_metadata_bytes
~PartitionProducerTask.parquet_kv_metadata_str
~PartitionProducerTask.random_float
~PartitionProducerTask.random_uint32
~PartitionProducerTask.run
~PartitionProducerTask.run_on_ray
~PartitionProducerTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~PartitionProducerTask.npartitions
~PartitionProducerTask.dimension
~PartitionProducerTask.partitioned_datasets
~PartitionProducerTask.allow_speculative_exec
~PartitionProducerTask.any_input_empty
~PartitionProducerTask.cpu_limit
~PartitionProducerTask.ctx
~PartitionProducerTask.dataset
~PartitionProducerTask.default_output_name
~PartitionProducerTask.elapsed_time
~PartitionProducerTask.exception
~PartitionProducerTask.exec_cq
~PartitionProducerTask.exec_id
~PartitionProducerTask.exec_on_scheduler
~PartitionProducerTask.fail_count
~PartitionProducerTask.final_output_abspath
~PartitionProducerTask.finish_time
~PartitionProducerTask.gpu_limit
~PartitionProducerTask.id
~PartitionProducerTask.input_datasets
~PartitionProducerTask.input_deps
~PartitionProducerTask.key
~PartitionProducerTask.local_gpu
~PartitionProducerTask.local_gpu_ranks
~PartitionProducerTask.local_rank
~PartitionProducerTask.location
~PartitionProducerTask.memory_limit
~PartitionProducerTask.node_id
~PartitionProducerTask.numa_node
~PartitionProducerTask.numpy_random_gen
~PartitionProducerTask.output
~PartitionProducerTask.output_deps
~PartitionProducerTask.output_dirname
~PartitionProducerTask.output_filename
~PartitionProducerTask.output_name
~PartitionProducerTask.output_root
~PartitionProducerTask.partition_dims
~PartitionProducerTask.partition_infos
~PartitionProducerTask.partition_infos_as_dict
~PartitionProducerTask.perf_metrics
~PartitionProducerTask.perf_profile
~PartitionProducerTask.python_random_gen
~PartitionProducerTask.random_seed_bytes
~PartitionProducerTask.ray_dataset_path
~PartitionProducerTask.ray_marker_path
~PartitionProducerTask.retry_count
~PartitionProducerTask.runtime_id
~PartitionProducerTask.runtime_output_abspath
~PartitionProducerTask.runtime_state
~PartitionProducerTask.sched_epoch
~PartitionProducerTask.self_contained_output
~PartitionProducerTask.skip_when_any_input_empty
~PartitionProducerTask.staging_root
~PartitionProducerTask.start_time
~PartitionProducerTask.status
~PartitionProducerTask.temp_abspath
~PartitionProducerTask.temp_output
~PartitionProducerTask.uniform_failure_prob

View File

@@ -0,0 +1,38 @@
smallpond.execution.task.PerfStats
==================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: PerfStats
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PerfStats.__init__
~PerfStats.count
~PerfStats.index
.. rubric:: Attributes
.. autosummary::
~PerfStats.avg
~PerfStats.cnt
~PerfStats.max
~PerfStats.min
~PerfStats.p50
~PerfStats.p75
~PerfStats.p95
~PerfStats.p99
~PerfStats.sum

View File

@@ -0,0 +1,106 @@
smallpond.execution.task.ProjectionTask
=======================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: ProjectionTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ProjectionTask.__init__
~ProjectionTask.add_elapsed_time
~ProjectionTask.adjust_row_group_size
~ProjectionTask.clean_complex_attrs
~ProjectionTask.clean_output
~ProjectionTask.cleanup
~ProjectionTask.compute_avg_row_size
~ProjectionTask.dump
~ProjectionTask.exec
~ProjectionTask.finalize
~ProjectionTask.get_partition_info
~ProjectionTask.initialize
~ProjectionTask.inject_fault
~ProjectionTask.merge_metrics
~ProjectionTask.oom
~ProjectionTask.parquet_kv_metadata_bytes
~ProjectionTask.parquet_kv_metadata_str
~ProjectionTask.random_float
~ProjectionTask.random_uint32
~ProjectionTask.run
~ProjectionTask.run_on_ray
~ProjectionTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~ProjectionTask.columns
~ProjectionTask.generated_columns
~ProjectionTask.union_by_name
~ProjectionTask.allow_speculative_exec
~ProjectionTask.any_input_empty
~ProjectionTask.cpu_limit
~ProjectionTask.ctx
~ProjectionTask.dataset
~ProjectionTask.default_output_name
~ProjectionTask.elapsed_time
~ProjectionTask.exception
~ProjectionTask.exec_cq
~ProjectionTask.exec_id
~ProjectionTask.exec_on_scheduler
~ProjectionTask.fail_count
~ProjectionTask.final_output_abspath
~ProjectionTask.finish_time
~ProjectionTask.gpu_limit
~ProjectionTask.id
~ProjectionTask.input_datasets
~ProjectionTask.input_deps
~ProjectionTask.key
~ProjectionTask.local_gpu
~ProjectionTask.local_gpu_ranks
~ProjectionTask.local_rank
~ProjectionTask.location
~ProjectionTask.memory_limit
~ProjectionTask.node_id
~ProjectionTask.numa_node
~ProjectionTask.numpy_random_gen
~ProjectionTask.output
~ProjectionTask.output_deps
~ProjectionTask.output_dirname
~ProjectionTask.output_filename
~ProjectionTask.output_name
~ProjectionTask.output_root
~ProjectionTask.partition_dims
~ProjectionTask.partition_infos
~ProjectionTask.partition_infos_as_dict
~ProjectionTask.perf_metrics
~ProjectionTask.perf_profile
~ProjectionTask.python_random_gen
~ProjectionTask.random_seed_bytes
~ProjectionTask.ray_dataset_path
~ProjectionTask.ray_marker_path
~ProjectionTask.retry_count
~ProjectionTask.runtime_id
~ProjectionTask.runtime_output_abspath
~ProjectionTask.runtime_state
~ProjectionTask.sched_epoch
~ProjectionTask.self_contained_output
~ProjectionTask.skip_when_any_input_empty
~ProjectionTask.staging_root
~ProjectionTask.start_time
~ProjectionTask.status
~ProjectionTask.temp_abspath
~ProjectionTask.temp_output
~ProjectionTask.uniform_failure_prob

View File

@@ -0,0 +1,122 @@
smallpond.execution.task.PythonScriptTask
=========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: PythonScriptTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PythonScriptTask.__init__
~PythonScriptTask.add_elapsed_time
~PythonScriptTask.adjust_row_group_size
~PythonScriptTask.clean_complex_attrs
~PythonScriptTask.clean_output
~PythonScriptTask.cleanup
~PythonScriptTask.compute_avg_row_size
~PythonScriptTask.create_input_views
~PythonScriptTask.dump
~PythonScriptTask.exec
~PythonScriptTask.exec_query
~PythonScriptTask.finalize
~PythonScriptTask.get_partition_info
~PythonScriptTask.initialize
~PythonScriptTask.inject_fault
~PythonScriptTask.merge_metrics
~PythonScriptTask.oom
~PythonScriptTask.parquet_kv_metadata_bytes
~PythonScriptTask.parquet_kv_metadata_str
~PythonScriptTask.prepare_connection
~PythonScriptTask.process
~PythonScriptTask.random_float
~PythonScriptTask.random_uint32
~PythonScriptTask.run
~PythonScriptTask.run_on_ray
~PythonScriptTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~PythonScriptTask.process_func
~PythonScriptTask.allow_speculative_exec
~PythonScriptTask.any_input_empty
~PythonScriptTask.compression_level_str
~PythonScriptTask.compression_options
~PythonScriptTask.compression_type_str
~PythonScriptTask.cpu_limit
~PythonScriptTask.cpu_overcommit_ratio
~PythonScriptTask.ctx
~PythonScriptTask.dataset
~PythonScriptTask.default_output_name
~PythonScriptTask.elapsed_time
~PythonScriptTask.enable_temp_directory
~PythonScriptTask.exception
~PythonScriptTask.exec_cq
~PythonScriptTask.exec_id
~PythonScriptTask.exec_on_scheduler
~PythonScriptTask.fail_count
~PythonScriptTask.final_output_abspath
~PythonScriptTask.finish_time
~PythonScriptTask.gpu_limit
~PythonScriptTask.id
~PythonScriptTask.input_datasets
~PythonScriptTask.input_deps
~PythonScriptTask.input_udfs
~PythonScriptTask.input_view_index
~PythonScriptTask.key
~PythonScriptTask.local_gpu
~PythonScriptTask.local_gpu_ranks
~PythonScriptTask.local_rank
~PythonScriptTask.location
~PythonScriptTask.memory_limit
~PythonScriptTask.memory_overcommit_ratio
~PythonScriptTask.node_id
~PythonScriptTask.numa_node
~PythonScriptTask.numpy_random_gen
~PythonScriptTask.output
~PythonScriptTask.output_deps
~PythonScriptTask.output_dirname
~PythonScriptTask.output_filename
~PythonScriptTask.output_name
~PythonScriptTask.output_root
~PythonScriptTask.parquet_compression
~PythonScriptTask.parquet_compression_level
~PythonScriptTask.partition_dims
~PythonScriptTask.partition_infos
~PythonScriptTask.partition_infos_as_dict
~PythonScriptTask.perf_metrics
~PythonScriptTask.perf_profile
~PythonScriptTask.python_random_gen
~PythonScriptTask.query_udfs
~PythonScriptTask.rand_seed_float
~PythonScriptTask.rand_seed_uint32
~PythonScriptTask.random_seed_bytes
~PythonScriptTask.ray_dataset_path
~PythonScriptTask.ray_marker_path
~PythonScriptTask.retry_count
~PythonScriptTask.runtime_id
~PythonScriptTask.runtime_output_abspath
~PythonScriptTask.runtime_state
~PythonScriptTask.sched_epoch
~PythonScriptTask.self_contained_output
~PythonScriptTask.skip_when_any_input_empty
~PythonScriptTask.staging_root
~PythonScriptTask.start_time
~PythonScriptTask.status
~PythonScriptTask.temp_abspath
~PythonScriptTask.temp_output
~PythonScriptTask.udfs
~PythonScriptTask.uniform_failure_prob

View File

@@ -0,0 +1,103 @@
smallpond.execution.task.RangePartitionTask
===========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: RangePartitionTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~RangePartitionTask.__init__
~RangePartitionTask.add_elapsed_time
~RangePartitionTask.adjust_row_group_size
~RangePartitionTask.clean_complex_attrs
~RangePartitionTask.clean_output
~RangePartitionTask.cleanup
~RangePartitionTask.compute_avg_row_size
~RangePartitionTask.dump
~RangePartitionTask.exec
~RangePartitionTask.finalize
~RangePartitionTask.get_partition_info
~RangePartitionTask.initialize
~RangePartitionTask.inject_fault
~RangePartitionTask.merge_metrics
~RangePartitionTask.oom
~RangePartitionTask.parquet_kv_metadata_bytes
~RangePartitionTask.parquet_kv_metadata_str
~RangePartitionTask.random_float
~RangePartitionTask.random_uint32
~RangePartitionTask.run
~RangePartitionTask.run_on_ray
~RangePartitionTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~RangePartitionTask.ctx
~RangePartitionTask.id
~RangePartitionTask.node_id
~RangePartitionTask.sched_epoch
~RangePartitionTask.output_name
~RangePartitionTask.output_root
~RangePartitionTask.dataset
~RangePartitionTask.input_deps
~RangePartitionTask.output_deps
~RangePartitionTask.perf_metrics
~RangePartitionTask.perf_profile
~RangePartitionTask.runtime_state
~RangePartitionTask.input_datasets
~RangePartitionTask.allow_speculative_exec
~RangePartitionTask.any_input_empty
~RangePartitionTask.cpu_limit
~RangePartitionTask.default_output_name
~RangePartitionTask.elapsed_time
~RangePartitionTask.exception
~RangePartitionTask.exec_cq
~RangePartitionTask.exec_id
~RangePartitionTask.exec_on_scheduler
~RangePartitionTask.fail_count
~RangePartitionTask.final_output_abspath
~RangePartitionTask.finish_time
~RangePartitionTask.gpu_limit
~RangePartitionTask.key
~RangePartitionTask.local_gpu
~RangePartitionTask.local_gpu_ranks
~RangePartitionTask.local_rank
~RangePartitionTask.location
~RangePartitionTask.memory_limit
~RangePartitionTask.numa_node
~RangePartitionTask.numpy_random_gen
~RangePartitionTask.output
~RangePartitionTask.output_dirname
~RangePartitionTask.output_filename
~RangePartitionTask.partition_dims
~RangePartitionTask.partition_infos
~RangePartitionTask.partition_infos_as_dict
~RangePartitionTask.python_random_gen
~RangePartitionTask.random_seed_bytes
~RangePartitionTask.ray_dataset_path
~RangePartitionTask.ray_marker_path
~RangePartitionTask.retry_count
~RangePartitionTask.runtime_id
~RangePartitionTask.runtime_output_abspath
~RangePartitionTask.self_contained_output
~RangePartitionTask.skip_when_any_input_empty
~RangePartitionTask.staging_root
~RangePartitionTask.start_time
~RangePartitionTask.status
~RangePartitionTask.temp_abspath
~RangePartitionTask.temp_output
~RangePartitionTask.uniform_failure_prob

View File

@@ -0,0 +1,106 @@
smallpond.execution.task.RepeatPartitionProducerTask
====================================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: RepeatPartitionProducerTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~RepeatPartitionProducerTask.__init__
~RepeatPartitionProducerTask.add_elapsed_time
~RepeatPartitionProducerTask.adjust_row_group_size
~RepeatPartitionProducerTask.clean_complex_attrs
~RepeatPartitionProducerTask.clean_output
~RepeatPartitionProducerTask.cleanup
~RepeatPartitionProducerTask.compute_avg_row_size
~RepeatPartitionProducerTask.dump
~RepeatPartitionProducerTask.exec
~RepeatPartitionProducerTask.finalize
~RepeatPartitionProducerTask.get_partition_info
~RepeatPartitionProducerTask.initialize
~RepeatPartitionProducerTask.inject_fault
~RepeatPartitionProducerTask.merge_metrics
~RepeatPartitionProducerTask.oom
~RepeatPartitionProducerTask.parquet_kv_metadata_bytes
~RepeatPartitionProducerTask.parquet_kv_metadata_str
~RepeatPartitionProducerTask.random_float
~RepeatPartitionProducerTask.random_uint32
~RepeatPartitionProducerTask.run
~RepeatPartitionProducerTask.run_on_ray
~RepeatPartitionProducerTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~RepeatPartitionProducerTask.npartitions
~RepeatPartitionProducerTask.dimension
~RepeatPartitionProducerTask.partitioned_datasets
~RepeatPartitionProducerTask.allow_speculative_exec
~RepeatPartitionProducerTask.any_input_empty
~RepeatPartitionProducerTask.cpu_limit
~RepeatPartitionProducerTask.ctx
~RepeatPartitionProducerTask.dataset
~RepeatPartitionProducerTask.default_output_name
~RepeatPartitionProducerTask.elapsed_time
~RepeatPartitionProducerTask.exception
~RepeatPartitionProducerTask.exec_cq
~RepeatPartitionProducerTask.exec_id
~RepeatPartitionProducerTask.exec_on_scheduler
~RepeatPartitionProducerTask.fail_count
~RepeatPartitionProducerTask.final_output_abspath
~RepeatPartitionProducerTask.finish_time
~RepeatPartitionProducerTask.gpu_limit
~RepeatPartitionProducerTask.id
~RepeatPartitionProducerTask.input_datasets
~RepeatPartitionProducerTask.input_deps
~RepeatPartitionProducerTask.key
~RepeatPartitionProducerTask.local_gpu
~RepeatPartitionProducerTask.local_gpu_ranks
~RepeatPartitionProducerTask.local_rank
~RepeatPartitionProducerTask.location
~RepeatPartitionProducerTask.memory_limit
~RepeatPartitionProducerTask.node_id
~RepeatPartitionProducerTask.numa_node
~RepeatPartitionProducerTask.numpy_random_gen
~RepeatPartitionProducerTask.output
~RepeatPartitionProducerTask.output_deps
~RepeatPartitionProducerTask.output_dirname
~RepeatPartitionProducerTask.output_filename
~RepeatPartitionProducerTask.output_name
~RepeatPartitionProducerTask.output_root
~RepeatPartitionProducerTask.partition_dims
~RepeatPartitionProducerTask.partition_infos
~RepeatPartitionProducerTask.partition_infos_as_dict
~RepeatPartitionProducerTask.perf_metrics
~RepeatPartitionProducerTask.perf_profile
~RepeatPartitionProducerTask.python_random_gen
~RepeatPartitionProducerTask.random_seed_bytes
~RepeatPartitionProducerTask.ray_dataset_path
~RepeatPartitionProducerTask.ray_marker_path
~RepeatPartitionProducerTask.retry_count
~RepeatPartitionProducerTask.runtime_id
~RepeatPartitionProducerTask.runtime_output_abspath
~RepeatPartitionProducerTask.runtime_state
~RepeatPartitionProducerTask.sched_epoch
~RepeatPartitionProducerTask.self_contained_output
~RepeatPartitionProducerTask.skip_when_any_input_empty
~RepeatPartitionProducerTask.staging_root
~RepeatPartitionProducerTask.start_time
~RepeatPartitionProducerTask.status
~RepeatPartitionProducerTask.temp_abspath
~RepeatPartitionProducerTask.temp_output
~RepeatPartitionProducerTask.uniform_failure_prob

View File

@@ -0,0 +1,103 @@
smallpond.execution.task.RootTask
=================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: RootTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~RootTask.__init__
~RootTask.add_elapsed_time
~RootTask.adjust_row_group_size
~RootTask.clean_complex_attrs
~RootTask.clean_output
~RootTask.cleanup
~RootTask.compute_avg_row_size
~RootTask.dump
~RootTask.exec
~RootTask.finalize
~RootTask.get_partition_info
~RootTask.initialize
~RootTask.inject_fault
~RootTask.merge_metrics
~RootTask.oom
~RootTask.parquet_kv_metadata_bytes
~RootTask.parquet_kv_metadata_str
~RootTask.random_float
~RootTask.random_uint32
~RootTask.run
~RootTask.run_on_ray
~RootTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~RootTask.ctx
~RootTask.id
~RootTask.node_id
~RootTask.sched_epoch
~RootTask.output_name
~RootTask.output_root
~RootTask.dataset
~RootTask.input_deps
~RootTask.output_deps
~RootTask.perf_metrics
~RootTask.perf_profile
~RootTask.runtime_state
~RootTask.input_datasets
~RootTask.allow_speculative_exec
~RootTask.any_input_empty
~RootTask.cpu_limit
~RootTask.default_output_name
~RootTask.elapsed_time
~RootTask.exception
~RootTask.exec_cq
~RootTask.exec_id
~RootTask.exec_on_scheduler
~RootTask.fail_count
~RootTask.final_output_abspath
~RootTask.finish_time
~RootTask.gpu_limit
~RootTask.key
~RootTask.local_gpu
~RootTask.local_gpu_ranks
~RootTask.local_rank
~RootTask.location
~RootTask.memory_limit
~RootTask.numa_node
~RootTask.numpy_random_gen
~RootTask.output
~RootTask.output_dirname
~RootTask.output_filename
~RootTask.partition_dims
~RootTask.partition_infos
~RootTask.partition_infos_as_dict
~RootTask.python_random_gen
~RootTask.random_seed_bytes
~RootTask.ray_dataset_path
~RootTask.ray_marker_path
~RootTask.retry_count
~RootTask.runtime_id
~RootTask.runtime_output_abspath
~RootTask.self_contained_output
~RootTask.skip_when_any_input_empty
~RootTask.staging_root
~RootTask.start_time
~RootTask.status
~RootTask.temp_abspath
~RootTask.temp_output
~RootTask.uniform_failure_prob

View File

@@ -0,0 +1,50 @@
smallpond.execution.task.RuntimeContext
=======================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: RuntimeContext
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~RuntimeContext.__init__
~RuntimeContext.cleanup
~RuntimeContext.cleanup_root
~RuntimeContext.get_local_gpus
~RuntimeContext.initialize
~RuntimeContext.new_task_id
~RuntimeContext.set_current_task
.. rubric:: Attributes
.. autosummary::
~RuntimeContext.available_memory
~RuntimeContext.exec_plan_path
~RuntimeContext.job_root_dirname
~RuntimeContext.job_status_path
~RuntimeContext.logcial_plan_graph_path
~RuntimeContext.logcial_plan_path
~RuntimeContext.numa_node_count
~RuntimeContext.physical_cpu_count
~RuntimeContext.ray_log_path
~RuntimeContext.runtime_ctx_path
~RuntimeContext.sched_state_path
~RuntimeContext.secs_executor_probe_timeout
~RuntimeContext.task
~RuntimeContext.total_memory
~RuntimeContext.usable_cpu_count
~RuntimeContext.usable_gpu_count
~RuntimeContext.usable_memory_size

View File

@@ -0,0 +1,105 @@
smallpond.execution.task.SplitDataSetTask
=========================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: SplitDataSetTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~SplitDataSetTask.__init__
~SplitDataSetTask.add_elapsed_time
~SplitDataSetTask.adjust_row_group_size
~SplitDataSetTask.clean_complex_attrs
~SplitDataSetTask.clean_output
~SplitDataSetTask.cleanup
~SplitDataSetTask.compute_avg_row_size
~SplitDataSetTask.dump
~SplitDataSetTask.exec
~SplitDataSetTask.finalize
~SplitDataSetTask.get_partition_info
~SplitDataSetTask.initialize
~SplitDataSetTask.inject_fault
~SplitDataSetTask.merge_metrics
~SplitDataSetTask.oom
~SplitDataSetTask.parquet_kv_metadata_bytes
~SplitDataSetTask.parquet_kv_metadata_str
~SplitDataSetTask.random_float
~SplitDataSetTask.random_uint32
~SplitDataSetTask.run
~SplitDataSetTask.run_on_ray
~SplitDataSetTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~SplitDataSetTask.partition
~SplitDataSetTask.npartitions
~SplitDataSetTask.allow_speculative_exec
~SplitDataSetTask.any_input_empty
~SplitDataSetTask.cpu_limit
~SplitDataSetTask.ctx
~SplitDataSetTask.dataset
~SplitDataSetTask.default_output_name
~SplitDataSetTask.elapsed_time
~SplitDataSetTask.exception
~SplitDataSetTask.exec_cq
~SplitDataSetTask.exec_id
~SplitDataSetTask.exec_on_scheduler
~SplitDataSetTask.fail_count
~SplitDataSetTask.final_output_abspath
~SplitDataSetTask.finish_time
~SplitDataSetTask.gpu_limit
~SplitDataSetTask.id
~SplitDataSetTask.input_datasets
~SplitDataSetTask.input_deps
~SplitDataSetTask.key
~SplitDataSetTask.local_gpu
~SplitDataSetTask.local_gpu_ranks
~SplitDataSetTask.local_rank
~SplitDataSetTask.location
~SplitDataSetTask.memory_limit
~SplitDataSetTask.node_id
~SplitDataSetTask.numa_node
~SplitDataSetTask.numpy_random_gen
~SplitDataSetTask.output
~SplitDataSetTask.output_deps
~SplitDataSetTask.output_dirname
~SplitDataSetTask.output_filename
~SplitDataSetTask.output_name
~SplitDataSetTask.output_root
~SplitDataSetTask.partition_dims
~SplitDataSetTask.partition_infos
~SplitDataSetTask.partition_infos_as_dict
~SplitDataSetTask.perf_metrics
~SplitDataSetTask.perf_profile
~SplitDataSetTask.python_random_gen
~SplitDataSetTask.random_seed_bytes
~SplitDataSetTask.ray_dataset_path
~SplitDataSetTask.ray_marker_path
~SplitDataSetTask.retry_count
~SplitDataSetTask.runtime_id
~SplitDataSetTask.runtime_output_abspath
~SplitDataSetTask.runtime_state
~SplitDataSetTask.sched_epoch
~SplitDataSetTask.self_contained_output
~SplitDataSetTask.skip_when_any_input_empty
~SplitDataSetTask.staging_root
~SplitDataSetTask.start_time
~SplitDataSetTask.status
~SplitDataSetTask.temp_abspath
~SplitDataSetTask.temp_output
~SplitDataSetTask.uniform_failure_prob

View File

@@ -0,0 +1,131 @@
smallpond.execution.task.SqlEngineTask
======================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: SqlEngineTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~SqlEngineTask.__init__
~SqlEngineTask.add_elapsed_time
~SqlEngineTask.adjust_row_group_size
~SqlEngineTask.clean_complex_attrs
~SqlEngineTask.clean_output
~SqlEngineTask.cleanup
~SqlEngineTask.compute_avg_row_size
~SqlEngineTask.create_input_views
~SqlEngineTask.dump
~SqlEngineTask.exec
~SqlEngineTask.exec_query
~SqlEngineTask.finalize
~SqlEngineTask.get_partition_info
~SqlEngineTask.initialize
~SqlEngineTask.inject_fault
~SqlEngineTask.merge_metrics
~SqlEngineTask.oom
~SqlEngineTask.parquet_kv_metadata_bytes
~SqlEngineTask.parquet_kv_metadata_str
~SqlEngineTask.prepare_connection
~SqlEngineTask.process_batch
~SqlEngineTask.random_float
~SqlEngineTask.random_uint32
~SqlEngineTask.run
~SqlEngineTask.run_on_ray
~SqlEngineTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~SqlEngineTask.sql_queries
~SqlEngineTask.per_thread_output
~SqlEngineTask.materialize_output
~SqlEngineTask.materialize_in_memory
~SqlEngineTask.batched_processing
~SqlEngineTask.parquet_row_group_size
~SqlEngineTask.parquet_row_group_bytes
~SqlEngineTask.parquet_dictionary_encoding
~SqlEngineTask.parquet_compression
~SqlEngineTask.parquet_compression_level
~SqlEngineTask.allow_speculative_exec
~SqlEngineTask.any_input_empty
~SqlEngineTask.compression_level_str
~SqlEngineTask.compression_options
~SqlEngineTask.compression_type_str
~SqlEngineTask.cpu_limit
~SqlEngineTask.cpu_overcommit_ratio
~SqlEngineTask.ctx
~SqlEngineTask.dataset
~SqlEngineTask.default_output_name
~SqlEngineTask.elapsed_time
~SqlEngineTask.enable_temp_directory
~SqlEngineTask.exception
~SqlEngineTask.exec_cq
~SqlEngineTask.exec_id
~SqlEngineTask.exec_on_scheduler
~SqlEngineTask.fail_count
~SqlEngineTask.final_output_abspath
~SqlEngineTask.finish_time
~SqlEngineTask.gpu_limit
~SqlEngineTask.id
~SqlEngineTask.input_datasets
~SqlEngineTask.input_deps
~SqlEngineTask.input_udfs
~SqlEngineTask.input_view_index
~SqlEngineTask.key
~SqlEngineTask.local_gpu
~SqlEngineTask.local_gpu_ranks
~SqlEngineTask.local_rank
~SqlEngineTask.location
~SqlEngineTask.max_batch_size
~SqlEngineTask.memory_limit
~SqlEngineTask.memory_overcommit_ratio
~SqlEngineTask.node_id
~SqlEngineTask.numa_node
~SqlEngineTask.numpy_random_gen
~SqlEngineTask.oneline_query
~SqlEngineTask.output
~SqlEngineTask.output_deps
~SqlEngineTask.output_dirname
~SqlEngineTask.output_filename
~SqlEngineTask.output_name
~SqlEngineTask.output_root
~SqlEngineTask.partition_dims
~SqlEngineTask.partition_infos
~SqlEngineTask.partition_infos_as_dict
~SqlEngineTask.perf_metrics
~SqlEngineTask.perf_profile
~SqlEngineTask.python_random_gen
~SqlEngineTask.query_udfs
~SqlEngineTask.rand_seed_float
~SqlEngineTask.rand_seed_uint32
~SqlEngineTask.random_seed_bytes
~SqlEngineTask.ray_dataset_path
~SqlEngineTask.ray_marker_path
~SqlEngineTask.retry_count
~SqlEngineTask.runtime_id
~SqlEngineTask.runtime_output_abspath
~SqlEngineTask.runtime_state
~SqlEngineTask.sched_epoch
~SqlEngineTask.self_contained_output
~SqlEngineTask.skip_when_any_input_empty
~SqlEngineTask.staging_root
~SqlEngineTask.start_time
~SqlEngineTask.status
~SqlEngineTask.temp_abspath
~SqlEngineTask.temp_output
~SqlEngineTask.udfs
~SqlEngineTask.uniform_failure_prob

View File

@@ -0,0 +1,103 @@
smallpond.execution.task.Task
=============================
.. currentmodule:: smallpond.execution.task
.. autoclass:: Task
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~Task.__init__
~Task.add_elapsed_time
~Task.adjust_row_group_size
~Task.clean_complex_attrs
~Task.clean_output
~Task.cleanup
~Task.compute_avg_row_size
~Task.dump
~Task.exec
~Task.finalize
~Task.get_partition_info
~Task.initialize
~Task.inject_fault
~Task.merge_metrics
~Task.oom
~Task.parquet_kv_metadata_bytes
~Task.parquet_kv_metadata_str
~Task.random_float
~Task.random_uint32
~Task.run
~Task.run_on_ray
~Task.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~Task.ctx
~Task.id
~Task.node_id
~Task.sched_epoch
~Task.output_name
~Task.output_root
~Task.dataset
~Task.input_deps
~Task.output_deps
~Task.perf_metrics
~Task.perf_profile
~Task.runtime_state
~Task.input_datasets
~Task.allow_speculative_exec
~Task.any_input_empty
~Task.cpu_limit
~Task.default_output_name
~Task.elapsed_time
~Task.exception
~Task.exec_cq
~Task.exec_id
~Task.exec_on_scheduler
~Task.fail_count
~Task.final_output_abspath
~Task.finish_time
~Task.gpu_limit
~Task.key
~Task.local_gpu
~Task.local_gpu_ranks
~Task.local_rank
~Task.location
~Task.memory_limit
~Task.numa_node
~Task.numpy_random_gen
~Task.output
~Task.output_dirname
~Task.output_filename
~Task.partition_dims
~Task.partition_infos
~Task.partition_infos_as_dict
~Task.python_random_gen
~Task.random_seed_bytes
~Task.ray_dataset_path
~Task.ray_marker_path
~Task.retry_count
~Task.runtime_id
~Task.runtime_output_abspath
~Task.self_contained_output
~Task.skip_when_any_input_empty
~Task.staging_root
~Task.start_time
~Task.status
~Task.temp_abspath
~Task.temp_output
~Task.uniform_failure_prob

View File

@@ -0,0 +1,36 @@
smallpond.execution.task.TaskId
===============================
.. currentmodule:: smallpond.execution.task
.. autoclass:: TaskId
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~TaskId.__init__
~TaskId.as_integer_ratio
~TaskId.bit_length
~TaskId.conjugate
~TaskId.from_bytes
~TaskId.to_bytes
.. rubric:: Attributes
.. autosummary::
~TaskId.denominator
~TaskId.imag
~TaskId.numerator
~TaskId.real

View File

@@ -0,0 +1,30 @@
smallpond.execution.task.TaskRuntimeId
======================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: TaskRuntimeId
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~TaskRuntimeId.__init__
.. rubric:: Attributes
.. autosummary::
~TaskRuntimeId.id
~TaskRuntimeId.epoch
~TaskRuntimeId.retry

View File

@@ -0,0 +1,107 @@
smallpond.execution.task.UserDefinedPartitionProducerTask
=========================================================
.. currentmodule:: smallpond.execution.task
.. autoclass:: UserDefinedPartitionProducerTask
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~UserDefinedPartitionProducerTask.__init__
~UserDefinedPartitionProducerTask.add_elapsed_time
~UserDefinedPartitionProducerTask.adjust_row_group_size
~UserDefinedPartitionProducerTask.clean_complex_attrs
~UserDefinedPartitionProducerTask.clean_output
~UserDefinedPartitionProducerTask.cleanup
~UserDefinedPartitionProducerTask.compute_avg_row_size
~UserDefinedPartitionProducerTask.dump
~UserDefinedPartitionProducerTask.exec
~UserDefinedPartitionProducerTask.finalize
~UserDefinedPartitionProducerTask.get_partition_info
~UserDefinedPartitionProducerTask.initialize
~UserDefinedPartitionProducerTask.inject_fault
~UserDefinedPartitionProducerTask.merge_metrics
~UserDefinedPartitionProducerTask.oom
~UserDefinedPartitionProducerTask.parquet_kv_metadata_bytes
~UserDefinedPartitionProducerTask.parquet_kv_metadata_str
~UserDefinedPartitionProducerTask.random_float
~UserDefinedPartitionProducerTask.random_uint32
~UserDefinedPartitionProducerTask.run
~UserDefinedPartitionProducerTask.run_on_ray
~UserDefinedPartitionProducerTask.set_memory_limit
.. rubric:: Attributes
.. autosummary::
~UserDefinedPartitionProducerTask.partition_func
~UserDefinedPartitionProducerTask.allow_speculative_exec
~UserDefinedPartitionProducerTask.any_input_empty
~UserDefinedPartitionProducerTask.cpu_limit
~UserDefinedPartitionProducerTask.ctx
~UserDefinedPartitionProducerTask.dataset
~UserDefinedPartitionProducerTask.default_output_name
~UserDefinedPartitionProducerTask.dimension
~UserDefinedPartitionProducerTask.elapsed_time
~UserDefinedPartitionProducerTask.exception
~UserDefinedPartitionProducerTask.exec_cq
~UserDefinedPartitionProducerTask.exec_id
~UserDefinedPartitionProducerTask.exec_on_scheduler
~UserDefinedPartitionProducerTask.fail_count
~UserDefinedPartitionProducerTask.final_output_abspath
~UserDefinedPartitionProducerTask.finish_time
~UserDefinedPartitionProducerTask.gpu_limit
~UserDefinedPartitionProducerTask.id
~UserDefinedPartitionProducerTask.input_datasets
~UserDefinedPartitionProducerTask.input_deps
~UserDefinedPartitionProducerTask.key
~UserDefinedPartitionProducerTask.local_gpu
~UserDefinedPartitionProducerTask.local_gpu_ranks
~UserDefinedPartitionProducerTask.local_rank
~UserDefinedPartitionProducerTask.location
~UserDefinedPartitionProducerTask.memory_limit
~UserDefinedPartitionProducerTask.node_id
~UserDefinedPartitionProducerTask.npartitions
~UserDefinedPartitionProducerTask.numa_node
~UserDefinedPartitionProducerTask.numpy_random_gen
~UserDefinedPartitionProducerTask.output
~UserDefinedPartitionProducerTask.output_deps
~UserDefinedPartitionProducerTask.output_dirname
~UserDefinedPartitionProducerTask.output_filename
~UserDefinedPartitionProducerTask.output_name
~UserDefinedPartitionProducerTask.output_root
~UserDefinedPartitionProducerTask.partition_dims
~UserDefinedPartitionProducerTask.partition_infos
~UserDefinedPartitionProducerTask.partition_infos_as_dict
~UserDefinedPartitionProducerTask.partitioned_datasets
~UserDefinedPartitionProducerTask.perf_metrics
~UserDefinedPartitionProducerTask.perf_profile
~UserDefinedPartitionProducerTask.python_random_gen
~UserDefinedPartitionProducerTask.random_seed_bytes
~UserDefinedPartitionProducerTask.ray_dataset_path
~UserDefinedPartitionProducerTask.ray_marker_path
~UserDefinedPartitionProducerTask.retry_count
~UserDefinedPartitionProducerTask.runtime_id
~UserDefinedPartitionProducerTask.runtime_output_abspath
~UserDefinedPartitionProducerTask.runtime_state
~UserDefinedPartitionProducerTask.sched_epoch
~UserDefinedPartitionProducerTask.self_contained_output
~UserDefinedPartitionProducerTask.skip_when_any_input_empty
~UserDefinedPartitionProducerTask.staging_root
~UserDefinedPartitionProducerTask.start_time
~UserDefinedPartitionProducerTask.status
~UserDefinedPartitionProducerTask.temp_abspath
~UserDefinedPartitionProducerTask.temp_output
~UserDefinedPartitionProducerTask.uniform_failure_prob

View File

@@ -0,0 +1,6 @@
smallpond.init
==============
.. currentmodule:: smallpond
.. autofunction:: init

View File

@@ -0,0 +1,46 @@
smallpond.logical.dataset.ArrowTableDataSet
===========================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: ArrowTableDataSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ArrowTableDataSet.__init__
~ArrowTableDataSet.log
~ArrowTableDataSet.merge
~ArrowTableDataSet.partition_by_files
~ArrowTableDataSet.reset
~ArrowTableDataSet.sql_query_fragment
~ArrowTableDataSet.to_arrow_table
~ArrowTableDataSet.to_batch_reader
~ArrowTableDataSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~ArrowTableDataSet.paths
~ArrowTableDataSet.root_dir
~ArrowTableDataSet.recursive
~ArrowTableDataSet.columns
~ArrowTableDataSet.absolute_paths
~ArrowTableDataSet.empty
~ArrowTableDataSet.num_files
~ArrowTableDataSet.num_rows
~ArrowTableDataSet.resolved_paths
~ArrowTableDataSet.udfs
~ArrowTableDataSet.union_by_name

View File

@@ -0,0 +1,51 @@
smallpond.logical.dataset.CsvDataSet
====================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: CsvDataSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~CsvDataSet.__init__
~CsvDataSet.log
~CsvDataSet.merge
~CsvDataSet.partition_by_files
~CsvDataSet.reset
~CsvDataSet.sql_query_fragment
~CsvDataSet.to_arrow_table
~CsvDataSet.to_batch_reader
~CsvDataSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~CsvDataSet.schema
~CsvDataSet.delim
~CsvDataSet.max_line_size
~CsvDataSet.parallel
~CsvDataSet.header
~CsvDataSet.absolute_paths
~CsvDataSet.columns
~CsvDataSet.empty
~CsvDataSet.num_files
~CsvDataSet.num_rows
~CsvDataSet.paths
~CsvDataSet.recursive
~CsvDataSet.resolved_paths
~CsvDataSet.root_dir
~CsvDataSet.udfs
~CsvDataSet.union_by_name

View File

@@ -0,0 +1,46 @@
smallpond.logical.dataset.DataSet
=================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: DataSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~DataSet.__init__
~DataSet.log
~DataSet.merge
~DataSet.partition_by_files
~DataSet.reset
~DataSet.sql_query_fragment
~DataSet.to_arrow_table
~DataSet.to_batch_reader
~DataSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~DataSet.paths
~DataSet.root_dir
~DataSet.recursive
~DataSet.columns
~DataSet.absolute_paths
~DataSet.empty
~DataSet.num_files
~DataSet.num_rows
~DataSet.resolved_paths
~DataSet.udfs
~DataSet.union_by_name

View File

@@ -0,0 +1,46 @@
smallpond.logical.dataset.FileSet
=================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: FileSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~FileSet.__init__
~FileSet.log
~FileSet.merge
~FileSet.partition_by_files
~FileSet.reset
~FileSet.sql_query_fragment
~FileSet.to_arrow_table
~FileSet.to_batch_reader
~FileSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~FileSet.paths
~FileSet.root_dir
~FileSet.recursive
~FileSet.columns
~FileSet.absolute_paths
~FileSet.empty
~FileSet.num_files
~FileSet.num_rows
~FileSet.resolved_paths
~FileSet.udfs
~FileSet.union_by_name

View File

@@ -0,0 +1,49 @@
smallpond.logical.dataset.JsonDataSet
=====================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: JsonDataSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~JsonDataSet.__init__
~JsonDataSet.log
~JsonDataSet.merge
~JsonDataSet.partition_by_files
~JsonDataSet.reset
~JsonDataSet.sql_query_fragment
~JsonDataSet.to_arrow_table
~JsonDataSet.to_batch_reader
~JsonDataSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~JsonDataSet.schema
~JsonDataSet.format
~JsonDataSet.max_object_size
~JsonDataSet.absolute_paths
~JsonDataSet.columns
~JsonDataSet.empty
~JsonDataSet.num_files
~JsonDataSet.num_rows
~JsonDataSet.paths
~JsonDataSet.recursive
~JsonDataSet.resolved_paths
~JsonDataSet.root_dir
~JsonDataSet.udfs
~JsonDataSet.union_by_name

View File

@@ -0,0 +1,46 @@
smallpond.logical.dataset.PandasDataSet
=======================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: PandasDataSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PandasDataSet.__init__
~PandasDataSet.log
~PandasDataSet.merge
~PandasDataSet.partition_by_files
~PandasDataSet.reset
~PandasDataSet.sql_query_fragment
~PandasDataSet.to_arrow_table
~PandasDataSet.to_batch_reader
~PandasDataSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~PandasDataSet.paths
~PandasDataSet.root_dir
~PandasDataSet.recursive
~PandasDataSet.columns
~PandasDataSet.absolute_paths
~PandasDataSet.empty
~PandasDataSet.num_files
~PandasDataSet.num_rows
~PandasDataSet.resolved_paths
~PandasDataSet.udfs
~PandasDataSet.union_by_name

View File

@@ -0,0 +1,54 @@
smallpond.logical.dataset.ParquetDataSet
========================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: ParquetDataSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ParquetDataSet.__init__
~ParquetDataSet.create_from
~ParquetDataSet.load_partitioned_datasets
~ParquetDataSet.log
~ParquetDataSet.merge
~ParquetDataSet.partition_by_files
~ParquetDataSet.partition_by_rows
~ParquetDataSet.partition_by_size
~ParquetDataSet.remove_empty_files
~ParquetDataSet.reset
~ParquetDataSet.sql_query_fragment
~ParquetDataSet.to_arrow_table
~ParquetDataSet.to_batch_reader
~ParquetDataSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~ParquetDataSet.generated_columns
~ParquetDataSet.absolute_paths
~ParquetDataSet.columns
~ParquetDataSet.empty
~ParquetDataSet.estimated_data_size
~ParquetDataSet.num_files
~ParquetDataSet.num_rows
~ParquetDataSet.paths
~ParquetDataSet.recursive
~ParquetDataSet.resolved_paths
~ParquetDataSet.resolved_row_ranges
~ParquetDataSet.root_dir
~ParquetDataSet.udfs
~ParquetDataSet.union_by_name

View File

@@ -0,0 +1,47 @@
smallpond.logical.dataset.PartitionedDataSet
============================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: PartitionedDataSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PartitionedDataSet.__init__
~PartitionedDataSet.log
~PartitionedDataSet.merge
~PartitionedDataSet.partition_by_files
~PartitionedDataSet.reset
~PartitionedDataSet.sql_query_fragment
~PartitionedDataSet.to_arrow_table
~PartitionedDataSet.to_batch_reader
~PartitionedDataSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~PartitionedDataSet.datasets
~PartitionedDataSet.absolute_paths
~PartitionedDataSet.columns
~PartitionedDataSet.empty
~PartitionedDataSet.num_files
~PartitionedDataSet.num_rows
~PartitionedDataSet.paths
~PartitionedDataSet.recursive
~PartitionedDataSet.resolved_paths
~PartitionedDataSet.root_dir
~PartitionedDataSet.udfs
~PartitionedDataSet.union_by_name

View File

@@ -0,0 +1,48 @@
smallpond.logical.dataset.SqlQueryDataSet
=========================================
.. currentmodule:: smallpond.logical.dataset
.. autoclass:: SqlQueryDataSet
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~SqlQueryDataSet.__init__
~SqlQueryDataSet.log
~SqlQueryDataSet.merge
~SqlQueryDataSet.partition_by_files
~SqlQueryDataSet.reset
~SqlQueryDataSet.sql_query_fragment
~SqlQueryDataSet.to_arrow_table
~SqlQueryDataSet.to_batch_reader
~SqlQueryDataSet.to_pandas
.. rubric:: Attributes
.. autosummary::
~SqlQueryDataSet.sql_query
~SqlQueryDataSet.query_builder
~SqlQueryDataSet.absolute_paths
~SqlQueryDataSet.columns
~SqlQueryDataSet.empty
~SqlQueryDataSet.num_files
~SqlQueryDataSet.num_rows
~SqlQueryDataSet.paths
~SqlQueryDataSet.recursive
~SqlQueryDataSet.resolved_paths
~SqlQueryDataSet.root_dir
~SqlQueryDataSet.udfs
~SqlQueryDataSet.union_by_name

View File

@@ -0,0 +1,39 @@
smallpond.logical.node.ArrowBatchNode
=====================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: ArrowBatchNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ArrowBatchNode.__init__
~ArrowBatchNode.add_perf_metrics
~ArrowBatchNode.create_task
~ArrowBatchNode.get_perf_stats
~ArrowBatchNode.process
~ArrowBatchNode.slim_copy
~ArrowBatchNode.spawn
~ArrowBatchNode.task_factory
.. rubric:: Attributes
.. autosummary::
~ArrowBatchNode.default_batch_size
~ArrowBatchNode.default_row_group_size
~ArrowBatchNode.default_secs_checkpoint_interval
~ArrowBatchNode.enable_resource_boost
~ArrowBatchNode.num_partitions

View File

@@ -0,0 +1,37 @@
smallpond.logical.node.ArrowComputeNode
=======================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: ArrowComputeNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ArrowComputeNode.__init__
~ArrowComputeNode.add_perf_metrics
~ArrowComputeNode.create_task
~ArrowComputeNode.get_perf_stats
~ArrowComputeNode.process
~ArrowComputeNode.slim_copy
~ArrowComputeNode.spawn
~ArrowComputeNode.task_factory
.. rubric:: Attributes
.. autosummary::
~ArrowComputeNode.default_row_group_size
~ArrowComputeNode.enable_resource_boost
~ArrowComputeNode.num_partitions

View File

@@ -0,0 +1,39 @@
smallpond.logical.node.ArrowStreamNode
======================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: ArrowStreamNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ArrowStreamNode.__init__
~ArrowStreamNode.add_perf_metrics
~ArrowStreamNode.create_task
~ArrowStreamNode.get_perf_stats
~ArrowStreamNode.process
~ArrowStreamNode.slim_copy
~ArrowStreamNode.spawn
~ArrowStreamNode.task_factory
.. rubric:: Attributes
.. autosummary::
~ArrowStreamNode.default_batch_size
~ArrowStreamNode.default_row_group_size
~ArrowStreamNode.default_secs_checkpoint_interval
~ArrowStreamNode.enable_resource_boost
~ArrowStreamNode.num_partitions

View File

@@ -0,0 +1,34 @@
smallpond.logical.node.ConsolidateNode
======================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: ConsolidateNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ConsolidateNode.__init__
~ConsolidateNode.add_perf_metrics
~ConsolidateNode.create_task
~ConsolidateNode.get_perf_stats
~ConsolidateNode.slim_copy
~ConsolidateNode.task_factory
.. rubric:: Attributes
.. autosummary::
~ConsolidateNode.enable_resource_boost
~ConsolidateNode.num_partitions

View File

@@ -0,0 +1,25 @@
smallpond.logical.node.Context
==============================
.. currentmodule:: smallpond.logical.node
.. autoclass:: Context
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~Context.__init__
~Context.create_duckdb_extension
~Context.create_external_module
~Context.create_function

View File

@@ -0,0 +1,6 @@
smallpond.logical.node.DataSetPartitionNode
===========================================
.. currentmodule:: smallpond.logical.node
.. autofunction:: DataSetPartitionNode

View File

@@ -0,0 +1,34 @@
smallpond.logical.node.DataSinkNode
===================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: DataSinkNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~DataSinkNode.__init__
~DataSinkNode.add_perf_metrics
~DataSinkNode.create_task
~DataSinkNode.get_perf_stats
~DataSinkNode.slim_copy
~DataSinkNode.task_factory
.. rubric:: Attributes
.. autosummary::
~DataSinkNode.enable_resource_boost
~DataSinkNode.num_partitions

View File

@@ -0,0 +1,34 @@
smallpond.logical.node.DataSourceNode
=====================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: DataSourceNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~DataSourceNode.__init__
~DataSourceNode.add_perf_metrics
~DataSourceNode.create_task
~DataSourceNode.get_perf_stats
~DataSourceNode.slim_copy
~DataSourceNode.task_factory
.. rubric:: Attributes
.. autosummary::
~DataSourceNode.enable_resource_boost
~DataSourceNode.num_partitions

View File

@@ -0,0 +1,40 @@
smallpond.logical.node.EvenlyDistributedPartitionNode
=====================================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: EvenlyDistributedPartitionNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~EvenlyDistributedPartitionNode.__init__
~EvenlyDistributedPartitionNode.add_perf_metrics
~EvenlyDistributedPartitionNode.create_consumer_task
~EvenlyDistributedPartitionNode.create_merge_task
~EvenlyDistributedPartitionNode.create_producer_task
~EvenlyDistributedPartitionNode.create_split_task
~EvenlyDistributedPartitionNode.create_task
~EvenlyDistributedPartitionNode.get_perf_stats
~EvenlyDistributedPartitionNode.slim_copy
~EvenlyDistributedPartitionNode.task_factory
.. rubric:: Attributes
.. autosummary::
~EvenlyDistributedPartitionNode.enable_resource_boost
~EvenlyDistributedPartitionNode.max_card_of_producers_x_consumers
~EvenlyDistributedPartitionNode.max_num_producer_tasks
~EvenlyDistributedPartitionNode.num_partitions

View File

@@ -0,0 +1,45 @@
smallpond.logical.node.HashPartitionNode
========================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: HashPartitionNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~HashPartitionNode.__init__
~HashPartitionNode.add_perf_metrics
~HashPartitionNode.create_consumer_task
~HashPartitionNode.create_merge_task
~HashPartitionNode.create_producer_task
~HashPartitionNode.create_split_task
~HashPartitionNode.create_task
~HashPartitionNode.get_perf_stats
~HashPartitionNode.slim_copy
~HashPartitionNode.task_factory
.. rubric:: Attributes
.. autosummary::
~HashPartitionNode.default_cpu_limit
~HashPartitionNode.default_data_partition_column
~HashPartitionNode.default_engine_type
~HashPartitionNode.default_memory_limit
~HashPartitionNode.default_row_group_size
~HashPartitionNode.enable_resource_boost
~HashPartitionNode.max_card_of_producers_x_consumers
~HashPartitionNode.max_num_producer_tasks
~HashPartitionNode.num_partitions

View File

@@ -0,0 +1,41 @@
smallpond.logical.node.LimitNode
================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: LimitNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~LimitNode.__init__
~LimitNode.add_perf_metrics
~LimitNode.create_merge_task
~LimitNode.create_task
~LimitNode.get_perf_stats
~LimitNode.slim_copy
~LimitNode.spawn
~LimitNode.task_factory
.. rubric:: Attributes
.. autosummary::
~LimitNode.default_cpu_limit
~LimitNode.default_memory_limit
~LimitNode.default_row_group_size
~LimitNode.enable_resource_boost
~LimitNode.max_udf_cpu_limit
~LimitNode.num_partitions
~LimitNode.oneline_query

View File

@@ -0,0 +1,40 @@
smallpond.logical.node.LoadPartitionedDataSetNode
=================================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: LoadPartitionedDataSetNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~LoadPartitionedDataSetNode.__init__
~LoadPartitionedDataSetNode.add_perf_metrics
~LoadPartitionedDataSetNode.create_consumer_task
~LoadPartitionedDataSetNode.create_merge_task
~LoadPartitionedDataSetNode.create_producer_task
~LoadPartitionedDataSetNode.create_split_task
~LoadPartitionedDataSetNode.create_task
~LoadPartitionedDataSetNode.get_perf_stats
~LoadPartitionedDataSetNode.slim_copy
~LoadPartitionedDataSetNode.task_factory
.. rubric:: Attributes
.. autosummary::
~LoadPartitionedDataSetNode.enable_resource_boost
~LoadPartitionedDataSetNode.max_card_of_producers_x_consumers
~LoadPartitionedDataSetNode.max_num_producer_tasks
~LoadPartitionedDataSetNode.num_partitions

View File

@@ -0,0 +1,30 @@
smallpond.logical.node.LogicalPlan
==================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: LogicalPlan
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~LogicalPlan.__init__
~LogicalPlan.explain_str
~LogicalPlan.graph
.. rubric:: Attributes
.. autosummary::
~LogicalPlan.nodes

View File

@@ -0,0 +1,36 @@
smallpond.logical.node.LogicalPlanVisitor
=========================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: LogicalPlanVisitor
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~LogicalPlanVisitor.__init__
~LogicalPlanVisitor.generic_visit
~LogicalPlanVisitor.visit
~LogicalPlanVisitor.visit_arrow_compute_node
~LogicalPlanVisitor.visit_arrow_stream_node
~LogicalPlanVisitor.visit_consolidate_node
~LogicalPlanVisitor.visit_data_sink_node
~LogicalPlanVisitor.visit_data_source_node
~LogicalPlanVisitor.visit_limit_node
~LogicalPlanVisitor.visit_partition_node
~LogicalPlanVisitor.visit_projection_node
~LogicalPlanVisitor.visit_python_script_node
~LogicalPlanVisitor.visit_query_engine_node
~LogicalPlanVisitor.visit_root_node
~LogicalPlanVisitor.visit_union_node

View File

@@ -0,0 +1,34 @@
smallpond.logical.node.Node
===========================
.. currentmodule:: smallpond.logical.node
.. autoclass:: Node
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~Node.__init__
~Node.add_perf_metrics
~Node.create_task
~Node.get_perf_stats
~Node.slim_copy
~Node.task_factory
.. rubric:: Attributes
.. autosummary::
~Node.enable_resource_boost
~Node.num_partitions

View File

@@ -0,0 +1,36 @@
smallpond.logical.node.NodeId
=============================
.. currentmodule:: smallpond.logical.node
.. autoclass:: NodeId
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~NodeId.__init__
~NodeId.as_integer_ratio
~NodeId.bit_length
~NodeId.conjugate
~NodeId.from_bytes
~NodeId.to_bytes
.. rubric:: Attributes
.. autosummary::
~NodeId.denominator
~NodeId.imag
~NodeId.numerator
~NodeId.real

View File

@@ -0,0 +1,39 @@
smallpond.logical.node.PandasBatchNode
======================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: PandasBatchNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PandasBatchNode.__init__
~PandasBatchNode.add_perf_metrics
~PandasBatchNode.create_task
~PandasBatchNode.get_perf_stats
~PandasBatchNode.process
~PandasBatchNode.slim_copy
~PandasBatchNode.spawn
~PandasBatchNode.task_factory
.. rubric:: Attributes
.. autosummary::
~PandasBatchNode.default_batch_size
~PandasBatchNode.default_row_group_size
~PandasBatchNode.default_secs_checkpoint_interval
~PandasBatchNode.enable_resource_boost
~PandasBatchNode.num_partitions

View File

@@ -0,0 +1,37 @@
smallpond.logical.node.PandasComputeNode
========================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: PandasComputeNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PandasComputeNode.__init__
~PandasComputeNode.add_perf_metrics
~PandasComputeNode.create_task
~PandasComputeNode.get_perf_stats
~PandasComputeNode.process
~PandasComputeNode.slim_copy
~PandasComputeNode.spawn
~PandasComputeNode.task_factory
.. rubric:: Attributes
.. autosummary::
~PandasComputeNode.default_row_group_size
~PandasComputeNode.enable_resource_boost
~PandasComputeNode.num_partitions

View File

@@ -0,0 +1,40 @@
smallpond.logical.node.PartitionNode
====================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: PartitionNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PartitionNode.__init__
~PartitionNode.add_perf_metrics
~PartitionNode.create_consumer_task
~PartitionNode.create_merge_task
~PartitionNode.create_producer_task
~PartitionNode.create_split_task
~PartitionNode.create_task
~PartitionNode.get_perf_stats
~PartitionNode.slim_copy
~PartitionNode.task_factory
.. rubric:: Attributes
.. autosummary::
~PartitionNode.enable_resource_boost
~PartitionNode.max_card_of_producers_x_consumers
~PartitionNode.max_num_producer_tasks
~PartitionNode.num_partitions

View File

@@ -0,0 +1,34 @@
smallpond.logical.node.ProjectionNode
=====================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: ProjectionNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~ProjectionNode.__init__
~ProjectionNode.add_perf_metrics
~ProjectionNode.create_task
~ProjectionNode.get_perf_stats
~ProjectionNode.slim_copy
~ProjectionNode.task_factory
.. rubric:: Attributes
.. autosummary::
~ProjectionNode.enable_resource_boost
~ProjectionNode.num_partitions

View File

@@ -0,0 +1,36 @@
smallpond.logical.node.PythonScriptNode
=======================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: PythonScriptNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~PythonScriptNode.__init__
~PythonScriptNode.add_perf_metrics
~PythonScriptNode.create_task
~PythonScriptNode.get_perf_stats
~PythonScriptNode.process
~PythonScriptNode.slim_copy
~PythonScriptNode.spawn
~PythonScriptNode.task_factory
.. rubric:: Attributes
.. autosummary::
~PythonScriptNode.enable_resource_boost
~PythonScriptNode.num_partitions

View File

@@ -0,0 +1,40 @@
smallpond.logical.node.RangePartitionNode
=========================================
.. currentmodule:: smallpond.logical.node
.. autoclass:: RangePartitionNode
.. automethod:: __init__
.. rubric:: Methods
.. autosummary::
~RangePartitionNode.__init__
~RangePartitionNode.add_perf_metrics
~RangePartitionNode.create_consumer_task
~RangePartitionNode.create_merge_task
~RangePartitionNode.create_producer_task
~RangePartitionNode.create_split_task
~RangePartitionNode.create_task
~RangePartitionNode.get_perf_stats
~RangePartitionNode.slim_copy
~RangePartitionNode.task_factory
.. rubric:: Attributes
.. autosummary::
~RangePartitionNode.enable_resource_boost
~RangePartitionNode.max_card_of_producers_x_consumers
~RangePartitionNode.max_num_producer_tasks
~RangePartitionNode.num_partitions

Some files were not shown because too many files have changed in this diff Show More