mirror of
https://github.com/deepseek-ai/smallpond
synced 2025-06-26 18:27:45 +00:00
init
This commit is contained in:
89
docs/source/api/nodes.rst
Normal file
89
docs/source/api/nodes.rst
Normal file
@@ -0,0 +1,89 @@
|
||||
.. currentmodule:: smallpond.logical.node
|
||||
|
||||
.. _nodes:
|
||||
|
||||
Nodes
|
||||
=====
|
||||
|
||||
Nodes represent the fundamental building blocks of a data processing pipeline. Each node encapsulates a specific operation or transformation that can be applied to a dataset.
|
||||
Nodes can be chained together to form a logical plan, which is a directed acyclic graph (DAG) of nodes that represent the overall data processing workflow.
|
||||
|
||||
A typical workflow to create a logical plan is as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Create a global context
|
||||
ctx = Context()
|
||||
|
||||
# Create a dataset
|
||||
dataset = ParquetDataSet("path/to/dataset/*.parquet")
|
||||
|
||||
# Create a data source node
|
||||
node = DataSourceNode(ctx, dataset)
|
||||
|
||||
# Partition the data
|
||||
node = DataSetPartitionNode(ctx, (node,), npartitions=2)
|
||||
|
||||
# Create a SQL engine node to transform the data
|
||||
node = SqlEngineNode(ctx, (node,), "SELECT * FROM {0}")
|
||||
|
||||
# Create a logical plan from the root node
|
||||
plan = LogicalPlan(ctx, node)
|
||||
|
||||
You can then create tasks from the logical plan, see :ref:`tasks`.
|
||||
|
||||
Notable properties of Node:
|
||||
|
||||
1. Nodes are partitioned. Each Node generates a series of tasks, with each task processing one partition of data.
|
||||
2. The input and output of a Node are a series of partitioned Datasets. A Node may write data to shared storage and return a new Dataset, or it may simply recombine the input Datasets.
|
||||
|
||||
Context
|
||||
-------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
Context
|
||||
NodeId
|
||||
|
||||
LogicalPlan
|
||||
-----------
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
LogicalPlan
|
||||
LogicalPlanVisitor
|
||||
.. Planner
|
||||
|
||||
Nodes
|
||||
-----
|
||||
|
||||
.. autosummary::
|
||||
:toctree: ../generated
|
||||
|
||||
Node
|
||||
DataSetPartitionNode
|
||||
ArrowBatchNode
|
||||
ArrowComputeNode
|
||||
ArrowStreamNode
|
||||
ConsolidateNode
|
||||
DataSinkNode
|
||||
DataSourceNode
|
||||
EvenlyDistributedPartitionNode
|
||||
HashPartitionNode
|
||||
LimitNode
|
||||
LoadPartitionedDataSetNode
|
||||
PandasBatchNode
|
||||
PandasComputeNode
|
||||
PartitionNode
|
||||
ProjectionNode
|
||||
PythonScriptNode
|
||||
RangePartitionNode
|
||||
RepeatPartitionNode
|
||||
RootNode
|
||||
ShuffleNode
|
||||
SqlEngineNode
|
||||
UnionNode
|
||||
UserDefinedPartitionNode
|
||||
UserPartitionedDataSourceNode
|
||||
Reference in New Issue
Block a user