4. Architecture¶
- compute
- computation
- phase
The definition & execution of networked operation is split in 1+2 phases:
… it is constrained by these IO data-structures:
… populates these low-level data-structures:
network graph (COMPOSE time)
execution dag (COMPILE time)
execution steps (COMPILE time)
solution (EXECUTE time)
… and utilizes these main classes:
graphtik.op.FunctionalOperation
([fn, name, …])An operation performing a callable (ie a function, a method, a lambda).
graphtik.netop.NetworkOperation
(operations, …)An operation that can compute a network-graph of operations.
graphtik.network.Network
(*operations[, graph])A graph of operations that can compile an execution plan.
A pre-compiled list of operation steps that can execute for the given inputs/outputs.
graphtik.network.Solution
(plan, input_values)A chain-map collecting solution outputs and execution state (eg overwrites)
- compose
- composition
The phase where operations are constructed and grouped into netops and corresponding networks.
Tip
Use
operation()
factory to constructFunctionalOperation
instances.Use
compose()
factory to prepare the net internally, and buildNetworkOperation
instances.
- compile
- compilation
The phase where the
Network
creates a new execution plan by pruning all graph nodes into a subgraph dag, and deriving the execution steps.- execute
- execution
- sequential
The phase where the
ExecutionPlan
calls the underlying functions of all operations contained in execution steps, with inputs/outputs taken from the solution.Currently there are 2 ways to execute:
sequential
parallel, with a
multiprocessing.pool.ProcessPool
Plans may abort their execution by setting the abort run global flag.
- net
- network
the
Network
contains a graph of operations and can compile (and cache) execution plans, or prune a cloned network for given inputs/outputs/node predicate.- plan
- execution plan
Class
ExecutionPlan
perform the execution phase which contains the dag and the steps.compileed execution plans are cached in
Network._cached_plans
across runs with (inputs, outputs, predicate) as key.- solution
A
Solution
instance created internally byNetworkOperation.compute()
to hold the values both inputs & outputs, and the status of executed operations. It is based on acollections.ChainMap
, to keep one dictionary for each operation executed +1 for inputs.The results of the last operation executed “wins” in the outputs produced, and the base (least precedence) is the inputs given when the execution started.
- graph
- network graph
A graph of operations linked by their dependencies forming a pipeline.
The
Network.graph
(currently a DAG) contains allFunctionalOperation
and data-nodes (string or modifier) of a netop.They are layed out and connected by repeated calls of
Network._append_operation()
by Network constructor during composition.This graph is then pruned to extract the dag, and the execution steps are calculated, all ingredients for a new
ExecutionPlan
.- prune
- pruning
A subphase of compilation performed by method
Network._prune_graph()
, which extracts a subgraph dag that does not contain any unsatisfied operations.It topologically sorts the graph, and prunes based on given inputs, asked outputs, node predicate and operation needs & provides.
- unsatisfied operation
The core of pruning & rescheduling, performed by
network._unsatisfied_operations()
function, which collects all operations with unreachable dependencies:- dag
- execution dag
- solution dag
There are 2 directed-acyclic-graphs instances used:
the
ExecutionPlan.dag
, in the execution plan, which contains the pruned nodes, used to decide the execution steps;the
Solution.dag
in the solution, which derives the canceled operations due to rescheduled/failed operations upstream.
- steps
- execution steps
The plan contains a list of the operation-nodes only from the dag, topologically sorted, and interspersed with instruction steps needed to compute the asked outputs from the given inputs.
They are built by
Network._build_execution_steps()
based on the subgraph dag.The only instruction step is for performing evictions.
- evictions
A memory footprint optimization where intermediate inputs & outputs are erased from solution as soon as they are not needed further down the dag.
Evictions are pre-calculated during compilation, where
_EvictInstruction
steps are inserted in the execution plan.- overwrites
Values in the solution that have been written by more than one operations, accessed by
Solution.overwrites
. Note that solution sideffect dependency produce, almost always, overwrites.- inputs
The named input values that are fed into an operation (or netop) through
Operation.compute()
method according to its needs.These values are either:
given by the user to the outer netop, at the start of a computation, or
derived from solution using needs as keys, during intermediate execution.
- outputs
The dictionary of computed values returned by an operation (or a netop) matching its provides, when method
Operation.compute()
is called.Those values are either:
retained in the solution, internally during execution, keyed by the respective provide, or
returned to user after the outer netop has finished computation.
When no specific outputs requested from a netop,
NetworkOperation.compute()
returns all intermediate inputs along with the outputs, that is, no evictions happens.An operation may return partial outputs.
- netop
- network operation
- pipeline
The
NetworkOperation
class holding a network of operations and dependencies.- operation
Either the abstract notion of an action with specified needs and provides, dependencies, or the concrete wrapper
FunctionalOperation
for (anycallable()
), that feeds on inputs and update outputs, from/to solution, or given-by/returned-to the user by a netop.The distinction between needs/provides and inputs/outputs is akin to function parameters and arguments during define-time and run-time, respectively.
- dependency
The name of a solution value an operation needs or provides.
Dependencies are declared during composition, when building
FunctionalOperation
instances. Operations are then interlinked together, by matching the needs & provides of all operations contained in a pipeline.During compilation the graph is then pruned based on the reachability of the dependencies.
During execution
Operation.compute()
performs 2 “matchings”:inputs & outputs in solution are accessed by the needs & provides names of the operations;
operation needs & provides are zipped against the underlying function’s arguments and results.
These matchings are affected by modifiers.
- needs
- fn_needs
The list of dependency names an operation requires from solution as inputs,
roughly corresponding to underlying function’s arguments (fn_needs).
Specifically,
Operation.compute()
extracts input values from solution by these names, and matches them against function arguments, mostly by their positional order. Whenever this matching is not 1-to-1, and function-arguments differ from the regular needs, modifiers must be used.- provides
- op_provides
- fn_provides
The list of dependency names an operation writes to the solution as outputs,
roughly corresponding to underlying function’s results (fn_provides).
Specifically,
Operation.compute()
“zips” this list-of-names with the output values produced when the operation’s function is called. Whenever this “zipping” is not 1-to-1, and function-results differ from the regular operation (op_provides) (or results are not a list), it is possible to:mark the operation that its function returns dictionary,
artificially extended the provides with aliased fn_provides, or
use modifiers to annotate certain names as sideffects,
- alias
Map an existing name in fn_provides into a duplicate, artificial one in op_provides .
You cannot alias an alias. See Aliased provides
- returns dictionary
When an operation is marked with this flag, the underlying function is not expected to return fn_provides as a sequence but as a dictionary; hence, no “zipping” of function-results –> op_provides takes place.
Usefull for operation returning partial outputs.
- modifier
Annotations on a dependency such as optionals & sideffects.
(see
graphtik.modifiers
module)- optionals
A modifier applied on needs only dependencies, corresponding to either:
function arguments-with-defaults (annotated with
optional
), or
that do not hinder execution of the operation if absent from inputs.
- sideffects
A modifier denoting a fictive dependency linking operations into virtual flows,
without real data exchanges.
A sideffect is a dependency denoting a modification to some internal state that may not be fully represented in the graph & solution. Sideffects participate in the compilation of the graph, and a dummy value gets written in the solution during execution, but they are never given/asked to/from functions.
There are actually 2 relevant modifiers:
An abstract sideffect (annotated with
sideffect
modifier) describing modifications taking place beyond the scope of the solution.The solution sideffect (annotated with
sol_sideffect
modifier) denoting modifications on dependencies that are read and written in solution.
Attention
Sideffects are not compatible with optionals and partial outputs.
- solution sideffect
- sideffected
A modifier that denotes sideffects on a dependency that exists in solution, …
allowing to declare an operation that both needs and provides that sideffected dependency.
All solution sideffect outputs produce, by definition, overwrites. It is annotated with
sol_sideffect
class.- reschedule
- rescheduling
- partial outputs
- partial operation
- canceled operation
The partial pruning of the solution’s dag during execution. It happens when any of these 2 conditions apply:
an operation is marked with the
FunctionalOperation.rescheduled
attribute, which means that its underlying callable may produce only a subset of its provides (partial outputs);endurance is enabled, either globally (in the configurations), or for a specific operation.
the solution must then reschedule the remaining operations downstream, and possibly cancel some of those ( assigned in
Solution.canceled
).Operations with partial outputs are incompatible with solution sideffects, i.e. they cannot control which of their sideffects they have produced, it’s either all or nothing.
- endurance
- endured
Keep executing as many operations as possible, even if some of them fail. Endurance for an operation is enabled if
set_endure_operations()
is true globally in the configurations or ifFunctionalOperation.endured
is true.You may interrogate
Solution.executed
to discover the status of each executed operations or call one ofcheck_if_incomplete()
orscream_if_incomplete()
.- predicate
- node predicate
A callable(op, node-data) that should return true for nodes to be included in graph during compilation.
- abort run
A global configurations flag that when set with
abort_run()
function, it halts the execution of all currently or future plans.It is reset automatically on every call of
NetworkOperation.compute()
(after a successful intermediate compilation), or manually, by callingreset_abort()
.- parallel
- parallel execution
- execution pool
- task
execute operations in parallel, with a thread pool or process pool (instead of sequential). Operations and netop are marked as such on construction, or enabled globally from configurations.
Note a sideffects are not expected to function with process pools, certainly not when marshalling is enabled.
- process pool
When the
multiprocessing.pool.Pool
class is used for parallel execution, the tasks must be communicated to/from the worker process, which requires pickling, and that may fail. With pickling failures you may try marshalling with dill library, and see if that helps.Note that sideffects are not expected to function at all. certainly not when marshalling is enabled.
- thread pool
When the
multiprocessing.dummy.Pool()
class is used for parallel execution, the tasks are run in process, so no marshalling is needed.- marshalling
Pickling parallel operations and their inputs/outputs using the
dill
module. It is configured either globally withset_marshal_tasks()
or set with a flag on each operation / netop.Note that sideffects do not work when this is enabled.
- plottable
Objects that can plot their graph network, such as those inheriting
Plottable
, (FunctionalOperation
,NetworkOperation
,Network
,ExecutionPlan
,Solution
) or apydot.Dot
instance (the result of thePlottable.plot()
method).Such objects may render as SVG in Jupiter notebooks (through their
plot()
method) and can render in a Sphinx site with with thegraphtik
RsT directive. You may control the rendered image as explained in the tip of the Plotting section.SVGs are in rendered with the zoom-and-pan javascript library
Attention
Zoom-and-pan does not work in Sphinx sites for Chrome locally - serve the HTML files through some HTTP server, e.g. launch this command to view the site of this project:
python -m http.server 8080 --directory build/sphinx/html/
- plotter
A
Plotter
is responsible for rendering plottables as images. It is the active plotter that does that, unless overridden in aPlottable.plot()
call. Plotters can be customized by various means, such plot theme.- active plotter
- default active plotter
The plotter currently installed “in-context” of the respective graphtik configuration - this term implies also any Plot customizations done on the active plotter (such as plot theme).
Installation happens by calling one of
active_plotter_plugged()
orset_active_plotter()
functions.The default active plotter is the plotter instance that this project comes pre-configured with, ie, when no plot-customizations have yet happened.
- plot theme
- theme expansion
The mergeable and auto-expandable attributes of
plot.Theme
instances in use.The actual theme in-use is the
Plotter.default_theme
attribute of the active plotter, unless overridden with thetheme
parameter when callingPlottable.plot()
(conveyed internally as the value of thePlotArgs.theme
attribute).The following expansions apply in the attribute-values of
Theme
instances:Any lists will be merged (important for multi-valued Graphviz attributes like
style
).Any
Ref
instances will be resolved against the attributes of the current theme.Any jinja2 templates will be rendered, using as template-arguments all the attributes of the
plot_args
instance in use.
Attention
All
Theme
class attributes are deep-copied when constructing new instances, to avoid modifying them by mistake, while attempting to update instance attributes instead (hint: allmost all its attributes are containers i.e. dicts).Therefore it is recommended to use other means for Plot customizations instead of modifying directly theme’s class-attributes.
- configurations
- graphtik configuration
The functions controlling compile & execution globally are defined in
config
module and +1 ingraphtik.plot
module; the underlying global data are stored incontextvars.ContextVar
instances, to allow for nested control.All boolean configuration flags are tri-state (
None, False, True
), allowing to “force” all operations, when they are not set to theNone
value. All of them default toNone
(false).