Wire Cell Interface to TBB flow graph
(top)

Table of Contents

Note, TBB docs seem to be under constant churn so expect links here to become dead.

1. Nodes

TBB nodes used:

source_node
takes body object, activation
function_node
takes body object, concurrency limit, buffered input by default.
multifunction_node
templated, typedef on in/out types. Out is a tbb::flow::tuple which is a std::tuple, takes a body which is given a collection of output ports into which it may put actual output.
composite_node
N-to-M external ports encapsulating a sub-graph of internal nodes
join_node
queued input (by default), sends tuple
split_node
part of fanout

2. Policy

TBB flow_graph nodes follow policies as to their fowarding, buffering and reception.

Forwarding:

broadcast-push
the message will be pushed to as many successors as will accept the message. If no successor accepts the message, the fate of the message depends on the output buffering policy of the node.
single-push
if the message is accepted by a successor, no further push of that message will occur. This policy is unique to buffernode, queuenode, priorityqueuenode and sequencernode. If no successor accepts the message, it will be retained for a possible future push or pull.

Buffering:

buffering
if no successor accepts a message, it is stored so subsequent node processing can use it. Nodes that buffer outputs have “yes” in the column “tryget()?” below.
discarding
if no successor accepts a message, it is discarded and has no further effect on graph execution. Nodes that discard outputs have “no” in the column “tryget()?” below.

Reception

accept
the node will deal with as many messages as are pushed to it.
switch
the message is not accepted, and the the state of the edge will change from push to pull mode.

There is then a table that summarizes the policies for each node.

3. Understanding the policy

How to make sense of the policies, in part, requires to understand the TBB Message Passing Protocol. The link explains well the idea of an edge being in either “push” or “pull” mode. When in “push” the sender tries to “put” and if that fails the mode is set to “pull”. When in “pull” the receiver tries to “get” and if that fails the mode is set to “push”.

But, a change of mode is subject to the buffering policy. If a preceding node with:

  • forwarding = broadcast-push
  • buffering = discarding (try_get() = no)

shares an edge with a successor node with

  • reception = switch

data may be dropped. This will occur if the preceding node is any of:

  • function_node
  • multifunction_node
  • split_node

and the succeeding node is any of

  • function_node<rejecting>
  • multifunction_node<rejecting>

Where the <rejecting> is a non-default template argument for function nodes. The default <queueing> argument allows the function nodes have infinite input FIFO buffer. A consequence of using <queueing> is that RAM usage may grow without bound.

4. Out-of-order

4.1. Concurrency

The TBB split_node is used in part to execute WCT’s IFanout nodes. It trails a function_node with a WCT body which is responsible to produce a tuple that is passed to the split_node to send each tuple element out a corresponding port. The split_node has discarding and broadcast-push policy and it also has unlimited concurrency. This concurrency means that split_node is a source of potentially out-of-order (OOO) data. To correct that, following each split_node output port is a sequencer_node which reorders based on a sequence number added by the original function_node calling the WCT fanout function body. Because this seqno violates the WCT data model, a final function_node follows each sequencer_node which strips off the seqno. So, each N-way fanout is really a 2*(N+1) subgraph.

4.2. Buffering

Despite the TBB documentation saying function_node has a FIFO queue at its input, it behaves as if it has a unordered buffer. This has been noticed by others and awaiting response by Intel. After fixing the above concurrency-related OOO, this can be observed in a WCT graph between two function_node of serial concurrency.

The trial solution is to bolt on a sequencer_node after function_node and similar. This requires a full redefining of the data type used at the TBB node level. Previously it shared the same type as with WCT nodes (boost::any) and now that is combined with a sequence number into a std::pair<size_t, boost::any>. Node bodies strip and discard any input seqno prior to passing the any for WCT node input. Node bodies also maintain a seqno, incrementing on each call, and combined with WCT node output for TBB level output.

Author: Brett Viren

Created: 2023-05-03 Wed 11:39

Validate