Pgrapher Configuration Improvements
A powerful new idiom has been developed to simplify developing Jsonnet configuration files for Wire-Cell toolkit's Pgrapher app component.
The previous post described the Pgrapher execution engine. It included a short configuration snippet. Astute readers that studied that example may conclude that the power and flexibility of Pgrapher may come at a cost of writing a lot of configuration code.
In part, they'd be right. It is actually fairly straight-forward to configure a simple job and especially if one has in mind a single target. However, the power of Jsonnet and the flexibility of Pgrapher essentially begs one to construct modular "chunks" of configuration that can be reused to build up to a variety of final targets. This means the configuration author needs a way to encapsulate incomplete (sub) graphs, aggregate these into larger graphs, etc, until a fully complete graph can be built from the smaller parts.
Graphs of Nodes and Nodes of Graphs
To jump the the punch line, the idiom that has been developed is to allow for an incomplete (sub) graph to be recast into a single "node" which may then be used in a larger graph. This recasting may continue as one builds up the graph until one is left with a single node which "wraps" the entire graph. With this abstraction the author may focus on one area of the graph, for example the simulation and parts therein. The results may then be exposed as a few high-level nodes which hide details of the sub graph they represent.
What follows are details on how to use this new idiom. It starts with some review of how to configure a Pgrapher "manually" and then goes into the new methods.
Pgrapher Graph
As hinted in the previous post, a Pgrapher graph is ultimately built up of edges which connect nodes via their ports. For example one edge can be written as:
{
tail: { node: wc.tn(fanout), port: 0 },
head: { node: wc.tn(sink1) },
},
{
tail: { node: wc.tn(fanout), port: 1 },
head: { node: wc.tn(sink2) },
},
Here, fanout
, sink1
, etc, are configuration objects. They are
what make up the final configuration sequence that the main Jsonnet
file must produce. Their contents follow an object schema as
described in the WCT manual. The wc.tn()
Jsonnet function (provided
by wirecell.jsonnet) processes an object and returns its canonical
label. Not all configuration objects correspond to WCT INode
but
the ones used in a graph must. As is also shown, a node may have
multiple input or output ports and edges that do not connect to port 0
must have an explicit port
attribute set.
As is, this is all that one needs to know to build up Pgrapher graphs.
Now consider if you want to take these two edges and use them to build
a variety of different graphs. Besides arranging to get this snipped
used in whatever Jsonnet files you must also keep track of fanout
,
sink1
, etc objects and make sure they get placed into the final
configuration sequence. The user of your bit of configuration may
also want to break one of your edges so that they can insert some
other nodes between, say, fanout
and sink1
.
Pnodes
To provide more flexibility a Jsonnet library pgraph.jsonnet has been
developed. It hinges on the concept of a pnode
. A pnode
is an
abstract node which represents ("wraps") either a single INode
configuration object described above or a number of other pnode
objects.
Regardless of what is "inside" a pnode
it still looks and acts like
a single node. It may contain an arbitrarily complex (sub) graph but
all the user must care about is what input and output ports it
presents. Thus they present the same level of complexity (simplicity)
as an individual INode
even if they may have very complex internal
representations.
At the same time, a pnode
need not be treated as a complete "black
box". One thing it holds is a list of internal edges and it is
possible to derive a new pnode
whereby the user breaks an edge in
order to insert new edges and nodes.
Besides managing internal edges, a pnode
also manages the other
nodes which it "uses". Configuration authors to list object
dependencies for their pnode
and as nodes are combined their "uses"
track what is needed. When the graph is finally built, the "uses"
array makes up the configuration sequence relevant to the graph and
the "edges" list is what is given to the Pgrapher app.
Examples
Produce a pnode
from an INode
configuration object
It all starts with wrapping a single INode
configuration object.
This is done through a function:
local wc = import "wirecell.jsonnet";
local g = import "pgraph.jsonnet"; // [1]
local other = import "some_other_file.jsonnet"; // [2]
// a temporary variable holding the basic INode configuration object
local my_inode = { // [3]
type: "MyNode",
data: {
my_tool: wc.tn(other.cool_tool), // [4]
},
};
// a temporary holding the pnode wrapper [5]
local my_pnode = g.pnode(my_inode, uses: [other.cool_tool], nin=1, nout=1);
// finally what we export to the importer of this file
{
a_truly_great_node: my_pnode, // [6]
}
What's going on?
- Import the
pgraph
Jsonnet library - Import your friends configuration library providing some cools tool.
- Create a temporary variable holding the configuration for your node
- Tell your node configuration the instance label for the cool tool (ultimately, this is used to look up the C++ tool by your C++ node implementation).
- Wrap your
inode
configuration into apnode
. Theuses
argument tells thepnode
what other objects your node depends on and thenin
andnout
give the input/output port multiplicity. - Finally we export the
pnode
in an object so that it may be used by yet higher-level configuration code.
Note, Jsonnet syntax doesn't require local node variables. The above could be shortened to just:
local wc = import "wirecell.jsonnet";
local g = import "pgraph.jsonnet";
local other = import "some_other_file.jsonnet";
{
a_truly_great_node: g.pnode({
type: "MyNode",
data: {
my_tool: wc.tn(other.cool_tool),
},
}, uses: [other.cool_tool], nin=1, nout=1);
}
Intern Pnodes
So far, this is just more work. Where the payoff begins is the
ability to pack up connected pnodes
into an encompassing pnode
.
For example:
{
n1: g.pnode({type:"Node", name:"n1"}, nout=1),
n2: g.pnode({type:"Node", name:"n2"}, nin=1, nout=1),
n3: g.pnode({type:"Node", name:"n3"}, nin=1),
pn: g.intern([$.n1],[$.n3],[$.n2],[
g.edge($.n1, $.n2),
g.edge($.n2, $.n3)], "pn"),
Here, n1
is a source, n2
is a filter (one input port, out output
port) and n3
is a sink.
The pgraph.intern()
function takes three lists of nodes (input,
output and internal) which may be empty, and list of internal edges
that connect those nodes. In this example n1
is an input, n3
is
an output and n2
is internal and connected to the other two. The
resulting pn
is still but a single node but it represents an entire
graph (complete in this case). Continuing with this example:
n12: g.intern([$.n1],[$.n2],edges=[
g.edge($.n1, $.n2)
], name="n12"),
n123: g.intern([$.n12],[$.n3],edges=[
g.edge($.n12, $.n3),
], name="n123"),
}
Here n12
is a pnode
which represents an incomplete graph. It
has edges that join n1
to n2
while n2
's output port becomes the
output port of n12
. Another user may form n123
by yet another
interning to produce a pnode
holding again a complete graph.
Edge breaking and node insertion
As mentioned above, given knowledge of the order of edges inside a
pnode
it is possible to derive a new pnode
which breaks an edge in
order to insert new pnodes
. Extending the above example further:
n13: g.intern([$.n1],[$.n3], edges=[
g.edge($.n1, $.n3),
], name="n13"),
n123inserted: g.insert_one($.n13, 0, $.n2, $.n2, name="n123inserted"),
The n13
node has interned n1
and n3
which are connected in a
kind of "short circuit". Now, some other user may want to do
something with the data that flows between this source and sink.
Knowing that the (n1,n3)
edge is at index 0 in the list of n13
edges it is possible to break that edge and insert a new node (n2
)
which is done when setting n123inserted
.
A fancy example
The above example uses a graph which is really a simple linear
pipeline. Real graphs may be much more complex. For example, in the
case of simulating signal and noise both represent their own stream of
data frames. The two stream must be summed together. A pnode
can
be built which makes it easy to build to configuration variants, one
with just signal and one with also noise. Without defining
everything, the noise portion is defined as:
local noise_source = g.pnode({
type: "NoiseSource",
data: { ... },
}, nout=1);
local frame_summer = g.pnode({
type: "FrameSummer",
data: { ... },
}, nin=2, nout=1);
{
nominal: g.intern([frame_summer],[frame_summer],[noise_source],
iports=frame_summer.iports[:1],
oports=frame_summer.oports,
edges=[g.edge(noise_source, frame_summer, 0, 1)],
name="NominalNoise"),
}
The produced nominal
pnode
object now has simply one input port
and one output port, both which are actually provided by the "frame
summer" node. The "noise source" node gets carried along properly.
By default the resulting pnode
input ports consist of all input
ports of all input nodes and etc for output. But here, one input port
of the frame summer is already connected to the noise source. To
indicate this special arrangement the iports
and oports
arguments
are passed. These explicitly give the port descriptors to use for the
resulting pnode
.
This may seem confusing but once worked out, the user of the resulting
pnode
need not care. The complexity is hidden. The user just needs
to take this nominal
node and connect it to others.
Going further
To learn more about this new configuration idiom you may wish to run the example (excerpted above):
$ jsonnet -J cfg cfg/test/test_pgraph.jsonnet
A working configuration is being developed in cfg/uboone/simsp/.
$ wire-cell -c cfg/uboone/simsp/main-simple-quiet.jsonnet
That top-level configuration file shows the end-game. It builds a
complete graph of pnodes
and uses it to configure the Pgrapher app
and to provide the final configuration sequence.
In summary:
local g = import "pgraph.jsonnet";
// ...
local graph = g.intern(...);
local app = {
type: "Pgrapher",
data: {
edges: g.edges(graph),
}
};
// final configuration sequence.
[com.cmdline] + g.uses(graph) + [app]
Note that both the graph.edges
and graph.uses
list attributes will
likely have duplicate entries due to details in how they are
constructed. In order to properly strip them of duplicates while
retaining proper order (in the case of .uses
) they must be extracted
through pgraph
functions of the same name.
Summary
This turned into a long post. Eventually, I hope it will be distilled
and integrated into the manual. It shows what you need to know to get
started authoring or extending a configuration for WCT jobs based on
the Pgrapher app. Of course this pnode
idiom need not be followed
in your own configuration of WCT. However WCT is now reaching a point
where the wide variety of features it provides, still fewer than it
eventually will, requires something to manage complexity of
configuration. Thankfully, Jsonnet provides such a good basis for a
configuration language that idioms like pnode
can be invented. This
idiom will be further explored as a configuration is developed to
handle the current breath of WCT. It is targeting these features and
options:
- signal simulation
- nominal vs shorted wire field responses
- correctly and incorrectly configured electronics
- noise simulation
- with and without noise
- optional simulation "truth" waveforms
- software noise filtering
- with and without it
- uniform or correcting misconfigured electronics
- signal processing
- nominal 2D deconvolution
- optional compressed sensing method to handle shorted wire regions
And, probably several variants I'm forgetting.
Edits
-
pgraph.uses()
andpgraph.edges()
to build final lists.
update to call