# Additive Bayesian Network Modelling in R

Bayesian network analysis is a form of probabilistic graphical models which derives from empirical data a directed acyclic graph (DAG)

# Fit an additive Bayesian network to data

In this example, a given DAG will be fitted using abn. Take a subset of columns from dataset `ex0.dat.data` (provided with abn):

``````mydat <- ex0.dag.data[,c("b1","b2","b3","g1","b4","p2","p4")]
``````

Set up distribution list for each node:

``````mydists <- list(b1="binomial",
b2="binomial",
b3="binomial",
g1="gaussian",
b4="binomial",
p2="poisson",
p4="poisson" )
``````

Define a model:

``````mydag <- matrix(data=c( 0,0,1,0,0,0,0, # b1<-b3
1,0,0,0,0,0,0, # b2<-b1
0,0,0,0,0,0,0, #
0,0,0,0,1,0,0, # g1<-b4
0,0,0,0,0,0,0, #
0,0,0,0,0,0,0, #
0,0,0,0,0,0,0 #
), byrow=TRUE,ncol=7)

colnames(mydag) <- rownames(mydag) <- names(mydat)
``````

Fit the model to calculate the log marginal likelihood goodness of fit:

``````myres.c <- fitabn(dag.m=mydag,data.df=mydat,data.dists=mydists)

print(myres.c\$mlik)
``````

# Examine the parameter estimates

In this example, we will additionally examine the computed marginals. Take a subset of cols from dataset `ex0.dat.data` (provided with abn)

``````mydat <- ex0.dag.data[,c("b1","b2","b3","g1","b4","p2","p4")]
``````

Set up distribution list for each node:

``````mydists <- list(b1="binomial",
b2="binomial",
b3="binomial",
g1="gaussian",
b4="binomial",
p2="poisson",
p4="poisson")
``````

Define a model:

``````mydag <- matrix(data=c(
0,0,1,0,0,0,0, # b1<-b3
1,0,0,0,0,0,0, # b2<-b1
0,0,0,0,0,0,0, #
0,0,0,0,1,0,0, # g1<-b4
0,0,0,0,0,0,0, #
0,0,0,0,0,0,0, #
0,0,0,0,0,0,0 #
), byrow=TRUE,ncol=7)
colnames(mydag) <- rownames(mydag) <- names(mydat)
``````

Now fit the model to calculate its goodness of fit:

``````myres.c <- fitabn(dag.m=mydag,data.df=mydat,data.dists=mydists,compute.fixed=TRUE)

print(names(myres.c\$marginals))
``````

Gives a list of all posterior densities. Plot some of the marginal posterior densities - note that, by default, all variables are standardized.

``````par(mfrow=c(1,2))
plot(myres.c\$marginals\$b1[["b1|(Intercept)"]],type="b",xlab="b1|(Intercept)", main="Node b1, Intercept",pch="+",col="green")

plot(myres.c\$marginals\$g1[["g1|b4"]],type="b",xlab="g1|b4",main="Node g1, parameter b4",pch="+",col="orange")
`````` # Find the best fitting graphical structure using an exact search

This dataset comes with abn see `?ex1.dag.data`:

``````mydat <- ex1.dag.data
``````

Set up distribution list for each node:

``````mydists <- list(b1="binomial",
p1="poisson",
g1="gaussian",
b2="binomial",
p2="poisson",
b3="binomial",
g2="gaussian",
b4="binomial",
b5="binomial",
g3="gaussian" )
``````

Set the parent limits node-wise:

``````max.par <- list("b1"=4,"p1"=4,"g1"=4,"b2"=4,"p2"=4,"b3"=4,"g2"=4,"b4"=4,"b5"=4,"g3"=4)
``````

Build the score cache:

``````mycache <- buildscorecache(data.df = mydat,
data.dists = mydists,
max.parents = max.par)
``````

Find the globally best DAG. Fit the model and plot it (requires `Rgraphviz`)

``````mp.dag <- mostprobable(score.cache=mycache)

fitabn(dag.m=mp.dag,data.df=mydat,data.dists=mydists)\$mlik; ## plot the best model - requires Rgraphviz

myres <- fitabn(dag.m=mp.dag,data.df=mydat,data.dists=mydists,create.graph=TRUE)

plot(myres\$graph)
`````` # Find the best fitting graphical structure using a heuristic search

This dataset comes with abn see ?ex1.dag.data

``````mydat <- ex1.dag.data
``````

Set up distribution list for each node

``````mydists <- list(b1="binomial",
p1="poisson",
g1="gaussian",
b2="binomial",
p2="poisson",
b3="binomial",
g2="gaussian",
b4="binomial",
b5="binomial",
g3="gaussian" );
``````

May take some minutes for buildscorecache()

Set parent limits

``````max.par<-list("b1"=4,"p1"=4,"g1"=4,"b2"=4,"p2"=4,"b3"=4,"g2"=4,"b4"=4,"b5"=4,"g3"=4);
``````

Build cache

``````mycache <- buildscorecache(data.df=mydat,data.dists=mydists, dag.banned=ban, dag.retained=retain,max.parents=max.par);
``````

Repeat but this time have the majority consensus network plotted as the searches progress

``````heur.res2 <- search.hillclimber(score.cache=mycache,num.searches=1000,seed=0,verbose=FALSE, trace=TRUE,timing.on=FALSE)
``````

For publication quality output for the consensus network use graphviz

``````tographviz(dag.m=heur.res\$consensus,data.df=mydat,data.dists=mydists,outfile="graphcon.dot");
``````

Then process using graphviz tools. e.g. on linux `system("dot -Tpdf -o graphcon.pdf graphcon.dot")` and `system("evince graphcon.pdf")`. Note that the .dot file created can be edited manually to provide custom shapes, colours etc … 