Full Example¶
We have covered how to learn the structure and parameters of a Bayesian Belief Network BBN
. Let’s see how we can combine the structure and parameters to create a BBN. Additionally, let’s see how we can use the BBN for exact inference.
Load data¶
Let’s read our data into a Spark DataFrame SDF
.
[1]:
from pysparkbbn.discrete.data import DiscreteData
sdf = spark.read.csv('hdfs://localhost/data-1479668986461.csv', header=True)
data = DiscreteData(sdf)
Structure learning¶
Let’s pick the naive Bayes algorithm to learn the structure.
[2]:
from pysparkbbn.discrete.scblearn import Naive
naive = Naive(data, 'n3')
g = naive.get_network()
Parameter learning¶
After we have a structure, we can learn the parameters.
[3]:
from pysparkbbn.discrete.plearn import ParamLearner
import json
param_learner = ParamLearner(data, g)
p = param_learner.get_params()
print(json.dumps(p, indent=2))
{
"n3": [
0.47345,
0.52655
],
"n1": [
0.8588024078572183,
0.1411975921427817,
0.6534042351153737,
0.34659576488462635
],
"n2": [
0.8773893758580632,
0.12261062414193685,
0.44725097331687397,
0.552749026683126
],
"n4": [
0.667546731439434,
0.33245326856056606,
0.1642768967809325,
0.8357231032190675
],
"n5": [
0.29675784137712535,
0.4307741049741261,
0.27246805364874854,
0.29503370999905043,
0.17909030481435761,
0.525875985186592
]
}
BBN¶
Now that we have the structure and parameters, we can build a BBN. Use the get_bbn
utility method to help bring together the structure and parameters.
[4]:
from pysparkbbn.discrete.bbn import get_bbn
bbn = get_bbn(g, p, data.get_profile())
Inference¶
With a BBN defined, we can use py-bbn to proceed with exact inference.
[5]:
from pybbn.pptc.inferencecontroller import InferenceController
join_tree = InferenceController.apply(bbn)
for node, posteriors in join_tree.get_posteriors().items():
p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
print(f'{node} : {p}')
n3 : f=0.47345, t=0.52655
n4 : f=0.40255, t=0.59745
n2 : f=0.65090, t=0.34910
n1 : f=0.75065, t=0.24935
n5 : maybe=0.29585, no=0.29825, yes=0.40590