Search-and-scoring approaches to learning BBN structures search over a space of BBN structures and score the candidates. The highest scoring BBN structure is typically the output.
Let’s read our data into a Spark DataFrame
from pysparkbbn.discrete.data import DiscreteData sdf = spark.read.csv('hdfs://localhost/data-1479668986461.csv', header=True) data = DiscreteData(sdf)
We use genetic algorithm
GA as a search-and-scoring approach to learning BBN structures. In general, the GA algorithm has the following major steps.
Initialization: a population of BBN structures
Fitness: the population is scored according to a fitness function and filtered
Crossover: two parents from the population undergo a crossover operation to produce two new offspring
Mutation: each offspring undergo a mutation operation
The fitness, crossover and mutation steps are repeated until a threshold of iterations is reached or there is convergence (a higher scoring BBN structure cannot be discovered).
from pysparkbbn.discrete.ssslearn import Ga ga = Ga(data, sc, max_iters=3) g = ga.get_network()
%matplotlib inline import matplotlib.pyplot as plt import networkx as nx fig, ax = plt.subplots(figsize=(5, 5)) nx.draw(g, with_labels=True, node_size=500, alpha=0.8, font_weight='bold', font_family='monospace', node_color='r', arrowsize=15, ax=ax)