Search-and-Scoring

Search-and-scoring approaches to learning BBN structures search over a space of BBN structures and score the candidates. The highest scoring BBN structure is typically the output.

Load data

Let’s read our data into a Spark DataFrame SDF.

[1]:
from pysparkbbn.discrete.data import DiscreteData

sdf = spark.read.csv('hdfs://localhost/data-1479668986461.csv', header=True)
data = DiscreteData(sdf)

Genetic algorithm

We use genetic algorithm GA as a search-and-scoring approach to learning BBN structures. In general, the GA algorithm has the following major steps.

  • Initialization: a population of BBN structures

  • Fitness: the population is scored according to a fitness function and filtered

  • Crossover: two parents from the population undergo a crossover operation to produce two new offspring

  • Mutation: each offspring undergo a mutation operation

The fitness, crossover and mutation steps are repeated until a threshold of iterations is reached or there is convergence (a higher scoring BBN structure cannot be discovered).

[2]:
from pysparkbbn.discrete.ssslearn import Ga

ga = Ga(data, sc, max_iters=3)
g = ga.get_network()
[4]:
%matplotlib inline
import matplotlib.pyplot as plt
import networkx as nx

fig, ax = plt.subplots(figsize=(5, 5))

nx.draw(g,
        with_labels=True,
        node_size=500,
        alpha=0.8,
        font_weight='bold',
        font_family='monospace',
        node_color='r',
        arrowsize=15,
        ax=ax)
_images/structure-ss_5_0.png