0 likes | 12 Views
A Bayesian network as a probabilistic graphical model used in AI to represent a group of variables and their conditional dependencies. The network is represented by a Directed Acyclic Graph (DAG).
E N D
www.leewayhertz.com/bayesian-networks-in-ai/ Bayesian networks in AI: A comprehensive overview Imagine a scenario where a team of doctors faces a perplexing medical puzzle. A patient shows a range of symptoms, each pointing to multiple possible diseases. How can they navigate this diagnostic riddle, unravel the complexities, and arrive at an accurate diagnosis? Enter Bayesian networks, the guiding light in medical decision-making. Bayesian networks, akin to a powerful navigator in the sea of decisions, provide a versatile framework for modeling and reasoning under uncertainty. Drawing inspiration from cause-and-effect relationships, these graphical models enable us to untangle intricate dependencies, weigh multiple factors simultaneously, and make informed choices backed by probability theory. This article offers a comprehensive guide on Bayesian networks in AI. It explores their common applications across various industries, highlighting their ability to navigate complex scenarios and provide probabilistic insights. It covers the following concepts to demystify the underlying principles of Bayesian networks and provide practical implementation strategies for readers to gain a comprehensive understanding. A glossary of important terms What are Bayesian networks? The indispensable role of Bayesian networks and probabilistic inference in machine learning What is a Directed Acyclic Graph (DAG) in Bayesian networks? Components of DAG 1/28
What is the Bayes’ theorem, and how is it related to the Bayesian network? Introduction to probability and its types Joint probability Conditional probability What is a probability distribution? Joint probability distribution Conditional probability distribution Bayesian networks in AI: Understanding with an example Probabilistic inference How to calculate probabilistic inference using Python? The variations of Bayesian networks in AI What are Bayesian networks in AI used for? How Bayesian networks are used: An example What is a Bayesian Neural Network (BNN)? A glossary of important terms Probability theory: It is a branch of mathematics that quantifies and analyzes uncertainty and randomness. It provides a mathematical framework for understanding and reasoning about uncertain events, outcomes, or processes. Probability theory allows us to model, measure, and manipulate uncertainty rigorously and systematically. Variables: Variables refer to quantities or entities that can take on different values or states. They are fundamental elements used to model and analyze uncertainty and randomness. Discrete variable: It is a type of variable that can only have specific values. It cannot have values in between those specific values. For example, imagine a variable called “Colors” that can be either “Red,” “Amber,” or “Green.” Each measurement or example can only be one of these three options and cannot be in between them. Continuous variable: A continuous variable is a type of variable that can take on any value within a certain range. It can have decimal or fractional values, including numbers in between. For example, imagine a variable called “Speed” that represents the speed of a car. The speed can be 20 kph, 40.5 kph, 90 kph, or any other value within a range. Conditional dependence: Conditional dependence refers to a statistical or probabilistic relationship between variables where the dependence exists only under certain conditions or given specific values of other variables. In other words, two variables are conditionally dependent if their relationship or association depends on the values or states of one or more additional variables. Conditional independence: Conditional independence is a concept in probability theory that describes the relationship between variables in a probabilistic model. It occurs when the relationship between two variables remains unaffected by the knowledge of a third variable. In other words, if knowing the value of the third variable does not provide any extra information about the dependence between the first two variables, they are considered conditionally independent. What are Bayesian networks? A Bayesian network as a probabilistic graphical model used in AI to represent a group of variables and their conditional dependencies. The network is represented by a Directed Acyclic Graph (DAG). The 2/28
Bayesian network, derived from the work of Reverend Thomas Bayes, an esteemed mathematician and philosopher of the 18th century who introduced Bayes’ theorem, is recognized by multiple names, including Bayes network, Bayes net, belief network and decision network. The key characteristic of Bayesian networks in AI is their ability to predict the likelihood of different known causes contributing to an event that has already occurred. In other words, given an observed event, a Bayesian network can help estimate the probabilities of various potential causes or factors that may have led to that event. This makes Bayesian networks valuable for analyzing causal relationships and making probabilistic inferences in situations where there are multiple possible causes or factors influencing an outcome. Essentially, the network represents a collection of random variables and their probabilistic relationships. In a graph, nodes represent the random variables, and the edges depict the dependencies or influences between the variables. This framework allows for efficient probabilistic reasoning and inference, making it a valuable tool in various fields such as ML, AI, and decision analysis. Each node in the graph is associated with a Conditional Probability Table (CPT) that specifies the probability distribution of the node given its parent nodes. The indispensable role of Bayesian networks and probabilistic inference in machine learning Bayesian networks and probabilistic inference play a pivotal role in machine learning, revamping the field in several ways. One significant impact is in addressing uncertainty. Machine learning models often face uncertain or incomplete data. Bayesian networks provide a powerful framework for modeling and reasoning under uncertainty. By explicitly representing dependencies between variables and incorporating probabilistic information, Bayesian networks enable more accurate and robust modeling of complex systems. This allows machine learning algorithms to make more informed predictions and decisions, accounting for the inherent uncertainty in the data. Another crucial impact is in decision-making. Bayesian networks in AI facilitate decision analysis by incorporating probabilistic inference. They quantify uncertainties and evaluate the expected utility of different actions, aiding in optimal decision-making. This is particularly valuable when decisions need to be made based on limited or noisy data. By leveraging probabilistic inference, machine learning algorithms can make well-informed decisions and take actions that maximize expected outcomes. Bayesian networks also contribute to causal reasoning in machine learning. They enable the explicit modeling of causal relationships between variables, going beyond mere correlation. This ability to capture cause-and-effect relationships is essential for understanding complex phenomena and making reliable predictions. By using Bayesian networks, machine learning algorithms can identify and leverage causal factors, leading to more accurate and interpretable models. Learning from data is another area where Bayesian networks excel. They can be trained using Bayesian parameter estimation techniques, which integrate prior knowledge with observed data. This allows the models to adapt and update their beliefs based on new information. Bayesian networks enable iterative 3/28
learning and refinement, enhancing the performance and adaptability of machine learning algorithms as more data becomes available. Furthermore, Bayesian networks in AI offer a structured probabilistic framework for modeling complex systems in machine learning. They provide a graphical representation that visually depicts dependencies and conditional probabilities, making the models more interpretable and understandable. This transparency aids in gaining insights into the underlying relationships and communicating the models to stakeholders effectively. In summary, by incorporating Bayesian networks into machine learning algorithms, researchers and practitioners can develop more robust, accurate, and interpretable models for a wide range of applications. What is a Directed Acyclic Graph (DAG) in Bayesian networks? A Directed Acyclic Graph (DAG) is a type of graph that consists of nodes connected by directed edges (also called arcs/links), where the edges have a specific direction and do not form cycles. In the Bayesian network’s context, a DAG represents the probabilistic dependencies among random variables. In a DAG, each edge points from one node (the parent) to another (the child). For instance, in the above image, the node ‘A’ is the parent of the nodes ‘E’ and ‘B.’ Similarly, ‘B’ is the parent of ‘C’ and ‘D.’ The key characteristic of a DAG is that it does not contain any directed cycles. This means it is impossible to start at a node and follow the directed edges to return to the same node. 4/28
Components of DAG A DAG consists of the following components: 1. Nodes: In a Bayesian network, each node in the DAG represents a random variable. For example, in a medical diagnosis scenario, nodes could represent variables such as “Symptom A,” “Symptom B,” and “Disease.” 2. Edges: In a Directed Acyclic Graph (DAG), the edges or arcs represent the probabilistic dependencies between variables. These directed arrows establish a connection between a pair of nodes in the graph, indicating the direction of influence or causality between them. For instance, an edge from node A to node B indicates that variable B depends on variable A. 3. Conditional Probability Tables (CPTs): Each node in the DAG is associated with a Conditional Probability Table (CPT). The CPT specifies the probability distribution of the node given its parent nodes. It captures the conditional dependencies among variables in the network. 4. Inference: DAGs facilitate efficient probabilistic inference in Bayesian networks. By propagating probabilities through the graph, inference algorithms can compute the posterior probabilities of unobserved variables given observed evidence. The absence of cycles ensures that inference algorithms can proceed without getting stuck in infinite loops. What is the Bayes’ theorem, and how is it related to the Bayesian network? The Bayes’ theorem is a core concept in probability theory and statistics, named after Thomas Bayes. It describes the relationship between conditional probabilities, allowing us to update our beliefs about an event based on new evidence. The theorem can be stated as follows: P(A|B) = (P(B|A) * P(A)) / P(B) In the above: P(A|B) represents the likelihood of event A happening when event B has already occurred. P(B|A) is the probability/likelihood of event B happening given that event A has already taken place. P(A) and P(B) refer to the probabilities of events A and B occurring independently, without any knowledge or consideration of each other. Bayes’ theorem provides a mathematical framework to revise probabilities based on new information. It is widely used in various fields, including statistics, machine learning, and artificial intelligence. Bayesian networks in artificial intelligence leverage Bayes’ theorem to represent and quantify uncertainty. This enables AI systems to make predictions or draw conclusions based on available evidence. The network’s structure encodes the conditional dependencies between variables, while the associated probability tables express the probability distributions. By observing evidence in some variables, the network can calculate the probabilities of other variables. 5/28
Introduction to probability and its types Probability is a way to calculate how likely something is to happen. It is represented by a number between 0 and 1. A probability of 0 means something is impossible and will not happen, while a probability of 1 means something is certain to happen. For example, let’s consider a variable called “Country” that represents the country where a person lives. If we assign a probability of 0.2 to the state “US,” it means that we believe there is a 20% chance that a person lives in the United States. This can be written as P(Country=US) = 0.2. Alternatively, if the context is known, we might write P(US) = 0.2 to indicate the probability of the specific state “US.” Although there are different types of probability based on probability theory, in terms of the Bayesian belief network, two probability types are of utmost importance. They are: Joint probability A joint probability is the probability of two or more events occurring simultaneously. It is denoted as P(A and B), representing the probability of both events A and B occurring. Joint probability represents the likelihood of multiple variables taking specific values simultaneously. For instance, consider two variables: “Rain” and “Wind.” Each variable can have two possible states: True or False. The joint probability of rain being true and wind being true is denoted as P(Raining=True, Windy=True). This means we want to know the probability of both rain and wind occurring simultaneously. Conditional probability Conditional probability is the likelihood of a circumstance arising, given that another event has already occurred. It is denoted as P(A|B), representing the probability of event A occurring given that event B has already happened. Here, the vertical bar “|” is read as “given” or “conditional on.” Conditional probability allows us to calculate the probability of one event happening while considering specific conditions or information about other related events. For example, we want to find the conditional probability of rain being true, given that windy is true. This is represented as P(Raining=True | Windy=True). It indicates that we are interested in the probability of rain being true under the condition that windy is already known to be true. What is a probability distribution? Probability distribution refers to a mathematical function or a set of probabilities that describes the likelihood of different outcomes or events in a specific scenario. It provides a systematic way to assign probabilities to all possible values or events within a given range or domain. A probability distribution can be represented in various forms, depending on the nature of the variable being considered. For discrete variables, the distribution is often expressed as a Probability Mass Function (PMF), which assigns probabilities to each possible value. For continuous variables, the distribution is typically described by a Probability Density Function (PDF), which specifies the relative likelihood of different values within a continuous range. 6/28
The main purpose of a probability distribution is to capture the probabilities associated with each possible outcome or event in a structured and organized manner. It summarizes the likelihood of observing different values and provides a basis for making probabilistic predictions and statistical inferences. In probability theory, two core concepts are used to describe the relationships between multiple random variables, which are joint probability distribution and conditional probability distribution. Joint probability distribution A joint probability distribution describes the probability of multiple random variables taking specific values simultaneously. It provides a complete description of the probabilities of all possible combinations of values for the variables involved. For example, consider the joint probability distribution over the variables “raining” and “windy.” This distribution allows us to determine the probability of each possible combination of raining and windy conditions. Raining Windy Probability True True True False False True False False 0.10 0.15 0.25 0.5 The mathematical notation representing this joint probability distribution is P(Raining, Windy). Conditional probability distribution A conditional probability distribution describes the probability of an event or outcome given that another event or outcome has occurred or is known. It focuses on the probability of one variable given the value or occurrence of another variable. For example, consider the conditional probability distribution over the variables “raining” and “windy.” This distribution lets us obtain the conditional probability value for each possible combination of raining and windy conditions. Raining Windy Probability True True True False False True False False 0.286 0.231 0.714 0.769 The mathematical notation representing this conditional probability distribution is P(Raining | Windy). Bayesian networks in AI: Understanding with an example Let’s consider a simple example to understand Bayesian networks in AI. 7/28
Suppose we want to model the relationships between these three variables: “Smoking,” “Lung cancer,” and “Cough.” And say, we are interested in understanding how smoking and lung cancer influence the likelihood of experiencing a cough. We can construct a Bayesian network to capture these relationships. Variable definition: Smoking: This variable represents whether a person is a smoker or a non-smoker and can take values “Yes” or “No.” Lung cancer: This variable represents the presence or absence of lung cancer and can take values “Yes” or “No.” Cough: This variable represents whether a person experiences a cough and can take values “Yes” or “No.” Graphical representation: We construct a Bayesian network by connecting the variables with directed edges, indicating the causal relationships or dependencies. In this network, “Smoking” is the parent node for both “Lung Cancer” and “Cough.” “Lung Cancer” is the parent node for “Cough.” The graph captures the relationships between these variables. Conditional Probability Tables (CPTs): Each node in the Bayesian network is associated with a conditional probability table that specifies the probability distribution of the node with respect to its parent nodes. Smoking: The CPT for “Smoking” might look like this: Smoking Probability Yes 0.3 8/28
No 0.7 Lung cancer: The CPT for “Lung Cancer” might look like this: Smoking Lung Cancer = yes Lung Cancer = No Yes 0.85 No 0.01 0.15 0.19 Cough: The CPT for “Cough” might look like this: Smoking Cough = yes Cough = No Yes 0.8 No 0.1 0.2 0.9 The values in the CPTs represent the probabilities of different outcomes based on the given states of the parent nodes. Probabilistic inference Probabilistic inference refers to the process of reasoning and making predictions based on observed or available information using the principles of probability theory. It involves updating our beliefs or knowledge about uncertain events or variables based on new evidence or data. Probabilistic inference in Bayesian networks refers to the process of reasoning and making predictions about the probability distributions of unobserved variables given observed evidence or data. It utilizes the graphical structure and probabilistic dependencies encoded in the Bayesian network to perform inference. In a Bayesian network, the joint probability distribution of all variables can be factorized using the chain rule of probability and the conditional independence assumptions represented by the network’s structure. This factorization allows for efficient probabilistic inference. Probabilistic inference in Bayesian networks involves two main tasks: 1. Querying: When provided with observed evidence, the objective is to determine the probability distribution of one or more target variables of interest. This involves conditioning the observed evidence and propagating the probabilities throughout the network to obtain the desired probability distribution. 2. Learning: This task involves updating the probabilities or parameters in the Bayesian network based on observed data. It aims to refine the network’s structure and probability tables to reflect the data better. How to calculate probabilistic inference using Python? This example of solving the Monty Hall problem, a famous probability puzzle, will help us understand how to calculate probabilistic inference. 9/28
The Monty Hall problem: In this problem, a participant can choose one out of three doors. Behind one door is a valuable prize, while the other two doors hide goats. After the participant chooses a door, the host (Monty), who knows what’s behind each door, opens one of the remaining doors to disclose a goat. The participant is then given the option to either stick with their original choice or switch to the other unopened door. The question is: What is the best strategy for the participant to maximize their chances of winning the prize? The solution using a Bayesian network: We can model the Monty Hall problem using a Bayesian network with three nodes representing the participant’s initial choice (C), the location of the prize (P), and the door opened by the host (H). To solve the problem, we need to calculate the probabilities of the hidden variables given the observed variables. We will calculate the probability of winning the prize when sticking with the initial choice and when switching to the other unopened door. First, import the required dependencies: Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter import numpy as np from pgmpy.models import BayesianModel from pgmpy.factors.discrete import TabularCPD import networkx as nx import pylab as plt import numpy as np from pgmpy.models import BayesianModel from pgmpy.factors.discrete import TabularCPD import networkx as nx import pylab as plt import numpy as np from pgmpy.models import BayesianModel from pgmpy.factors.discrete import TabularCPD import networkx as nx import pylab as plt Next, create the Bayesian network model. Plain text 10/28
Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter model = BayesianModel([('C', 'H'), ('P', 'H')]) model = BayesianModel([('C', 'H'), ('P', 'H')]) model = BayesianModel([('C', 'H'), ('P', 'H')]) Now, define Conditional Probability Distributions (CPDs) Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter cpd_c = TabularCPD('C', 3, [[1/3], [1/3], [1/3]]) cpd_p = TabularCPD('P', 3, [[1/3], [1/3], [1/3]]) cpd_h = TabularCPD('H', 3, [[0, 0, 0, 0, 1/2, 1, 0, 1/2, 0], [1/2, 0, 1, 0, 0, 0, 1/2, 0, 0], [1/2, 1, 0, 1, 1/2, 0, 1/2, 1/2, 1]], evidence=['C', 'P'], evidence_card=[3, 3]) cpd_c = TabularCPD('C', 3, [[1/3], [1/3], [1/3]]) cpd_p = TabularCPD('P', 3, [[1/3], [1/3], [1/3]]) cpd_h = TabularCPD('H', 3, [[0, 0, 0, 0, 1/2, 1, 0, 1/2, 0], [1/2, 0, 1, 0, 0, 0, 1/2, 0, 0], [1/2, 1, 0, 1, 1/2, 0, 1/2, 1/2, 1]], evidence=['C', 'P'], evidence_card=[3, 3]) cpd_c = TabularCPD('C', 3, [[1/3], [1/3], [1/3]]) cpd_p = TabularCPD('P', 3, [[1/3], [1/3], [1/3]]) cpd_h = TabularCPD('H', 3, [[0, 0, 0, 0, 1/2, 1, 0, 1/2, 0], [1/2, 0, 1, 0, 0, 0, 1/2, 0, 0], [1/2, 1, 0, 1, 1/2, 0, 1/2, 1/2, 1]], evidence=['C', 'P'], evidence_card=[3, 3]) Run the below command to add CPDs to the model. Plain text 11/28
Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter model.add_cpds(cpd_c, cpd_p, cpd_h) model.add_cpds(cpd_c, cpd_p, cpd_h) model.add_cpds(cpd_c, cpd_p, cpd_h) To check the model structure and associated conditional probability distributions, you can use the get_cpds() method of the BayesianModel object. If everything is fine, the method will return True. Otherwise, it will raise an error message. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter model.check_model() model.check_model() model.check_model() To infer the network and determine which door the host will open next, we need to access the posterior probability from the network by providing the evidence. The evidence, in this case, refers to the door selected by the participant and the location of the prize. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter from pgmpy.inference import VariableElimination infer = VariableElimination(model) posterior_p = infer.query(['H'], evidence={'C': 2, 'P': 2}) print(posterior_p) 12/28
from pgmpy.inference import VariableElimination infer = VariableElimination(model) posterior_p = infer.query(['H'], evidence={'C': 2, 'P': 2}) print(posterior_p) from pgmpy.inference import VariableElimination infer = VariableElimination(model) posterior_p = infer.query(['H'], evidence={'C': 2, 'P': 2}) print(posterior_p) To plot our above model, we can utilize the NetworkX and Matplotlib libraries. NetworkX is a Python package used for the development, manipulation, and study of the structure, dynamics, and functions of complex networks. Matplotlib’s PyLab interface provides a convenient way to create visualizations of the network as graphs with nodes and edges. Here is how you can plot the model: Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter import networkx as nx import matplotlib.pyplot as plt nodes = model.nodes() edges = model.edges() graph = nx.DiGraph() graph.add_nodes_from(nodes) graph.add_edges_from(edges) # Draw the graph nx.draw(graph, with_labels=True) plt.savefig('model.png') plt.close() import networkx as nx import matplotlib.pyplot as plt nodes = model.nodes() edges = model.edges() graph = nx.DiGraph() graph.add_nodes_from(nodes) graph.add_edges_from(edges) # Draw the graph nx.draw(graph, with_labels=True) plt.savefig('model.png') plt.close() import networkx as nx import matplotlib.pyplot as plt 13/28
nodes = model.nodes() edges = model.edges() graph = nx.DiGraph() graph.add_nodes_from(nodes) graph.add_edges_from(edges) # Draw the graph nx.draw(graph, with_labels=True) plt.savefig('model.png') plt.close() The above code snippet generates the Directed Acyclic Graph (DAG) as shown below: The variations of Bayesian networks in AI There are several types or variations of Bayesian networks in AI, each with its own characteristics and applications. Here are some common types: 1. Static Bayesian networks: These are the most basic type of Bayesian networks, where the relationships between variables are fixed and do not change over time. They are used to model dependencies among variables in a static system. 14/28
2. Dynamic Bayesian networks: Unlike static Bayesian networks, dynamic Bayesian networks (DBNs) allow for modeling temporal dependencies and changes over time. They can represent probabilistic relationships that evolve or transition between different states. 3. Hidden Markov Models (HMMs): HMMs are a type of dynamic Bayesian network widely used in modeling sequential data. They involve a set of hidden states that are not directly observable but can be deduced from observable variables. HMMs are commonly used in speech recognition, natural language processing, and bioinformatics. 4. Continuous Bayesian networks: Most traditional Bayesian networks assume discrete variables. However, continuous Bayesian networks deal with continuous variables and use probability distributions such as Gaussian or exponential distributions to represent the relationships between variables. 5. Hybrid Bayesian networks: Hybrid Bayesian networks combine discrete and continuous variables in a single model. They can handle both discrete and continuous variables simultaneously and are useful in applications where the data has mixed variable types. 6. Influence diagrams: Influence diagrams are a type of Bayesian network that not only represent probabilistic dependencies but also incorporate decision and utility nodes. They are used for decision analysis and optimization problems, allowing for explicitly modeling decisions, uncertainties, and utilities. 7. Causal Bayesian networks: While Bayesian networks typically represent associations and dependencies between variables, causal Bayesian networks aim to model causal relationships. They explicitly capture cause-and-effect relationships between variables, making them useful for understanding and predicting causal effects. These are some of the main types of Bayesian networks. Each type has its own advantages and is suitable for different applications and problem domains. What are Bayesian networks in AI used for? Bayesian networks can be used for a wide range of applications in AI and ML. Here are some common uses of Bayesian networks: 1. Probabilistic inference: Bayesian networks allow for probabilistic inference, which means they can answer queries about the probability distribution of variables given observed evidence. They can calculate the posterior probability of unobserved variables based on the probabilistic dependencies in the network. 2. Diagnosis and decision support: Bayesian networks are widely used in medical diagnosis and decision support systems. By observing symptoms or evidence, the network can compute the probabilities of different diseases or conditions, aiding in the diagnostic process. They can also assist in decision-making by considering the probabilities and utilities associated with different choices. 3. Predictive modeling: Bayesian networks can be used for predictive modeling tasks. Given observed variables, they can predict the values of unobserved variables or estimate their probabilities. This makes them useful in various domains, such as weather forecasting, finance, and customer behavior analysis. 15/28
4. Risk assessment and management: Bayesian networks are valuable for risk assessment and management. They can model the dependencies between risk factors and estimate the probabilities of different outcomes or events. This is useful in areas such as insurance underwriting, project management, and environmental risk analysis. 5. Anomaly detection: Bayesian networks can be used for anomaly detection tasks. By learning the normal behavior of a system or process, they can detect deviations or anomalies from the expected patterns. This is useful in cybersecurity, fraud detection, and monitoring industrial processes. 6. Natural Language Processing: Bayesian networks have been applied in natural language processing tasks. They can be used for tasks such as part-of-speech tagging, named entity recognition, and semantic parsing. Bayesian networks can capture the dependencies between linguistic elements and infer the most likely interpretations or structures. 7. Environmental modeling: Bayesian networks are employed in environmental modeling to understand complex systems and assess environmental impacts. They can model the interactions between variables such as climate, ecosystems, and human activities, enabling predictions and scenario analyses. 8. Bioinformatics and genomics: Bayesian networks are used in bioinformatics and genomics to model and analyze genetic and protein interactions. They can help in understanding gene regulatory networks, protein-protein interactions, and disease-gene associations. These are just a few examples of the diverse applications of Bayesian networks. Their ability to handle uncertainty and model complex dependencies makes them a valuable tool in various domains, where reasoning under uncertainty and making probabilistic inferences are essential. How Bayesian networks are used: An example Now that we know the use cases and practical applications of the Bayesian network let us look into a simple usage of the network, particularly digit generation and visualization. Here, we would use Python and Sorobn, a pre-built Bayesian network architecture. Prerequisites: C++ build tool: Microsoft C++ Build Tools – Visual Studio Graphviz: Download | Graphviz First, let us import the necessary modules, load the dataset and preprocess it. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter from sklearn import datasets pixels, digits = datasets.load_digits(return_X_y=True, as_frame=True) 16/28
pixels = pixels.astype('uint8') pixels.columns = [f"{col.split('_')[1]}-{int(col.split('_')[2])}" for col in pixels.columns] pixels.head() from sklearn import datasets pixels, digits = datasets.load_digits(return_X_y=True, as_frame=True) pixels = pixels.astype('uint8') pixels.columns = [f"{col.split('_')[1]}-{int(col.split('_')[2])}" for col in pixels.columns] pixels.head() from sklearn import datasets pixels, digits = datasets.load_digits(return_X_y=True, as_frame=True) pixels = pixels.astype('uint8') pixels.columns = [f"{col.split('_')[1]}-{int(col.split('_')[2])}" for col in pixels.columns] pixels.head() From the imported dataset, let us visualize the images by creating a 5×5 grid of images, with each image displayed using grayscale colormap and accompanied by a title. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter import matplotlib.pyplot as plt from mpl_toolkits.axes_grid1 import ImageGrid img_shape = (8, 8) fig = plt.figure(figsize=(7, 7)) grid = ImageGrid(fig, 111, nrows_ncols=(5, 5), axes_pad=.25) for i, ax in enumerate(grid): img = pixels.iloc[i].values.reshape(img_shape) ax.imshow(img, cmap='gray') ax.set_title(digits.iloc[i]) ax.axis('off') 17/28
import matplotlib.pyplot as plt from mpl_toolkits.axes_grid1 import ImageGrid img_shape = (8, 8) fig = plt.figure(figsize=(7, 7)) grid = ImageGrid(fig, 111, nrows_ncols=(5, 5), axes_pad=.25) for i, ax in enumerate(grid): img = pixels.iloc[i].values.reshape(img_shape) ax.imshow(img, cmap='gray') ax.set_title(digits.iloc[i]) ax.axis('off') import matplotlib.pyplot as plt from mpl_toolkits.axes_grid1 import ImageGrid img_shape = (8, 8) fig = plt.figure(figsize=(7, 7)) grid = ImageGrid(fig, 111, nrows_ncols=(5, 5), axes_pad=.25) for i, ax in enumerate(grid): img = pixels.iloc[i].values.reshape(img_shape) ax.imshow(img, cmap='gray') ax.set_title(digits.iloc[i]) ax.axis('off') Next, define a function called ‘neighbors’ that calculates the neighboring coordinates of a given point in a grid. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter def neighbors(r, c): top = (r - 1, c) left = (r, c - 1) if r and c: return [top, left] if r: return [top] if c: return [left] return [] 18/28
neighbors(0, 0) neighbors(0, 1) neighbors(1, 0) neighbors(1, 1) Create the Bayesian network. import sorobn structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) def neighbors(r, c): top = (r - 1, c) left = (r, c - 1) if r and c: return [top, left] if r: return [top] if c: return [left] return [] neighbors(0, 0) neighbors(0, 1) neighbors(1, 0) neighbors(1, 1) Create the Bayesian network. import sorobn structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) def neighbors(r, c): top = (r - 1, c) left = (r, c - 1) if r and c: return [top, left] if r: return [top] if c: return [left] return [] neighbors(0, 0) neighbors(0, 1) neighbors(1, 0) neighbors(1, 1) Create the Bayesian network. 19/28
import sorobn structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) Run the following code to visualize the network. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter bn.graphviz() bn.graphviz() bn.graphviz() Using daft library, we can also arrange the nodes in the graphical model based on their associated pixel positions. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter import daft pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('-') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) 20/28
for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) pgm.render(); import daft pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('- ') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) pgm.render(); import daft pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('-') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) pgm.render(); Furthermore, it is possible to define additional relationships by extending the structure of the Bayesian network accordingly. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter def neighbors(r, c): top_left = (r - 1, c - 1) left = (r, c - 1) top = (r - 1, c) if r and c: return [top, left, top_left] 21/28
if r: return [top] if c: return [left] return [] structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('-') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) pgm.render(); def neighbors(r, c): top_left = (r - 1, c - 1) left = (r, c - 1) top = (r - 1, c) if r and c: return [top, left, top_left] if r: return [top] if c: return [left] return [] structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('-') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) pgm.render(); def neighbors(r, c): top_left = (r - 1, c - 1) left = (r, c - 1) 22/28
top = (r - 1, c) if r and c: return [top, left, top_left] if r: return [top] if c: return [left] return [] structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('-') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) pgm.render(); As the above network structures do not facilitate sampling, let us define a simpler structure: Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter def neighbors(r, c): if r == c == 0: return [] 23/28
# Even row number if r % 2 == 0: if c: return [(r, c - 1)] return [(r - 1, c)] if c == img_shape[1] - 1: return [(r - 1, c)] return [(r, c + 1)] neighbors(0, 0) neighbors(0, 1) neighbors(1, 7) neighbors(2, 0) import sorobn structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('-') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) 24/28
pgm.render(); def neighbors(r, c): if r == c == 0: return [] # Even row number if r % 2 == 0: if c: return [(r, c - 1)] return [(r - 1, c)] if c == img_shape[1] - 1: return [(r - 1, c)] return [(r, c + 1)] neighbors(0, 0) neighbors(0, 1) neighbors(1, 7) neighbors(2, 0) import sorobn structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('-') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) pgm.render(); def neighbors(r, c): if r == c == 0: return [] # Even row number if r % 2 == 0: if c: return [(r, c - 1)] return [(r - 1, c)] if c == img_shape[1] - 1: return [(r - 1, c)] return [(r, c + 1)] neighbors(0, 0) neighbors(0, 1) neighbors(1, 7) neighbors(2, 0) import sorobn structure = [ (f'{neighbor[0]}-{neighbor[1]}', f'{r}-{c}') for r in range(img_shape[0]) for c in range(img_shape[1]) for neighbor in neighbors(r, c) ] bn = sorobn.BayesNet(*structure) pgm = daft.PGM(node_unit=.7, grid_unit=1.6, directed=True) for rc in bn.nodes: r, c = rc.split('-') pgm.add_node(node=rc, x=int(c), y=img_shape[0] - int(r)) 25/28
for parent, children in bn.children.items(): for child in children: pgm.add_edge(parent, child) pgm.render(); Now, let us fit the network to the data. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter bn = sorobn.BayesNet(*structure) bn = bn.fit(pixels[digits == 0]) bn = sorobn.BayesNet(*structure) bn = bn.fit(pixels[digits == 0]) bn = sorobn.BayesNet(*structure) bn = bn.fit(pixels[digits == 0]) Finally, we can generate samples from the Bayesian network and display them as grayscale images on a grid of subplots. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter import pandas as pd fig = plt.figure(figsize=(7, 7)) grid = ImageGrid(fig, 111, nrows_ncols=(5, 5), axes_pad=.1) for ax in grid: sample = bn.sample() img = pd.Series(sample).values.reshape(img_shape) ax.imshow(img, cmap='gray') 26/28
ax.axis('off') import pandas as pd fig = plt.figure(figsize=(7, 7)) grid = ImageGrid(fig, 111, nrows_ncols=(5, 5), axes_pad=.1) for ax in grid: sample = bn.sample() img = pd.Series(sample).values.reshape(img_shape) ax.imshow(img, cmap='gray') ax.axis('off') import pandas as pd fig = plt.figure(figsize=(7, 7)) grid = ImageGrid(fig, 111, nrows_ncols=(5, 5), axes_pad=.1) for ax in grid: sample = bn.sample() img = pd.Series(sample).values.reshape(img_shape) ax.imshow(img, cmap='gray') ax.axis('off') You can access the whole set of codes through this GitHub link. What is a Bayesian Neural Network (BNN)? Bayesian Neural Networks (BNNs) are a type of neural network that incorporates Bayesian inference principles to introduce uncertainty into the weights and biases of the network. Traditional neural networks use fixed weights and biases learned through optimization algorithms like gradient descent. In contrast, BNNs treat the weights and biases as random variables with prior distributions. Bayesian neural networks combine the flexibility and expressive power of neural networks with the ability to capture uncertainty through Bayesian inference. By treating the weights and biases of the network as random variables, BNNs provide a probabilistic framework for learning and making predictions. In a BNN, prior distributions are assigned to the weights and biases, which reflect the initial beliefs about their values before observing any data. These priors can be chosen based on prior knowledge or assumptions about the problem domain. During the training process, the BNN updates the priors based on the observed data, resulting in posterior distributions over the weights and biases. The key challenge in BNNs is to approximate the posterior distribution, which captures the updated beliefs about the weights and biases given the observed data. Exact inference in BNNs is generally intractable due to neural networks’ complex, non-linear nature. Therefore, approximate inference methods are commonly used to estimate the posterior distribution. Endnote Bayesian networks in AI have emerged as a robust tool for decision-making under uncertainty. By capturing complex relationships and incorporating probabilistic reasoning, these graphical models enable us to navigate intricate scenarios and make informed choices. Whether in healthcare, finance, or other domains, the application of Bayesian networks has significantly impacted decision-making processes, 27/28
empowering individuals and organizations to optimize resource allocation, mitigate risks, and achieve better outcomes. As we continue to unlock their potential and embrace the power of probabilistic reasoning, Bayesian Networks will continue illuminating our path through uncertainty, paving the way for smarter decisions in an ever-changing world. Partner with our expert team to unlock the potential of Bayesian networks and elevate your AI solutions to new heights. We, at LeewayHertz, use advanced AI technologies to build robust AI solutions tailored to your needs. 28/28