grammarly wip (bg chap done)
This commit is contained in:
BIN
thesis/Main.pdf
BIN
thesis/Main.pdf
Binary file not shown.
@@ -241,37 +241,37 @@ This thesis tackles a broad, interdisciplinary challenge at the intersection of
|
|||||||
|
|
||||||
Because anomalies are, by nature, often unpredictable in form and structure, unsupervised learning methods are widely used since they do not require pre-assigned labels—a significant advantage when dealing with unforeseen data patterns. However, these methods can be further refined through the integration of a small amount of labeled data, giving rise to semi-supervised approaches. The method evaluated in this thesis, DeepSAD, is a semi-supervised deep learning approach that also leverages an autoencoder architecture in its design. Autoencoders have gained widespread adoption in deep learning for their ability to extract features from unlabeled data, which is particularly useful for handling complex data types such as \rev{LiDAR} scans.
|
Because anomalies are, by nature, often unpredictable in form and structure, unsupervised learning methods are widely used since they do not require pre-assigned labels—a significant advantage when dealing with unforeseen data patterns. However, these methods can be further refined through the integration of a small amount of labeled data, giving rise to semi-supervised approaches. The method evaluated in this thesis, DeepSAD, is a semi-supervised deep learning approach that also leverages an autoencoder architecture in its design. Autoencoders have gained widespread adoption in deep learning for their ability to extract features from unlabeled data, which is particularly useful for handling complex data types such as \rev{LiDAR} scans.
|
||||||
|
|
||||||
\rev{LiDAR} sensors function by projecting lasers in multiple directions near-simultaneously, measuring the time it takes for each reflected ray to return. Using the angles and travel times, the sensor constructs a point cloud that is often accurate enough to map the sensor's surroundings. In the following sections, we will delve into these technologies, review how they work, how they are generally used, how we employ them in this thesis and explore related work from these backgrounds.
|
\rev{LiDAR} sensors function by projecting lasers in multiple directions near-simultaneously, measuring the time it takes for each reflected ray to return. Using the angles and travel times, the sensor constructs a point cloud that is often accurate enough to map the sensor's surroundings. In the following sections, we will delve into these technologies, review how they work, how they are generally used, how we employ them in this thesis, and explore related work from these backgrounds.
|
||||||
|
|
||||||
|
|
||||||
\newsection{anomaly_detection}{Anomaly Detection}
|
\newsection{anomaly_detection}{Anomaly Detection}
|
||||||
|
|
||||||
|
|
||||||
Anomaly detection refers to the process of detecting unexpected patterns of data, outliers which deviate significantly from the majority of data which is implicitly defined as normal by its prevalence. In classic statistical analysis these techniques have been studied as early as the 19th century~\cite{anomaly_detection_history}. Since then, a multitude of methods and use cases for them have been proposed and studied. Examples of applications include healthcare, where computer vision algorithms are used to detect anomalies in medical images for diagnostics and early detection of diseases~\cite{anomaly_detection_medical}, detection of fraud in decentralized financial systems based on block-chain technology~\cite{anomaly_detection_defi} as well as fault detection in industrial machinery using acoustic sound data~\cite{anomaly_detection_manufacturing}.
|
Anomaly detection refers to the process of detecting unexpected patterns of data, outliers that deviate significantly from the majority of data, which is implicitly defined as normal by its prevalence. In classic statistical analysis, these techniques have been studied as early as the 19th century~\cite{anomaly_detection_history}. Since then, a multitude of methods and use cases for them have been proposed and studied. Examples of applications include healthcare, where computer vision algorithms are used to detect anomalies in medical images for diagnostics and early detection of diseases~\cite{anomaly_detection_medical}, detection of fraud in decentralized financial systems based on blockchain technology~\cite{anomaly_detection_defi}, as well as fault detection in industrial machinery using acoustic sound data~\cite{anomaly_detection_manufacturing}.
|
||||||
|
|
||||||
Figure~\ref{fig:anomaly_detection_overview} depicts a simple but illustrative example of data which can be classified as either normal or anomalous and shows the problem anomaly detection methods try to generally solve. A successful anomaly detection method would somehow learn to differentiate normal from anomalous data, for example by learning the boundaries around the available normal data and classifying it as either normal or anomalous based on its location inside or outside of those boundaries. Another possible approach could calculate an analog value which correlates with the likelihood of an sample being anomalous, for example by using the sample's distance from the closest normal data cluster's center.
|
Figure~\ref{fig:anomaly_detection_overview} depicts a simple but illustrative example of data that can be classified as either normal or anomalous and shows the problem anomaly detection methods try to generally solve. A successful anomaly detection method would somehow learn to differentiate normal from anomalous data, for example, by learning the boundaries around the available normal data and classifying it as either normal or anomalous based on its location inside or outside of those boundaries. Another possible approach could calculate an analog value that correlates with the likelihood of a sample being anomalous, for example, by using the sample's distance from the closest normal data cluster's center.
|
||||||
|
|
||||||
\figc{anomaly_detection_overview}{figures/anomaly_detection_overview}{An illustrative example of anomalous and normal data containing 2-dimensional data with clusters of normal data $N_1$ and $N_2$ as well as two single anomalies $o_1$ and $o_2$ and a cluster of anomalies $O_3$. Reproduced from~\cite{anomaly_detection_survey}\rev{.}}{width=0.5\textwidth}
|
\figc{anomaly_detection_overview}{figures/anomaly_detection_overview}{An illustrative example of anomalous and normal data containing 2-dimensional data with clusters of normal data $N_1$ and $N_2$ as well as two single anomalies $o_1$ and $o_2$ and a cluster of anomalies $O_3$. Reproduced from~\cite{anomaly_detection_survey}\rev{.}}{width=0.5\textwidth}
|
||||||
|
|
||||||
By their very nature anomalies are rare occurrences and oftentimes unpredictable in nature, which makes it hard to define all possible anomalies in any system. It also makes it very challenging to create an algorithm which is capable of detecting anomalies which may have never occurred before and may not have been known to exist during the creation of the detection algorithm. There are many possible approaches to this problem, though they can be roughly grouped into six distinct categories based on the techniques used~\cite{anomaly_detection_survey}:
|
By their very nature, anomalies are rare occurrences and oftentimes unpredictable in nature, which makes it hard to define all possible anomalies in any system. It also makes it very challenging to create an algorithm that is capable of detecting anomalies that may have never occurred before and may not have been known to exist during the creation of the detection algorithm. There are many possible approaches to this problem, though they can be roughly grouped into six distinct categories based on the techniques used~\cite{anomaly_detection_survey}:
|
||||||
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item \textbf{Classification Based} \\ A classification technique such as \rev{Support Vector Machine (SVM)~\cite{bg_svm}} is used to classify samples as either normal or anomalous based on labeled training data. Alternatively, if not enough labeled training data is available a one-class classification algorithm can be employed. In that case, the algorithm assumes all training samples to be normal and then learns a boundary around the normal samples to differentiate them from anomalous samples.
|
\item \textbf{Classification Based} \\ A classification technique, such as \rev{Support Vector Machine (SVM)~\cite{bg_svm}}, is used to classify samples as either normal or anomalous based on labeled training data. Alternatively, if not enough labeled training data is available, a one-class classification algorithm can be employed. In that case, the algorithm assumes all training samples to be normal and then learns a boundary around the normal samples to differentiate them from anomalous samples.
|
||||||
\item \textbf{Clustering Based} \\ Clustering techniques such as \rev{K-Means~\cite{bg_kmeans}} or DBSCAN\rev{~\cite{bg_dbscan}} aim to group similar \rev{data into} clusters, differentiating it from dissimilar data which may belong to another or no cluster at all. Anomaly detection methods from this category employ such a technique, with the assumption that normal data will assemble into one or more clusters due to their similar properties, while anomalies may create their own smaller clusters, not \rev{belonging} to any cluster at all or at least be \rev{at} an appreciable distance from the closest normal cluster's center.
|
\item \textbf{Clustering Based} \\ Clustering techniques such as \rev{K-Means~\cite{bg_kmeans}} or DBSCAN\rev{~\cite{bg_dbscan}} aim to group similar \rev{data into} clusters, differentiating it from dissimilar data, which may belong to another cluster or no cluster at all. Anomaly detection methods from this category employ such a technique, with the assumption that normal data will assemble into one or more clusters due to their similar properties, while anomalies may create their own smaller clusters, not \rev{belonging} to any cluster at all, or at least be \rev{at} an appreciable distance from the closest normal cluster's center.
|
||||||
\item \textbf{Nearest Neighbor Based} \\ Similar to the clustering based category, these techniques assume normal data is more closely clustered than anomalies and therefore utilize either a sample's distance to their $k^{th}$ nearest neighbor or the density of their local neighborhood, to judge whether a sample is anomalous.
|
\item \textbf{Nearest Neighbor Based} \\ Similar to the clustering based category, these techniques assume normal data is more closely clustered than anomalies and therefore utilize either a sample's distance to its $k^{th}$ nearest neighbor or the density of its local neighborhood to judge whether a sample is anomalous.
|
||||||
\item \textbf{Statistical} \\ These methods try to fit a statistical model of the normal behavior to the data. After the distribution from which normal data originates is defined, samples can be found to be normal or anomalous based on their likelihood to \rev{arising from that} distribution.
|
\item \textbf{Statistical} \\ These methods try to fit a statistical model of the normal behavior to the data. After the distribution from which normal data originates is defined, samples can be found to be normal or anomalous based on their likelihood \rev{of arising from that} distribution.
|
||||||
\item \textbf{Information Theoretic} \\ The main assumption for information theoretic anomaly detection methods, is that anomalies differ somehow in their information content from anomalous data. An information theoretic measure is therefore used to determine \rev{irregularities} in the data's information content, enabling the detection of anomalous samples.
|
\item \textbf{Information Theoretic} \\ The main assumption for information theoretic anomaly detection methods is that anomalies differ somehow in their information content from anomalous data. An information theoretic measure is therefore used to determine \rev{irregularities} in the data's information content, enabling the detection of anomalous samples.
|
||||||
\item \textbf{Spectral} \\ Spectral approaches assume the possibility to map data into a lower-dimensional space, where normal data appears significantly different from anomalous data. To this end a dimensionality reduction technique such as Principal Component Analysis (PCA)\rev{~\cite{bg_pca}} is used to embed the data into a lower dimensional \rev{subspace. Spectral} methods are oftentimes used as a pre-processing step followed by another anomaly detection method operating on the data's subspace.
|
\item \textbf{Spectral} \\ Spectral approaches assume the possibility of mapping data into a lower-dimensional space, where normal data appears significantly different from anomalous data. To this end, a dimensionality reduction technique such as Principal Component Analysis (PCA)\rev{~\cite{bg_pca}} is used to embed the data into a lower-dimensional \rev{subspace. Spectral} methods are often used as a pre-processing step followed by another anomaly detection method operating on the data's subspace.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
|
|
||||||
In this thesis we used an anomaly detection method, namely \citetitle{deepsad}\rev{~(DeepSAD)~\cite{deepsad}} to model our problem\rev{—}how to quantify the degradation of \rev{LiDAR} sensor data\rev{—}as an anomaly detection problem. We do this by classifying good quality data as normal and degraded data as anomalous and rely on a method which can express each samples likelihood of being anomalous as an analog anomaly score, which enables us to interpret it as the \rev{data} degradation quantification value.
|
In this thesis, we used an anomaly detection method, namely \citetitle{deepsad}\rev{~(DeepSAD)~\cite{deepsad}}, to model our problem\rev{—}how to quantify the degradation of \rev{LiDAR} sensor data\rev{—}as an anomaly detection problem. We do this by classifying good-quality data as normal and degraded data as anomalous, and rely on a method that can express each sample's likelihood of being anomalous as an analog anomaly score, which enables us to interpret it as the \rev{data} degradation quantification value.
|
||||||
|
|
||||||
Chapter~\ref{chp:deepsad} describes DeepSAD in more detail, which shows that it is a clustering based approach with a spectral pre-processing component, in that it uses a neural network to reduce the inputs dimensionality while simultaneously clustering normal data closely around a given centroid. It then produces an anomaly score by calculating the geometric distance between a data sample and the aforementioned cluster centroid, assuming the distance is shorter for normal than for anomalous data. Since our data is high dimensional it makes sense to use a spectral method to reduce \rev{its} dimensionality\rev{, furthermore} an approach which results in an analog value rather than a binary classification is useful for our use case since we want to quantify not only classify the data degradation.
|
Chapter~\ref{chp:deepsad} describes DeepSAD in more detail, which shows that it is a clustering based approach with a spectral pre-processing component, in that it uses a neural network to reduce the input's dimensionality while simultaneously clustering normal data closely around a given centroid. It then produces an anomaly score by calculating the geometric distance between a data sample and the aforementioned cluster centroid, assuming the distance is shorter for normal than for anomalous data. Since our data is high-dimensional, it makes sense to use a spectral method to reduce \rev{its} dimensionality. \rev{Moreover} reporting an analog value rather than a binary classification is useful for our use case since we want to quantify not only classify the data degradation.
|
||||||
|
|
||||||
There is a wide \rev{set} of problems in domains similar to the one we research in this \rev{thesis}, for which modeling them as anomaly detection problems has been proven successful. The degradation of pointclouds, produced by an industrial 3D sensor, has been modeled as an anomaly detection task in \rev{\cite{bg_ad_pointclouds_scans}}. \citeauthor{bg_ad_pointclouds_scans} propose a student-teacher model capable of inferring a pointwise anomaly score for degradation in point clouds. The teacher network is trained on an anomaly-free dataset to extract dense features of the point clouds' local geometries, after which an identical student network is trained to emulate the teacher networks' outputs. For degraded pointclouds the regression between the teacher's and student's outputs is calculated and interpreted as the anomaly score, with the rationalization that the student network has not observed features produced by anomalous geometries during training, leaving it incapable of producing a similar output as the teacher for those regions. Another example would be \rev{\cite{bg_ad_pointclouds_poles}}, which proposes a method to detect and classify pole-like objects in urban point cloud data, to differentiate between natural and man-made objects such as street signs, for autonomous driving purposes. An anomaly detection method was used to identify the vertical pole-like objects in the point clouds and then the preprocessed objects were grouped by similarity using a clustering algorithm to then classify them as either trees or man-made poles.
|
There is a wide \rev{set} of problems in domains similar to the one we research in this \rev{thesis}, for which modeling them as anomaly detection problems has been proven successful. The degradation of point clouds, produced by an industrial 3D sensor, has been modeled as an anomaly detection task in \rev{\cite{bg_ad_pointclouds_scans}}. \citeauthor{bg_ad_pointclouds_scans} propose a student-teacher model capable of inferring a pointwise anomaly score for degradation in point clouds. The teacher network is trained on an anomaly-free dataset to extract dense features of the point clouds' local geometries, after which an identical student network is trained to emulate the teacher network's outputs. For degraded point clouds, the regression between the teacher's and student's outputs is calculated and interpreted as the anomaly score, with the rationalization that the student network has not observed features produced by anomalous geometries during training, leaving it incapable of producing a similar output as the teacher for those regions. Another example would be \rev{\cite{bg_ad_pointclouds_poles}}, which proposes a method to detect and classify pole-like objects in urban point cloud data, to differentiate between natural and man-made objects such as street signs, for autonomous driving purposes. An anomaly detection method was used to identify the vertical pole-like objects in the point clouds, and then the preprocessed objects were grouped by similarity using a clustering algorithm to classify them as either trees or man-made poles.
|
||||||
|
|
||||||
As already shortly mentioned at the beginning of this section, anomaly detection methods and their usage are oftentimes challenged by the limited availability of anomalous data, owing to the very nature of anomalies which are rare occurrences. Oftentimes the intended use case is to even find unknown anomalies in a given dataset which have not yet been identified. In addition, it can be challenging to classify anomalies correctly for complex data, since the very definition of an anomaly is dependent on many factors such as the type of data, the intended use case or even how the data evolves over time. For these reasons most types of anomaly detection approaches limit their reliance on anomalous data during training and many of them do not differentiate between normal and anomalous data at all. DeepSAD is a semi-supervised method which is characterized by using a mixture of labeled and unlabeled data.
|
As already shortly mentioned at the beginning of this section, anomaly detection methods and their usage are oftentimes challenged by the limited availability of anomalous data, owing to the very nature of anomalies, which are rare occurrences. Oftentimes, the intended use case is to even find unknown anomalies in a given dataset that have not yet been identified. In addition, it can be challenging to classify anomalies correctly for complex data, since the very definition of an anomaly is dependent on many factors, such as the type of data, the intended use case, or even how the data evolves over time. For these reasons, most types of anomaly detection approaches limit their reliance on anomalous data during training, and many of them do not differentiate between normal and anomalous data at all. DeepSAD is a semi-supervised method that is characterized by using a mixture of labeled and unlabeled data.
|
||||||
|
|
||||||
|
|
||||||
\newsection{semi_supervised}{Semi-Supervised Learning Algorithms}
|
\newsection{semi_supervised}{Semi-Supervised Learning Algorithms}
|
||||||
@@ -279,7 +279,7 @@ As already shortly mentioned at the beginning of this section, anomaly detection
|
|||||||
|
|
||||||
Machine learning refers to algorithms capable of learning patterns from existing data to perform tasks on previously unseen data, without being explicitly programmed to do so~\cite{machine_learning_first_definition}. Central to many approaches is the definition of an objective function that measures how well the model is performing. The model’s parameters are then adjusted to optimize this objective. By leveraging these data-driven methods, machine learning can handle complex tasks across a wide range of domains.
|
Machine learning refers to algorithms capable of learning patterns from existing data to perform tasks on previously unseen data, without being explicitly programmed to do so~\cite{machine_learning_first_definition}. Central to many approaches is the definition of an objective function that measures how well the model is performing. The model’s parameters are then adjusted to optimize this objective. By leveraging these data-driven methods, machine learning can handle complex tasks across a wide range of domains.
|
||||||
|
|
||||||
Among the techniques employed in machine \rev{learning,} neural networks have become especially prominent over the past few decades due to their ability to achieve state-of-the-art results across a wide variety of domains. They are most commonly composed of layers of interconnected artificial neurons. Each neuron computes a weighted sum of its inputs, adds a bias term, and then applies a nonlinear activation function, enabling them to model complex non-linear relationships. These layers are typically organized into three types:
|
Among the techniques employed in machine \rev{learning,} neural networks have become especially prominent over the past few decades due to their ability to achieve state-of-the-art results across a wide variety of domains. They are most commonly composed of layers of interconnected artificial neurons. Each neuron computes a weighted sum of its inputs, adds a bias term, and then applies a nonlinear activation function, enabling the network to model complex nonlinear relationships. These layers are typically organized into three types:
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item Input layer, which receives raw data.
|
\item Input layer, which receives raw data.
|
||||||
@@ -287,11 +287,11 @@ Among the techniques employed in machine \rev{learning,} neural networks have be
|
|||||||
\item Output layer, which produces the network’s final prediction.
|
\item Output layer, which produces the network’s final prediction.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
As outlined above, neural network training is formulated as an optimization problem: we define an objective function that measures how well the model is achieving its task and then we adjust the network’s parameters to optimize that objective. The most common approach is stochastic gradient descent (SGD) or one of its \rev{variants.} In each training iteration, the network first performs a forward pass to compute its outputs and evaluate the objective, then a backward pass—known as backpropagation—to calculate gradients of the objective with respect to every weight in the network. These gradients indicate the direction in which each weight should change to improve performance, and the weights are updated accordingly. Repeating this process over many iterations (also called epochs) allows the network to progressively refine its parameters and better fulfill its task.
|
As outlined above, neural network training is formulated as an optimization problem: we define an objective function that measures how well the model is achieving its task, and then we adjust the network’s parameters to optimize that objective. The most common approach is stochastic gradient descent (SGD) or one of its \rev{variants.} In each training iteration, the network first performs a forward pass to compute its outputs and evaluate the objective, then a backward pass—known as backpropagation—to calculate gradients of the objective with respect to every weight in the network. These gradients indicate the direction in which each weight should change to improve performance, and the weights are updated accordingly. Repeating this process over many iterations (also called epochs) allows the network to progressively refine its parameters and better fulfill its task.
|
||||||
|
|
||||||
Aside from the underlying technique, one can also categorize machine learning algorithms by the type of feedback provided during learning, for the network to improve. Broadly speaking, three main categories—supervised, unsupervised and reinforcement learning—exist, although many other approaches do not exactly fit any of these categories and have spawned less common categories like semi-supervised or self-supervised learning.
|
Aside from the underlying technique, one can also categorize machine learning algorithms by the type of feedback provided during learning, for the network to improve. Broadly speaking, three main categories—supervised, unsupervised, and reinforcement learning—exist, although many other approaches do not exactly fit any of these categories and have spawned less common categories like semi-supervised or self-supervised learning.
|
||||||
|
|
||||||
In supervised learning, each input sample is paired with a “ground-truth” label representing the desired output. During training, the model makes a prediction and a loss function quantifies the difference between the prediction and the truth label. The learning algorithm then adjusts its parameters to minimize this loss, improving its performance over time. Labels are typically categorical (used for classification tasks, such as distinguishing “cat” from “dog”) or continuous (used for regression tasks, like predicting a temperature or distance). Figure~\ref{fig:ml_learning_schema_concept}~\rev{(b)} illustrates this principle with a classification example, where labeled data is used to learn a boundary between two classes.
|
In supervised learning, each input sample is paired with a “ground-truth” label representing the desired output. During training, the model makes a prediction, and a loss function quantifies the difference between the prediction and the truth label. The learning algorithm then adjusts its parameters to minimize this loss, improving its performance over time. Labels are typically categorical (used for classification tasks, such as distinguishing “cat” from “dog”) or continuous (used for regression tasks, like predicting a temperature or distance). Figure~\ref{fig:ml_learning_schema_concept}~\rev{(b)} illustrates this principle with a classification example, where labeled data is used to learn a boundary between two classes.
|
||||||
|
|
||||||
|
|
||||||
\figc{ml_learning_schema_concept}{figures/ml_learning_schema_concept.png}{Conceptual illustration of unsupervised (a) and supervised (b) learning. In (a), the inputs are two-dimensional data without labels, and the algorithm groups them into clusters without external guidance. In (b), the inputs have class labels (colors), which serve as training signals for learning a boundary between the two classes. Reproduced from~\cite{ml_supervised_unsupervised_figure_source}.}{width=0.6\textwidth}
|
\figc{ml_learning_schema_concept}{figures/ml_learning_schema_concept.png}{Conceptual illustration of unsupervised (a) and supervised (b) learning. In (a), the inputs are two-dimensional data without labels, and the algorithm groups them into clusters without external guidance. In (b), the inputs have class labels (colors), which serve as training signals for learning a boundary between the two classes. Reproduced from~\cite{ml_supervised_unsupervised_figure_source}.}{width=0.6\textwidth}
|
||||||
@@ -301,43 +301,43 @@ In unsupervised learning, models work directly with raw data, without any ground
|
|||||||
%In reinforcement learning, the model—often called an agent—learns by interacting with an environment, that provides feedback in the form of rewards or penalties. At each step, the agent observes the environment’s state, selects an action, and an interpreter judges the action's outcome based on how the environment changed, providing a scalar reward or penalty that reflects the desirability of that outcome. The agent’s objective is to adjust its decision-making strategy to maximize the cumulative reward over time, balancing exploration of new actions with exploitation of known high-reward behaviors. This trial-and-error approach is well suited to sequential decision problems in complex settings, such as autonomous navigation or robotic control, where each choice affects both the immediate state and future possibilities.
|
%In reinforcement learning, the model—often called an agent—learns by interacting with an environment, that provides feedback in the form of rewards or penalties. At each step, the agent observes the environment’s state, selects an action, and an interpreter judges the action's outcome based on how the environment changed, providing a scalar reward or penalty that reflects the desirability of that outcome. The agent’s objective is to adjust its decision-making strategy to maximize the cumulative reward over time, balancing exploration of new actions with exploitation of known high-reward behaviors. This trial-and-error approach is well suited to sequential decision problems in complex settings, such as autonomous navigation or robotic control, where each choice affects both the immediate state and future possibilities.
|
||||||
In reinforcement learning, an agent learns by trial and error while interacting with an environment. After each action, it receives feedback in the form of rewards or penalties and adapts its strategy to maximize the total reward over time. This makes reinforcement learning particularly suited for sequential decision-making tasks such as robotics or game playing.
|
In reinforcement learning, an agent learns by trial and error while interacting with an environment. After each action, it receives feedback in the form of rewards or penalties and adapts its strategy to maximize the total reward over time. This makes reinforcement learning particularly suited for sequential decision-making tasks such as robotics or game playing.
|
||||||
|
|
||||||
Semi-Supervised learning algorithms are an \rev{in-between} category of supervised and unsupervised algorithms, in that they use a mixture of labeled and unlabeled data. Typically vastly more unlabeled data is used during training of such algorithms than labeled data, due to the effort and expertise required to label large quantities of data correctly. Semi-supervised methods are oftentimes an effort to improve a machine learning algorithm belonging to either the supervised or unsupervised category. Supervised methods such as classification tasks are enhanced by using large amounts of unlabeled data to augment the supervised training without additional need of labeling work. Alternatively, unsupervised methods like clustering algorithms may not only use unlabeled data but improve their performance by considering some hand-labeled data during training.
|
Semi-supervised learning algorithms are an \rev{in-between} category of supervised and unsupervised algorithms, in that they use a mixture of labeled and unlabeled data. Typically, vastly more unlabeled data is used during training of such algorithms than labeled data, due to the effort and expertise required to label large quantities of data correctly. Semi-supervised methods are often an effort to improve a machine learning algorithm belonging to either the supervised or unsupervised category. Supervised methods, such as classification tasks, are enhanced by using large amounts of unlabeled data to augment the supervised training without the need for additional labeling work. Alternatively, unsupervised methods like clustering algorithms may not only use unlabeled data but also improve their performance by considering some hand-labeled data during training.
|
||||||
|
|
||||||
Machine learning based anomaly detection methods can utilize techniques from all of the aforementioned categories, although their suitability varies. While supervised anomaly detection methods exist, their usability not only depends on the availability of labeled training data but also on a reasonable proportionality between normal and anomalous data. Both requirements can be challenging due to labeling often being labor intensive and anomalies' intrinsic property to occur rarely when compared to normal data, making capture of enough anomalous behavior a hard problem. Semi-Supervised anomaly detection methods are of special interest in that they may overcome these difficulties inherently present in many anomaly detection tasks~\cite{semi_ad_survey}. These methods typically have the same goal as unsupervised anomaly detection methods which is to model the normal class behavior and delimitate it from anomalies, but they can incorporate some hand-labeled examples of normal and/or anomalous behavior to improve their performance over fully unsupervised methods. DeepSAD is a semi-supervised method which extends its unsupervised predecessor Deep SVDD~\cite{deep_svdd} by including some labeled samples during training. Both, DeepSAD and Deep SVDD also utilize an autoencoder in a pretraining step, a machine learning architecture\rev{, which we will look at next}.
|
Machine learning based anomaly detection methods can utilize techniques from all of the aforementioned categories, although their suitability varies. While supervised anomaly detection methods exist, their usability not only depends on the availability of labeled training data but also on a reasonable proportionality between normal and anomalous data. Both requirements can be challenging due to labeling often being labor-intensive and anomalies' intrinsic property to occur rarely when compared to normal data, making the capture of enough anomalous behavior a hard problem. Semi-supervised anomaly detection methods are of special interest in that they may overcome these difficulties inherently present in many anomaly detection tasks~\cite{semi_ad_survey}. These methods typically have the same goal as unsupervised anomaly detection methods which is to model the normal class behavior and delimitate it from anomalies, but they can incorporate some hand-labeled examples of normal and/or anomalous behavior to improve their performance over fully unsupervised methods. DeepSAD is a semi-supervised method that extends its unsupervised predecessor Deep SVDD~\cite{deep_svdd} by including some labeled samples during training. Both DeepSAD and Deep SVDD also utilize an autoencoder in a pretraining step, a machine learning architecture\rev{, which we will look at next}.
|
||||||
|
|
||||||
\newsection{autoencoder}{Autoencoder}
|
\newsection{autoencoder}{Autoencoder}
|
||||||
|
|
||||||
|
|
||||||
Autoencoders are a type of neural network architecture, whose main goal is learning to encode input data into a representative state, from which the same input can be reconstructed, hence the name. They typically consist of two functions, an encoder and a decoder with a latent space \rev{in between} them as depicted in the toy example in \rev{Figure}~\ref{fig:autoencoder_general}. The encoder learns to extract the most significant features from the input and to convert them into the input's latent space representation. The reconstruction goal ensures that the most prominent features of the input get retained during the encoding phase, due to the inherent inability to reconstruct the input if too much relevant information is missing. The decoder simultaneously learns to reconstruct the original input from its encoded latent space representation, by minimizing the error between the input sample and the autoencoder's output. This optimization goal complicates the categorization of autoencoders as unsupervised methods. Although they do not require labeled data, they still compute an error against a known target—the input itself. For this reason, some authors describe them as a form of self-supervised learning, where the data provides its own supervisory signal without requiring expert labeling.
|
Autoencoders are a type of neural network architecture whose main goal is learning to encode input data into a representative state, from which the same input can be reconstructed, hence the name. They typically consist of two functions, an encoder and a decoder with a latent space \rev{in between} them as depicted in the toy example in \rev{Figure}~\ref{fig:autoencoder_general}. The encoder learns to extract the most significant features from the input and to convert them into the input's latent space representation. The reconstruction goal ensures that the most prominent features of the input are retained during the encoding phase, due to the inherent inability to reconstruct the input if too much relevant information is missing. The decoder simultaneously learns to reconstruct the original input from its encoded latent space representation by minimizing the error between the input sample and the autoencoder's output. This optimization goal complicates the categorization of autoencoders as unsupervised methods. Although they do not require labeled data, they still compute an error against a known target—the input itself. For this reason, some authors describe them as a form of self-supervised learning, where the data provides its own supervisory signal without requiring expert labeling.
|
||||||
|
|
||||||
\fig{autoencoder_general}{figures/autoencoder_principle.png}{Illustration of an autoencoder’s working principle. The encoder $\mathbf{g_\phi}$ compresses the input into a lower-dimensional bottleneck representation $\mathbf{z}$, which is then reconstructed by the decoder $\mathbf{f_\theta}$. During training, the difference between input and output serves as the loss signal to optimize both the encoder’s feature extraction and the decoder’s reconstruction. Reproduced from~\cite{ml_autoencoder_figure_source}.
|
\fig{autoencoder_general}{figures/autoencoder_principle.png}{Illustration of an autoencoder’s working principle. The encoder $\mathbf{g_\phi}$ compresses the input into a lower-dimensional bottleneck representation $\mathbf{z}$, which is then reconstructed by the decoder $\mathbf{f_\theta}$. During training, the difference between input and output serves as the loss signal to optimize both the encoder’s feature extraction and the decoder’s reconstruction. Reproduced from~\cite{ml_autoencoder_figure_source}.
|
||||||
}
|
}
|
||||||
|
|
||||||
One key use case of autoencoders is to employ them as a dimensionality reduction technique. In that case, the latent space \rev{in between} the encoder and decoder is of a lower dimensionality than the input data itself. Due to the aforementioned reconstruction goal, the shared information between the input data and its latent space representation is maximized, which is known as following the Infomax principle\rev{~\cite{bg_infomax}}. After training such an autoencoder, it may be used to generate lower-dimensional representations of the given datatype, enabling more performant computations which may have been infeasible to achieve on the original data. DeepSAD uses an autoencoder in a pretraining step to achieve this goal among others.
|
One key use case of autoencoders is to employ them as a dimensionality reduction technique. In that case, the latent space \rev{in between} the encoder and decoder is of a lower dimensionality than the input data itself. Due to the aforementioned reconstruction goal, the shared information between the input data and its latent space representation is maximized, which is known as following the Infomax principle\rev{~\cite{bg_infomax}}. After training such an autoencoder, it may be used to generate lower-dimensional representations of the given datatype, enabling more performant computations that may have been infeasible to achieve on the original data. DeepSAD uses an autoencoder in a pretraining step to achieve this goal, among others.
|
||||||
|
|
||||||
Autoencoders have been shown to be useful in the anomaly detection domain by assuming that autoencoders trained on more normal than anomalous data are better at reconstructing normal behavior than anomalous one. This assumption allows methods to utilize the reconstruction error as an anomaly score. Examples of this are the methods in \rev{\cite{bg_autoencoder_ad} or \cite{bg_autoencoder_ad_2}} which both employ an autoencoder and the aforementioned assumption. Autoencoders have also been shown to be a suitable dimensionality reduction technique for \rev{LiDAR} data, which is oftentimes high-dimensional and sparse, making feature extraction and dimensionality reduction popular preprocessing steps. As an example, \rev{\cite{bg_autoencoder_lidar}} shows the feasibility and advantages of using an autoencoder architecture to reduce \rev{LiDAR}-orthophoto fused feature's dimensionality for their building detection method, which can recognize buildings in visual data taken from an airplane. Similarly, we can make use of the dimensionality reduction in DeepSAD's pretraining step, since our method is intended to work with high-dimensional \rev{LiDAR} data.
|
Autoencoders have been shown to be useful in the anomaly detection domain by assuming that autoencoders trained on more normal than anomalous data are better at reconstructing normal behavior than anomalous one. This assumption allows methods to utilize the reconstruction error as an anomaly score. Examples of this are the methods in \rev{\cite{bg_autoencoder_ad} or \cite{bg_autoencoder_ad_2}} that both employ an autoencoder and the aforementioned assumption. Autoencoders have also been shown to be a suitable dimensionality reduction technique for \rev{LiDAR} data, which is frequently high-dimensional and sparse, making feature extraction and dimensionality reduction popular preprocessing steps. As an example, \rev{\cite{bg_autoencoder_lidar}} shows the feasibility and advantages of using an autoencoder architecture to reduce \rev{LiDAR}-orthophoto fused features' dimensionality for their building detection method, which can recognize buildings in visual data taken from an airplane. Similarly, we can make use of the dimensionality reduction in DeepSAD's pretraining step, since our method is intended to work with high-dimensional \rev{LiDAR} data.
|
||||||
|
|
||||||
\newsection{lidar_related_work}{\rev{LiDAR} - Light Detection and Ranging}
|
\newsection{lidar_related_work}{\rev{LiDAR} - Light Detection and Ranging}
|
||||||
|
|
||||||
|
|
||||||
\rev{LiDAR} (Light Detection and Ranging) measures distance by emitting short laser pulses and timing how long they take to return, an approach many may be familiar with from the more commonly known radar technology, which uses radio-frequency pulses and measures their return time to gauge an object's range. Unlike radar, however, \rev{LiDAR} operates at much shorter wavelengths and can fire millions of pulses per second, achieving millimeter-level precision and dense, high-resolution 3D point clouds. This fine granularity makes \rev{LiDAR} ideal for applications such as detailed obstacle mapping, surface reconstruction, and autonomous navigation in complex environments.
|
\rev{LiDAR} (Light Detection and Ranging) measures distance by emitting short laser pulses and timing how long they take to return, an approach many may be familiar with from the more commonly known radar technology, which uses radio-frequency pulses and measures their return time to gauge an object's range. Unlike radar, however, \rev{LiDAR} operates at much shorter wavelengths and can fire millions of pulses per second, achieving millimeter-level precision and dense, high-resolution 3D point clouds. This fine granularity makes \rev{LiDAR} ideal for applications such as detailed obstacle mapping, surface reconstruction, and autonomous navigation in complex environments.
|
||||||
|
|
||||||
Because the speed of light in air is effectively constant, multiplying half the round‐trip time by that speed gives the distance between the \rev{LiDAR} sensor and the reflecting object, as can be seen visualized in \rev{Figure}~\ref{fig:lidar_working_principle}. Modern spinning multi‐beam \rev{LiDAR} systems emit up to millions of these pulses every second. Each pulse is sent at a known combination of horizontal and vertical angles, creating a regular grid of measurements: for example, 32 vertical channels swept through 360° horizontally at a fixed angular spacing. While newer solid-state designs (flash, MEMS, phased-array) are emerging, spinning multi-beam \rev{LiDAR} remains the most commonly seen type in autonomous vehicles and robotics because of its proven range, reliability, and mature manufacturing base.
|
Because the speed of light in air is effectively constant, multiplying half the round‐trip time by that speed gives the distance between the \rev{LiDAR} sensor and the reflecting object, as can be seen in \rev{Figure}~\ref{fig:lidar_working_principle}. Modern spinning multi‐beam \rev{LiDAR} systems emit up to millions of these pulses every second. Each pulse is sent at a known combination of horizontal and vertical angles, creating a regular grid of measurements: for example, 32 vertical channels swept through 360° horizontally at a fixed angular spacing. While newer solid-state designs (flash, MEMS, phased-array) are emerging, spinning multi-beam \rev{LiDAR} remains the most commonly seen type in autonomous vehicles and robotics because of its proven range, reliability, and mature manufacturing base.
|
||||||
|
|
||||||
\figc{lidar_working_principle}{figures/bg_lidar_principle.png}{Illustration of the working principle of a \rev{LiDAR} sensor. The emitter sends out an optical signal that is reflected by objects in the scene and captured by the receiver. The system controller measures the time delay $\Delta t$ between emission and reception to calculate distance using $d = c \cdot \Delta t / 2$. By repeating this process across many directions—either with multiple emitter/receiver pairs or sequentially in a spinning \rev{LiDAR}—the sensor obtains a dense set of distances that, combined with their emission angles, form a 3D point cloud of the environment. Reproduced from~\cite{bg_lidar_figure_source}.
|
\figc{lidar_working_principle}{figures/bg_lidar_principle.png}{Illustration of the working principle of a \rev{LiDAR} sensor. The emitter sends out an optical signal that is reflected by objects in the scene and captured by the receiver. The system controller measures the time delay $\Delta t$ between emission and reception to calculate distance $d = c \cdot \Delta t / 2$. By repeating this process across many directions—either with multiple emitter/receiver pairs or sequentially in a spinning \rev{LiDAR}—the sensor obtains a dense set of distances that, combined with their emission angles, form a 3D point cloud of the environment. Reproduced from~\cite{bg_lidar_figure_source}.
|
||||||
}{width=.8\textwidth}
|
}{width=.8\textwidth}
|
||||||
|
|
||||||
\rev{Each time} a \rev{LiDAR} emits and receives a laser pulse, it can use the ray's direction and the calculated distance to produce a single three-dimensional point. By collecting up to millions of such points each second, the sensor constructs a “point cloud”—a dense set of 3D coordinates relative to the \rev{LiDAR}’s own position. In addition to \rev{$X$, $Y$, and $Z$}, many \rev{LiDAR}s also record the intensity or reflectivity of each return, providing extra information about the surface properties of the object hit by the pulse.
|
\rev{Each time} a \rev{LiDAR} emits and receives a laser pulse, it can use the ray's direction and the calculated distance to produce a single three-dimensional point. By collecting up to millions of such points each second, the sensor constructs a “point cloud”—a dense set of 3D coordinates relative to the \rev{LiDAR}’s own position. In addition to \rev{$X$, $Y$, and $Z$}, many \rev{LiDAR}s also record the intensity or reflectivity of each return, providing extra information about the surface properties of the object hit by the pulse.
|
||||||
|
|
||||||
\rev{LiDAR}’s high accuracy, long range, and full-circle field of view make it indispensable for tasks like obstacle detection, simultaneous localization and mapping~(SLAM)~\rev{\cite{bg_slam}}, and terrain modeling in autonomous driving and mobile robotics. While complementary sensors—such as time-of-flight cameras, ultrasonic sensors, and RGB cameras—have their strengths at short range or in particular lighting, only \rev{LiDAR} delivers the combination of precise 3D measurements over medium to long distances, consistent performance regardless of illumination, and the pointcloud density needed for safe navigation. \rev{LiDAR} systems do exhibit intrinsic noise (e.g., range quantization or occasional multi-return ambiguities), but in most robotic applications these effects are minor compared to environmental degradation.
|
\rev{LiDAR}’s high accuracy, long range, and full-circle field of view make it indispensable for tasks like obstacle detection, simultaneous localization and mapping~(SLAM)~\rev{\cite{bg_slam}}, and terrain modeling in autonomous driving and mobile robotics. While complementary sensors—such as time-of-flight cameras, ultrasonic sensors, and RGB cameras—have their strengths at short range or in particular lighting, only \rev{LiDAR} delivers the combination of precise 3D measurements over medium to long distances, consistent performance regardless of illumination, and the point cloud density needed for safe navigation. \rev{LiDAR} systems do exhibit intrinsic noise (e.g., range quantization or occasional multi-return ambiguities), but in most robotic applications, these effects are minor compared to environmental degradation.
|
||||||
|
|
||||||
In subterranean and rescue domain scenarios, the dominant challenge is airborne particles: dust kicked up by debris or smoke from fires. These aerosols create early returns that can mask real obstacles and cause missing data behind particle clouds, undermining SLAM and perception algorithms designed for cleaner data. This degradation is a type of atmospheric scattering, which can be caused by any kind of airborne particulates (e.g., snowflakes) or liquids (e.g., water droplets). Other kinds of environmental noise exist as well, such as specular reflections caused by smooth surfaces, beam occlusion due to close objects blocking the sensor's field of view or even thermal drift-temperature affecting the sensor's circuits and mechanics, introducing biases in the measurements.
|
In subterranean and rescue domain scenarios, the dominant challenge is airborne particles: dust kicked up by debris or smoke from fires. These aerosols create early returns that can mask real obstacles and cause missing data behind particle clouds, undermining SLAM and perception algorithms designed for cleaner data. This degradation is a type of atmospheric scattering, which can be caused by any kind of airborne particulates (e.g., snowflakes) or liquids (e.g., water droplets). Other kinds of environmental noise exist as well, such as specular reflections caused by smooth surfaces, beam occlusion due to close objects blocking the sensor's field of view, or even thermal drift--temperature changes affecting the sensor's circuits and mechanics, introducing biases in the measurements.
|
||||||
|
|
||||||
All of these may create unwanted noise in the point cloud created by the \rev{LiDAR}, making this domain an important research topic. \rev{In \cite{lidar_denoising_survey} an overview} about the current state of research into denoising methods for \rev{LiDAR} in adverse environments \rev{is given. It} categorizes them according to their approach (distance-, intensity- or learning-based) and concludes that all approaches have merits but also open challenges to solve, for autonomous systems to safely navigate these adverse environments. The current research is heavily focused on the automotive domain, which can be observed by the vastly higher number of methods filtering noise from adverse weather effects\rev{--}environmental scattering from rain, snow and fog-than from dust, smoke or other particles occuring rarely in the automotive domain.
|
All of these may create unwanted noise in the point cloud created by the \rev{LiDAR}, making this domain an important research topic. \rev{In \cite{lidar_denoising_survey}, an overview of} the current state of research into denoising methods for \rev{LiDAR} in adverse environments \rev{is given. It} categorizes them according to their approach (distance-, intensity-, or learning-based) and concludes that all approaches have merits but also open challenges to solve, for autonomous systems to safely navigate these adverse environments. The current research is heavily focused on the automotive domain, which can be observed by the vastly higher number of methods filtering noise from adverse weather effects\rev{--}environmental scattering from rain, snow, and fog-than from dust, smoke, or other particles occurring rarely in the automotive domain.
|
||||||
|
|
||||||
A learning-based method to filter dust-caused degradation from \rev{LiDAR} is introduced in \rev{\cite{lidar_denoising_dust}}. The authors employ a convultional neural network to classify dust particles in \rev{LiDAR} point clouds as such, enabling the filtering of those points and compare their methods to more conservative approaches, such as various outlier removal algorithms. Another relevant example would be the filtering method proposed in \rev{\cite{lidar_subt_dust_removal}}, which enables the filtration of pointclouds degraded by smoke or dust in subterranean environments, with a focus on the search and rescue domain. To achieve this, they formulated a filtration framework that relies on dynamic onboard statistical cluster outlier removal, to classify and remove dust particles in point clouds.
|
A learning-based method to filter dust-caused degradation from \rev{LiDAR} is introduced in \rev{\cite{lidar_denoising_dust}}. The authors employ a convolutional neural network to classify dust particles in \rev{LiDAR} point clouds, enabling the filtering of those points, and compare their methods to more conservative approaches, such as various outlier removal algorithms. Another relevant example would be the filtering method proposed in \rev{\cite{lidar_subt_dust_removal}}, which enables the filtration of point clouds degraded by smoke or dust in subterranean environments, with a focus on the search and rescue domain. To achieve this, they formulated a filtration framework that relies on dynamic onboard statistical cluster outlier removal to classify and remove dust particles in point clouds.
|
||||||
|
|
||||||
Our method does not aim to remove the noise or degraded points in the \rev{LiDAR} data, but quantify its degradation to inform other systems of the autonomous robot about the data's quality, enabling more informed decisions. One such approach, though from the autonomous driving and not from the search and rescue domain can be found in \rev{\cite{degradation_quantification_rain}, where a} learning-based method to quantify the \rev{LiDAR} sensor data degradation caused by adverse weather-effects was proposed. \rev{They posed} the problem as an anomaly detection task and \rev{utilized} DeepSAD to learn degraded data to be an anomaly and high quality data to be normal behaviour. DeepSAD's anomaly score was used as the degradation's quantification score. From this example we decided to imitate this method and adapt it for the search and rescue domain, although this proved challenging due to the more limited data availability. Since it was effective for this closely related use case, we also employed DeepSAD, whose detailed workings we present in the following chapter.
|
Our method does not aim to remove the noise or degraded points in the \rev{LiDAR} data, but to quantify its degradation to inform other systems of the autonomous robot about the data's quality, enabling more informed decisions. One such approach, though from the autonomous driving and not from the search and rescue domain, can be found in \rev{\cite{degradation_quantification_rain}, where a} learning-based method to quantify the \rev{LiDAR} sensor data degradation caused by adverse weather effects was proposed. \rev{They posed} the problem as an anomaly detection task and \rev{utilized} DeepSAD to learn degraded data to be an anomaly and high-quality data to be normal behaviour. DeepSAD's anomaly score was used as the degradation quantification score. From this example, we decided to imitate this method and adapt it for the search and rescue domain, although this proved challenging due to the more limited data availability. Since it was effective for this closely related use case, we also employed DeepSAD, whose detailed workings we present in the following chapter.
|
||||||
|
|
||||||
\newchapter{deepsad}{DeepSAD: Semi-Supervised Anomaly Detection}
|
\newchapter{deepsad}{DeepSAD: Semi-Supervised Anomaly Detection}
|
||||||
|
|
||||||
@@ -345,12 +345,12 @@ In this chapter, we explore the method \rev{DeepSAD}~\cite{deepsad}, which we em
|
|||||||
|
|
||||||
\newsection{algorithm_description}{Algorithm Description}
|
\newsection{algorithm_description}{Algorithm Description}
|
||||||
|
|
||||||
DeepSAD's overall mechanics are similar to clustering-based anomaly detection methods, which according to \rev{\cite{anomaly_detection_survey}} typically follow a two-step approach. First, a clustering algorithm groups data points around a centroid; then, the distances of individual data points from this centroid are calculated and used as anomaly scores. In DeepSAD, these concepts are implemented by employing a neural network, which is jointly trained to map input data onto a latent space and to minimize the volume of an data-encompassing hypersphere, whose center is the aforementioned centroid. The data's geometric distance in the latent space to the hypersphere center is used as the anomaly score, where a larger distance between data and centroid corresponds to a higher probability of a sample being anomalous. This is achieved by shrinking the data-encompassing hypersphere during training, proportionally to all training data, of which is required that there is significantly more normal than anomalous data present. The outcome of this approach is that normal data gets clustered more closely around the centroid, while anomalies appear further away from it as can be seen in the toy example depicted in \rev{Figure}~\ref{fig:deep_svdd_transformation}.
|
DeepSAD's overall mechanics are similar to clustering-based anomaly detection methods, which, according to \rev{\cite{anomaly_detection_survey}}, typically follow a two-step approach. First, a clustering algorithm groups data points around a centroid; then, the distances of individual data points from this centroid are calculated and used as anomaly scores. In DeepSAD, these concepts are implemented by employing a neural network, which is jointly trained to map input data onto a latent space and to minimize the volume of an data-encompassing hypersphere, whose center is the aforementioned centroid. The data's geometric distance in the latent space to the hypersphere center is used as the anomaly score, where a larger distance between data and centroid corresponds to a higher probability of a sample being anomalous. This is achieved by shrinking the data-encompassing hypersphere during training, proportionally to all training data, of which is required that there is significantly more normal than anomalous data present. The outcome of this approach is that normal data gets clustered more closely around the centroid, while anomalies appear further away from it as can be seen in the toy example depicted in \rev{Figure}~\ref{fig:deep_svdd_transformation}.
|
||||||
|
|
||||||
\fig{deep_svdd_transformation}{figures/deep_svdd_transformation}{DeepSAD teaches a neural network to transform data into a latent space and minimize the volume of an data-encompassing hypersphere centered around a predetermined centroid $\textbf{c}$. \\Reproduced from~\cite{deep_svdd}.}
|
\fig{deep_svdd_transformation}{figures/deep_svdd_transformation}{DeepSAD teaches a neural network to transform data into a latent space and minimize the volume of an data-encompassing hypersphere centered around a predetermined centroid $\textbf{c}$. \\Reproduced from~\cite{deep_svdd}.}
|
||||||
|
|
||||||
|
|
||||||
Before DeepSAD's training can begin, a pretraining step is required, during which an autoencoder is trained on all available input data. One of DeepSAD's goals is to map input data onto a lower dimensional latent space, in which the separation between normal and anomalous data can be achieved. To this end DeepSAD and its predecessor Deep SVDD make use of the autoencoder's reconstruction goal, whose successful training ensures confidence in the encoder architecture's suitability for extracting the input datas' most prominent information to the latent space \rev{in between} the encoder and decoder. DeepSAD goes on to use just the encoder as its main network architecture, discarding the decoder at this step, since reconstruction of the input is unnecessary.
|
Before DeepSAD's training can begin, a pretraining step is required, during which an autoencoder is trained on all available input data. One of DeepSAD's goals is to map input data onto a lower-dimensional latent space, in which the separation between normal and anomalous data can be achieved. To this end DeepSAD and its predecessor Deep SVDD make use of the autoencoder's reconstruction goal, whose successful training ensures confidence in the encoder architecture's suitability for extracting the input datas' most prominent information to the latent space \rev{in between} the encoder and decoder. DeepSAD goes on to use just the encoder as its main network architecture, discarding the decoder at this step, since reconstruction of the input is unnecessary.
|
||||||
|
|
||||||
|
|
||||||
The pretraining results are used in two more key ways. First, the encoder weights obtained from the autoencoder pretraining initialize DeepSAD’s network for the main training phase. Second, we perform an initial forward pass through the encoder on all training samples, and the mean of these latent representations is set as the hypersphere center, $\mathbf{c}$. According to \citeauthor{deepsad}, this initialization method leads to faster convergence during the main training phase compared to using a randomly selected centroid. An alternative would be to compute $\mathbf{c}$ using only the labeled normal examples, which would prevent the center from being influenced by anomalous samples; however, this requires a sufficient number of labeled normal samples. Once defined, the hypersphere center $\mathbf{c}$ remains fixed, as allowing it to be optimized freely could in the unsupervised case lead to a hypersphere collapse-a trivial solution where the network learns to map all inputs directly onto the centroid $\mathbf{c}$.
|
The pretraining results are used in two more key ways. First, the encoder weights obtained from the autoencoder pretraining initialize DeepSAD’s network for the main training phase. Second, we perform an initial forward pass through the encoder on all training samples, and the mean of these latent representations is set as the hypersphere center, $\mathbf{c}$. According to \citeauthor{deepsad}, this initialization method leads to faster convergence during the main training phase compared to using a randomly selected centroid. An alternative would be to compute $\mathbf{c}$ using only the labeled normal examples, which would prevent the center from being influenced by anomalous samples; however, this requires a sufficient number of labeled normal samples. Once defined, the hypersphere center $\mathbf{c}$ remains fixed, as allowing it to be optimized freely could in the unsupervised case lead to a hypersphere collapse-a trivial solution where the network learns to map all inputs directly onto the centroid $\mathbf{c}$.
|
||||||
@@ -533,12 +533,12 @@ To create this mapping, we leveraged the available measurement indices and chann
|
|||||||
|
|
||||||
Figure~\ref{fig:data_projections} displays two examples of \rev{LiDAR} point cloud projections to aid in the reader’s understanding. Although the original point clouds were converted into grayscale images with a resolution of 2048×32 pixels, these raw images can be challenging to interpret. To enhance human readability, we applied the viridis colormap and vertically stretched the images so that each measurement occupies multiple pixels in height. The top projection is derived from a scan without artificial smoke—and therefore minimal degradation—while the lower projection comes from an experiment where artificial smoke introduced significant degradation.
|
Figure~\ref{fig:data_projections} displays two examples of \rev{LiDAR} point cloud projections to aid in the reader’s understanding. Although the original point clouds were converted into grayscale images with a resolution of 2048×32 pixels, these raw images can be challenging to interpret. To enhance human readability, we applied the viridis colormap and vertically stretched the images so that each measurement occupies multiple pixels in height. The top projection is derived from a scan without artificial smoke—and therefore minimal degradation—while the lower projection comes from an experiment where artificial smoke introduced significant degradation.
|
||||||
|
|
||||||
\fig{data_projections}{figures/data_2d_projections.png}{Two-dimensional projections of two pointclouds, one from an experiment without degradation and one from an experiment with artifical smoke as degradation. To aid the readers perception, the images are vertically stretched and a colormap has been applied to the pixels' reciprocal range values, while the actual training data is grayscale.}
|
\fig{data_projections}{figures/data_2d_projections.png}{Two-dimensional projections of two point clouds, one from an experiment without degradation and one from an experiment with artifical smoke as degradation. To aid the readers perception, the images are vertically stretched and a colormap has been applied to the pixels' reciprocal range values, while the actual training data is grayscale.}
|
||||||
|
|
||||||
|
|
||||||
The remaining challenge, was labeling a large enough portion of the dataset in a reasonably accurate manner, whose difficulties and general approach we described in \rev{Section}~\ref{sec:data_req}. Since, to our knowledge, neither our chosen dataset nor any other publicly available one provide objective labels for \rev{LiDAR} data degradation in the SAR domain, we had to define our own labeling approach. With objective measures of degradation unavailable, we explored alternative labeling methods—such as using \rev{the statistical} properties like the number of missing measurements per point cloud or the higher incidence of erroneous measurements near the sensor we described in \rev{Section~\ref{sec:data_dataset}}. Ultimately, we were concerned that these statistical approaches might lead the method to simply mimic the statistical evaluation rather than to quantify degradation in a generalized and robust manner. After considering these options, we decided to label all point clouds from experiments with artificial smoke as anomalies, while point clouds from experiments without smoke were labeled as normal data. This labeling strategy—based on the presence or absence of smoke—is fundamentally an environmental indicator, independent of the intrinsic data properties recorded during the experiments.
|
The remaining challenge, was labeling a large enough portion of the dataset in a reasonably accurate manner, whose difficulties and general approach we described in \rev{Section}~\ref{sec:data_req}. Since, to our knowledge, neither our chosen dataset nor any other publicly available one provide objective labels for \rev{LiDAR} data degradation in the SAR domain, we had to define our own labeling approach. With objective measures of degradation unavailable, we explored alternative labeling methods—such as using \rev{the statistical} properties like the number of missing measurements per point cloud or the higher incidence of erroneous measurements near the sensor we described in \rev{Section~\ref{sec:data_dataset}}. Ultimately, we were concerned that these statistical approaches might lead the method to simply mimic the statistical evaluation rather than to quantify degradation in a generalized and robust manner. After considering these options, we decided to label all point clouds from experiments with artificial smoke as anomalies, while point clouds from experiments without smoke were labeled as normal data. This labeling strategy—based on the presence or absence of smoke—is fundamentally an environmental indicator, independent of the intrinsic data properties recorded during the experiments.
|
||||||
|
|
||||||
The simplicity of this labeling approach has both advantages and disadvantages. On the positive side, it is easy to implement and creates a clear distinction between normal and anomalous data. However, its simplicity is also its drawback: some point clouds from experiments with artificial smoke do not exhibit perceptible degradation, yet they are still labeled as anomalies. The reason for this, is that during the three non-static anomalous experiments the sensor platform starts recording in a tunnel roughly 20 meters from the smoke machine's location. It starts by approaching the smoke machine, navigates close to the machine for some time and then leaves its perimeter once again. Since the artificical smoke's density is far larger near the machine it originates from, the time the sensor platform spent close to it produced highly degraded point clouds, whereas the beginnings and ends of the anomalous experiments capture point clouds which are subjectively not degraded and appear similar to ones from the normal experiments. This effect is clearly illustrated by the degradation indicators which we talked about earlier\rev{--}the proportion of missing points and the amount of erroneous points close to the sensor per pointcloud\rev{--}as can be seen in \rev{Figure}~\ref{fig:data_anomalies_timeline}.
|
The simplicity of this labeling approach has both advantages and disadvantages. On the positive side, it is easy to implement and creates a clear distinction between normal and anomalous data. However, its simplicity is also its drawback: some point clouds from experiments with artificial smoke do not exhibit perceptible degradation, yet they are still labeled as anomalies. The reason for this, is that during the three non-static anomalous experiments the sensor platform starts recording in a tunnel roughly 20 meters from the smoke machine's location. It starts by approaching the smoke machine, navigates close to the machine for some time and then leaves its perimeter once again. Since the artificical smoke's density is far larger near the machine it originates from, the time the sensor platform spent close to it produced highly degraded point clouds, whereas the beginnings and ends of the anomalous experiments capture point clouds which are subjectively not degraded and appear similar to ones from the normal experiments. This effect is clearly illustrated by the degradation indicators which we talked about earlier\rev{--}the proportion of missing points and the amount of erroneous points close to the sensor per point cloud\rev{--}as can be seen in \rev{Figure}~\ref{fig:data_anomalies_timeline}.
|
||||||
|
|
||||||
\fig{data_anomalies_timeline}{figures/data_combined_anomalies_timeline.png}{Missing points and points with a measured range smaller than 50cm per point cloud over a normalized timeline of the individual experiments. This illustrates the rise, plateau and fall of degradation intensity during the anomalous experiments, owed to the spacial proximity to the degradation source (smoke machine). One of the normal experiments (without artifical smoke) is included as a baseline \rev{in gray}.}
|
\fig{data_anomalies_timeline}{figures/data_combined_anomalies_timeline.png}{Missing points and points with a measured range smaller than 50cm per point cloud over a normalized timeline of the individual experiments. This illustrates the rise, plateau and fall of degradation intensity during the anomalous experiments, owed to the spacial proximity to the degradation source (smoke machine). One of the normal experiments (without artifical smoke) is included as a baseline \rev{in gray}.}
|
||||||
|
|
||||||
@@ -563,7 +563,7 @@ In the following sections, we detail our adaptations to this framework:
|
|||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item Data integration: preprocessing and loading the dataset \rev{introduced in Chapter~\ref{chp:data_preprocessing}}.
|
\item Data integration: preprocessing and loading the dataset \rev{introduced in Chapter~\ref{chp:data_preprocessing}}.
|
||||||
\item Model architecture: configuring DeepSAD’s encoder to match our pointcloud input format, contrasting two distinct neural network architectures to investigate their impact on the method's output.
|
\item Model architecture: configuring DeepSAD’s encoder to match our point cloud input format, contrasting two distinct neural network architectures to investigate their impact on the method's output.
|
||||||
\item Training \& evaluation: training DeepSAD alongside two classical baselines—Isolation Forest and One-class SVM (OCSVM)—and comparing their degradation-quantification performance.
|
\item Training \& evaluation: training DeepSAD alongside two classical baselines—Isolation Forest and One-class SVM (OCSVM)—and comparing their degradation-quantification performance.
|
||||||
\item Experimental environment: the hardware and software stack used, with typical training and inference runtimes.
|
\item Experimental environment: the hardware and software stack used, with typical training and inference runtimes.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|||||||
Reference in New Issue
Block a user