grammarly part 1 done
This commit is contained in:
@@ -250,25 +250,25 @@ Because anomalies are, by nature, often unpredictable in form and structure, uns
|
||||
|
||||
Anomaly detection refers to the process of detecting unexpected patterns of data, outliers that deviate significantly from the majority of data, which is implicitly defined as normal by its prevalence. In classic statistical analysis, these techniques have been studied as early as the 19th century~\cite{anomaly_detection_history}. Since then, a multitude of methods and use cases for them have been proposed and studied. Examples of applications include healthcare, where computer vision algorithms are used to detect anomalies in medical images for diagnostics and early detection of diseases~\cite{anomaly_detection_medical}, detection of fraud in decentralized financial systems based on blockchain technology~\cite{anomaly_detection_defi}, as well as fault detection in industrial machinery using acoustic sound data~\cite{anomaly_detection_manufacturing}.
|
||||
|
||||
Figure~\ref{fig:anomaly_detection_overview} depicts a simple but illustrative example of data that can be classified as either normal or anomalous and shows the problem anomaly detection methods try to generally solve. A successful anomaly detection method would somehow learn to differentiate normal from anomalous data, for example, by learning the boundaries around the available normal data and classifying it as either normal or anomalous based on its location inside or outside of those boundaries. Another possible approach could calculate an analog value that correlates with the likelihood of a sample being anomalous, for example, by using the sample's distance from the closest normal data cluster's center.
|
||||
Figure~\ref{fig:anomaly_detection_overview} depicts a simple but illustrative example of data that can be classified as either normal or anomalous and shows the problem that anomaly detection methods try to generally solve. A successful anomaly detection method would somehow learn to differentiate normal from anomalous data, for example, by learning the boundaries around the available normal data and classifying it as either normal or anomalous based on its location inside or outside of those boundaries. Another possible approach could calculate an analog value that correlates with the likelihood of a sample being anomalous, for example, by using the sample's distance from the closest normal data cluster's center.
|
||||
|
||||
\figc{anomaly_detection_overview}{figures/anomaly_detection_overview}{An illustrative example of anomalous and normal data containing 2-dimensional data with clusters of normal data $N_1$ and $N_2$ as well as two single anomalies $o_1$ and $o_2$ and a cluster of anomalies $O_3$. Reproduced from~\cite{anomaly_detection_survey}\rev{.}}{width=0.5\textwidth}
|
||||
\figc{anomaly_detection_overview}{figures/anomaly_detection_overview}{An illustrative example of anomalous and normal data containing 2-dimensional data with clusters of normal data $N_1$ and $N_2$, as well as two single anomalies $o_1$ and $o_2$ and a cluster of anomalies $O_3$. Reproduced from~\cite{anomaly_detection_survey}\rev{.}}{width=0.5\textwidth}
|
||||
|
||||
By their very nature, anomalies are rare occurrences and oftentimes unpredictable in nature, which makes it hard to define all possible anomalies in any system. It also makes it very challenging to create an algorithm that is capable of detecting anomalies that may have never occurred before and may not have been known to exist during the creation of the detection algorithm. There are many possible approaches to this problem, though they can be roughly grouped into six distinct categories based on the techniques used~\cite{anomaly_detection_survey}:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \textbf{Classification Based} \\ A classification technique, such as \rev{Support Vector Machine (SVM)~\cite{bg_svm}}, is used to classify samples as either normal or anomalous based on labeled training data. Alternatively, if not enough labeled training data is available, a one-class classification algorithm can be employed. In that case, the algorithm assumes all training samples to be normal and then learns a boundary around the normal samples to differentiate them from anomalous samples.
|
||||
\item \textbf{Clustering Based} \\ Clustering techniques such as \rev{K-Means~\cite{bg_kmeans}} or DBSCAN\rev{~\cite{bg_dbscan}} aim to group similar \rev{data into} clusters, differentiating it from dissimilar data, which may belong to another cluster or no cluster at all. Anomaly detection methods from this category employ such a technique, with the assumption that normal data will assemble into one or more clusters due to their similar properties, while anomalies may create their own smaller clusters, not \rev{belonging} to any cluster at all, or at least be \rev{at} an appreciable distance from the closest normal cluster's center.
|
||||
\item \textbf{Nearest Neighbor Based} \\ Similar to the clustering based category, these techniques assume normal data is more closely clustered than anomalies and therefore utilize either a sample's distance to its $k^{th}$ nearest neighbor or the density of its local neighborhood to judge whether a sample is anomalous.
|
||||
\item \textbf{Classification-Based} \\ A classification technique, such as \rev{Support Vector Machine (SVM)~\cite{bg_svm}}, is used to classify samples as either normal or anomalous based on labeled training data. Alternatively, if not enough labeled training data is available, a one-class classification algorithm can be employed. In that case, the algorithm assumes all training samples to be normal and then learns a boundary around the normal samples to differentiate them from anomalous samples.
|
||||
\item \textbf{Clustering-Based} \\ Clustering techniques such as \rev{K-Means~\cite{bg_kmeans}} or DBSCAN\rev{~\cite{bg_dbscan}} aim to group similar \rev{data into} clusters, differentiating it from dissimilar data, which may belong to another cluster or no cluster at all. Anomaly detection methods from this category employ such a technique, with the assumption that normal data will assemble into one or more clusters due to their similar properties, while anomalies may create their own smaller clusters, not \rev{belonging} to any cluster at all, or at least be \rev{at} an appreciable distance from the closest normal cluster's center.
|
||||
\item \textbf{Nearest Neighbor Based} \\ Similar to the clustering-based category, these techniques assume normal data is more closely clustered than anomalies and therefore utilize either a sample's distance to its $k^{th}$ nearest neighbor or the density of its local neighborhood to judge whether a sample is anomalous.
|
||||
\item \textbf{Statistical} \\ These methods try to fit a statistical model of the normal behavior to the data. After the distribution from which normal data originates is defined, samples can be found to be normal or anomalous based on their likelihood \rev{of arising from that} distribution.
|
||||
\item \textbf{Information Theoretic} \\ The main assumption for information theoretic anomaly detection methods is that anomalies differ somehow in their information content from anomalous data. An information theoretic measure is therefore used to determine \rev{irregularities} in the data's information content, enabling the detection of anomalous samples.
|
||||
\item \textbf{Information-Theoretic} \\ The main assumption for information-theoretic anomaly detection methods is that anomalies differ somehow in their information content from anomalous data. An information-theoretic measure is therefore used to determine \rev{irregularities} in the data's information content, enabling the detection of anomalous samples.
|
||||
\item \textbf{Spectral} \\ Spectral approaches assume the possibility of mapping data into a lower-dimensional space, where normal data appears significantly different from anomalous data. To this end, a dimensionality reduction technique such as Principal Component Analysis (PCA)\rev{~\cite{bg_pca}} is used to embed the data into a lower-dimensional \rev{subspace. Spectral} methods are often used as a pre-processing step followed by another anomaly detection method operating on the data's subspace.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
In this thesis, we used an anomaly detection method, namely \citetitle{deepsad}\rev{~(DeepSAD)~\cite{deepsad}}, to model our problem\rev{—}how to quantify the degradation of \rev{LiDAR} sensor data\rev{—}as an anomaly detection problem. We do this by classifying good-quality data as normal and degraded data as anomalous, and rely on a method that can express each sample's likelihood of being anomalous as an analog anomaly score, which enables us to interpret it as the \rev{data} degradation quantification value.
|
||||
|
||||
Chapter~\ref{chp:deepsad} describes DeepSAD in more detail, which shows that it is a clustering based approach with a spectral pre-processing component, in that it uses a neural network to reduce the input's dimensionality while simultaneously clustering normal data closely around a given centroid. It then produces an anomaly score by calculating the geometric distance between a data sample and the aforementioned cluster centroid, assuming the distance is shorter for normal than for anomalous data. Since our data is high-dimensional, it makes sense to use a spectral method to reduce \rev{its} dimensionality. \rev{Moreover} reporting an analog value rather than a binary classification is useful for our use case since we want to quantify not only classify the data degradation.
|
||||
Chapter~\ref{chp:deepsad} describes DeepSAD in more detail, which shows that it is a clustering-based approach with a spectral pre-processing component, in that it uses a neural network to reduce the input's dimensionality while simultaneously clustering normal data closely around a given centroid. It then produces an anomaly score by calculating the geometric distance between a data sample and the aforementioned cluster centroid, assuming the distance is shorter for normal than for anomalous data. Since our data is high-dimensional, it makes sense to use a spectral method to reduce \rev{its} dimensionality. \rev{Moreover}, reporting an analog value rather than a binary classification is useful for our use case since we want to quantify not only classify the data degradation.
|
||||
|
||||
There is a wide \rev{set} of problems in domains similar to the one we research in this \rev{thesis}, for which modeling them as anomaly detection problems has been proven successful. The degradation of point clouds, produced by an industrial 3D sensor, has been modeled as an anomaly detection task in \rev{\cite{bg_ad_pointclouds_scans}}. \citeauthor{bg_ad_pointclouds_scans} propose a student-teacher model capable of inferring a pointwise anomaly score for degradation in point clouds. The teacher network is trained on an anomaly-free dataset to extract dense features of the point clouds' local geometries, after which an identical student network is trained to emulate the teacher network's outputs. For degraded point clouds, the regression between the teacher's and student's outputs is calculated and interpreted as the anomaly score, with the rationalization that the student network has not observed features produced by anomalous geometries during training, leaving it incapable of producing a similar output as the teacher for those regions. Another example would be \rev{\cite{bg_ad_pointclouds_poles}}, which proposes a method to detect and classify pole-like objects in urban point cloud data, to differentiate between natural and man-made objects such as street signs, for autonomous driving purposes. An anomaly detection method was used to identify the vertical pole-like objects in the point clouds, and then the preprocessed objects were grouped by similarity using a clustering algorithm to classify them as either trees or man-made poles.
|
||||
|
||||
@@ -304,12 +304,12 @@ In reinforcement learning, an agent learns by trial and error while interacting
|
||||
|
||||
Semi-supervised learning algorithms are an \rev{in-between} category of supervised and unsupervised algorithms, in that they use a mixture of labeled and unlabeled data. Typically, vastly more unlabeled data is used during training of such algorithms than labeled data, due to the effort and expertise required to label large quantities of data correctly. Semi-supervised methods are often an effort to improve a machine learning algorithm belonging to either the supervised or unsupervised category. Supervised methods, such as classification tasks, are enhanced by using large amounts of unlabeled data to augment the supervised training without the need for additional labeling work. Alternatively, unsupervised methods like clustering algorithms may not only use unlabeled data but also improve their performance by considering some hand-labeled data during training.
|
||||
|
||||
Machine learning based anomaly detection methods can utilize techniques from all of the aforementioned categories, although their suitability varies. While supervised anomaly detection methods exist, their usability not only depends on the availability of labeled training data but also on a reasonable proportionality between normal and anomalous data. Both requirements can be challenging due to labeling often being labor-intensive and anomalies' intrinsic property to occur rarely when compared to normal data, making the capture of enough anomalous behavior a hard problem. Semi-supervised anomaly detection methods are of special interest in that they may overcome these difficulties inherently present in many anomaly detection tasks~\cite{semi_ad_survey}. These methods typically have the same goal as unsupervised anomaly detection methods which is to model the normal class behavior and delimitate it from anomalies, but they can incorporate some hand-labeled examples of normal and/or anomalous behavior to improve their performance over fully unsupervised methods. DeepSAD is a semi-supervised method that extends its unsupervised predecessor Deep SVDD~\cite{deep_svdd} by including some labeled samples during training. Both DeepSAD and Deep SVDD also utilize an autoencoder in a pretraining step, a machine learning architecture\rev{, which we will look at next}.
|
||||
Machine learning based anomaly detection methods can utilize techniques from all of the aforementioned categories, although their suitability varies. While supervised anomaly detection methods exist, their usability not only depends on the availability of labeled training data but also on a reasonable proportionality between normal and anomalous data. Both requirements can be challenging due to labeling often being labor-intensive and anomalies' intrinsic property to occur rarely when compared to normal data, making the capture of enough anomalous behavior a hard problem. Semi-supervised anomaly detection methods are of special interest in that they may overcome these difficulties inherently present in many anomaly detection tasks~\cite{semi_ad_survey}. These methods typically have the same goal as unsupervised anomaly detection methods, which is to model the normal class behavior and delimitate it from anomalies, but they can incorporate some hand-labeled examples of normal and/or anomalous behavior to improve their performance over fully unsupervised methods. DeepSAD is a semi-supervised method that extends its unsupervised predecessor Deep SVDD~\cite{deep_svdd} by including some labeled samples during training. Both DeepSAD and Deep SVDD also utilize an autoencoder in a pretraining step, a machine learning architecture\rev{, which we will look at next}.
|
||||
|
||||
\newsection{autoencoder}{Autoencoder}
|
||||
|
||||
|
||||
Autoencoders are a type of neural network architecture whose main goal is learning to encode input data into a representative state, from which the same input can be reconstructed, hence the name. They typically consist of two functions, an encoder and a decoder with a latent space \rev{in between} them as depicted in the toy example in \rev{Figure}~\ref{fig:autoencoder_general}. The encoder learns to extract the most significant features from the input and to convert them into the input's latent space representation. The reconstruction goal ensures that the most prominent features of the input are retained during the encoding phase, due to the inherent inability to reconstruct the input if too much relevant information is missing. The decoder simultaneously learns to reconstruct the original input from its encoded latent space representation by minimizing the error between the input sample and the autoencoder's output. This optimization goal complicates the categorization of autoencoders as unsupervised methods. Although they do not require labeled data, they still compute an error against a known target—the input itself. For this reason, some authors describe them as a form of self-supervised learning, where the data provides its own supervisory signal without requiring expert labeling.
|
||||
Autoencoders are a type of neural network architecture whose main goal is learning to encode input data into a representative state, from which the same input can be reconstructed, hence the name. They typically consist of two functions, an encoder and a decoder with a latent space \rev{in between} them, as depicted in the toy example in \rev{Figure}~\ref{fig:autoencoder_general}. The encoder learns to extract the most significant features from the input and to convert them into the input's latent space representation. The reconstruction goal ensures that the most prominent features of the input are retained during the encoding phase, due to the inherent inability to reconstruct the input if too much relevant information is missing. The decoder simultaneously learns to reconstruct the original input from its encoded latent space representation by minimizing the error between the input sample and the autoencoder's output. This optimization goal complicates the categorization of autoencoders as unsupervised methods. Although they do not require labeled data, they still compute an error against a known target—the input itself. For this reason, some authors describe them as a form of self-supervised learning, where the data provides its own supervisory signal without requiring expert labeling.
|
||||
|
||||
\fig{autoencoder_general}{figures/autoencoder_principle.png}{Illustration of an autoencoder’s working principle. The encoder $\mathbf{g_\phi}$ compresses the input into a lower-dimensional bottleneck representation $\mathbf{z}$, which is then reconstructed by the decoder $\mathbf{f_\theta}$. During training, the difference between input and output serves as the loss signal to optimize both the encoder’s feature extraction and the decoder’s reconstruction. Reproduced from~\cite{ml_autoencoder_figure_source}.
|
||||
}
|
||||
@@ -367,7 +367,7 @@ DeepSAD's full training and inference procedure is visualized in \rev{Figure}~\r
|
||||
|
||||
\newsection{algorithm_details}{Algorithm Details and Hyperparameters}
|
||||
|
||||
Since DeepSAD is heavily based on its predecessor \rev{Deep SVDD}~\cite{deep_svdd} it is helpful to first understand Deep SVDD's optimization objective, so we start with explaining it here. For input space $\mathcal{X} \subseteq \mathbb{R}^D$, output space $\mathcal{Z} \subseteq \mathbb{R}^d$, and a neural network $\phi(\wc; \mathcal{W}) : \mathcal{X} \to \mathcal{Z}$, where $\mathcal{W}$ depicts the neural network's weights with $L$ layers $\{\mathbf{W}_1, \dots, \mathbf{W}_L\}$, $n$ the number of unlabeled training samples $\{\mathbf{x}_1, \dots, \mathbf{x}_n\}$, $\mathbf{c}$ the center of the hypersphere in the latent space, Deep SVDD teaches the neural network to cluster normal data closely together in the latent space by defining its optimization objective as \rev{follows.}
|
||||
Since DeepSAD is heavily based on its predecessor \rev{Deep SVDD}~\cite{deep_svdd}, it is helpful to first understand Deep SVDD's optimization objective, so we start by explaining it here. For input space $\mathcal{X} \subseteq \mathbb{R}^D$, output space $\mathcal{Z} \subseteq \mathbb{R}^d$, and a neural network $\phi(\wc; \mathcal{W}) : \mathcal{X} \to \mathcal{Z}$, where $\mathcal{W}$ depicts the neural network's weights with $L$ layers $\{\mathbf{W}_1, \dots, \mathbf{W}_L\}$, $n$ the number of unlabeled training samples $\{\mathbf{x}_1, \dots, \mathbf{x}_n\}$, $\mathbf{c}$ the center of the hypersphere in the latent space, Deep SVDD teaches the neural network to cluster normal data closely together in the latent space by defining its optimization objective as \rev{follows.}
|
||||
|
||||
\begin{equation}
|
||||
\label{eq:deepsvdd_optimization_objective}
|
||||
|
||||
Reference in New Issue
Block a user