wip
This commit is contained in:
BIN
thesis/Main.pdf
BIN
thesis/Main.pdf
Binary file not shown.
@@ -1175,14 +1175,6 @@ To compare the computational efficiency of the two architectures we show the num
|
|||||||
\label{tab:params_lenet_vs_efficient}
|
\label{tab:params_lenet_vs_efficient}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\todo[inline]{next paragrpah does not work anymroe?}
|
|
||||||
|
|
||||||
As can be seen, the efficient encoder requires an order of magnitude fewer parameters and significantly fewer operations while maintaining a comparable representational capacity. The key reason is the use of depth–wise separable convolutions, aggressive pooling along the densely sampled horizontal axis, and a channel squeezing strategy before the fully connected layer. Interestingly, the Efficient network also processes more intermediate channels (up to 32 compared to only 8 in the LeNet variant), which increases its ability to capture a richer set of patterns despite the reduced computational cost. This combination of efficiency and representational power makes the Efficient encoder a more suitable backbone for our anomaly detection task.
|
|
||||||
|
|
||||||
\todo[inline]{mention that as we see in AE results the efficient arch is capable of reproducing inputs better and especially so in lower dimensional latent spaces}
|
|
||||||
|
|
||||||
\threadtodo
|
\threadtodo
|
||||||
{how was training/testing adapted (networks overview), inference, ae tuning}
|
{how was training/testing adapted (networks overview), inference, ae tuning}
|
||||||
{data has been loaded, how is it processed}
|
{data has been loaded, how is it processed}
|
||||||
@@ -1227,7 +1219,7 @@ The boundary itself is learned using the support vector machine framework. In es
|
|||||||
|
|
||||||
During training, the algorithm balances two competing objectives: capturing as many of the normal samples as possible inside the boundary, while keeping the region compact enough to exclude potential outliers. Once this boundary is established, applying OCSVM is straightforward — any new data point is checked against the learned boundary, with points inside considered normal and those outside flagged as anomalous.
|
During training, the algorithm balances two competing objectives: capturing as many of the normal samples as possible inside the boundary, while keeping the region compact enough to exclude potential outliers. Once this boundary is established, applying OCSVM is straightforward — any new data point is checked against the learned boundary, with points inside considered normal and those outside flagged as anomalous.
|
||||||
|
|
||||||
We adapted the baseline implementations to our data loader and input format \todo[inline]{briefly describe file layout / preprocessing}, and added support for multiple evaluation targets per frame (two labels per data point), reporting both results per experiment. For OCSVM, the dimensionality reduction step is \emph{always} performed with the corresponding DeepSAD encoder and its autoencoder pretraining weights that match the evaluated setting (i.e., same latent size and backbone). Both baselines, like DeepSAD, output continuous anomaly scores. This allows us to evaluate them directly without committing to a fixed threshold.
|
We adapted the baseline implementations to our data loader and input format, and added support for multiple evaluation targets per frame (two labels per data point), reporting both results per experiment. For OCSVM, the dimensionality reduction step is \emph{always} performed with the corresponding DeepSAD encoder and its autoencoder pretraining weights that match the evaluated setting (i.e., same latent size and backbone). Both baselines, like DeepSAD, output continuous anomaly scores. This allows us to evaluate them directly without committing to a fixed threshold.
|
||||||
|
|
||||||
\newsection{setup_experiments_environment}{Experiment Overview \& Computational Environment}
|
\newsection{setup_experiments_environment}{Experiment Overview \& Computational Environment}
|
||||||
|
|
||||||
@@ -1669,16 +1661,21 @@ Representative precision–recall curves illustrate how methods differ in their
|
|||||||
|
|
||||||
\fig{prc_representative}{figures/results_prc.png}{Representative precision–recall curves over all latent dimensionalities for semi-labeling regime 0/0 from experiment-based evaluation labels. DeepSAD maintains a large high-precision operating region before collapsing; OC-SVM declines smoother but exhibits high standard deviation between folds; IsoForest collapses quickly and remains flat. DeepSAD's fall-off is at least partly due to known mislabeled evaluation targets.}
|
\fig{prc_representative}{figures/results_prc.png}{Representative precision–recall curves over all latent dimensionalities for semi-labeling regime 0/0 from experiment-based evaluation labels. DeepSAD maintains a large high-precision operating region before collapsing; OC-SVM declines smoother but exhibits high standard deviation between folds; IsoForest collapses quickly and remains flat. DeepSAD's fall-off is at least partly due to known mislabeled evaluation targets.}
|
||||||
|
|
||||||
\newsection{results_latent}{Effect of latent space dimensionality}
|
|
||||||
Figure~\ref{fig:latent_dim_ap} plots AP versus latent dimension under the experiment-based evaluation. DeepSAD benefits from compact latent spaces (e.g., 32–128), with diminishing or negative returns at larger codes. Baseline methods are largely flat across dimensions, reflecting their reliance on fixed embeddings. (Hand-labeled results saturate and are shown in the appendix.)
|
%\newsection{results_latent}{Effect of latent space dimensionality}
|
||||||
|
%Figure~\ref{fig:latent_dim_ap} plots AP versus latent dimension under the experiment-based evaluation. DeepSAD benefits from compact latent spaces (e.g., 32–128), with diminishing or negative returns at larger codes. We argue that the most likely reason for the declining performance with increasing latent space size is due to the network learning
|
||||||
|
|
||||||
|
Figure~\ref{fig:latent_dim_ap} plots AP versus latent dimension under the experiment-based evaluation. DeepSAD benefits most from compact latent spaces (e.g., 32–128), with diminishing or even negative returns at larger code sizes. We argue that two interacting effects likely explain this trend. First, higher-dimensional latent spaces increase model capacity and reduce the implicit regularization provided by smaller bottlenecks, leading to overfitting. Second, as illustrated by the representative PRC curves in Figure~\ref{fig:prc_representative}, DeepSAD exhibits a steep decline in precision once recall exceeds roughly 0.5. We attribute this effect primarily to mislabeled or ambiguous samples in the experiment-based evaluation: once the model is forced to classify these borderline cases, precision inevitably drops. Importantly, while such a drop is visible across all latent dimensions, its sharpness increases with latent size. At small dimensions (e.g., 32), the decline is noticeable but somewhat gradual, whereas at 1024 it becomes nearly vertical. This suggests that larger latent spaces exacerbate the difficulty of distinguishing borderline anomalies from normal data, leading to more abrupt collapses in precision once the high-confidence region is exhausted.
|
||||||
|
|
||||||
\fig{latent_dim_ap}{figures/results_ap_over_latent.png}{AP as a function of latent dimension (experiment-based evaluation). DeepSAD shows inverse correlation between AP and latent space size.}
|
\fig{latent_dim_ap}{figures/results_ap_over_latent.png}{AP as a function of latent dimension (experiment-based evaluation). DeepSAD shows inverse correlation between AP and latent space size.}
|
||||||
|
|
||||||
\newsection{results_semi}{Effect of semi-supervised labeling regime}
|
%\newsection{results_semi}{Effect of semi-supervised labeling regime}
|
||||||
Refering back to the results in table~\ref{tab:results_ap} compares AP across labeling regimes (0/0, 50/10, 500/100). Surprisingly, the unsupervised regime (0/0) often performs best; adding labels does not consistently help, likely due to label noise and the scarcity/ambiguity of anomalous labels. Baselines (which do not use labels) are stable across regimes.
|
Refering back to the results in table~\ref{tab:results_ap} compares AP across labeling regimes (0/0, 50/10, 500/100). Surprisingly, the unsupervised regime (0/0) often performs best; adding labels does not consistently help, likely due to label noise and the scarcity/ambiguity of anomalous labels. Baselines (which do not use labels) are stable across regimes.
|
||||||
|
|
||||||
|
\todo[inline]{rework this discussion of semi-supervised labeling and how it affected our results}
|
||||||
|
|
||||||
% --- Section: Autoencoder Pretraining Results ---
|
% --- Section: Autoencoder Pretraining Results ---
|
||||||
\newsection{results_inference}{Autoencoder Pretraining Results}
|
\newsection{results_inference}{Inference on Held-Out Experiments}
|
||||||
|
|
||||||
In addition to the evaluation of average precision and precision--recall curves obtained from $k$-fold cross-validation with varying hyperparameters, we also examine the behavior of the fully trained methods when applied to previously unseen, held-out experiments.
|
In addition to the evaluation of average precision and precision--recall curves obtained from $k$-fold cross-validation with varying hyperparameters, we also examine the behavior of the fully trained methods when applied to previously unseen, held-out experiments.
|
||||||
While the prior analysis provided valuable insights into the classification capabilities of the methods, it was limited by two factors: first, the binary ground-truth labels were of uneven quality due to aforementioned mislabeling of frames, and second, the binary formulation does not reflect our overarching goal of quantifying sensor degradation on a continuous scale.
|
While the prior analysis provided valuable insights into the classification capabilities of the methods, it was limited by two factors: first, the binary ground-truth labels were of uneven quality due to aforementioned mislabeling of frames, and second, the binary formulation does not reflect our overarching goal of quantifying sensor degradation on a continuous scale.
|
||||||
@@ -1689,16 +1686,16 @@ These frame-by-frame time-axis plots simulate online inference and illustrate ho
|
|||||||
\fig{results_inference_normal_vs_degraded}{figures/results_inference_normal_vs_degraded.png}{Comparison of anomaly detection methods with statistical indicators across clean (dashed) and degraded (solid) experiments. Each subplot shows one method (DeepSAD--LeNet, DeepSAD--Efficient, OCSVM, Isolation Forest). Red curves denote method anomaly scores normalized to the clean experiment; blue and green curves denote the percentage of missing LiDAR points and near-sensor particle hits, respectively. Clear separation between clean and degraded runs is observed for the DeepSAD variants and, to a lesser degree, for OCSVM, while Isolation Forest produces high scores even in the clean experiment. Latent Space Dimensionality was 32 and semi-supervised labeling regime was 0 normal and 0 anomalous samples during training.}
|
\fig{results_inference_normal_vs_degraded}{figures/results_inference_normal_vs_degraded.png}{Comparison of anomaly detection methods with statistical indicators across clean (dashed) and degraded (solid) experiments. Each subplot shows one method (DeepSAD--LeNet, DeepSAD--Efficient, OCSVM, Isolation Forest). Red curves denote method anomaly scores normalized to the clean experiment; blue and green curves denote the percentage of missing LiDAR points and near-sensor particle hits, respectively. Clear separation between clean and degraded runs is observed for the DeepSAD variants and, to a lesser degree, for OCSVM, while Isolation Forest produces high scores even in the clean experiment. Latent Space Dimensionality was 32 and semi-supervised labeling regime was 0 normal and 0 anomalous samples during training.}
|
||||||
|
|
||||||
The plots in Fig.~\ref{fig:results_inference_normal_vs_degraded} highlight important differences in how well the tested methods distinguish between normal and degraded sensor conditions.
|
The plots in Fig.~\ref{fig:results_inference_normal_vs_degraded} highlight important differences in how well the tested methods distinguish between normal and degraded sensor conditions.
|
||||||
Among the four approaches, the strongest separation is achieved by \textbf{DeepSAD (Efficient)}, followed by \textbf{DeepSAD (LeNet)}, then \textbf{OCSVM}.
|
Among the four approaches, the strongest separation is achieved by DeepSAD (Efficient), followed by DeepSAD (LeNet), then OCSVM.
|
||||||
For \textbf{Isolation Forest}, the anomaly scores are already elevated in the clean experiment, which prevents reliable differentiation between normal and degraded runs and makes the method unsuitable in this context.
|
For Isolation Forest, the anomaly scores are already elevated in the clean experiment, which prevents reliable differentiation between normal and degraded runs and makes the method unsuitable in this context.
|
||||||
It is important to note that the score axes are scaled individually per method, so comparisons should focus on relative separation rather than absolute values.
|
It is important to note that the score axes are scaled individually per method, so comparisons should focus on relative separation rather than absolute values.
|
||||||
|
|
||||||
Because the raw anomaly scores produced by the different methods are on incomparable scales (depending, for example, on network architecture or latent space dimensionality), we first applied a \textbf{$z$-score normalization}.
|
Because the raw anomaly scores produced by the different methods are on incomparable scales (depending, for example, on network architecture or latent space dimensionality), we first applied a $z$-score normalization.
|
||||||
The $z$-score is a standardized measure that rescales values in terms of their deviation from the mean relative to the standard deviation, making outputs from different models directly comparable in terms of how many standard deviations they deviate from normal behavior.
|
The $z$-score is a standardized measure that rescales values in terms of their deviation from the mean relative to the standard deviation, making outputs from different models directly comparable in terms of how many standard deviations they deviate from normal behavior.
|
||||||
To allow comparison between the clean and degraded experiments, the mean and standard deviation were estimated exclusively from the clean experiment and then used to normalize the degraded scores as well.
|
To allow comparison between the clean and degraded experiments, the mean and standard deviation were estimated exclusively from the clean experiment and then used to normalize the degraded scores as well.
|
||||||
This ensures that increases in the degraded runs are interpreted relative to the distribution of the clean baseline, whereas computing separate $z$-scores per experiment would only reveal deviations within each run individually and not enable a meaningful cross-experiment comparison.
|
This ensures that increases in the degraded runs are interpreted relative to the distribution of the clean baseline, whereas computing separate $z$-scores per experiment would only reveal deviations within each run individually and not enable a meaningful cross-experiment comparison.
|
||||||
It should be noted that the $z$-scores remain \emph{method-specific}, meaning that while relative separation between clean and degraded runs can be compared within a method, the absolute scales across different methods are not directly comparable; readers should therefore take note of the differing axis ranges for each subplot.
|
It should be noted that the $z$-scores remain method-specific, meaning that while relative separation between clean and degraded runs can be compared within a method, the absolute scales across different methods are not directly comparable; readers should therefore take note of the differing axis ranges for each subplot.
|
||||||
After normalization, the resulting time series were still highly noisy, which motivated the application of \textbf{exponential moving average (EMA) smoothing}.
|
After normalization, the resulting time series were still highly noisy, which motivated the application of exponential moving average (EMA) smoothing.
|
||||||
EMA was chosen because it is causal (does not rely on future data) and thus suitable for real-time inference.
|
EMA was chosen because it is causal (does not rely on future data) and thus suitable for real-time inference.
|
||||||
Although it introduces a small time delay, this delay is shorter than for other smoothing techniques such as running averages.
|
Although it introduces a small time delay, this delay is shorter than for other smoothing techniques such as running averages.
|
||||||
|
|
||||||
@@ -1738,7 +1735,6 @@ The main contributions of this thesis can be summarized as follows:
|
|||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item \textbf{Empirical evaluation:} A systematic comparison of DeepSAD against Isolation Forest and OC-SVM for lidar degradation detection, demonstrating that DeepSAD consistently outperforms simpler baselines.
|
\item \textbf{Empirical evaluation:} A systematic comparison of DeepSAD against Isolation Forest and OC-SVM for lidar degradation detection, demonstrating that DeepSAD consistently outperforms simpler baselines.
|
||||||
\item \textbf{Analysis of latent dimensionality:} An investigation of how representation size influences performance and stability under noisy labels, revealing that smaller latent spaces are more robust in this setting.
|
\item \textbf{Analysis of latent dimensionality:} An investigation of how representation size influences performance and stability under noisy labels, revealing that smaller latent spaces are more robust in this setting.
|
||||||
\item \textbf{Analysis of semi-supervised training labels:} An investigation of how representation size influences performance and stability under noisy labels, revealing that smaller latent spaces are more robust in this setting.
|
|
||||||
\item \textbf{Analysis of semi-supervised training labels:} An evaluation of different semi-supervised labeling regimes, showing that in our case purely unsupervised training yielded the best performance. Adding a small number of labels reduced performance, while a higher ratio of labels led to partial recovery. This pattern may indicate overfitting effects, although interpretation is complicated by the presence of mislabeled evaluation targets.
|
\item \textbf{Analysis of semi-supervised training labels:} An evaluation of different semi-supervised labeling regimes, showing that in our case purely unsupervised training yielded the best performance. Adding a small number of labels reduced performance, while a higher ratio of labels led to partial recovery. This pattern may indicate overfitting effects, although interpretation is complicated by the presence of mislabeled evaluation targets.
|
||||||
\item \textbf{Analysis of encoder architecture:} A comparison between a LeNet-inspired and an Efficient encoder showed that the choice of architecture has a decisive influence on DeepSAD’s performance. The Efficient encoder outperformed the LeNet-inspired baseline not only during autoencoder pretraining but also in anomaly detection. While the exact magnitude of this improvement is difficult to quantify due to noisy evaluation targets, the results underline the importance of encoder design for representation quality in DeepSAD.
|
\item \textbf{Analysis of encoder architecture:} A comparison between a LeNet-inspired and an Efficient encoder showed that the choice of architecture has a decisive influence on DeepSAD’s performance. The Efficient encoder outperformed the LeNet-inspired baseline not only during autoencoder pretraining but also in anomaly detection. While the exact magnitude of this improvement is difficult to quantify due to noisy evaluation targets, the results underline the importance of encoder design for representation quality in DeepSAD.
|
||||||
\item \textbf{Feasibility study:} An exploration of runtime, temporal inference plots, and downstream applicability, indicating that anomaly scores correlate with degradation trends and could provide a foundation for future quantification methods.
|
\item \textbf{Feasibility study:} An exploration of runtime, temporal inference plots, and downstream applicability, indicating that anomaly scores correlate with degradation trends and could provide a foundation for future quantification methods.
|
||||||
@@ -1765,7 +1761,7 @@ Finally, the binary ground truth employed here is insufficient for the quantific
|
|||||||
|
|
||||||
\newsection{conclusion_ad}{Insights into DeepSAD and AD for Degradation Quantification}
|
\newsection{conclusion_ad}{Insights into DeepSAD and AD for Degradation Quantification}
|
||||||
|
|
||||||
This work has shown that the DeepSAD principle is applicable to lidar degradation data and yields promising performance both in terms of accuracy and runtime feasibility (cf.~\ref{sec:setup_experiments_environment}). Compared to simple baselines such as Isolation Forest and OC-SVM, DeepSAD achieves significantly better discrimination of degraded frames. However, in our experiments the semi-supervised component of DeepSAD did not lead to measurable improvements, which may be attributable to the noisy evaluation targets (cf.~\ref{sec:results_semi}).
|
This work has shown that the DeepSAD principle is applicable to lidar degradation data and yields promising performance both in terms of accuracy and runtime feasibility (see section~\ref{sec:setup_experiments_environment}). Compared to simple baselines such as Isolation Forest and OC-SVM, DeepSAD achieves significantly better discrimination of degraded frames. However, in our experiments the semi-supervised component of DeepSAD did not lead to measurable improvements, which may be attributable to the noisy evaluation targets (see section~\ref{sec:results_deepsad}).
|
||||||
|
|
||||||
We also observed that the choice of encoder architecture is critical. As discussed in Section~\ref{sec:results_deepsad}, the Efficient architecture consistently outperformed the LeNet-inspired baseline in pretraining and contributed to stronger downstream performance. The influence of encoder design on DeepSAD training merits further study under cleaner evaluation conditions. In particular, benchmarking different encoder architectures on datasets with high-quality ground truth could clarify how much of DeepSAD’s performance gain stems from representation quality versus optimization.
|
We also observed that the choice of encoder architecture is critical. As discussed in Section~\ref{sec:results_deepsad}, the Efficient architecture consistently outperformed the LeNet-inspired baseline in pretraining and contributed to stronger downstream performance. The influence of encoder design on DeepSAD training merits further study under cleaner evaluation conditions. In particular, benchmarking different encoder architectures on datasets with high-quality ground truth could clarify how much of DeepSAD’s performance gain stems from representation quality versus optimization.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user