results inference discussion

2025-09-22 09:41:58 +02:00
parent 8e7c210872
commit 9ec73c5992
8 changed files with 234 additions and 107 deletions
--- a/thesis/Main.pdf
+++ b/thesis/Main.pdf
--- a/thesis/Main.tex
+++ b/thesis/Main.tex
@@ -1679,6 +1679,39 @@ Figure~\ref{fig:labeling_regime_ap} compares AP across labeling regimes (0/0, 50

 \fig{labeling_regime_ap}{figures/labeling_regime_ap.png}{AP across semi-supervised labeling regimes. Unsupervised training often performs best; added labels do not yield consistent gains under noisy conditions.}

+% --- Section: Autoencoder Pretraining Results ---
+\newsection{results_inference}{Autoencoder Pretraining Results}
+
+In addition to the evaluation of average precision and precision--recall curves obtained from $k$-fold cross-validation with varying hyperparameters, we also examine the behavior of the fully trained methods when applied to previously unseen, held-out experiments.
+While the prior analysis provided valuable insights into the classification capabilities of the methods, it was limited by two factors: first, the binary ground-truth labels were of uneven quality due to aforementioned mislabeling of frames, and second, the binary formulation does not reflect our overarching goal of quantifying sensor degradation on a continuous scale.
+To provide a more intuitive understanding of how the methods might perform in real-world applications, we therefore present results from running inference sequentially on entire experiments.
+These frame-by-frame time-axis plots simulate online inference and illustrate how anomaly scores evolve as data is captured, thereby serving as a candidate metric for quantifying the degree of LiDAR degradation during operation.
+
+
+\fig{results_inference_normal_vs_degraded}{figures/results_inference_normal_vs_degraded.png}{Comparison of anomaly detection methods with statistical indicators across clean (dashed) and degraded (solid) experiments. Each subplot shows one method (DeepSAD--LeNet, DeepSAD--Efficient, OCSVM, Isolation Forest). Red curves denote method anomaly scores normalized to the clean experiment; blue and green curves denote the percentage of missing LiDAR points and near-sensor particle hits, respectively. Clear separation between clean and degraded runs is observed for the DeepSAD variants and, to a lesser degree, for OCSVM, while Isolation Forest produces high scores even in the clean experiment. Latent Space Dimensionality was 32 and semi-supervised labeling regime was 0 normal and 0 anomalous samples during training.}
+
+The plots in Fig.~\ref{fig:results_inference_normal_vs_degraded} highlight important differences in how well the tested methods distinguish between normal and degraded sensor conditions.
+Among the four approaches, the strongest separation is achieved by \textbf{DeepSAD (Efficient)}, followed by \textbf{DeepSAD (LeNet)}, then \textbf{OCSVM}.
+For \textbf{Isolation Forest}, the anomaly scores are already elevated in the clean experiment, which prevents reliable differentiation between normal and degraded runs and makes the method unsuitable in this context.
+It is important to note that the score axes are scaled individually per method, so comparisons should focus on relative separation rather than absolute values.
+
+Because the raw anomaly scores produced by the different methods are on incomparable scales (depending, for example, on network architecture or latent space dimensionality), we first applied a \textbf{$z$-score normalization}.
+The $z$-score is a standardized measure that rescales values in terms of their deviation from the mean relative to the standard deviation, making outputs from different models directly comparable in terms of how many standard deviations they deviate from normal behavior.
+To allow comparison between the clean and degraded experiments, the mean and standard deviation were estimated exclusively from the clean experiment and then used to normalize the degraded scores as well.
+This ensures that increases in the degraded runs are interpreted relative to the distribution of the clean baseline, whereas computing separate $z$-scores per experiment would only reveal deviations within each run individually and not enable a meaningful cross-experiment comparison.
+It should be noted that the $z$-scores remain \emph{method-specific}, meaning that while relative separation between clean and degraded runs can be compared within a method, the absolute scales across different methods are not directly comparable; readers should therefore take note of the differing axis ranges for each subplot.
+After normalization, the resulting time series were still highly noisy, which motivated the application of \textbf{exponential moving average (EMA) smoothing}.
+EMA was chosen because it is causal (does not rely on future data) and thus suitable for real-time inference.
+Although it introduces a small time delay, this delay is shorter than for other smoothing techniques such as running averages.
+
+%Since the raw anomaly scores were highly noisy across all methods, \textbf{exponential moving average (EMA) smoothing} was applied.
+%EMA was chosen because it is causal (does not rely on future data), which makes it usable in real-world online detection scenarios.
+%Although it introduces a small time delay, this delay is shorter than for other techniques such as running averages.
+
+The red method curves can also be compared with the blue and green statistical indicators (missing points and near-sensor particle hits).
+While some similarities in shape may suggest that the methods partly capture these statistics, such interpretations should be made with caution.
+The anomaly detection models are expected to have learned additional patterns that are not directly observable from simple statistics, and these may also contribute to their ability to separate degraded from clean data.
+

 \newchapter{conclusion_future_work}{Conclusion and Future Work}
 \newsection{conclusion}{Conclusion}
--- a/thesis/figures/ae_elbow_test_loss_anomaly.png
+++ b/thesis/figures/ae_elbow_test_loss_anomaly.png
--- a/thesis/figures/ae_elbow_test_loss_overall.png
+++ b/thesis/figures/ae_elbow_test_loss_overall.png
--- a/thesis/figures/results_inference_normal_vs_degraded.png
+++ b/thesis/figures/results_inference_normal_vs_degraded.png
--- a/thesis/figures/results_prc.png
+++ b/thesis/figures/results_prc.png