abstract lidar capitalization

This commit is contained in:
Jan Kowalczyk
2025-10-19 17:34:38 +02:00
parent 62c424cd54
commit 6cd2c7fbef
3 changed files with 6 additions and 5 deletions

View File

@@ -91,7 +91,8 @@
}
\DeclareRobustCommand{\rev}[1]{\textcolor{red}{#1}}
%\DeclareRobustCommand{\rev}[1]{\textcolor{red}{#1}}
\DeclareRobustCommand{\rev}[1]{#1}
\DeclareRobustCommand{\mcah}[1]{}
% correct bad hyphenation
@@ -763,7 +764,7 @@ We adapted the baseline implementations to our data loader and input format and
\paragraph{Evaluation Metrics}
As discussed in Section~\ref{sec:preprocessing}, evaluating model performance in our setup is challenging due to the absence of an analog ground truth. Instead, we rely on binary labels that are additionally noisy and subjective. All models under consideration produce continuous anomaly scores: DeepSAD outputs a positive-valued distance to the center of a hypersphere, Isolation Forest measures deviation from the mean tree depth (which can be negative), and OCSVM returns a signed distance to the decision boundary. Because these scores differ in scale and sign—and due to the lack of a reliable degradation threshold—it is not appropriate to evaluate performance using metrics such as accuracy or F1 score, both of which require classification at a fixed threshold.
As discussed in Section~\ref{sec:preprocessing}, evaluating model performance in our setup is challenging due to the absence of analog ground truth. Instead, we rely on binary labels that are additionally noisy and subjective. All models under consideration produce continuous anomaly scores: DeepSAD outputs a positive-valued distance to the center of a hypersphere, Isolation Forest measures deviation from the mean tree depth (which can be negative), and OCSVM returns a signed distance to the decision boundary. Because these scores differ in scale and sign—and due to the lack of a reliable degradation threshold—it is not appropriate to evaluate performance using metrics such as accuracy or F1 score, both of which require classification at a fixed threshold.
Instead, we adopt threshold-independent evaluation curves that illustrate model behavior across the full range of possible thresholds. The most commonly used of these is the Receiver Operating Characteristic (ROC)~\cite{roc} curve, along with its scalar summary metric, ROC AUC. ROC curves plot the true positive rate (TPR) against the false positive rate (FPR), providing insight into how well a model separates the two classes. However, as noted in~\cite{roc_vs_prc2,roc_vs_prc} and confirmed in our own testing, ROC AUC can be misleading under strong class imbalance—a common condition in anomaly detection.