started with data chapter flow rework

This commit is contained in:
Jan Kowalczyk
2025-05-14 12:12:02 +02:00
parent 83fa7538bc
commit 0a35786ebb
2 changed files with 50 additions and 9 deletions

View File

@@ -518,7 +518,7 @@ The pre-training results are used in two more key ways. First, the encoder weigh
In the main training step, DeepSAD's network is trained using SGD backpropagation. The unlabeled training data is used with the goal to minimize an data-encompassing hypersphere. Since one of the pre-conditions of training was the significant prevelance of normal data over anomalies in the training set, normal samples collectively cluster more tightly around the centroid, while the rarer anomalous samples do not contribute as significantly to the optimization, resulting in them staying further from the hypersphere center. The labeled data includes binary class labels signifying their status as either normal or anomalous samples. Labeled anomalies are pushed away from the center by defining their optimization target as maximizing the distance between them and $\mathbf{c}$. Labeled normal samples are treated similar to unlabeled samples with the difference that DeepSAD includes a hyperparameter capable of controling the proportion with which labeled and unlabeled data contribute to the overall optimization. The resulting network has learned to map normal data samples closer to $\mathbf{c}$ in the latent space and anomalies further away.
\fig{deepsad_procedure}{diagrams/deepsad_procedure}{WIP: Depiction of DeepSAD's training procedure, including data flows and tweakable hyperparameters.}
\fig{deepsad_procedure}{diagrams/deepsad_procedure}{(WORK IN PROGRESS) Depiction of DeepSAD's training procedure, including data flows and tweakable hyperparameters.}
\threadtodo
{how to use the trained network?}
@@ -564,8 +564,11 @@ The first term of \ref{eq:deepsad_optimization_objective} stays mostly the same,
\newsubsubsectionNoTOC{Hyperparameters}
The neural network architecture of DeepSAD is not fixed but rather dependent on the datatype the algorithm is supposed to operate on. This is due to the way it employs an autoencoder for pre-training and the encoder part of the network for its main training step. This makes the adaption of an autoencoder architecture suitable to the specific application necessary but also allows for flexibility in choosing a fitting architecture depending on the application's requirements. For this reason the specific architecture employed, may be considered an hyperparameter of the Deep SAD algorithm. During the pre-training step-as is typical for autoencoders-no labels are necessary since the optimization objective of autoencoders is generally to reproduce the input, as is indicated by the architecture's name.
The neural network architecture of DeepSAD is not fixed but rather dependent on the data type the algorithm is supposed to operate on. This is due to the way it employs an autoencoder for pre-training and the encoder part of the network for its main training step. This makes the adaption of an autoencoder architecture suitable to the specific application necessary but also allows for flexibility in choosing a fitting architecture depending on the application's requirements. For this reason the specific architecture employed, may be considered a hyperparameter of the Deep SAD algorithm.
\todo[inline]{Talk about choosing the correct architecture (give example receptive fields for image data from object detection?)}
\todo[inline]{latent space size, talk about auto encoder performance, trying out sensible dimensionalities and find reconstruction elbow, choose smallest possible, but as large as necessary}
\todo[inline]{eta, think of possible important scenarios, learning rate, epochs}
%\todo[inline, color=green!40]{Core idea of the algorithm is to learn a transformation to map input data into a latent space where normal data clusters close together and anomalous data gets mapped further away. to achieve this the methods first includes a pretraining step of an auto-encoder to extract the most relevant information, second it fixes a hypersphere center in the auto-encoders latent space as a target point for normal data and third it traings the network to map normal data closer to that hypersphere center. Fourth The resulting network can map new data into this latent space and interpret its distance from the hypersphere center as an anomaly score which is larger the more anomalous the datapoint is}
%\todo[inline, color=green!40]{explanation pre-training step: architecture of the autoencoder is dependent on the input data shape, but any data shape is generally permissible. for the autoencoder we do not need any labels since the optimization target is always the input itself. the latent space dimensionality can be chosen based on the input datas complexity (search citations). generally a higher dimensional latent space has more learning capacity but tends to overfit more easily (find cite). the pre-training step is used to find weights for the encoder which genereally extract robust and critical data from the input because TODO read deepsad paper (cite deepsad). as training data typically all data (normal and anomalous) is used during this step.}
@@ -577,14 +580,22 @@ The neural network architecture of DeepSAD is not fixed but rather dependent on
%\todo[inline, color=green!40]{explain the three terms (unlabeled, labeled, regularization)}
\newsection{advantages_limitations}{Advantages and Limitations}
\todo[inline]{semi supervised, learns normality by amount of data (no labeling/ground truth required), very few labels for better training to specific situation}
\todo[inline]{unsure if this section makes sense, what content would be here?}
%\todo[inline]{semi supervised, learns normality by amount of data (no labeling/ground truth required), very few labels for better training to specific situation}
\newchapter{data_preprocessing}{Data and Preprocessing}
\threadtodo
{Introduce data chapter, what will be covered here, incite interest}
{all background covered, deepsad explained, data natural next step}
{emotional why data scarce, lot of data necessary, what will be covered}
{what will we talk about next $\rightarrow$ requirements}
%\todo[inline, color=green!40]{good data important for learning based methods and for evaluation. in this chapter we talk about the requirements we have for our data and the difficulties that come with them and will then give some information about the dataset that was used as well as how the data was preprocessed for the experiments (sec 4.2)}
%Fortunately situations like earthquakes, structural failures and other circumstances where rescue robots need to be employed are uncommon occurences. When such an operation is conducted, the main focus lies on the fast and safe rescue of any survivors from the hazardous environment, therefore it makes sense that data collection is not a priority. Paired with the rare occurences this leads to a lack of publicly available data of such situations. To improve any method, a large enough, diversified and high quality dataset is always necessary to provide a comprehensive evaluation. Additionally, in this work we evaluate a training based method, which increases the requirements on the data manifold, which makes it all the more complex to find a suitable dataset. In this chapter we will state the requirements we defined for the data, talk about the dataset that was chosen for this task, including some statistics and points of interest, as well as how it was preprocessed for the training and evaluation of the methods.
Situations such as earthquakes, structural failures, and other emergencies that require rescue robots are fortunately rare. When these operations do occur, the primary focus is on the rapid and safe rescue of survivors rather than on data collection. Consequently, there is a scarcity of publicly available data from such scenarios. To improve any method, however, a large, diverse, and high-quality dataset is essential for comprehensive evaluation. This challenge is further compounded in our work, as we evaluate a training-based approach that imposes even higher demands on the data to enable training, making it difficult to find a suitable dataset.
Situations such as earthquakes, structural failures, and other emergencies that require rescue robots are fortunately rare. When these operations do occur, the primary focus is on the rapid and safe rescue of survivors rather than on data collection. Consequently, there is a scarcity of publicly available data from such scenarios. To improve any method, however, a large, diverse, and high-quality dataset is essential for comprehensive evaluation. This challenge is further compounded in our work, as we evaluate a training-based approach that imposes even higher demands on the data, especially requiring a great deal of diverse training samples, making it difficult to find a suitable dataset.
In this chapter, we outline the specific requirements we established for the data, describe the dataset selected for this task—including key statistics and notable features—and explain the preprocessing steps applied for training and evaluating the methods.
@@ -601,11 +612,38 @@ In this chapter, we outline the specific requirements we established for the dat
%Our main requirement for the data was for it to be as closely related to the target domain of rescue operations as possible. Since autonomous robots get largely used in situations where a structural failures occured we require of the data to be subterranean. This provides the additional benefit, that data from this domain oftentimes already has some amount of airborne particles like dust due to limited ventilation and oftentimes exposed rock, which is to be expected to also be present in rescue situations. The second and by far more limiting requirement on the data, was that there has to be appreciable degradation due to airborne particles as would occur during a fire from smoke. The type of data has to at least include lidar but for better understanding other types of visual data e.g., visual camera images would be benefical. The amount of data has to be sufficient for training the learning based methods while containing mostly good quality data without degradation, since the semi-supervised method implicitely requires a larger amount of normal than anomalous training for successful training. Nonetheless, the number of anomalous data samples has to be large enough that a comprehensive evaluation of the methods' performance is possible.
\newsubsubsectionNoTOC{Requirements}
Our primary requirement for the dataset was that it closely reflects the target domain of rescue operations. Because autonomous robots are predominantly deployed in scenarios involving structural failures, the data should be taken from subterranean environments. This setting not only aligns with the operational context but also inherently includes a larger than normal amount of airborne particles (e.g., dust) from limited ventilation and exposed rock surfaces, which is typically encountered during rescue missions.
\threadtodo
{list requirements we had for data}
{what were our requirements for choosing a dataset}
{list from basic to more complex with explanations}
{ground truth for evaluation $\rightarrow$ ground truth/labeling challenges}
A second, more challenging requirement is that the dataset must exhibit significant degradation due to airborne particles, as would be expected in scenarios involving smoke from fires. The dataset should at minimum include LiDAR data, and ideally also incorporate other visual modalities (e.g., camera images) to provide a more comprehensive understanding of the environment.
%Our primary requirement for the dataset was that it closely reflects the target domain of rescue operations. Because autonomous robots are predominantly deployed in scenarios involving structural failures, the data should be taken from subterranean environments. This setting not only aligns with the operational context but also inherently includes a larger than normal amount of airborne particles (e.g., dust) from limited ventilation and exposed rock surfaces, which is typically encountered during rescue missions.
%A second, more challenging requirement is that the dataset must exhibit significant degradation due to airborne particles, as would be expected in scenarios involving smoke from fires. The dataset should at minimum include LiDAR data, and ideally also incorporate other visual modalities (e.g., camera images) to provide a more comprehensive understanding of the environment.
%Additionally, the dataset must be sufficiently large for training learning-based methods. Since the semi-supervised approach we utilize relies on a predominance of normal data over anomalous data, it is critical that the dataset predominantly consists of high-quality, degradation-free samples. At the same time, there must be enough anomalous samples to allow for a thorough evaluation of the methods performance.
To ensure our chosen dataset meets the needs of reliable degradation quantification in subterranean rescue scenarios, we imposed the following requirements:
\begin{enumerate}
\item \textbf{Data Modalities:}\\
The dataset must include LiDAR sensor data, since we decided to train and evaluate our method on what should be the most universally used sensor type in the given domain. To keep our method as generalized as possible, we chose to only require range-based point cloud data and forego sensor-specific data such as intensity or reflectivity, though it may be of interest for future work. It is also desirable to have complementary visual data such as camera images, for better context and manual verification and understanding of the data.
\item \textbf{Context \& Collection Method:}\\
To mirror the real-world conditions of autonomous rescue robots, the data should originate from locations such as subterranean environments (tunnels, caves, collapsed structures), which closely reflect what would be encountered during rescue missions. Ideally, it should be captured from a ground-based, self-driving robot platform in motion instead of aerial, handheld, or stationary collection, to ensure similar circumstances to the target domain.
\item \textbf{Degradation Characteristics:}\\
Because our goal is to quantify the degradation of lidar data encountered by rescue robots, the dataset must exhibit significant degradation of LiDAR returns from aerosols (i.e., dust or smoke particles), which should be the most frequent and challenging degradation encountered. This requirement is key to evaluating how well our method detects and measures the severity of such challenging conditions.
\item \textbf{Volume \& Class Balance:}\\
The dataset must be large enough to train deep learning models effectively. Since our semi-supervised approach depends on learning a robust model of “normal” data, the majority of samples should be high-quality, degradation-free scans. Simultaneously, there must be a sufficient number of degraded (anomalous) scans to permit a comprehensive evaluation of quantification performance.
\item \textbf{Ground-Truth Labels:}\\
Finally, to evaluate and tune our method, we need some form of ground truth indicating which scans are degraded. Obtaining reliable labels in this domain is challenging. Manual annotation of degradation levels is laborious and somewhat subjective. We address these challenges and issues of labeling data for evaluation next.
\end{enumerate}
Additionally, the dataset must be sufficiently large for training learning-based methods. Since the semi-supervised approach we utilize relies on a predominance of normal data over anomalous data, it is critical that the dataset predominantly consists of high-quality, degradation-free samples. At the same time, there must be enough anomalous samples to allow for a thorough evaluation of the methods performance.
\newsubsubsectionNoTOC{Labeling Challenges}

View File

@@ -47,7 +47,7 @@
\end{pgfonlayer}
\begin{pgfonlayer}{foreground}
\node[hlabelbox, below=of hyper] (autoencarch) {\boxtitle{Autoencoder Architecture} Choose based on data type \\ Latent Space Size};
\node[hlabelbox, below=of hyper] (autoencarch) {\boxtitle{Autoencoder Architecture} Choose based on data type \\ Latent Space Size (based on complexity)};
\node[hlabelbox, below=.1 of autoencarch] (pretrainhyper) {\boxtitle{Hyperparameters} $E_A$: Number of Epochs \\ $L_A$: Learning Rate};
\end{pgfonlayer}
\begin{pgfonlayer}{background}
@@ -76,7 +76,7 @@
%\draw[arrow] (node cs:name=traindata,angle=-45) |- node[arrowlabel]{all training data, labels removed} (node cs:name=calcc,angle=200);
\begin{pgfonlayer}{foreground}
\node[stepsbox, below=1.4 of calcc] (maintrainproc) {Train Network for $E_M$ Epochs \\ with $L_M$ Learning Rate \\ Considers Labels};
\node[stepsbox, below=1.4 of calcc] (maintrainproc) {Train Network for $E_M$ Epochs \\ with $L_M$ Learning Rate \\ Considers Labels with $\eta$ strength};
\node[outputbox, below=.1 of maintrainproc] (maintrainout) {\boxtitle{Outputs} Encoder Network \\ $\mathbf{w}$: Network Weights \\ $\mathbf{c}$: Hypersphere Center};
\end{pgfonlayer}
\begin{pgfonlayer}{background}
@@ -90,6 +90,9 @@
\node[hyperbox, fit=(maintrainhyper), label={[label distance = 1, name=autoenclabel]above:{\textbf{Main-Training Hyperparameters}}}] (maintrainhyp) {};
\end{pgfonlayer}
\draw[arrow] (node cs:name=pretrain,angle=-20) -- +(1, 0) |- (node cs:name=maintrain,angle=20);
%\draw[arrow] (pretrainoutput.south) -- (node cs:name=maintrain,angle=22);
\draw[arrow] (calcc.south) -- (maintrainlab.north);
\draw[arrow] (traindata.south) |- (maintrain.west);