diff --git a/thesis/Main.pdf b/thesis/Main.pdf index f021c47..c178537 100644 Binary files a/thesis/Main.pdf and b/thesis/Main.pdf differ diff --git a/thesis/Main.tex b/thesis/Main.tex index 866b6cd..47e4b22 100755 --- a/thesis/Main.tex +++ b/thesis/Main.tex @@ -426,7 +426,7 @@ To ensure our chosen dataset meets the needs of reliable degradation quantificat \begin{enumerate} \item \textbf{Data Modalities:}\\ - The dataset must include \rev{LiDAR} sensor data, since we decided to train and evaluate our method on what should be the most universally used sensor type in the given domain. To keep our method as generalized as possible, we chose to only require range-based point cloud data and \rev{opt out of} sensor-specific data such as intensity or reflectivity, though it may be of interest for future work. It is also desirable to have complementary visual data such as camera images, for better context, manual verification and understanding of the data. + The dataset must include \rev{LiDAR} sensor data, since we decided to train and evaluate our method on what should be the most universally used sensor type in the given domain. To keep our method as generalized as possible, we chose to only require range-based point cloud data and \rev{opt out of} sensor-specific data such as intensity or reflectivity, though it may be of interest for future work. It is also desirable to have complementary visual data, such as camera images, for better context, manual verification, and understanding of the data. \item \textbf{Context \& Collection Method:}\\ To mirror the real-world conditions of autonomous rescue robots, the data should originate from locations such as subterranean environments (tunnels, caves, collapsed structures), which closely reflect what would be encountered during rescue missions. Ideally, it should be captured from a ground-based, self-driving robot platform in motion instead of aerial, handheld, or stationary collection, to ensure similar circumstances to the target domain. @@ -444,13 +444,13 @@ To ensure our chosen dataset meets the needs of reliable degradation quantificat -Quantitative benchmarking of degradation quantification requires a degradation label for every scan. Ideally that label would be a continuous degradation score, although a binary label would still enable meaningful comparison. As the rest of this section shows, producing any reliable label is already challenging and assigning meaningful analog scores may not be feasible at all. Compounding the problem, no public search-and-rescue (SAR) \rev{LiDAR} data set offers such ground truth as far as we know. To understand the challenges around labeling \rev{LiDAR} data degradation, we will look at what constitutes degradation in this context. +Quantitative benchmarking of degradation quantification requires a degradation label for every scan. Ideally, that label would be a continuous degradation score, although a binary label would still enable meaningful comparison. As the rest of this section shows, producing any reliable label is already challenging, and assigning meaningful analog scores may not be feasible at all. Compounding the problem, no public search-and-rescue (SAR) \rev{LiDAR} data set offers such ground truth as far as we know. To understand the challenges around labeling \rev{LiDAR} data degradation, we will look at what constitutes degradation in this context. -In \rev{Section}~\ref{sec:lidar_related_work} we discussed some internal and environmental error causes of \rev{LiDAR} sensors, such as multi-return ambiguities or atmospheric scattering respectively. While we are aware of research into singular failure \rev{modes~\cite{lidar_errormodel_particles}} or research trying to model the totality of error souces occuring in other \rev{domains~\cite{lidar_errormodel_automotive}}, there appears to be no such model for the search and rescue domain and its unique environmental circumstances. Although, scientific consensus appears to be, that airborne particles are the biggest contributor to degradation in SAR~\cite{lidar_errormodel_consensus}, we think that a more versatile definition is required to ensure confidence during critical SAR missions, which are often of a volatile nature. We are left with an ambiguous definition of what constitutes \rev{LiDAR} point cloud degradation in the SAR domain. +In \rev{Section}~\ref{sec:lidar_related_work}, we discussed some internal and environmental error causes of \rev{LiDAR} sensors, such as multi-return ambiguities or atmospheric scattering, respectively. While we are aware of research into singular failure \rev{modes~\cite{lidar_errormodel_particles}} or research trying to model the totality of error sources occurring in other \rev{domains~\cite{lidar_errormodel_automotive}}, there appears to be no such model for the search and rescue domain and its unique environmental circumstances. Although scientific consensus appears to be that airborne particles are the biggest contributor to degradation in SAR~\cite{lidar_errormodel_consensus}, we think that a more versatile definition is required to ensure confidence during critical SAR missions, which are often of a volatile nature. We are left with an ambiguous definition of what constitutes \rev{LiDAR} point cloud degradation in the SAR domain. -We considered which types of objective measurements may be available to produce ground-truth labels, such as particulate matter sensors, \rev{LiDAR} point clouds' inherent properties such as range-dropout rate and others, but we fear that using purely objective measures to label the data, would limit our learning based method to imitating the labels' sources instead of differentiating all possible degradation patterns from high quality data. Due to the incomplete error model in this domain, there may be novel or compound error sources that would not be captured using such an approach. As an example, we did observe dense smoke reflecting enough rays to produce phantom objects, which may fool SLAM algorithms. Such a case may even be labeleled incorrectly as normal by one of the aforementioned objective measurement labeling options, if the surroundings do not exhibit enough dispersed smoke particles already. +We considered which types of objective measurements may be available to produce ground-truth labels, such as particulate matter sensors, \rev{LiDAR} point clouds' inherent properties such as range-dropout rate and others, but we fear that using purely objective measures to label the data, would limit our learning based method to imitating the labels' sources instead of differentiating all possible degradation patterns from high quality data. Due to the incomplete error model in this domain, there may be novel or compound error sources that would not be captured using such an approach. As an example, we did observe dense smoke reflecting enough rays to produce phantom objects, which may fool SLAM algorithms. Such a case may even be labeled incorrectly as normal by one of the aforementioned objective measurement labeling options, if the surroundings do not already exhibit enough dispersed smoke particles. -To mitigate the aforementioned risks we adopt a human-centric, binary labelling strategy. We judged analog and multi-level discrete rating scales to be too subjective for human consideration, which only left us with the simplistic, but hopefully more reliable binary choice. We used two labeling approaches, producing two evaluation sets, whose motivation and details will be discussed in more detail in \rev{Section}~\ref{sec:preprocessing}. Rationale for the exact labeling procedures requires knowledge of the actual dataset we ended up choosing, which we will present in the next section. +To mitigate the aforementioned risks, we adopt a human-centric, binary labelling strategy. We judged analog and multi-level discrete rating scales to be too subjective for human consideration, which only left us with the simplistic, but hopefully more reliable, binary choice. We used two labeling approaches, producing two evaluation sets, whose motivation and details will be discussed in more detail in \rev{Section}~\ref{sec:preprocessing}. Rationale for the exact labeling procedures requires knowledge of the actual dataset we ended up choosing, which we will present in the next section. \newsection{data_dataset}{\rev{Dataset}} @@ -501,13 +501,13 @@ We use data from the \emph{Ouster OS1-32} \rev{LiDAR} sensor, which was configur \end{figure} -During the measurement campaign, a total of 14 experiments were conducted—10 prior to operating the artificial smoke machine (hereafter referred to as normal experiments) and 4 after it has already been running for some time (anomalous experiments). In 13 of these experiments, the sensor platform was in near-constant motion (either translating at roughly 1m/s or rotating), with only one anomalous experiment conducted while the platform remained stationary. Although this means we do not have two stationary experiments from the same exact position for a direct comparison between normal and anomalous conditions, the overall experiments are similar enough to allow for meaningful comparisons. In addition to the presence of water vapor from the smoke machine, the experiments vary in illumination conditions, the presence of humans on the measurement grounds, and additional static artifacts. For our purposes, only the artificial smoke is relevant; differences in lighting or incidental static objects do not affect our analysis. Regardless of illumination, the \rev{LiDAR} sensor consistently produces comparable point clouds, and the presence of static objects does not influence our quantification of point cloud degradation. +During the measurement campaign, a total of 14 experiments were conducted—10 prior to operating the artificial smoke machine (hereafter referred to as normal experiments) and 4 after it had already been running for some time (anomalous experiments). In 13 of these experiments, the sensor platform was in near-constant motion (either translating at roughly 1m/s or rotating), with only one anomalous experiment conducted while the platform remained stationary. Although this means we do not have two stationary experiments from the same exact position for a direct comparison between normal and anomalous conditions, the overall experiments are similar enough to allow for meaningful comparisons. In addition to the presence of water vapor from the smoke machine, the experiments vary in illumination conditions, the presence of humans on the measurement grounds, and additional static artifacts. For our purposes, only the artificial smoke is relevant; differences in lighting or incidental static objects do not affect our analysis. Regardless of illumination, the \rev{LiDAR} sensor consistently produces comparable point clouds, and the presence of static objects does not influence our quantification of point cloud degradation. In the anomalous experiments, the artificial smoke machine appears to have been running for some time before data collection began, as evidenced by both camera images and \rev{LiDAR} data showing an even distribution of water vapor around the machine. The stationary experiment is particularly unique: the smoke machine was positioned very close to the sensor platform and was actively generating new, dense smoke, to the extent that the \rev{LiDAR} registered the surface of the fresh water vapor as if it were a solid object. -The \rev{Figures}~\ref{fig:data_screenshot_pointcloud}~and~\ref{fig:data_screenshot_camera} show an representative depiction of the environment of the experiments as a camera image of the IR camera and the point cloud created by the OS1 \rev{LiDAR} sensor at practically the same time. +\rev{Figures}~\ref{fig:data_screenshot_pointcloud}~and~\ref{fig:data_screenshot_camera} show a representative depiction of the environment of the experiments as a camera image of the IR camera and the point cloud created by the OS1 \rev{LiDAR} sensor at practically the same time. -\figc{data_screenshot_pointcloud}{figures/data_screenshot_pointcloud.png}{Screenshot of 3D rendering of an experiment's point cloud produced by the OS1-32 \rev{LiDAR} sensor without smoke and with illumination (same frame and roughly same alignment as \rev{Figure}~\ref{fig:data_screenshot_camera}). Point color corresponds to measurement range and axis in \rev{the center of the figure marks} the \rev{LiDAR}'s position.}{width=.9\textwidth} +\figc{data_screenshot_pointcloud}{figures/data_screenshot_pointcloud.png}{Screenshot of 3D rendering of an experiment's point cloud produced by the OS1-32 \rev{LiDAR} sensor without smoke and with illumination (same frame and roughly same alignment as \rev{Figure}~\ref{fig:data_screenshot_camera}). The point color corresponds to the measurement range, and the axis in \rev{the center of the figure marks} the \rev{LiDAR}'s position.}{width=.9\textwidth} \figc{data_screenshot_camera}{figures/data_screenshot_camera.png}{Screenshot of IR camera output of an experiment without smoke and with illumination (same frame and roughly same alignment as \rev{Figure}~\ref{fig:data_screenshot_pointcloud})\rev{.}}{width=.9\textwidth} @@ -515,46 +515,46 @@ Regarding the dataset volume, the 10 normal experiments ranged from 88.7 to 363. \fig{data_points_pie}{figures/data_points_pie.png}{Pie chart visualizing the amount and distribution of normal and anomalous point clouds in \cite{subter}\rev{.}} -The artificial smoke introduces measurable changes that clearly separate the \textit{anomalous} runs from the \textit{normal} baseline. One change is a larger share of missing points per scan: smoke particles scatter or absorb the laser beam before it reaches a solid target, so the sensor reports an error instead of a distance. Figure~\ref{fig:data_missing_points} shows the resulting right–shift of the missing-point histogram, a known effect for \rev{LiDAR} sensors in aerosol-filled environments. Another demonstrative effect is the appearance of many spurious returns very close to the sensor; these near-field points arise when back-scatter from the aerosol itself is mistaken for a surface echo. The box-plot in \rev{Figure}~\ref{fig:particles_near_sensor} confirms a pronounced increase in sub-50 cm hits under smoke, a range at which we do not expect any non-erroneous measurements. Both effects are consistent with the behaviour reported in \rev{\cite{when_the_dust_settles}}. +The artificial smoke introduces measurable changes that clearly separate the \textit{anomalous} runs from the \textit{normal} baseline. One change is a larger share of missing points per scan: smoke particles scatter or absorb the laser beam before it reaches a solid target, so the sensor reports an error instead of a distance. Figure~\ref{fig:data_missing_points} shows the resulting right–shift of the missing-point histogram, a known effect for \rev{LiDAR} sensors in aerosol-filled environments. Another demonstrative effect is the appearance of many spurious returns very close to the sensor; these near-field points arise when back-scatter from the aerosol itself is mistaken for a surface echo. The box plot in \rev{Figure}~\ref{fig:particles_near_sensor} confirms a pronounced increase in sub-50 cm hits under smoke, a range at which we do not expect any non-erroneous measurements. Both effects are consistent with the behaviour reported in \rev{\cite{when_the_dust_settles}}. -\fig{data_missing_points}{figures/data_missing_points.png}{Density histogram showing the percentage of missing measurements per scan for normal experiments without degradation and anomalous experiments with artifical smoke introduced as degradation.} +\fig{data_missing_points}{figures/data_missing_points.png}{Density histogram showing the percentage of missing measurements per scan for normal experiments without degradation and anomalous experiments with artificial smoke introduced as degradation.} \fig{particles_near_sensor}{figures/particles_near_sensor_boxplot_zoomed_500.png}{Box diagram depicting the percentage of measurements closer than 50 centimeters to the sensor for normal and anomalous experiments.} -Taken together, the percentage of missing points and the proportion of near-sensor returns provide a concise indication of how strongly the smoke degrades our scans—capturing the two most prominent aerosol effects, drop-outs and back-scatter spikes. They do not, however, reveal the full error landscape discussed earlier (compound errors, temperature drift, multipath, \dots), so they should be read as an easily computed synopsis rather than an exhaustive measure of \rev{LiDAR} quality. Next we will discuss how the \rev{LiDAR} scans were preprocessed before use and how we actually assigned ground-truth labels to each scan, so \rev{that} we could train and evaluate our quantification degradation methods. +Taken together, the percentage of missing points and the proportion of near-sensor returns provide a concise indication of how strongly the smoke degrades our scans—capturing the two most prominent aerosol effects, drop-outs and back-scatter spikes. They do not, however, reveal the full error landscape discussed earlier (compound errors, temperature drift, multipath, \dots), so they should be read as an easily computed synopsis rather than an exhaustive measure of \rev{LiDAR} quality. Next, we will discuss how the \rev{LiDAR} scans were preprocessed before use and how we actually assigned ground-truth labels to each scan, so \rev{that} we could train and evaluate our quantification degradation methods. \newsection{preprocessing}{Preprocessing Steps and Labeling} As described in Section~\ref{sec:algorithm_description}, the method under evaluation is data type agnostic and can be adapted to work with any kind of data by choosing a suitable autoencoder architecture. In our case, the input data are point clouds produced by a \rev{LiDAR} sensor. Each point cloud contains up to 65,536 points, with each point represented by its \emph{X}, \emph{Y}, and \emph{Z} coordinates. To tailor the DeepSAD architecture to this specific data type, we would need to design an autoencoder suitable for processing three-dimensional point clouds. Although autoencoders can be developed for various data types, \rev{\cite{autoencoder_survey} observed} that over 60\% of recent research on autoencoders focuses on two-dimensional image classification and reconstruction. Consequently, there is a more established understanding of autoencoder architectures for images compared to those for three-dimensional point clouds. -For this reason and to simplify the architecture, we converted the point clouds into two-dimensional grayscale images using a spherical projection. This approach—proven sucessful in related work~\cite{degradation_quantification_rain}—encodes each \rev{LiDAR} measurement as a single pixel, where the pixel’s grayscale value is determined by the reciprocal range, calculated as $v = \frac{1}{\sqrt{\emph{X}^2 + \emph{Y}^2 + \emph{Z}^2}}$. Given the \rev{LiDAR} sensor's configuration, the resulting images have a resolution of 2048 pixels in width and 32 pixels in height. Missing measurements in the point cloud are mapped to pixels with a brightness value of $v = 0$. +For this reason and to simplify the architecture, we converted the point clouds into two-dimensional grayscale images using a spherical projection. This approach—proven successful in related work~\cite{degradation_quantification_rain}—encodes each \rev{LiDAR} measurement as a single pixel, where the pixel’s grayscale value is determined by the reciprocal range, calculated as $v = \frac{1}{\sqrt{\emph{X}^2 + \emph{Y}^2 + \emph{Z}^2}}$. Given the \rev{LiDAR} sensor's configuration, the resulting images have a resolution of 2048 pixels in width and 32 pixels in height. Missing measurements in the point cloud are mapped to pixels with a brightness value of $v = 0$. To create this mapping, we leveraged the available measurement indices and channel information inherent in the dense point clouds, which are ordered from 0 to 65,535 in a horizontally ascending, channel-by-channel manner. For sparse point clouds without such indices, one would need to rely on the pitch and yaw angles relative to the sensor's origin to correctly map each point to its corresponding pixel, although this often leads to ambiguous mappings due to numerical errors in angle estimation. Figure~\ref{fig:data_projections} displays two examples of \rev{LiDAR} point cloud projections to aid in the reader’s understanding. Although the original point clouds were converted into grayscale images with a resolution of 2048×32 pixels, these raw images can be challenging to interpret. To enhance human readability, we applied the viridis colormap and vertically stretched the images so that each measurement occupies multiple pixels in height. The top projection is derived from a scan without artificial smoke—and therefore minimal degradation—while the lower projection comes from an experiment where artificial smoke introduced significant degradation. -\fig{data_projections}{figures/data_2d_projections.png}{Two-dimensional projections of two point clouds, one from an experiment without degradation and one from an experiment with artifical smoke as degradation. To aid the readers perception, the images are vertically stretched and a colormap has been applied to the pixels' reciprocal range values, while the actual training data is grayscale.} +\fig{data_projections}{figures/data_2d_projections.png}{Two-dimensional projections of two point clouds, one from an experiment without degradation and one from an experiment with artificial smoke as degradation. To aid the reader's perception, the images are vertically stretched, and a colormap has been applied to the pixels' reciprocal range values, while the actual training data is grayscale.} -The remaining challenge, was labeling a large enough portion of the dataset in a reasonably accurate manner, whose difficulties and general approach we described in \rev{Section}~\ref{sec:data_req}. Since, to our knowledge, neither our chosen dataset nor any other publicly available one provide objective labels for \rev{LiDAR} data degradation in the SAR domain, we had to define our own labeling approach. With objective measures of degradation unavailable, we explored alternative labeling methods—such as using \rev{the statistical} properties like the number of missing measurements per point cloud or the higher incidence of erroneous measurements near the sensor we described in \rev{Section~\ref{sec:data_dataset}}. Ultimately, we were concerned that these statistical approaches might lead the method to simply mimic the statistical evaluation rather than to quantify degradation in a generalized and robust manner. After considering these options, we decided to label all point clouds from experiments with artificial smoke as anomalies, while point clouds from experiments without smoke were labeled as normal data. This labeling strategy—based on the presence or absence of smoke—is fundamentally an environmental indicator, independent of the intrinsic data properties recorded during the experiments. +The remaining challenge was labeling a large enough portion of the dataset in a reasonably accurate manner, whose difficulties and general approach we described in \rev{Section}~\ref{sec:data_req}. Since, to our knowledge, neither our chosen dataset nor any other publicly available one provides objective labels for \rev{LiDAR} data degradation in the SAR domain, we had to define our own labeling approach. With objective measures of degradation unavailable, we explored alternative labeling methods—such as using \rev{the statistical} properties like the number of missing measurements per point cloud or the higher incidence of erroneous measurements near the sensor we described in \rev{Section~\ref{sec:data_dataset}}. Ultimately, we were concerned that these statistical approaches might lead the method to simply mimic the statistical evaluation rather than to quantify degradation in a generalized and robust manner. After considering these options, we decided to label all point clouds from experiments with artificial smoke as anomalies, while point clouds from experiments without smoke were labeled as normal data. This labeling strategy—based on the presence or absence of smoke—is fundamentally an environmental indicator, independent of the intrinsic data properties recorded during the experiments. -The simplicity of this labeling approach has both advantages and disadvantages. On the positive side, it is easy to implement and creates a clear distinction between normal and anomalous data. However, its simplicity is also its drawback: some point clouds from experiments with artificial smoke do not exhibit perceptible degradation, yet they are still labeled as anomalies. The reason for this, is that during the three non-static anomalous experiments the sensor platform starts recording in a tunnel roughly 20 meters from the smoke machine's location. It starts by approaching the smoke machine, navigates close to the machine for some time and then leaves its perimeter once again. Since the artificical smoke's density is far larger near the machine it originates from, the time the sensor platform spent close to it produced highly degraded point clouds, whereas the beginnings and ends of the anomalous experiments capture point clouds which are subjectively not degraded and appear similar to ones from the normal experiments. This effect is clearly illustrated by the degradation indicators which we talked about earlier\rev{--}the proportion of missing points and the amount of erroneous points close to the sensor per point cloud\rev{--}as can be seen in \rev{Figure}~\ref{fig:data_anomalies_timeline}. +The simplicity of this labeling approach has both advantages and disadvantages. On the positive side, it is easy to implement and creates a clear distinction between normal and anomalous data. However, its simplicity is also its drawback: some point clouds from experiments with artificial smoke do not exhibit perceptible degradation, yet they are still labeled as anomalies. The reason for this is that during the three non-static anomalous experiments, the sensor platform starts recording in a tunnel roughly 20 meters from the smoke machine's location. It starts by approaching the smoke machine, navigates close to the machine for some time, and then leaves its perimeter once again. Since the artificical smoke's density is far larger near the machine it originates from, the time the sensor platform spent close to it produced highly degraded point clouds, whereas the beginnings and ends of the anomalous experiments capture point clouds that are subjectively not degraded and appear similar to those from the normal experiments. This effect is clearly illustrated by the degradation indicators which we talked about earlier\rev{--}the proportion of missing points and the amount of erroneous points close to the sensor per point cloud\rev{--}as can be seen in \rev{Figure}~\ref{fig:data_anomalies_timeline}. -\fig{data_anomalies_timeline}{figures/data_combined_anomalies_timeline.png}{Missing points and points with a measured range smaller than 50cm per point cloud over a normalized timeline of the individual experiments. This illustrates the rise, plateau and fall of degradation intensity during the anomalous experiments, owed to the spacial proximity to the degradation source (smoke machine). One of the normal experiments (without artifical smoke) is included as a baseline \rev{in gray}.} +\fig{data_anomalies_timeline}{figures/data_combined_anomalies_timeline.png}{Missing points and points with a measured range smaller than 50cm per point cloud over a normalized timeline of the individual experiments. This illustrates the rise, plateau, and fall of degradation intensity during the anomalous experiments, owing to the spatial proximity between the LiDAR sensor and the degradation source (smoke machine). One of the normal experiments (without artificial smoke) is included as a baseline \rev{in gray}.} Afraid that the incorrectly labeled data may negatively impact DeepSAD's semi-supervised training, we chose to manually remove the anomalous labels from the beginning and end of the anomalous experiments, for training purposes. This refinement gave us more confidence in the training signal but reduced the number of labeled anomalies. For evaluation, we therefore report results under both schemes: \begin{enumerate} - \item \textbf{Experiment-based labels:} All scans from anomalous experiments marked anomalous, including border cases. This yields conservative performance metrics that reflect real-world label noise. + \item \textbf{Experiment-based labels:} All scans from anomalous experiments are marked anomalous, including border cases. This yields conservative performance metrics that reflect real-world label noise. \item \textbf{Manually-defined labels:} Only unequivocally degraded scans are marked anomalous, producing near-ideal separation in a lot of cases. \end{enumerate} -Under both evaluation schemes all frames from normal experiments were marked as normal, since they appear to have produced high quality data throughout. A visualization of how the two evaluation schemes measure up in terms of numbers of samples per class can be seen in \rev{Figure}~\ref{fig:data_eval_labels}. +Under both evaluation schemes, all frames from normal experiments were marked as normal, since they appear to have produced high-quality data throughout. A visualization of how the two evaluation schemes measure up in terms of the number of samples per class can be seen in \rev{Figure}~\ref{fig:data_eval_labels}. -\fig{data_eval_labels}{figures/data_eval_labels.png}{Pie charts visualizing the number of normal and anomalous labels applied to the dataset per labeling scheme. A large part of the experiment-based anomalous labels had to be removed for the manually-defined scheme, since they were either subjectively clearly or possibly not degraded.} +\fig{data_eval_labels}{figures/data_eval_labels.png}{Pie charts visualizing the number of normal and anomalous labels applied to the dataset per labeling scheme. A large part of the experiment-based anomalous labels had to be removed for the manually-defined scheme, since, subjectively, they were either clearly or possibly not degraded.} -By evaluating and comparing both approaches, we hope to demonstrate a more thorough performance investigatation than with only one of the two \rev{labeling schemes}. +By evaluating and comparing both approaches, we hope to demonstrate a more thorough performance investigation than with only one of the two \rev{labeling schemes}. \newchapter{experimental_setup}{Experimental Setup}