network arch lenet work

This commit is contained in:
Jan Kowalczyk
2025-08-17 14:49:00 +02:00
parent e2040fa547
commit cc152a4b75

View File

@@ -998,12 +998,21 @@ For inference (i.e.\ model validation on held-out experiments), we provide a sec
\section{Model Configuration \& Evaluation Protocol} \section{Model Configuration \& Evaluation Protocol}
Since the neural network architecture trained in the deepsad method is not fixed as described in section~\ref{sec:algorithm_details} but rather chosen based on the input data, we also had to choose an autoencoder architecture befitting our preprocessed lidar data projections. Since \citetitle{degradation_quantification_rain}~\cite{degradation_quantification_rain} reported success in training DeepSAD on similar data we firstly adapted their utilized network architecture for our usecase, which is based on the simple and well understood LeNet architecture. Additionally we were interested in evaluating the importance of a well-suited network architecture for DeepSAD's performance and therefore designed a second network architecture henceforth called "efficient architecture" to incorporate a few modern techniques, befitting our usecase. Since the neural network architecture trained in the deepsad method is not fixed as described in section~\ref{sec:algorithm_details} but rather chosen based on the input data, we also had to choose an autoencoder architecture befitting our preprocessed lidar data projections. Since \citetitle{degradation_quantification_rain}~\cite{degradation_quantification_rain} reported success in training DeepSAD on similar data we firstly adapted the network architecture utilized by them for our usecase, which is based on the simple and well understood LeNet architecture. Additionally we were interested in evaluating the importance and impact of a well-suited network architecture for DeepSAD's performance and therefore designed a second network architecture henceforth called "efficient architecture" to incorporate a few modern techniques, befitting our usecase.
\newsubsubsectionNoTOC{Network architectures (LeNet variant, custom encoder) and how they suit the pointcloud input} \newsubsubsectionNoTOC{Network architectures (LeNet variant, custom encoder) and how they suit the pointcloud input}
The LeNet-inspired autoencoder can be split into an encoder network (figure~\ref{fig:setup_arch_lenet_encoder}) and a decoder network (figure~\ref{fig:setup_arch_lenet_decoder}) with a latent space inbetween the two parts. Such an arrangement is typical for autoencoder architectures as we discussed in section~{sec:autoencoder}. The encoder network is simultaneously DeepSAD's main training architecture which is used to infer the degradation quantification in our use-case, once trained.
The LeNet-inspired encoder network (see figure~\ref{fig:setup_arch_lenet_encoder}) consists of two convolution steps with pooling layers, and finally a dense layer which populates the latent space. \todo[inline]{lenet explanation from chatgpt?} The first convolutional layer uses a 3x3 kernel and outputs 8 channels, which depicts the number of features/structures/patterns the network can learn to extract from the input and results in an output dimensionality of 2048x32x8 which is reduced to 1024x16x8 by a 2x2 pooling layer. \todo[inline]{batch normalization, and something else like softmax or relu blah?} The second convolution reduces the 8 channels to 4 with another 3x3 kernel \todo[inline]{why? explain rationale} and is followed by another 2x2 pooling layer resulting in a 512x8x4 dimensionality, which is then flattened and input into a dense layer. The dense layer's output dimension is the chosen latent space dimensionality, which is as previously mentioned another tuneable hyperparameter.
Its decoder network (see figure~\ref{fig:setup_arch_lenet_decoder}) is a mirrored version of the encoder, with a dense layer after the latent space and two pairs of \todo[inline]{upscale, then deconv?} \todo[inline]{deconv = convtranspose why??} 2x2 upsampling and transpose convolution layers which use 4 and 8 input channels respectively with the second one reducing its output to one channel resulting in the 2048x32x1 dimensionality, equal to the input's, which is required for the autoencoder, for its autoencoding objective to be possible.
\todo[inline]{start by explaining lenet architecture, encoder and decoder split, encoder network is the one being trained during the main training step, together as autoencoder during pretraining, decoder of lenet pretty much mirrored architecture of encoder, after preprocessing left with image data (2d projections, grayscale = 1 channel) so input is 2048x32x1. convolutional layers with pooling afterwards (2 convolution + pooling) convolutions to multiple channels (8, 4?) each channel capable of capturing a different pattern/structure of input. fully connected layer before latent space, latent space size not fixed since its also a hyperparameter and depended on how well the normal vs anomalous data can be captured and differentiated in the dimensionality of the latent space}
\todo[inline]{batch normalization, relu? something....}
\fig{setup_arch_lenet_encoder}{diagrams/arch_lenet_encoder}{UNFINISHED - Visualization of the original LeNet-inspired encoder architecture.} \fig{setup_arch_lenet_encoder}{diagrams/arch_lenet_encoder}{UNFINISHED - Visualization of the original LeNet-inspired encoder architecture.}
\fig{setup_arch_lenet_decoder}{diagrams/arch_lenet_decoder}{UNFINISHED - Visualization of the original LeNet-inspired decoder architecture.} \fig{setup_arch_lenet_decoder}{diagrams/arch_lenet_decoder}{UNFINISHED - Visualization of the original LeNet-inspired decoder architecture.}
\fig{setup_arch_ef_encoder}{diagrams/arch_ef_encoder}{UNFINISHED - Visualization of the efficient encoder architecture.} \fig{setup_arch_ef_encoder}{diagrams/arch_ef_encoder}{UNFINISHED - Visualization of the efficient encoder architecture.}