laurent.sexy

Laurent dreams video soft life things

Here we look at the performance of fashion MNIST classification task using different activation functions. We try different model depth and structures

fc(N) = nn.Sequential( flatten(), linear(28*28, ... ), activation(), batchnorm(), dropout(), linear( ... , ... ), activation(), batchnorm(), dropout(), linear( ... , 10 ), activation(), batchnorm(), dropout(), softmax(), )

For following analysis, we'll only run experiments on fc(4) and fc(16) and keep dropout=0

cn(N,m_size,M) = nn.Sequential( # feature extraction part, has M convolutions conv2d( 1 , ..., kernel=3), activation(), maxpool2d(m_size), conv2d( ... , ..., kernel=3), activation(), maxpool2d(m_size), conv2d( ... , 128, kernel=3), activation(), maxpool2d(m_size), # classification part, has N linear units flatten(), linear( ... , ... ), activation(), linear( ... , ... ), activation(), linear( ... , 10 ), activation(), softmax(), )

model

activation

dropout

confusion

performance

loss

loss raw

Laurent Rohrbasser (2014|2025) -- view as folder