There are several things that can contribute to what you are seeing here. The optimization process is not good, the way you optimize your model, can have a direct effect on how your model performs. The proper choice of an optimizer, learning rate, learning rate decay regime, proper regularization, are just some to name. Other than that your network is very simple and very badly designed. You do not have enough Conv layers to utilize the image structures and provide good abstractions that can be used to do what you are asking it to do. You model is not deep enough either.
MNIST by itself is a very easy task, using a linear classifier you can achieve around the very accuracy you achieved, maybe even better. this shows you are not exploiting the CNNs or deep architectures capabilities in any good way. even a simple one or two fully connected layers should give you better accuracy if properly trained.
Try making your network deeper, use more CNN layers, followed by BatchNormalization and then ReLUs, and avoid quickly downsampling the input featuremaps. When you downsample, you lose information, to make up for that, you usually want to increase the filters on the next layer to compensate for the decreased representational capacity caused by this. In other words, try to gradually decrease the featuremap’s dimensions and likewise, increase the neurons number.
A huge number of neurons in the beginning is wasteful for you specific use-case, 32/64 can be more than enough, as the network gets deeper, more abstract features are built upon more primitive ones found in the early layers, so having more neurons at the later layers are more reasonable usually.
Early layers are responsible for creating primitive filters and after some point, more filters won’t help in performance, it just creates duplicated work that’s already been done by some previous filter.
The reason you see a difference in accuracy, is simply because you ended up in another local minima! With the same exact config, if you train 100 times, you will get 100 different results, some better than others and some worse than the others, never the same exact value, unless you use deterministic behavior by using a specific seed and only run in cpu mode.
CLICK HERE to find out more related problems solutions.