I read Sutton's book "Reinforcement Learning" to better understand the historical and mathemtical background and it turns out that all reinforcement learning is, at least in its basic parts, is a a collection of Markov Decision problem solving procedures. Specificially, Dynmaic Programming based on the Bellman equation is a good way to solve learning problems where the model is known as a whole. If you don't know the model as a whole or only want to solve a part of the problem, then the application of Marov Chain Monte Carlo methods helps you pull in Bayesian learning to deal with that. Eventually, all you do is simulate or process data and analyse corresponding Markov Chains.
So as a whole, I was a bit disappointed, because the idea of Markov Chains as such is not to challenging (the mathematics of it is, though). Markov Chains and Monte Carlo are very intutitive concepts and solving them also coincides with goals people try to achieve in reinforcement learning.
Anyway, I had a lot of fun exploring all the different ways to solve Grid World problems and I might dive deeper into the issue with an application closer to the real world, hopefully soon.