LLMs are getting better at character-level manipulation

Commentary

Its evident from the test that newer and larger models are better at generalizing Base64 encoding and decoding. So that implies they will get better at character-level manipulation and analysis.
Sadly the how many r’s in strawberry problem will be solvable by LLMs
Thinking is out of the equation, the crux here is the tokenisation, the better sense of the word you have, the better it understands, but the fine balance between less and more context is critical, and I think it is still being fine tuned to get a sweet spot.