Playing around again with OpenAI API. Limitations on its current vision capabilities.

I've been trying to get it to play some old gameboy games I have on my machine through an emulator. It struggles on understanding relative position in 2D space, but does a great job at extracting text and extracting what objects are present and what they look like. The only way I was able to consistently get it to understand that, in the context of screenshots, that coordinates (0, 0) is to the left and up of coordinate (1, 1), was by forcing it to repeat as part of its response that west is subtracting from the X position and south is adding to the Y position. But it could never deduce where the player's position (or the position of anything) was except by hazy guesses of relative position. Sometimes the player would be left of a target, but it would tell me it was directly to the right, and vice versa. This means that I could not rely on its vision capabilities to understand the 2D game space, but rather I had to construct that 2D game space by other means, and then provide a model of that space to the AI so that it could use raw numbers, with which it has had more success. At the same time, it still tends to fail when reasoning about position within a 2D grid of numbers. What it does do well, is determine what is needed for overall strategic success. This makes it better for card games, strategy games, turn based games. I made a change to stop it from trying to figure out point A to point B, and instead implemented a non-AI pathfinding algorithm within a non-AI constructed model of the screen. So all the AI has to do is decide where to move, and then leverage proven non-AI pathfinding algorithms in order to get there. The AI is capable of building a narrative - This is what the language models are good at. And having a narrative gives the program a 'why' and a source for future action which does not involve further human prompting.