I asked it a little bit more about this and it basically told me that it describes the image and gives that to basically a third party image generator, with its own hardwired rules, and that actually outputs the image, so maybe to grok, it considers that it only described the image to the photo generator idk its bizarre.