Heretical Gaming is my blog about my gaming life, featuring small skirmishes and big battles from many historical periods (and some in the mythic past or the far future too). The focus is on battle reports using a wide variety of rules, with the occasional rules review, book review and odd musing about the gaming and history. Most of the battles use 6mm-sized figures and vehicles, but occasionally 15mm and 28mm figures appear too.

Monday, 29 September 2025

Generative AI and Wargames' Scenario Generation

 I was very interested to read Norm Smith's latest blogplost on 'Battlefields and Warriors' on generative AI within gaming, specifically in terms of scenario generation, 'The Impacts of AI'. To make sense of my post, it will definitely be worth your while to read his first. 
 
The central claim is that generative AI does not really create, but it basically just steals stuff and reformats it.  This felt testable, so I have had a go, but before I link to the results, I held a few things in mind:
 
If I were asked, even if just asked myself, to create a Wars of the Roses scenario, to what degree would I fall back on things which I had read and then slightly re-package them? How would I come up with a creative solution which wasn't also a very unlikely scenario i.e. the scenario had to be both creatively different but also highly plausible? What, in terms of generating non-historical scenarios for historical miniatures games, would I consider to be sufficiently different to be not merely copying? In Norm's test, the differences between his own internet-published games and the AI-generated scenario, seem to be minimal (he didn't link to his own scenario directly)...what would happen when I tried it? In short, I would try to apply the same standard of judgement to AI behaviour and performance as I would to human behaviour and performance.
 
The results are here. Essentially, I asked the LLM (ChatGPT5 - Thinking mode) to create three scenarios. I have included the prompts I used, but to summarize:
 
Scenario One - a Wars of the Roses scenario for about 5000 combatants per side, each with a roughly even chance of winning.
 
Scenario Two - a scenario appropriate for Norm's own Piggy Longton campaign.
 
Scenario Three - a Wars of the Roses scenario, no further guidance. 
 
And then I posted them up as a page on this blog, with no changes and some very light formatting.
 
I am no expert on The Wars of the Roses. I would say I have more than a passing acquaintance, but no expertise at all, so I don't know if these scenarios are minimally repackaged historical battles - there are no battles immediately springing to mind. I also don't know if they are repackaged human-written and published scenarios, although again, nothing immediately sprang to mind, but my knowledge is even more limited here. So, if anyone can see through a thinly disguised historial or imaginative scenario please let me know! The challenge, to be clear, is not to find points of contact, it is to find something which has been basically been copied and very minimally changed - i.e. if a human had written it, you would have considered that human had copied it.
 
Disclaimer: I don't use AI to generate the scenarios I play personally.

14 comments:

  1. Well, this is interesting. My Battle of Solden Hill scenario and battle reports saw an unusual increase in the number of hits...

    ReplyDelete
    Replies
    1. That is interesting. I would not have expected it to look up anything and instead just generate from its training data; I wonder if it did, for some reason. None of the scenarios seem that similar to it at first blush, except perhaps in the structure.

      Delete
  2. I had a read through the scenarios and they look pretty good to me! A lot more detail than any scenarios I have ever come up with I think!

    ReplyDelete
    Replies
    1. Yes, generative AI has a lot to say for itself, if nothing else...

      Delete
  3. Hi John, very interesting. Thanks for reference back to my post.

    If I may, I will limit my comments to your 2nd Scenario as I am very familiar with the subject - ‘making a scenario compatible with the piggy longton setting’. I should say for those who don’t know, that Piggy Longton is my fictitious Imaginations ‘Parish’ and since the story line is unique to my blog, one would expect that for AI to do the task, it would have to visit my blog. When I asked it for A Piggy Longton scenario, it pretty much just lifted one of my scenarios.

    However, the BIG difference here is that the AI is told not to copy and so …… on the face of it, it didn’t copy in anyway that I could detect. If I just found the scenario on the internet, I would not readily see it as being Piggy Longton related at all.

    If I HAD to look for similarities, I would say there are some elements of my ‘Save the Treasury’ scenario (a raid against the treasury, full of the annual tax collection and the need to take several turns to fill the wagons) and also the Attack on Beacon Farm scenario, which involves raiding to steal winter feed, plus it involves a ‘fire’ rule.

    It mentions St. Guthlac Chapel and I have an Osric’s Chapel, but again, that is a bit tenuous - so I would say overall, the AI has worked to the brief of not copying, certainly in any sort of overt way and not really anything I could detect if I came across the scenario in the wild!

    Overall, the scenario gave the type of romp in the kind of place that accords with what I do with Piggy Longton. I thought the scenario instructions made sense as you read them, but by the end of it all, putting that on a table would confuse me, certainly without a map for context.

    ReplyDelete
    Replies
    1. Thanks Norm. I don't imagine the AI would have to visit your blog to generate a Piggy Longton scenario - given the uniqueness of Piggy Longton, it would probably do it from training data. The AI did claim this, although that of course does not make certain (but I also didn't notice any live searches when it was doing so). I think the "Anglo-Saxon sounding chapel" probably is the biggest tell there, although it wouldn't, to my mind, come close to what is normally meant by copying.

      Delete
  4. I did note on reading all three sets that some words stuck out, ‘FICKLE’, ‘FRICTION CARDS’, ‘TEMPO’ and the term “BLOCKS’ for units. I would imagine that someone might recognise them as being peculiar to a narrow range of rules. I know Tempo is a Baccus thing.

    To persistently use the term blocks, when we might expect to see ‘units’ might suggest a narrow source for the scenario creation with regard to that one point i.e. the A.I. Has found blocks in a scenario format and is happy with it..

    ReplyDelete
    Replies
    1. Perhaps yes, blocks struck me as a little peculiar for the period too. Tempo is a Baccus thing, although there have never been any published Wars of the Roses or medieval rules under the Polemos banner, so that would be strange if it were the inspiration...blocks makes me think of Commands and Colors, but I have never played the Medieval version, or Battlelore, so don't know how similar the concepts are...I guess 'blocks' are sometimes used to describe WotR troops, so not entirely surprising the AI has found that in its weights for WotR...I wonder if it would do the same for a Napoleonic scenario? I may try it.

      Delete
  5. That is an interesting experiment. I've seen LLMs generate some pretty decent scenarios, and one chap even posted a very convincing text based battle he'd 'fought' against ChatGPT. Tbh I'm generally fairly dismissive of the "intelligence" aspect of Generative AI, there isn't any intelligence as we would understand it is humans, however like AI encryption, the way they generate output has moved beyond our ability to understand how the algorithms work as have evolved. They really have learned how to do stuff we can't do, and I think trying to figure out how they work via black box experiments is going to be quite challenging. Frankly, the whole thing is terrifying.

    ReplyDelete
    Replies
    1. Yes, there are issues of fundamental process unknowability here. I do a bit more on this for w*rk, but in this context, testing expectations versus output over several iterations is about as good as it gets. What has surprised everyone I guess is the degree to which next word prediction effectively mimics the outputs of (lots of) human thought, although TBF it was equally unknowable at a fundamental level how human intelligence was working and it may well have more in common than weighted next word prediction than we realized (or at least, cared to admit). And yes - a bit terrifying!

      Delete
  6. An interesting use of AI and I did watch the LWTV episode on it at the weekend. A lot of it I thought was easily covered by rules such as 'Rebels & Patriots' by Dan Mersey and Michael Leck, in terms of character generation etc. But then I'm familiar with these rules and not everyone is of course. The scenarios had some interesting ideas, but ones that with some time and thought could easily be created.

    So I see this as possibly useful for those new to a game or period, or for those pressed for time. The maps and character pictures were good and would improve the more you used it, as AI is dumb and 'learns' the more it is used for any topic etc.

    Will I use it? No, it's not for me. Oh and there's the elephant in the room issue of the amound of energy used for each 'search', which is 20X that of a Google one!

    ReplyDelete
    Replies
    1. Thanks Steve. I am instinctively against it: for a hobby, the journey is the destination, and scenario-writing (unlike say painting) isn't so time-intensive that getting a generically useful scenario done is a big saving.
      I don't think I care too much about the energy costs of individual searches versus LLM chats, it is all pretty low (it is a large multiple of a very tiny amount): https://www.sustainabilitybynumbers.com/p/carbon-footprint-chatgpt But the overall energy costs (through training runs and data centre useage) are pretty big, and not entirely convinced they are worth it on a society/civilizational scale...

      Delete
    2. On a Radio 4 programme on AI a month or so ago, they commented upon the fact that one data centre currently being built will use the same amount of electricity as Miami! So yes individual amounts are tiny, but scaled up and it's a bit of a shocker.

      Delete
    3. Yes, exactly so. Individual use doesn't really matter, it is your society having it which makes the difference. Same with cryptocurrencies.

      Delete