>Top

Gamifying Toxic Yuri: An Offsuit Pair Post-Mortem
Our design goals: The Big Picture
- VNs and [mini]games
The game was going to be rigged from the start.
Implementing a rigged game
- Developing the **** Engine
- Integrating **** Engine configuration with story elements
A minor aside on writing the game’s AI
Conclusion
Footnotes

Gamifying Toxic Yuri: An Offsuit Pair Post-Mortem

5176 words, 25 minute readtime. last modified 2025/08/15

I think I’d like to start this off with saying that I am very, very happy with how Offsuit Pair came out. I knew that it was a fairly ambitious project to complete in 1.5 months (and with a team of two people, no less), but the Toxic Yuri VN jam wasn’t my first rodeo and I knew that I had a strong grasp of project management… even if the scale of what I wanted to achieve was a lot larger than any other VN I’d worked on before.

Around a month before the jam started, Mozart and I hashed out a high-level outline of what we wanted Offsuit Pair to be, complete with a handful of stretch goals that we both thought would be really cool (but ultimately not necessary) to include in the VN. (In these outlining sessions, I also said that I didn’t want to program a cardgame where I would have to implement a proper AI. Hah. Funny how that goes.) While it took a bit of crunch to get the VN done by the deadline, we did succeed in implementing every stretch goal that we proposed, and I think our work was a lot stronger for it– something that I hope anyone who played all the way through to the 5th ending (the final stretch goal) can attest to.

That said, the one substantial bug that found its way into our initial release made it there because we were implementing so much of the VN script at the last second. Lesson learned: do your bugtesting runthroughs in the player build, not the livereload dev copy.

I still have a lot to learn, but I think the skill to correctly gauge scope is the seasoning that comes with experience.

My role in the creation of this game was primarily as programmer and game designer, so that is what I will focus on in this writeup.

Our design goals: The Big Picture

If toxic yuri is about manipulative and fraught relationships between women, we wanted to create a manipulative and fraught relationship between our player and our game. There are many ways to go about doing this in a visual novel, such as utilizing literature devices like the unreliable narrator, but we wanted to make an experience unique to the interactive form of videogames. I recently read all of FKMT’s manga, so in particular I was drawn to making something in the same vein as his work– a digital cardgame that drew the rules of the game itself into question.

We decided that we wanted to illicit strong feelings from the player by putting them in the player character’s shoes and challenging them through the mechanics of the card game. In any kind of game, be it a videogame, boardgame, or cardgame, there is a (spoken or unspoken) promise to follow rules so that the game can be fair and balanced; How do you react when these rules aren’t followed?

We started from this core premise for Offsuit Pair and worked backwards to design a game, characters, and framing narrative to support it.

One of our first concerns was what the game itself was going to be. We ended up defining a list of criteria that we wanted from our prospective game:

Played between two players, to make the experience more “intimate”.
Something with a moderate amount of depth, so that a minimum level of thought would be required from a reasonably invested player (contrast something like say, War, which is effectively a game of pure chance).
Involves some level of hidden information, both for game mechanic reasons and to help facilitate layers of deception between the two players.

Even with these requirements, this left us with a lot of options. Our next approach was to switch gears and ask what kind of game would support the themes of the story that we wanted to tell? Even with a lot of the narrative up in the air at this point, there was this hazy idea about two masc-presenting women in the 1920s, which got us thinking about what kind of behavior is expected of women in such a position.

This ultimately led us to make the game about poker, a game where player strategy revolves less around card efficiency and statistics and more around reading your opponent and discerning the value of their hand. In essence, a kind of communication between players that happens without speaking. What a good fit! We wanted a game about manipulation, and this one has lying baked right into the mechanics!¹

Poker is, somewhat famously, the game about reading your opponent’s behavior.

Looking back at the final script, this decision to make the game about poker worked out in more than one way: Communication (or lack thereof) ended up being one of the major conflicts; Over the course of Offsuit Pair, conversations between the two leads overwhelmingly occurred during poker games.

These women are fundamentally only interested in communicating with each other if it’s a part of a larger game.

Verbal communication in offsuit pair usually happens while playing poker. For comparison, 12.6k/18.7k of the game’s total script occurs outside of game segments.

The poker game-inside-a-game isn’t just a time-killer or minigame: it is central to delivering the themes of the visual novel as a whole.

VNs and [mini]games

Now, this segways into one of the first design questions I had in the process of creating Offsuit Pair, something that I would not have a satisfactory answer for until nearly the very end of the dev process.

Unlike a conventional visual novel, binding story so tightly to gameplay runs the risk of gatekeeping later parts of the game from players who don’t have the skillset needed to advance it. How do you accommodate for two extremes of players, the visual novel reader who has no experience nor interest in cardgames, and the player who has played poker extensively and is willing and able to engage in-depth with game mechanics?

The first category of player can be broken down further into two subgroups: the minigame hater and the cardgame intrigued.

Put simply, there is no appeasing the minigame hater: they do not care if the game is a core part of the novel itself, they categorically dislike it. Offsuit Pair is not going to be the visual novel for these players, and that’s fine.

The second subgroup, however, is worth taking into consideration when designing game scenarios. While the cardgame-intrigued may not have the technical background needed to understand subtler game mechanics, they are still willing to engage with the game in good faith, as long as they are not difficulty-gated out of participation nor “held hostage” away from the story for too long.

Now that we have clearly defined the novice group that we want to cater to, let’s break down how their typical playstyle might differ from an experienced player.

The novice is not fully aware of anything in the game that’s not in front of them. They have a decent grasp of the potential value of their hand, but they likely don’t understand just how ‘common’ or ‘rare’ it is compared to other hands (the reference chart we provided was meant to address both of these facts). They are likely to take a more “active” approach in each round because they want to feel like they are ~~pressing buttons~~ doing something productive to win² (this is also part of why I initially proposed the “swap” mechanic– more buttons good). The combination of these facts means that they are likely to take significant risks, something that might occasionally pay off but on average will lead to significant losses.
The expert player, in comparison, is much more patient. Watch any streamed poker tournament and you’ll see that very few hands make it to the betting round. It doesn’t make sense for any player (who isn’t forced to bet the blind) to put money on a hand unless their cards are above-average– which means at the maximum this player would only go in on half of all hands (in reality, expert players tend to be even more picky than that.) If an experienced player starts making bets on a hand, then they think that hand is going to win, not just “oh, maybe this hand has some gas…”. They fold early and they fold often, on average seeing many more hands than the novice player. The flipside of this, of course, is that they also have a good grasp on the odds of their opponent having a better hand than theirs is, and they know when to capitalize on it.

By contrasting both of these behaviors, I found the solution to this problem: balance the betting such that the initial buy-in is very small³, but has the ability to explode and wipe out around half of a player’s funds if raised to the maximum amount. As a result, the novice player will likely lose in 2-5 hands, whereas the expert player will see dozens of hands. Combining this with a best-of-three structure means that the novice is guaranteed to see all scripted story content (concentrated in the first three hands of the first 2 of 3 games), while the expert player will get a much stronger sense of just how exactly the game has been stacked.

The remainder of this post will get into significant mechanic and plot spoilers.

The game was going to be rigged from the start.

It is time to address the aggressively ignored elephant in the room: The poker games in Offsuit Pair are not fair. In most cases, they are aggressively tilted against the player (which is, naturally, what makes the game toxic yuri).

When first discussing this game with Mozart, we very quickly settled into a paradigm of “soft”/undetectable cheating and “hard”/obvious cheating. “Soft” cheating should make the game feel off and slightly unfair, but not in any way that player can confidently identify. “Hard” cheating, in comparison, is something that is immediately identifiable as breaking the standard rules of the game, even to a novice player. (At this point there weren’t any implementation specifics, because at the time we weren’t quite sure what the game itself was going to be.)

We also knew that we wanted everything that occurs in the game to be consistent with the motives and emotions of the characters involved. This very quickly shaped our direction for Audrey, because this meant that she had to be finely tweaking the results of the game by working around the rules, but not so that she would win the game. Audrey wants to play a game, but that game was never the card game she presented it as in the first place.

As for our other player, she would have to be someone who had a complicated relationship with playing around the rules like this, because her actions would hinge on what choices the player made in-game. Working backwards from this pile of seemingly-contradictory actions is how we got Betty, someone who could be highly-retaliatory and also has her own penchant for cheating as well.

As to how the games themselves were structured, we leaned into soft cheating a lot in early rounds in order to make the game feel a little skewed without giving the player any concrete evidence that something is wrong.⁴ We thought that, maybe, if the players were pushed enough into feeling that the game was unfair, they would feel more justified in ‘striking back’ later.

This tension in the player would all come to a head at the first instance of “hard” cheating, which was implemented as the singular hard-scripted hand in the game. This moment is climax of the story, where any ambiguity of the earlier games is resolved and the player must make a decision that will lock in their ending.

The [kinda] titular hand, as played by pltn. Clip used with permission.

We hoped that, for some players, this would create a feeling of “wait, was Audrey really fucking with me the whole time?”

Implementing a rigged game

While we started out with this top-down design outline for the game, it was several weeks into the jam before I sat down and figured out how to turn these abstract ideas into code. In particular, it took a bit to really settle on an acceptable model for “soft cheating”.

I had a breakthrough when I realized something. Because the poker game that is being rigged is an emulation of a physical one, this put us in a somewhat unique position as game designers:

The player expects the game to function in the same way that an analog poker game would; Any form of “cheating” should match up to actual irl paradigms, such as stacking the deck or marking cards.
Because the game is digital, it is not bound by the same rules as one played with cardstock. We can fudge the game by gently placing our finger on the scale of expected outcomes, as opposed to the brute-force methods that our players might expect.

Ruminating on this led to the design of the core of our digital cheating paradigm: the Luck Engine.

Developing the **** Engine

What is luck, really? There’s pragmatic definitions like “when favored but improbable events happen”, or psychology-based terms rooted in how people perceive probability, but if you ask me these make a lot more sense to apply to a game than to a narrative. In a narrative, the fixed flow of events from introduction to conclusion means that there are no actual chance elements involved. In a story, “luck” is the alteration of non-fixed events as needed by the author to drive narrative tension.

The Washizu arc of Akagi asks whether skill can overcome luck by pitting the incredibly talented Akagi against the unfathomably lucky Washizu in a life-or-death game mahjong. The game lasts the whole night (and took 20 real-world years to finish publication) and the tension between these two forces is tight the entire time.
There is a lot I could say about the role of luck in Akagi, but that’s another essay for another day.

Working from this concept, it made sense to consider “luck” to be a hidden stat for both players in the game, something that swings the cards into or away from their favor. This value is altered as needed to drive the intended narrative feel for that part of the game.

As for the mechanic details of how this works, we can break down what the Luck Engine does into a succinct list:

The top card of the deck is decided upon right before it is dealt.
Both players of the game have a hidden “luck” value that will affect what this card is.
A positive luck value means that player is lucky, a negative value means that player is unlucky, and a luck value of zero means that player will receive undoctored draws.
When dealing both player’s hole cards:
- A lucky player is more likely to get high-rank cards and more likely to have the suit of their two cards match.
- An unlucky player is more likely to draw low-rank cards and have the suits of their two cards not match.
When dealing the flop (and burned cards):
- Instead of looking at a single player’s luck value, the difference between the two luck values is taken.
- The favored player is more likely to have flop cards that match the suit of their hole cards, and that are ranks below, equal to, or above their hole cards. Cards that do not favor them are more likely to be burnt.
- The disfavored player is more likely to have burn cards that match the suit of their hole cards, and that are ranks below, equal to, or above their hole cards. Cards that do favor them are less likely to be dealt to the flop.
All luck effects are multiplicative.

With this outline in place, the next step was to make a minimal implementation and start testing. I very quickly settled on a method where each card was given a weight value, which was then passed to the draw function to make a (weighted) random choice about what card to draw, so I wouldn’t have to worry about making sure all my probabilities added to 1 or anything.⁵

I then went through and tuned each luck-based cofactor until the results seemed about “right” for the scale of luck values that I intended to use (which was from -20 to 20).

Legend is player luck value. The first coefficients that were decided upon were for the hole cards, since anything that has to do with the flop will have to build off of this.

For the more technically inclined, the exact formulas used here are as follows.

Finding weight w for a card of rank r and, for the second card, suits s1 and s2 for a given player luck l.

Next was going to be the hole cards, which required a more involved testsuite. The way that a lucky player was more likely to draw into cards that were one value below, equal to, or one value above their hole cards certainly made them much more likely to get a higher-value hand like a straight or a full house, but it was important to tune these values so that one type of hand didn’t unreasonably dominate the others. (And indeed, the preliminary set of values that I chose made lucky players have flushes far more often than not.)

I opted to implement a Monte Carlo simulation that would deal the entire game and evaluate the winner, and then repeat this for 1000 times for every test condition. The results of these trials are plotted below. (Aside: The fact that I didn’t have to simulate any kind of player strategy here was definitely a nice perk of choosing the game of poker.)

The final Luck Engine configuration causes a player with a luck score 10 points lower than their opponent’s to win in around one out of ten hands, which is a satisfying way for the numbers to scale.

Note how a very low luck player is more likely to draw into pairs than in a completely neutral game: this is because their opponent is being favored– the unlucky player’s hand is based on a pair of hole cards that are giving their opponent a three-of-a-kind or a full house.

And, once again, the exact formulas used here for the flop card weight values are as follows.

The weight *w*<sub>r,s</sub> for a potential flop card is calculated from the combined effects of the rank weight and suit weight for both player's *f*unctional luck values and hands.

The weight w_r,s for a potential flop card is calculated from the combined effects of the rank weight and suit weight for both player’s functional luck values and hands.

Paying close attention to these rules, astute readers may notice that there are a few key “holes” in the way that this system works, namely that:

The swapped cards are not touched upon at all; They use the same rules as dealing the initial hole cards. As a result, it is possible to use this mechanic in a game where the player is disfavored in order to get new hole cards after the flop has been dealt, so that it is a lot more likely to have hole and flop cards that line up.
It is possible (albeit improbable) that both players will be dealt cards of identical ranks or suits. In this case, the unlucky player will benefit from the same flop cards that benefit the other player.

I considered adding more rules to the luck system to prevent these exploits, but ultimately decided against it. Aside from the fact that this was made under the time-crunch of a game jam and I didn’t want to add yet another layer of complexity, any player that could intuit the way the cards were being manipulated well enough to leverage this has shown more than enough game mastery to be justified in being able to beat the system.⁶

Integrating **** Engine configuration with story elements

With the internals of the luck engine laid out, the next step was to start integrating it to our story beats. Feel free to take a peek inside gamemodes.json alongside this section if you want to.

An early production flowchart showing some gamestate variables and how the player ends up on different routes. (Apologies for my handwriting.)

The very first game is actually completely fair. This is primarily to let the player get familiar with the game, and partially because, in many cases, the first three hands of this are going to be the tutorial anyways.
The second and third games on the first day is where things start to get stacked against the player. Betty has a luck of zero while Audrey has minor luck (5); Outside some fixed events, the player will only be able to win around one in five hands.
Should the player take the cheating events on the first day and end up on the domineering route, then they will have a significant advantage in the day two games: the player’s luck value is at least 10 points higher than Audrey’s (and this difference gets larger as the best-of-three goes on). This is alluded to within the text in an aside between matches, where Betty admits that she’s deliberately stacking the cards in her favor. If Audrey had wanted to unilaterally crush Betty, this is what she would have been doing the whole time.
Otherwise, at the beginning of the second game the player will be served the (earlier-discussed) fixed hand. Depending on how the player responds, they will be placed on one of three routes.
Choosing to say nothing will put the player on the pushover route. Because the player is just laying down and taking everything Betty throws at her, the player is given a luck of negative five, while Audrey has a luck of two. These will become negative ten and ten, respectively, if the player somehow makes it to the third game.
Choosing to call Audrey out will result in the gaslight route. The luck values here are meant to gaslight the player just as much as Audrey is: the player has what would normally be extraordinary luck (8), while Audrey has an even higher luck (15). The result behavior here is that the player will start out with a pair of hole cards that appear to be strong (and would be, in an ordinary poker game), but when the flop is dealt it will frustratingly not line up; Audrey’s significantly higher luck value means that all the flop cards will be more likely to form pairs, straights, and flushes with her hole cards, while cards that favor Betty are more likely to end up in the burn pile.
Taking the last option to retaliate will result in the toxic4toxic route, where Audrey has a very high luck and Betty has a luck of zero. This is where things start to move away from the paradigm of passive luck, because starting in the second round the player has access to a second deck that gives her access to the ability to, well, stack the deck. Choosing to swap cards from the second deck to the top of the table deck will completely disable the luck-based randomization of drawn cards until there are no more second-deck cards remaining on top of the deck. Notably, this can be used to completely stack the flop to something that matches the player’s hole cards– although this may backfire, because if Audrey chooses to use her swap (something that has been purely cosmetic until this point), it will shuffle the deck and effectively un-stack the player’s cards.
As for the happy ending route, referred to as “NG+” in-engine, things start where the toxic4toxic route left off and accelerate from there. Not only does Betty start with a second deck and Audrey start with high luck, the behavior of when and how Audrey swaps cards is much more aggressive. Previously, the decision for Audrey to swap cards was completely random (albeit strongly weighted towards the first opportunity to do so), but now she is far more likely to do this if the player has just swapped cards– which makes it very risky to try and stack the deck. On top of this, NG+ also adds another weapon to Audrey’s arsenal: the ability to mint new cards. Any time she “swaps cards”, she will add cards to the deck that favor her and remove cards that favor the player.

The number of cards that Audrey can create this way is increased in the second NG+ game, leading to extraordinary boardstates.

A minor aside on writing the game’s AI

I do not think this is terribly interesting from a design perspective, but I spent so much damn time fretting over this that I figure I should at least write a little blurb about it.

In my initial research in figuring out how to handle the game’s AI, I read a lot about counterfactual regret minimization … which I didn’t end up using at all, because it’s far too computationally heavy to expect to work on a client PC (or even in a game simulation with a full 52 card deck).⁷

Instead, I went with a much more brute-force approach: whenever the AI needs to make a decision, it runs a quick Monte Carlo simulation to get an estimate of how likely it is to win from the given boardstate. (Note: the AI normally does not look at the player’s hand when running these simulations, but it may take a peek depending on the story state.) From there the action utility of raising, calling, and folding are calculated and then run through a function to account for how “aggressive” the AI should be for the current story state. Then, a random action with positive utility is selected by making a weighted choice over the action weights.

I settled on this because it was fast (even on low-end machines), and playing against this model felt realistic enough; In my testing, I frequently found myself doubting what actions I should take given what the AI was doing.

I think part of why this model works here is because the games are unfair and the AI is aware of this fact– accounting for “luck” when running the predictive simulations restricts the possibility-space of future boardstates enough that a reasonable estimate can be obtained, even before the flop is revealed. In a fully undoctored game, it’s much more difficult to make precise estimations of who will win from incomplete information.

Conclusion

This document has been around a month in the making (admittedly between structural updates to my website), and I think that it finally covers everything that I really wanted to show off under the hood. I hope that this essay on my part in the game design process was interesting to you, regardless of your level of background knowledge.

As I was developing this game, I spent a lot of time wondering how it would be received– toying with player expectation is a core part of the game’s premise, and for it to work it requires that the player buys in to the poker game itself. Was this game going to actually stick the landing, or would it flop? I didn’t get to enlist playtesters like I’d hoped to, so this question weighed heavily on my mind. I was an absolute wreck of anticipation for a day after releasing the game.

When the first few comments rolled in after a day or two, they were all so incredibly kind and positive. I was absolutely thrilled. The Toxic Yuri VN Jam community has truly been wonderful to be a member of, and I am proud to do my part in contributing to it as well.

If you’ve made it this far, you clearly have some desire for knowledge about how things work– never stop asking questions.

Thanks for taking the time to read this essay and for your interest in Offsuit Pair.

Footnotes

Now, I’ll admit that the initial decision for Texas Hold’em over traditional 5 card poker was slightly arbitrary, as it largely came down to thinking that the boardstate would be a lot more appealing to look at. ↩
I fear this may sound derogatory, but I mean this statement with love. If you’ve ever played an online mahjong game in a low-tier room, you probably know exactly the kind of “button presser” behavior I’m talking about here. Pon! ↩
Half the time the minimum wager is literally nothing– this isn’t necessarily standard for two player poker, so I explicitly highlighted it in the tutorial. ↩
Of course, if you push this far enough, it will become patently clear that something is afoot. We wanted to run a survey on players to find where exactly the sweet spot for this is, but the testing form we sent out didn’t get enough responses in time to arrive at any meaningful conclusion. ↩
The more technically inclined may recognize this as a kind of transfer function. ↩
And indeed, I was able to use some these tricks to beat one of the games that was intended to be “unwinnable”, and this was rather late into testing. While I didn’t make any changes to the engine itself because of this, I did make sure to implement every match in all possible branching paths of best-of-threes. ↩
I still really like this on a conceptual level, expect a small game about this soon™ ↩