Introduction

I knew when I first saw Marcus Desai’s post about their behaviour-driven anti-cheating product for gaming, that he would be a great first conversation for archi.tech. His recently formed company, CapsuleAI, with co-founder Phil Marsh, must be doing some fascinating things, in an area that has captured the hearts and minds (and fears) of the world since the release of ChatGPT. I wanted to get the inside track on what value CapsuleAI is looking to realize for its clients, understand more about cheating in the world of gaming and esports, and get some insight into how their Orca product helps to level the battlefield in the war against the commercial cheating industry.

Introduce Yourself

I studied philosophy at university. Whilst I was not enjoying my masters course, I started to pick up programming. I felt like if you had ideas, then you could make them concrete through programming.

After university, I got a role as a Machine Learning Engineer at Napier.ai. We had some great data scientists in our team, and I quickly learned that I have more of an engineering bias than a data science bias. It also became clear to me that if you want to make software -well- your whole organisation has to be structured to that end.

More recently, I was in a life position where I could take a chance. I knew there were better ways to deliver data science, so I created a new company with a friend, CapsuleAI, to make data science work.

Marcus Desai

Why have you decided to create CapsuleAI?

When you look at data science in an organisation, it often resides in a silo, at the end of the data chain. It is an afterthought being bolted onto a solution and an organisation. Engineers build the solution with little knowledge of, or respect for the data they are collecting, or the questions the organisation is looking to answer. Typically, the data they collect is operational data that serves to diagnose issues; behavioural data has never been considered. Often the organisation doesn’t know what questions it wants to answer. This leads to friction when trying to embed data science.

Watch the video for this section

Historically, software engineering and Information Technology has been challenging to embed into organisations. There are large differences between the business and technical domains, with little appreciation or understanding of each other’s worlds, and domain specific languages. Information gets lost in translation between the domains. Technical implementations of business solutions are complicated and hard to specify in an upfront design, which has led to different styles of Agile project delivery. And, even after decades of practical experience and evolution, it still feels like software engineering and Information Technology is in its infancy.

Organisations are now taking emerging, far more complex ideas, technologies, and processes, and a more science led approach, and trying to embed them. This is exceptionally challenging. Even hiring data scientists is difficult. Organisations can’t do this effectively as they don’t understand the domain or even the language to evaluate applicants. Moreover, organisations that have successfully embedded engineering and Information Technology can still struggle to make their data science team into a collaborative partner. There is commonality between them in that code is code, but engineering and data science approach work in different ways – as the name suggests.

In 10 or 20 years we may have a language for how we understand and describe data science. But, at the moment, data science is just so new it is really difficult to do well; to deliver genuine value. The current ecosystem is evolving in different directions: different technologies; different approaches; different beliefs. Over time, some of these directions will evolve to be clear winners, but we don’t have the patterns and principles like those that have emerged in the software engineering domain. There is currently a huge mess.

What does it mean to be scientific about your data? Be as rigorous in collecting your data as a traditional scientist would be in their field of expertise.

In reality, it isn’t too difficult to take a data set or model and start to apply data science principles and technologies to it. Kaggle has helped people to do this, and organisations have convinced themselves they are competent and are being productive with their operational data. What is hidden is how difficult it is to do this properly:

  • What data have you collected and how valuable is it?
  • Where is this data and how can you get access to it?
  • How clean is it and how do you improve it, standardise it and combine it?
  • What questions or problem are you looking to answer and what amenable solutions are there?
  • Do you want to go down the route of manual embeddings or simply throw everything into a black box?
  • Do you build a model and a model architecture?
  • How do you train it?
  • How do you make it repeatable?
  • How do you make it reliable?
  • How do you run it at scale?

What if you created a team that is simply really good at data science? You could centralise and contain all of the specialist, scientific capabilities required to answer the questions listed above. You could create a boundary within which the majority of the data science domain expertise, language, technologies, and processes are concealed. Using this approach, embedding data science into the organisation becomes far easier, as all you need to focus your attention on is becoming really good at understanding business problems and communicating across the domain boundaries. The data science team can own all of the complexity around the data and the operational solution.

This is why we formed CapsuleAI.  At CapusleAI, we offer data science as a neatly packaged service that can be consumed directly and easily. There is no need to hire and embed your own data science team, with all of the complexity and risk this involves. We simply take care of it for you. We focus on your business problems and your data and then develop the correct technology and approaches. One of the first applications of our service has been developing anti-cheat for gaming.

What advice would you give to organisations looking to adopt a world of data science?

Traditionally, data insight work has been retrospective and based on collect and hope. Having data is way better than not having data, but most of the data that has historically been collected has been operational: logs, auditing, and telemetry records. If you randomly collect data, it might be useful, but it is unlikely it will fit into the format you need when you realise what you need it for.

As a principle, you should always have a mission for your organisation. When you know your mission, you can establish the questions you want to ask from your data. You can work backwards from there.  You can think about the data you need to collect up-front and identify the systems and journeys where you will collect it. To obtain valuable results the organisation requires a mentality to do it properly. You can then communicate a clear message to a data scientist: I want to recognise these kinds of behaviours that are related to my organisation. Then you can be scientific about your data.

Watch the video for this section

What does it mean to be scientific about your data? Be as rigorous in collecting your data as a traditional scientist would be in their field of expertise. First of all, you need to collect the data. Just accept you are going to get dirty data. Have a data scientist on staff and make it their problem, they specialise in cleaning, standardising, and joining data. They can also define the context and meaning of the data in preparation for machine learning algorithms.

For organisations that aren’t going to use a Large Language Model (LLM) for their data, the general approach has been to collect every data point you can. However, the more scientifically you treat the process for collecting data, the better results you will get. Your model speaks the data.

If you want to label behavioural data, the label needs to be as clean as possible, and you need confidence that you are labelling the right behaviour. Again, this is bread and butter for a data scientist.

If you are doing classification of data, for example, clearly labelling your data to create distinctions between the elements, means your classifier is highly likely to find them.

Alternatively, if your data fits a transformer model, like ChatGPT, there is no labelling of the data and no real need to clean it up. From ChatGPT’s point of view, there is no wrong data, it is data from people just talking validly on the Internet. Whilst this may seem ideal, this kind of data can be difficult to work with as it is open and unmoderated and can be biased towards negative language and beliefs – which are presented from the model. Whilst preparation of the data isn’t a challenge for ChatGPT, it faces significant issues in creating arbitrary technical solutions for negative bias outside of the data.

If you are looking to adopt a world of data science, it isn’t something you can just blindly wander into. You need support from genuine data scientists that can place data at the heart of your organisation and can prepare and analyze it to answer your questions.

Behavioural anti-cheating for gaming

Cheating in gaming is rampant. It isn’t easy to cheat but there is a niche industry that has grown to support cheating in gaming. You can now just buy relatively expensive cheats, for say $100, but you can even pay for cheating subscription services. So, the ability to cheat is open to everybody and the more you are prepared to pay, the better you can cheat.

Watch the video for this section

The problem for games companies is detecting and preventing cheats is like plugging holes on a ship. There is a cat and mouse game between cheat developers and games developers where the cheat developers have the upper-hand – they are simply looking for innovative ways to exploit existing holes. The games companies are on a treadmill. They are constantly trying to detect what cheats are being applied and develop specific fixes to defend against them. The games developers are inventive too though. For example, they have spoofed specific bits of game memory on a machine to create a honeypot. Players that access that memory are known to be cheating and are banned.

But, cheats are becoming increasingly advanced. Off-device cheats, for example, are increasing in prevalence. A hardware dongle or even a second machine can be used to cheat. Imagine streaming HDMI content through a second machine, using computer vision to analyze where another player is in real-time and sending this information back to an input device – it can then automate moving the cursor. For traditional anti-cheats that sit on the players device, it simply can’t detect this approach. However, if your behaviour is cheating, that can’t be hidden. If you look like you are cheating at a machine level, you are cheating.

CapsuleAI is looking at the problem completely differently. To be able to cheat in a game, you have to be better than a human in some way. If you have an aimbot, the behaviour of the cursor moving, the speed and precision of movement, will be superhuman, it will fall outside the bounds of normal human behaviour. If you can see through walls, you have the ability to behave outside the bounds of a human that cannot see through walls. Cheating, behaviour is distinguishable from normal behaviour as you are gaining an advantage. Looking at behaviour moves away from looking at the implementation of cheats. There is always going to be another way to cheat, to change the implementation, but your behaviour will still be superhuman.

Huge global business, like Epic have games like Fortnite that are free to play. Revenue is generated by live services, in-game purchases etc. When their revenue is directly proportional to their player-base, they are 100% exposed to cheating. Cheating has the effect of making people stop playing your game.

The gaming industry needs to face up to these challenges, where technical evolution is providing new opportunities to cheat. For an industry that is financially larger than the film industry, it is becoming increasingly reputationally damaging for them. If we consider the rise of esports, and the celebrity and purses that are presented to winners of the largest tournaments, there are huge incentives for successful cheaters. Where you think it might be impossible to cheat in an esports arena, there have already been cases where competitors have been found to have physical cheat devices on their person in competitions.

there is reputational damage to the industry and gamers need to consider how they provide trust; this is similar to how physical sports tries to build trust with anti-doping

One of the most popular esports is based around the official Formula 1 game. Similar to Formula 1, its esports equivalent has a professional championship and esports competitors have even competed in real-time, on the same tracks, whilst the F1 races are taking place. F1 esports is tightly integrated into the wider industry around Formula 1. F1 has been heavily promoting the game and it is a reputational risk for F1. Very recently there has been huge controversy within the topflight of F1 esports with numerous players making allegations that 70% of top-level players are cheating. This is really poisonous for F1, and it is seriously undermining the trust and reputation of the game. With F1 being so heavily linked, it is affecting the wider industry too. Regardless of whether the allegations are true, there is reputational damage to the industry and gamers need to consider how they provide trust; this is similar to how physical sports tries to build trust with anti-doping.

Whilst behavioural analysis can help to mitigate and prevent some of these cheats, and help to establish trust, similar data science led techniques can also be used to help cheat. Computer vision and machine learning could be increasingly prolific in future cheats. Games are really amenable to machine learning. They are bounded spaces where everything is quantized. For example, if you wanted to make self-driving cars in Grand Theft Auto, it would be far simpler than the real world.

As long as it is economically viable for cheat developers to develop new chats, why wouldn’t they? They are working hard to deliver a product ahead of what game developers are doing to protect their games. In some respects, the cheat developers have a clearer vision than the game developers of the data they need to collect and how they need to analyze it – you could say they are outperforming the game makers at data science.

Videos from the conversation with Marcus

Introduce CapsuleAI

Advice for organisations looking to adopt data science

Behavioural antt-cheating for gaming