Mapping Destruction in EVE Online

Published in

Quickbird

13 min readAug 7, 2016

Visualisation on “Galaxy Map” of the game. Each circle is a solar system, with planets, asteroids, etc. Their size is proportional to number of ships that were destroyed there

Recently I stumbled upon the files of final year project that I did for my degree back in 2014, and decided to write a post about it, at least what I can still remember.

In my final year project I had the freedom to choose any topic I liked, that turned out to be a mixed blessing. After coming up with countless stupid ideas, I decided to investigate an online game I was addicted to as a teenager.

The goal was to see if I can use machine-learning to categorise millions of data points of player behaviour, and then create maps that visualise it, like the one above. I shared the maps back with the player community, which sprung into lively discussion thread on the forum.

This article discusses how I went about the process, but you will find results, links to the forum, a Github repo and more at the end of the article.

The game

The suspect is Eve Online, it’s a sci-fi MMORPG set in space, considered the most hard-core game of it’s kind. Unlike World of Warcraft, second job is the minimum level of dedication you need to make meaningful progress here. That’s because issues most games design around in Eve are called features!

Everything that gets blow up is gone forever. So, if a player is attacked, and their ship is destroyed, they have to go buy another one. You Space-stations and possessions get attacked while you are asleep, or on holiday.
There are no ‘separate servers’. All players inhabit the same shard, or instance of game world.
For every ship, gun or bullet that’s bough, there is a player somewhere that had to manufacture it first. And mine the ore required. Items are traded on a free market, affected by supply and demand.
The game has glorious amount of scams, griefing and market manipulation. Even if you care nothing for games, take a look and get a feel for the kind of shit that goes on in this game
Epic Scams — https://www.engadget.com/2012/10/28/eve-evolved-top-ten-ganks-scams-heists-and-events/
Everyday Scams — https://www.themittani.com/features/new-players-guide-eves-most-common-scams

So every death creates a lot of drama for the player, affects the markets and sometimes in-game politics. Being able to tell circumstances of death for every player, with a good degree of accuracy would reveal a lot of information about the game world. Which shipping routes are the most dangerous? Where do most new players die? How many suicide attacks succeed? (Yes, there are suicide attacks.) I would be able to create maps!

The review is full of jargon, but should give an idea for what goes on here

When one player kills another, they receive a log which lists participants, the victim and what was destroyed. They brag about their achievements publically on community websites called killboards, like https://zkillboard.com , and some offer API access. This abundance of data made Eve attractive for data mining and for my project.

Data

I wrote a simple C# program (source on github, link below) to collect 10,000,000 records of player deaths from https://zkillboard.com ‘s REST API, and chuck all of that into a MSSQL server. At the time my knowledge of C# was minimal, but I found enough example code to cobble together a working program

Once I had enough dead players, I wanted to give that data a bit of meaning. You see, the records only contain names of ships and equipment used in the battle. I wanted to provide machine learning algorithms with characteristics they could chew on, such as speed, armour, and number of guns on the ships players used.
All of this game data is published by CCP as a Static Data Export, which is basically a dump from their MSSQL server. It can be found here https://developers.eveonline.com/resource/resources

Before taking on this project I've never seen a database, but after a few days I could write SQL queries and manipulate tables. I didn’t know what the hell is a clustered index, but it didn't matter. I have built tables that summarised, what was in my view, the most important aspects of the victim’s ship and the attacker’s ships in about 40 rows.

Classification

An experienced player can tell a lot from the death-report, the choice of modules fitted on their ship, the attacker’s ships, organisation the victim and the attackers belong to and the location where they died. This knowledge is subtle — a similar combination of modules might serve different purposes on different ships, organisations have different reputations and control territory, etc.

I wanted to classify two things:

Activity the victim was doing at the time of death
The situation he ended up in

For example

The victim was mining ore and died in a suicide attack
The victim was farming NPCs and was killed by pirates
Victim died in a massive fleet battle

I tried to organise categories used colloquially used by Eve players to describe these. To do supervised classification I needed to create a trailing dataset. At the time I decided that it would be ‘unscientific’ to create such a dataset by classifying player deaths myself. Instead I opted for crowd-sourcing. So I made a web-site where players could see a kill-report from a killboard and categorise it in one way or another.

The black part is the Iframe, which leads to ZKillboard. The buttons above are part of my website. Yes, epic fail is a legitimate category!

This was a superbly primitive PhP script that read a list of death ID’s from a csv file and randomly chose one to display to the user. It then saved whatever user chose into a different CSV file. I flew around the game spamming the local chat to get the players to fill in the survey, and offered a bit of in-game currency for doing so. Then some players decided to kill me to clean up the channel. After the dust settled I had a bit over 1052 responses for ~700 unique deaths (some deaths were categorised multiple times) from ~20 players.

All categories I had originally, with original names.

As it turned out there is a lot of overlap between the terms players use and some aren’t actually suitable for categorisation. I had discussions with the players how some categories where not distinguishable. When I dug into the data I could see categories that were used interchangeably. So I cleaned up the data by merging interchangeable categories and removing the useless ones.

In writing this article, I tried to give these categories names that make sense if you never played Eve online — you can see new names in the chart below. That’s something I didn’t realise I should do in my report, instead I added a glossary and let the professors figure it out. In fact, I spent most of my project getting the results, visualisations and improving accuracy by another 2%. I’ve put way too little effort into writing up what I did in a comprehensible manner — I’ve spent more time writing this article.

But I digress — the first thing I did was to asses how good players are at categorising these things. Only a small fraction of death reports are categorised multiple times, so the ‘Agreement’ statistics is rubbish for some categories. The chart below demonstrates player agreement.

Below each catergory is the number of records that the players cross-referenced

‘Rot. Forest’ is the results I got from automatic classification. I placed player’s agreement and f-score produced by the classifier on the same scale— they are not directly comparable, but it does give you an idea for what’s going on.

In most categories we have pretty good results, an I went on to produce maps — see ‘Results’ section. ‘PvP — E-war’ category is an exception. While there are dedicated ships for E-war, players often fit some E-War modules on random combat ship, and that confuses the classifier.

Also, ‘Cyno’ is a group unique to Eve Online, again see results section for an explanation.

Classifiers

the struggles

I started off using Weka, it has a lovely, simple interface and many-many classifiers. Unfortunately I burnt way too much midnight oil dealing with it’s refusal to read a simple CSV file. Then I realised that I could read data directly from MSSQL server, but again Weka would throw a bitch fit with astounding regularity.

Then I found Knime, which, like Weka, is also written in Java, and it reads files and connect to SQL server without any bullshit. But Knime doesn’t have nearly as many nice classifiers, so I found Weka plugins for Knime, and those worked fine, even if not all of them. So now that’s I have completed this inception gymnastics, I could finally get to work.

Useful bit

I have split up the responses into a training and verification sets, roughly 4:1, and gave several classifiers a go. Then I looked at the ones that gave best results, and chose one I could understand. That was rotation forest— it produces a decision tree, but unlike ‘normal’ decision tree, this one imagines that each column in the table is a dimension in n-dimensional space. The algorithm tries to rotate the tree in n-dimensional space into optimal angle using an iterative algorithm, hence the name.

The above means that it can combine several parameters and weight them relatively to one another, otherwise it acts as a normal decision tree. It has plenty of knobs to tweak, yet unlike with neural network I actually understood what’s going on. Another advantage of trees is that results are easy to inspect to make sure you aren’t over fitting your data or doing something ridiculous.

Getting the results was relatively straight forward — I inspected the tree to make sure it was not over fitting, and pruned it a lot. I didn’t want to see names of individual players or player organisations anywhere in the tree, or names of specific locations. I considered that overfitting, and my goal was to get it to operate on higher level data — security status, parameters of ships, not individual occurrences.

I spent a lot of time pre-processing the data before I fed it to the classifier. This seems to be a bit of a dark art, as I couldn’t find a robust method on how this should be done.
I tried to assist the classifier using my knowledge of the game. For example the classifier struggled to tell apart what was equipped on the ship and what was in the cargo, and thus useless in battle. So I created a rating of the ship’s attacking ability based on the kind of weapons it has, but excluding the ones it had in sitting in the cargo. Then it stopped categorising freighters as a massive threat. Check out the github repo if you would like to see this in detail.

Of course, the amount of hand-holding of the classifier would be vastly reduced if instead of 1000, I had 100,000 rated player deaths. But you can always have more data, can’t you? It’s all about what you can do with the resources you’ve got.

Unsupervised Classification

I spent a little time and had a stab at unsupervised classification, but it didn't yield any results. Actually, it did yield results, but they were useless.

All unsupervised ML I tried categorises deaths by ship class of the victim. I think that happens because the all the parameters I extracted out of the dataset — ship's armour, speed, etc. are mostly a function of the ship’s size class. For instance a battleship has literally two orders of magnitude more hit points than a frigate. If you do primary component analysis of the data, virtually all the variance in the dataset is in the ship’s size.

PCA of the data — A lot of varience in one dimention, much less in others.

In the chart above, dimension 0 is amount of damage taken, and that’s basically the ship’s size. Also I found there aren’t many packages that let you visualise results of PCA in an easily digestible format.
I can’t remember the algorithms I used for classification, but take a look at the result. In charts below, colour is the group assigned by algorithm, X and Y are the two dimensions of PCA. The data I am putting in is player deaths, the categories I am getting out are ship classes — top right group are kills of players in Titans, biggest ships in the game. The next blue group is other capital ships.

Left — PCA of the dataset, right — center section zoomed in

Once I zoomed in on the centre section, I could see a cluster or all non-capital ships. I haven’t spent a lot of time exploring this grouping, but I struggled to find much sense in it at the time.

Accuracy

Using any automatic classification is a trade-off between Recall and False positives. Let’s make an analogy: imagine you are a jury, you never know that a suspect is guilty or innocent 100%. You can go for two extremes:

Maximise recall — make sure every criminal is Jail, but also sentence some innocents.
Maximise accuracy — no innocents are in Jail, but some criminals walk.

ROC curves illustrate that trade-off, and area under that curve is one of the best measured for classifier performance.

ROC curves for the group Logistics with one combination of settings

We are interested in mapping and we don’t worry about capturing every event. Instead we want to maximise accuracy, and discard events where we aren’t sure.

Confusion Matrix

in machine learning this is the thing that tells you what the classifier is confusing with what. it’s a big square table and for any medium to large job these turn into monstrosities. My project is quite small, and it’s already hard to follow.

So instead I came up with this quirky visualisation, entirely hand-drawn. Here each circle is the “true” category of the record. The colour is the way the classifier has categorised it. When a colour ‘bleeds’ from one circle to the next, that’s misclassification. I think it makes it easy to see what’s being misclassified with a single gaze.

For example, take a look at PvE — Sleepers. almost half of that group is being miscategorised as ‘PvP generic’.

A Visualisation for confusion matrices. Much more convenient

Similarly, ‘PvP — Stealth’ Is being miscategorised as ‘Scanning. Both of the miscategorised groups fit similar ships in a similar manner.

Results

So what can we tell from the data we collected? I used the classifier to categorise a couple million deaths, visualised them on a map of the game world / galaxy.

Mapping small gang warfare

If we map deaths of Logistics ships in game world, we can tell where small, well organised fleets fights their battles. That’s because the logistics, or ‘healing’ ships are useless in biggest fleet battles — the targeted players die too quickly, and are rarely used by disorganised groups, fielding them takes quite a bit of skill.

There are not small fleets on the far fringes of the galaxy

This map is basically telling us that most small, organised wars happen in Low-security space, near edges of null-security space and at the heart of high security space. That makes intuitive sense.

Capital Ship routes

It would be nice to know capital ship transport routes. To map them, we can’t just map death of capital ships — they usually die in fleet battles, not in transport. The map wouldn't tell us anything.

Instead we can make a map of ‘Cyno’s. ‘Cyno’ is something unique to Eve, ships equipped this way create a ‘beacon’, used by capital ships to jump from one system to another, across large distances.
Lighting up the beacon renders the ship immobile, and the beacon is visible across the galaxy. Every passer-by will immediately will empty their clip, so players usually use something cheap and disposable.

There is some noise in this image — not clear why so many cyno ships are being destroyed in Jita, apparently everything is being destroyed in Jita!?
But the overall picture is true — there are no cyno ships in high-security space because capital ships aren’t allowed there. Most of the traffic is in low-security space.

Destruction of Cargo

This map is very similar to the previous ones, same systems act as junctions for all the cargo and traffic from High-security space and back.

Destruction of mining ships

This map probably doesn’t tell us anything profound

Capital Ships

Capital ships don’t die very often, it’s usually a large battle or pilot’s stupidity that brings one down. The hotspots you see on the map below mark major battles in the game’s history.

Largest hostspots are battles of historic significance

Other thoughts

Remember I said I wanted to classify two things? Well, for the life of me I can’t find the second classification I did. I’ve got all the data, but there are no charts or mention of it in my report. So I guess whatever happened to those results will stay a mystery, unless You, yes you, follow the github link and try classifying the data yourself.

At first, I thought OP was on acid.
Then, I thought…That’s some good acid!
very cool.
- Doc Fury, some random eve player

Forum Post where I share the results

My post on EVE online forums with some discussion of results.

forums.eveonline.com

Github Repo with all the materials

Repo with some raw data, SQL queries, everything you need to replicate my work. Or at least the bits I could find.

github.com

Discussion of EVE game Economy by the developers

In EVE Vegas I presented a lot of data we've either rarely or never made public before. This also meant a lot of new…

community.eveonline.com

Original dev blog about the graphs I used

Alternate title: EVE Online Graph porn of 2013. Last year, for the presentation on the EVE Economy at Fanfest, I made a…