A detailed look into Pocket Media’s advertising fraud detection strategy.
Would you allow your marketing budget to be stolen by companies who try to pretend to use your apps? I guess not. Fraud is a booming topic in the mobile ad industry. Why? Because it’s wide spread and takes up to 5-20% of the total mobile ad spend that is increasing to a 100+ Billion Dollars industry. If that money could have been spent on attracting real users instead of paying companies who are claiming they had a share then that seems worthwhile protecting your business. Right? On the other hand the relevance of ads for mobile users is getting more important every day in the struggle of filling advertiser placements. This having said… What are we up against?! Let me guide you into how to identify several types of fraud in a series of blogs. In this first article I will do a deep dive in how to identify Install Fraud.
Types of Fraud
More than one attempt has been made to categorize types of fraud. At Pocket Media we differentiate three (commonly used) types that can be defined as:
- Install fraud: fake click, fake user
- Click fraud: fake click, genuine user
- Compliance fraud: genuine click, genuine user
In this blog I will do a deep dive into the ‘easiest’ of the three: Install fraud. In the following blogs I will explain more about Click fraud and Compliance fraud.
What is Install Fraud?
After clicking on an ad(vertisement) your will be redirected to an app store to download an app and install it. This install is the part where the money is so companies tend to minimize the install fraud first. Let me use an easy explanation of install fraud. Install fraud is a tactic used to impersonate real human behavior when it is actually 100% disingenuous. This form of fraud is designed to quickly scale a user base and drive up costs for the marketer. These users are not genuinely interested in the app. These are either real humans with malicious intent, or bots that are generating this traffic.
When having a look to the amount of data to consider for analysis then it’s also easier to analyze installs instead of clicks from a technology perspective. With a tool like Excel you’d still be able to analyze the installs (to a max of 1M) whereas the clicks go up into the 10M+ per day. You will need to invest in other BI tools. When applying (machine learning) algorithms to automate fraud alerts you also need to invest in the knowledge and tools to get in control of your traffic. So let’s dig a little deeper into install fraud.
How to identify fake installs?
Now that we have a comprehension of what install fraud is, let me show you how to detect it in your data. For this it’s good to start thinking the other way: what would a real human do? When analyzing the data I always start with asking myself how a real human would respond to this ad. Not all users click the ad, then download it and keep using the app day in day out. But still, there is some common sense in ‘standard/average’ behavior. Ask yourself: how many times have you clicked on an ad you were interested in but never installed and used it? The number of clicks needed to convert to an install can be represented in a conversion rate (CR) or clicks to install (CTI) rate. These rates can be used to define a baseline for real human behavior. Some companies already pointed out two types of fake installs:
- App install farms as a crime for which real traffic to an app is simulated without having the intention to use the app for its purpose. Do you want to see how one looks like? Check this example!
- Bots and Botnets. Devices can be infected by this malicious software in order to install apps and perform in-app activity.
So, how are we going to block this fraud? Let’s discuss four dimensions to look at to identify suspicious install traffic.
4 ways to identify Install Fraud:
1. IP address
IP addresses can be spoofed to hide where they actually coming from. To double check if the IP address is black-listed or known in other databases we run it through external software which in return will give us more insight in the IP address used. The number of unique IP addresses – used to install an app – could show us if the same IP address was used for other campaigns. How probable is it that a real human installs several offers on the same day from the same IP address within a few minutes? Not so probable…
This could look like below on a certain day. The colors represent the publisher.
Fig 1. Number of Installs per IP address by Publisher per Campaign for 1 day. Colors represent Publisher and numbers the Installs per Campaign.
There are reasons for multiple installs per IP address per day, like multiple different users using the same IP address (like in a library multiple users on Wi-Fi use the same IP address when connected). So you need to dig a little deeper.
When checking the data behind the graph above we see that the same IP address was used for different campaigns within the same time window (a few minutes)!! Definitely not something a real human could achieve.
As mentioned already fraudsters are innovative. So a more advanced system wouldn’t use the same IP address but would use a proxy to create diversity in the data.
In order to capture IPs coming from a range we’ve split up the IP address in four parts and do another count on the first three parts.
A report for this IP range looks like the graph below.
Fig 2. Installs per IP range by publisher for one day.
While the number of installs per IP range might not be alarming the number of splits might be. We also should not forget that the number of unique IP addresses are depending on the country’s policy and how they are handled by carriers. The combination of both would lead us to detect proxy-like traffic.
What also catches the eye is that some try to ‘mask’ the IP range by skipping 1 or 2 to avoid being highlighted in reports that look for subsequent IP ranges.
Although I have seen some companies addressing the User Agent (what’s that?) in combination with the IP I couldn’t use this as a major key to identify suspicious traffic. It’s clear that it should provide a (better) fingerprint but our data doesn’t back this up (or is mostly free from this fraudulent traffic). Don’t leave it out of your analysis if you did discover fraud with this technique!
The uniqueness of the user agent is getting less because of standardization of devices and the software running on it. As Apple devices are less various the IP – User Agent combination is more prone to appear. Some good information about that is found here.
From a device perspective, we can tell a lot about the traffic. Just looking at the share of the device operating system and model (even version) it generates useful insights. This way we could discover that some networks were using a device model called AndyWin Emulator. Clearly intended to fake a device and install an app from a desktop on a larger scale.
Another clear factor I found for Android is when installs are faked the fraudsters want to cover that with using different models of the device. There’s no leading share of a device model. When I had another look at the OS version it also seems this one is spoofed per device model. In the market we notice that there’s a trend of staying on older versions but just compare it to the stats per country to double check. For Apple this is harder to track because all devices are stored as iPhone or iPad.
Here’s a graph that shows an abnormal spread by device model by country by campaign by publisher (on Android). Almost exactly the same device models have been used to fake installs across different countries. The publisher in blue represents the good traffic. All campaigns are alike.
Fig 3. Installs in % of Total by Device Model by Country for four different Publishers running comparing campaigns.
Compare these shares of device model to what the country should represent on average. Bots are developed to show uniqueness of the device model resulting in a huge variety of models used and a flattening curve like above graph (except for blue). In this example the countries are too perfectly aligned compared to their shares by country. Try to identify the outliers of the differences between the device share and the average per country.
3. Time to Install (TTI)
There’s already been said and written a lot about the TTI: the time it takes from the click to install and open the app. It’s one of the key metrics to identify fraudulent traffic which might come in too fast or too slow. Additionally you can store the app id and link it to the app store URL to filter out the size of the app. In combination with the average speed per country we created a way to highlight statistical outliers per country per campaign. An example of a TTI analysis could look like the graph below.
Although the TTI is an important factor for analysis it will not identify install fraud as such. In my next blog I will go in more detail about click fraud and the TTI to identify it. In regards to what other articles have stated before we see fraud happening on Apple too. I hope Apple is aware and is trying to close their OS for bots etc. But we just cannot assume Apple devices are free from fraud.
To identify install fraud coming from human farms or bots/botnets we should have a closer look at the in-app activity. It’s quite easy to setup goals for reaching a certain activity that you can trace and evaluate. Either in-app purchases or getting towards a certain game level would be easy starters. These farms will simulate the install (that’s where the money is in the short term) but will stay away from keeping the app active. When bots automate in-app traffic this will mostly lead to an almost perfect stable behavior for a particular network/site compared to other traffic from other sites. I would explicitly request advertisers to share activities through second postbacks to identify fraud earlier in the process. Pocket Media is connected to industry-known in-app analytical platforms like Appsflyer and TMC.
So, this is just the beginning. Fraud is here and we all want to minimize it to improve ROI for our mobile ad campaigns. In this article I presented you the metrics you can use to start identifying install fraud:
- Count the IP addresses that are used for several offers across your site/network (?) to check farm behavior
- Use device characteristics to detect unusual shares in app usage
- Generate an average Time to Install per campaign and highlight statistical deviations
- Use trackable goals to measure in-app activity and define KPI’s for reaching those goals
As an industry we have the responsibility to tackle fraud. Together we can make that happen! In the next blog I will pay attention to click fraud and what to do about it. So stay tuned for more insights on fraud in the mobile ad space!
About the Writer
Ignas van den Einde is a Business Intelligence analyst and fraud specialist for Pocket Media. His passion is to create business value through innovation and efficiency with a creative, data-driven and realistic perspective. In his day-to-day practice he gives answers to business continuity. Besides that, he is also an indoor soccer lover, goes crazy when playing squash and enjoys relaxing when playing golf.
About Pocket Media
Pocket Media is Mobile Media Agency, specialized in Mobile Display and Video advertising. With an international team of mobile experts, we advise our clients on strategy, implementation and optimization.
Our many years of experience in Performance Based Advertising is the starting point of our 4 verticals: App Installs, Trading Desk, Native Ad Solution and Mobile Entertainment.
Venturebeat.com, Tune.com, Fastmetrics.com, Quora.com, BusinessInsider.com, Deviceatlas.com, Twiiter.com, Appbrain.com