Predicting visa overstayers in the EU using AI

Using classification models to predict people overstaying their visa — This article has been co-authored by Marco Mazzeschi, one of the leading immigration lawyers in Italy —

To control illegal immigration focusing on asylum seekers who cross land borders ignores the real problem: people who overstay their visas (e.g. tourist, student, medical reasons) or overstayers of expired residence or work permits (European Commission, 2020).

Image from:

Most irregular migrants originally entered the EU legally on short-stay visas, but remain in the EU for economic reasons once their visa has expired (European Commission).

Overstayers outnumber illegal migrants

Immigrants, who enter United States legally on student, tourist, or work visas and then stay past their visa’s expiration date have outnumbered border crossings by a ratio of about 2 to 1 (K. Calamur, 2019). Elsewhere, the issue is even more pronounced. Most people who are in Britain illegally, for example, entered legally and simply stayed on after their visa expired (B. Vollmer, 2011).

Systematic identification of people ‘overstaying’ in the Schengen area is one of its major challenges and is primarily facilitated by the absence of any system for recording entry/exit movements in Europe (European Commission). European countries are still not able to fully account for the flows of non-EU individuals who entered the EU legally and extended their stay without obtaining the necessary permits.

The Schengen Borders Code has no provisions on the recording of cross-border movements. The current procedure requires only that passports be stamped with dates of entry and exit. This is the sole method available to border guards or national Police when calculating whether a right to stay has been exceeded.

Is there a way to use AI for profiling immigrants? Can this be done in way to avoid discrimination against persons on the grounds of sex, colour, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, property, birth, disability, age or sexual orientation?

Available Data

The data of migration flows in the EU are collected by Eurostat with a specific methodology (European Commission, 2016) and with the following disaggregations:

  • Sex
  • Age
  • Citizenship
  • Status Withdrawn
  • Country of Residence
  • Decision for entering the country
  • Resettlement Framework

Is this data (“Data”) useful and can be sufficient to make a prediction on whether an immigrant would become an “overstayer” after his/her legal entry in the EU??

How can AI be applied to immigration?

AI is a powerful tool to make predictions. By giving the AI historical mathematical data, the AI can find patterns in the data and become specialized in predicting the same thing all over again. This tool can be also used in predicting illegal immigrants, if we have sufficient data.

This is how AI works with a very simple example.

For any given data, you manually have to decide which part of your data acts as predictors (Features) and which part of the data you want to predict (Labels).

After having set those partitions, you simply give them to a model, and the AI finds the rules by itself. Meaning that from the predictors (Features), the AI will be able to predict the Labels.

There are different kinds of AI, according to the one of our choice, results will vary, and the final output can be more or less accurate.

Training an AI to predict overstayers

In the paper “Artificial Intelligence and Predicting Illegal Immigration to the USA”, researchers Azizi and Yektansani built a model (“Model”) able to estimate the probability of an individual overstaying in the US.

The Model takes into account all the Features (Sex, Age, N. Children, Wage, …, essentially the predictors) and takes Legal Status (0 = Undocumented, 1 = Legal Immigrant) as Labels (the value we want to predict).

To test the performance of the AI they have developed, Azizi and Yektansani have split the dataset in 70:30 proportion. The big chunk with a random 70% of the data (4,396 samples) has been used to train the AI to find the rules (“Rules”), the remaining 30% of the data (1,885 samples) has been used to test the Model.

The AI has found the Rules to map Features and Labels. The researchers have tested the AI on the remaining part of the data. Below is the chart showing the accuracy of different models.

After applying different classification models to make the best prediction, the researchers have reached a threshold of 80% accuracy.

How can the Model be improved?

To build the Model so that it can be used in a more effective and neutral way, the following improvements could be implemented:

More data: accumulating more data, in compliance with privacy rules, may help to improve the accuracy of the Model. Data which could be useful to profile immigrants and assess the risk of a possible overstay, are:

  • Personality Traits (OCEAN model) of the immigrant
  • Gini Coefficient of the country
  • Hofstede cultural dimensions

No discrimination: the Model should also be trained in a way to avoid bias and guarantee that the findings are not influenced by sex, race, religion, and ethnic profiling.


European Commission, 4th May 2020, Visa statistics: Schengen States issue 15 million visas for short stays in 2019, Retrieved from:

Schengen Borders Code, Retrieved from:

Seyed Soroosh Azizi, Kiana Yektansani, February 2020, Artificial Intelligence and Predicting Illegal Immigration to the USA, International Migration

VERSION 3.0 AMENDED IN FEBRUARY 2016, Eurostat, Retrieved from:

Krishnadev Calamur, April 19th 2019, The Real Illegal Immigration Crisis Isn’t on the Southern Border, The Atlantic, Retrieved from:

Dr. Bastian Vollmer, 11th July 2011, Irregular Migration in the UK: Definitions, Pathways and Scale, The Migration Observatory, Retrieved from:

Leave a Reply