The good people of IUU Watch presented a couple of weeks ago a post by Grant Humphries of Black Bawks Data Science that developed and algorithm aimed at vessels risk profiling. While aimed to support the EU IUU Reg. I like to see beyond that (how many non EU vessels do unload in the EU?), furthermore if it is to profile the risk of a vessel based on the data in the Catch Cert, they way things are at the present, that vessel could have unloaded months and even years ago.
Said so, I do like the design logic behind the algorithm, and I can see many uses for it at ports, particularly under the PSMA principles. Furthermore, I could see it as a way to add relevance to the Vessel Compliance Index like the one run by FFA.
I paste below the text from the IUU Watch post, as written by Grant.
I present a sample application that shows how a relatively simple decision support tool could be built into the EU IT system referred to above, to assist in the selection of fishing vessels/seafood consignments for controls aimed at detecting illegal fishing. The example below combines the predictive power of Random Forests – a powerful machine learning algorithm – with the open source, web application builder “R Shiny”. This decision support tool allows users to select a test fishing vessel with set parameters, and predict its probability of engaging in illegal fishing. Users can also create their own series of parameters and make a prediction on the probability of a ship engaging in illegal fishing
The test application can be found at: https://blackbawks.shinyapps.io/IUUFishing/.
The code and technical details for the application/simulation are available on Github: https://github.com/Blackbawks/IllegalFishing
Development of the application
The steps involved in developing the risk management application were as follows:
Step 1: Creation of a simulated dataset of ships and characteristics with pre-set relationships between illegal fishing and our simulated predictor variables (see below table)
Step 2: Training of the Random Forests (machine learning) algorithm
Step 3: Building of a web-based application in R Shiny that allows users to input data
Step 4: Use of the information in the trained Random Forests algorithm to predict the probability of the ship engaging in illegal fishing
In the real world, the dataset on which we would train the algorithm would be stored in a central, password and firewall protected database, which could be accessed through the web-based application. A proposed model could look like this
Step 1: The data
The scenario in the test application has:
- Five fictional countries: Sidonia, Avalon, Noordilund, Slagovnia, and Tortuga.
- Five fictional owners: SparkleFish, FishRGud, KungFuFish, ScummyFishCo and FishARRRies
- Five classes of ship (classed by length): 1 (60 – 100m), 2 (101 – 130m), 3 (131 – 170m), 4 (171 – 220m) and 5 (221 – 300m).
- Five possible destinations of goods: LaLaLand, BetaZed, The Shire, Alpha Centauri, and Kings Landing
- Five fish species: Raricus fishica, Commonae eatedie, Billidae nyiecus, Donaldus trumpfishii and Fishica maximus
I next simulated 3000 fictional ship IDs, with the assumption that the five countries have submitted all known data for their ships. These 3000 ships form the basis by which we “teach” our model (training) to “learn” the patterns / relationships.
With the 3000 ships, I created a data table with the following columns:
I simulated this dataset under 10 assumptions:
- Assumption 1) The largest (class 5) and smallest (class 1) vessels are slightly more likely to engage in illegal fishing. (Note: this helps to create a bimodal distribution of the ship sizes engaging in illegal activities to demonstrate the non-parametric nature of the algorithm, i.e. doesn’t depend on statistical distributions).
- Assumption 2) “Responsible” countries with strong Illegal fishing laws are less likely to engage in illegal fishing. In our dataset, Sidonia and Noordilund are countries with strong regulations, Avalon is in the middle and Slagovnia and Tortuga have either little or no regulation.
- Assumption 3) Companies with sustainable practices will almost never engage in illegal fisheries. In our example, SparkleFish, and KungFuFish are the most sustainable, FishRGud are moderate, while ScummyFishCo and FishARRRies are the least sustainable.
- Assumption 4) Older fishing vessels are more likely to engage in illegal fisheries as they are more likely to be used by organizations wanting to cut costs and not prioritize safety features to save money (these are organizations likely to be more corrupt).
- Assumption 5) Raricus fishica is likely to be illegally caught the most… but Billidae nyiecus looks like another species therefore we score it higher as there could be illegal fishing associated with it.
- Assumption 6) CITES listed II species are more likely to be associated with illegal fishing
- Assumption 7) If an owner has been flagged for illegal fishing in the past, this increases the likelihood a vessel is fishing illegally
- Assumption 8) If a country has been flagged for illegal fishing in the past, illegal fishing is more likely
- Assumption 9) If a ship has switched its trade route, it is more likely to be fishing illegally
- Assumption 10) If ship has not switched on its AIS, it is more likely to be fishing illegally
Step 2: The analysis
Note: in this case, I was not interested in testing the hyperparameters (e.g. all the settings that help tune the algorithm) of Random Forest, so I left these under the default settings.
Random Forests works by way of decision trees (i.e. a souped-up series of conditional “if” / “then” statements) to make predictions on a target variable. It creates those conditional statements by “learning” the relationships between the target variable (here, illegal fishing) and the predictor variables (the variables we want to use to predict the target – see table above). Using the data simulated in Step 1, we used the “Illegal” column as our target variable – in other words, we were interested in predicting if a ship was engaged in illegal activity based on the other columns (owner, country, etc…).
We used a cross-validation technique to ensure the model was predicting our data well. In this case, the Random Forests model we used had an accuracy of 74% – that means that it correctly guessed if a ship was engaged in illegal fishing (or not), 74% of the time. This value could be vastly improved through tuning of the model (e.g. tuning of hyperparameters, use of ensemble models, deep learning methodology, or other techniques). I purposefully programmed “noise” in our dataset to ensure that we didn’t achieve a perfect model. The goal of this is to demonstrate how the proposed system could work as opposed to perfecting the model.
Step 3: User input and prediction
On the front end, the user is given the option to select from one of five ships, which fills in the pertinent data like “owner”, “country”, “ship length”, etc… For example, the “Christian Bale” is a ship owned by ScummyFishCo, and is registered in Slogovnia. It is 192m long, making her a class 4 ship. The ship was built in 1975, and normally sends product to LaLaLand. If that ship comes into port and the user tells the front end that this ship was catching Raricus fishica, that the shipment is being sent to King’s Landing, and that the AIS was active since last at port, we find that the probability of this ship engaging in illegal fishing was 0.93 – in this case, we would likely board the ship for inspection.
Another potential ship a user could pick in our application is the Bruce Lee. She is owned by KungFuFish and registered in Noordilund. The ship is 83m long, making her a class 1, and was built in 2014, normally shipping to LaLaLand. If on an excursion, the Bruce Lee returns with Commonae eatedie being shipped to LaLaLand, and had her AIS on, the probability of the ship engaging in illegal fishing would be 0.03 (3%), so we would not likely inspect the ship so thoroughly.
Step 4: Using the information
The question really lies in what thresholds do we use to make the decision on whether to inspect a ship or not. For example, if the probability is 51%, do we board? The precautionary principle would suggest we do, but this could increase the number of inspections which may not be commercially viable. One school of thought could be to only inspect ships with a very high likelihood of engaging in illegal fishing (e.g. 80% or more).
No matter what approach is taken, decision support tools that take advantage of sophisticated algorithms are showing great promise. Using them to combat illegal fishing will automate decision-making in a transparent way that can be scaled from local to global solutions. Furthermore, data integrity can be secured through centralized databases with specifically designed access.
There is still much work to be done to develop these tools in a way that is agreed upon by the global community, but we are at a stage now to begin the process.