websites visited, and the like.
Given the difficulty nowadays of doing almost anything without leaving an electronic trace, the challenge in link analysis is usually not one of having insufficient data, but rather of deciding which of the megabytes of available data to select for further analysis. Link analysis works best when backed up by other kinds of information, such as tips from police informants or from neighbors of possible suspects.
Once an initial link analysis has identified a possible criminal or terrorist network, it may be possible to determine who the key players are by examining which individuals have the most links to others in the network.
GEOMETRIC CLUSTERING
Because of resource limitations, law enforcement agencies generally focus most of their attention on major crime, with the result that minor offenses such as shoplifting or house burglaries get little attention. If, however, a single person or an organized gang commits many such crimes on a regular basis, the aggregate can constitute significant criminal activity that deserves greater police attention. The problem facing the authorities, then, is to identify within the large numbers of minor crimes that take place every day, clusters that are the work of a single individual or gang.
One example of a âminorâ crime that is often carried out on a regular basis by two (and occasionally three) individuals acting together is the so-called bogus official burglary (or distraction burglary ). This is where two people turn up at the front door of a homeowner (elderly people are often the preferred targets) posing as some form of officialsâperhaps telephone engineers, representatives of a utility company, or local government agentsâand, while one person secures the attention of the homeowner, the other moves quickly through the house or apartment taking any cash or valuables that are easily accessible.
Victims of bogus official burglaries often file a report to the police, who will send an officer to the victimâs home to take a statement. Since the victim will have spent considerable time with one of the perpetrators (the distracter), the statement will often include a fairly detailed descriptionâgender, race, height, body type, approximate age, general facial appearance, eyes, hair color, hair length, hair style, accent, identifying physical marks, mannerisms, shoes, clothing, unusual jewelry, etc.âtogether with the number of accomplices and their genders. In principle, this wealth of information makes crimes of this nature ideal for data mining, and in particular for the technique known as geometric clustering , to identify groups of crimes carried out by a single gang. Application of the method is, however, fraught with difficulties, and to date the method appears to have been restricted to one or two experimental studies. Weâll look at one such study, both to show how the method works and to illustrate some of the problems often faced by the data-mining practitioner.
The following study was carried out in England in 2000 and 2001 by researchers at the University of Wolverhampton, together with the West Midlands Police. * The study looked at victim statements from bogus official burglaries in the police region over a three-year period. During that period, there were 800 such burglaries recorded, involving 1,292 offenders. This proved to be too great a number for the resources available for the study, so the analysis was restricted to those cases where the distracter was female, a group comprising 89 crimes and 105 offender descriptions.
The first problem encountered was that the descriptions of the perpetrators was for the most part in narrative form, as written by the investigating officer who took the statement from the victim. A data-mining technique known as text mining had to be used to put the descriptions into a structured form. Because of the limitations of the text-mining software available, human input