In February 2015, Kaspersky Labs released a report (PDF) detailing its investigation into the Equation Group, an extremely sophisticated hacker group engaged in espionage. Many experts suspect the United States’ NSA to be behind Equation Group due to keywords identified in malware the group has produced. In this article, I will use a different approach to produce evidence for attribution. Kaspersky released, in addition to it’s initial report, data on the dates that pieces of malware were compiled by Equation Group. These timestamps fall almost exclusively during the working week and appear to follow a 9:00 to 5:00 schedule. Assuming that Equation Group is operated by a state actor (government), we can correlate these dates with holidays to identify countries that are more or less likely to be responsible.

So, somehow you managed to find a lot of data for which you want to fit a linear model. The problem is that there are several different categories in the data and you forgot to record which falls into which category. Oops. This is especially bad because the relationship between your dependent and independent variables changes across categories. How do we solve this?

Why do arms embargoes fail? Despite their frequent use by international organizations like the United Nations and the European Union, arms embargoes suffer from a poor record of success. For half a century now, multilateral arms embargoes have been the primary tool used to fight the proliferation of small arms and light weapons (SALW) to conflict zones and perpetrators of mass violence. These agreements between countries prohibit the sale of weapons to a particular target country (or sometimes a target organization). However, official reviews and academic studies alike tend to conclude that small arms are still making their way to embargoed actors.

GDELT and ICEWS are arguably the largest event data collections in social science at the moment. During their brief existence they have also been among the most influential data sets in terms of their impact on academic research and policy advice. Yet, we know little to date about how these two repositories of event data compare to each other. Given the nascent existence of both GDELT and ICEWS event data, it is interesting to compare these two repositories of event data.