Politifact is a website the rates the truthfulness of public statements. Generally they focus on statements by political leaders but they’ll also tackle celebrities, chain emails, and bloggers from time to time. Statements are rated on a five point scale: “Pants on Fire!,” “False,” “Mostly False,” “Half-True,” “Mostly True,” and “True.” The script below will scrape Politifact and save every statement in a gzipped CSV file. The data is structured as who, when, validity, subjects, statement.

China’s ruling elite are vast and notoriously difficult to obtain accurate biographies of, especially in a structured form. China Vitae provides a large database of information about Chinese political leaders but offers no clear way of downloading those biographies for use in quantitative studies. Here, I offer a simple script in R for scraping this data from China Vitae that produces a CSV that chronicles the careers of thousands of Chinese leaders.

Occasionally in political science, we run into problems in which we have a small dataset and a large array of possible of predictors. Choosing a parsimonious model can be difficult. When theory-based model selection is out of the question, automated variable selection allows us to estimate the probability that each predictor is included in the “true model.” We can use these estimates to then prune our model. Here, I describe a Bayesian ordered probit regression model with stochastic search variable selection (SSVS).

So, somehow you managed to find a lot of data for which you want to fit a linear model. The problem is that there are several different categories in the data and you forgot to record which falls into which category. Oops. This is especially bad because the relationship between your dependent and independent variables changes across categories. How do we solve this?

In February 2015, Kaspersky Labs released a report (PDF) detailing its investigation into the Equation Group, an extremely sophisticated hacker group engaged in espionage. Many experts suspect the United States’ NSA to be behind Equation Group due to keywords identified in malware the group has produced. In this article, I will use a different approach to produce evidence for attribution. Kaspersky released, in addition to it’s initial report, data on the dates that pieces of malware were compiled by Equation Group. These timestamps fall almost exclusively during the working week and appear to follow a 9:00 to 5:00 schedule. Assuming that Equation Group is operated by a state actor (government), we can correlate these dates with holidays to identify countries that are more or less likely to be responsible.

Why do arms embargoes fail? Despite their frequent use by international organizations like the United Nations and the European Union, arms embargoes suffer from a poor record of success. For half a century now, multilateral arms embargoes have been the primary tool used to fight the proliferation of small arms and light weapons (SALW) to conflict zones and perpetrators of mass violence. These agreements between countries prohibit the sale of weapons to a particular target country (or sometimes a target organization). However, official reviews and academic studies alike tend to conclude that small arms are still making their way to embargoed actors.

GDELT and ICEWS are arguably the largest event data collections in social science at the moment. During their brief existence they have also been among the most influential data sets in terms of their impact on academic research and policy advice. Yet, we know little to date about how these two repositories of event data compare to each other. Given the nascent existence of both GDELT and ICEWS event data, it is interesting to compare these two repositories of event data.