Downloading data

You can download here the data used in the paper Launch Hard or Go Home! Predicting the Success of Kickstarter Campaigns.

kickstarter-etter-cosn2013.tar.gz

Readme

These files contain the data used in the paper: 
Vincent Etter, Matthias Grossglauser, Patrick Thiran. Launch Hard or Go Home! Predicting the Success of Kickstarter Campaigns. Proceedings of COSN'13: Conference on Online Social Networks, 2013.

If you use it for your research, please mention the source.

For an example of how to use these files, please run:
python example.py .

Note that you need Python >= 2.7 and NumpPy to run this script.

# Description of the files

## projects.npy

Contains a 2D numpy array, where each lines corresponds to a project.

The fields are the following:
1. Project ID (attributed by Kickstarter)
2. Project goal
3. Project final state (1=successfully funded, 0=failed)
4. Launch date (as a UNIX timestamp)
5. Deadline (as a UNIX timestamp)

## statuses.pkl

Contains a 3D array, that contains 1000 uniformly-spaced samples of the status of each campaign.
The dimension is thus x1000x3, where the last three fields are:
1. Time of the sample (between 0 and 1, relative to the start and end of the campaign)
2. Current amount of pledged money (relative to the campaign's goal, so multiply with the goal to obtain the real amount)
3. Current number of backers (usually not an integer, because the statuses have been resampled to have an equal number of statuses)

## tweets.pkl

Contains a 3D array, that contains 1000 uniformly-spaced samples of the features related to tweets.
The dimension is thus x1000x3, where the last three fields are:
1. Time of the sample (between 0 and 1, relative to the start and end of the campaign)
2. Number of tweets
3. Number of replies
4. Number of retweets
5. Estimated number of backers
6. Number of users who tweeted

Again, these numbers are usually not integers, because they were resampled.

## graph.pkl

Contains a dictionary representing the projects/backers graph.
In this dictionary, the key are the nodes of the graph, and the values represent the edges.
Nodes having negative values are user, and those having positive values are projects.

For example:
graph = {}
graph[1] = [-1, -2, -3]
graph[2] = [-2]
graph[-1] = [1]
graph[-2] = [1,2]
graph[-3] = [1]

represents a graph with 2 projects (with IDs 1, 2) and 3 users (with IDs -1, -2, -3).
User -2 backed projects 1 and 2, and project 1 was backed by all users (-1, -2, -3).