Census of the Political Web

About the census

An index of virtually every English political site on the web. This index contains more than 1.8 million web sites, crawled and classified by language (English/non-English) and political content. Of these, roughly 800,000 are political sites.

Data

This automated snowball census was conducted 8/1/2010.

The complete 2010 index (107 MB, zipped csv)
1% sample of the 2010 index

Software

To conduct this census, we develop snowCrawled, an open-source python library for directed web crawls. The snowCrawl code repository is hosted at http://code.google.com/p/snowcrawl/. Please visit that site to download the library and examples.

References

For a description of the process used to generate this census, please see the working paper: An automated snowball census of the political web at SSRN.