Census of the Political Web
About the census
An index of virtually every English political site on the web. This index contains more than 1.8 million web sites, crawled and classified by language (English/non-English) and political content. Of these, roughly 800,000 are political sites.
Data
This automated snowball census was conducted 8/1/2010.
The complete 2010 index (107 MB, zipped csv)
1% sample of the 2010 index
Software
To conduct this census, we develop snowCrawled, an open-source python library for directed web crawls. The snowCrawl code repository is hosted at http://code.google.com/p/snowcrawl/. Please visit that site to download the library and examples.
References
For a description of the process used to generate this census, please see the working paper: An automated snowball census of the political web at SSRN.