How big is the Web? [data]

August 15th, 2012 § 6 comments

Tim Berners-Lee: The World Wide Web - Opportun...

Tim Berners-Lee: The World Wide Web

I was curious about the total pageviews of the web. It turns out they are not really tracked anywhere, and that they are easy to estimate, so I did a quick analysis.

First I found two sources for ‘global total pageviews’:

  • Akamai Net Usage Index – amazing real time dashboard of part of this data. They say that every minute 3 million pageviews are spent on news sites, and 10 million on social sites. That’s friggin’ a lot of pageviews! But I wanted to know the grand total, and hopefully get some sense on where the blogs are in the picture.
  • blog post about interpolating this data from Alexa. Nice approach, but a few years old data, so I decided to repeat the process.

Alexa publishes pageviews for every site for free as a % of global pageviews. First thing to do was estimate the grand total, as described in that blog post, by looking at the published data from Wikipedia.

11,600,000,000 / 0.5% = 2320,000,000,000 monthly total pageviews on the Web

… told you it was easy 🙂 but that just means we can dig deeper. Alexa publishes the list of top million sites in a downloadable text file, so I wrote a script to go trough it, scrape Alexa pages for top 10.000 sites and store their individual traffic shares.

The script also does some simple heuristics to classify sites into some categories, which I then manually completed. Turns out it’s really quick to go trough list of couple thousand sites 🙂 anyway, I’ve decided to create a super simple taxonomy for all websites. It is based on the frame of mind that a visitor has when she visits the site:

  1. search – primary purpose of the visit is to find something – information, people, content. so all content sharing sites fall into this as well.
  2. social – primary purpose is to socialize with other users
  3. media – primary purpose is to consume news streams and other editorial content. Blogs are a subset of this.
  4. e-commerce – primary purpose is to spend money. Amazon, E-Bay, eBookers, all fall into this.
  5. reference – primary purpose is to learn an information. Wikipedia is here, as well as individual company sites with information about products, like Microsoft or Chase.
  6. utility – primary purpose, explicit or implicit, is to use the tool or infrastructure. This one is tricky, but all e-mail providers are here, as well as some cloud storage providers (Dropbox) and ad networks that get hit automatically without users knowing it.
  7. unknown – unfortunately I don’t speak Chinese, or Arab or some 5000 other languages, so to spare time I skipped foreign sites, unless I knew them, like Yandex or Baidu.

Out of curiosity, I also looked at how many sites it takes to fill the Web. It turns out, that top 20 sites create 25% of all pageviews, and top 250 create 50% already. After that, we are already deep in the tail and the progress is super slow, and is actually still downloading…

So let’s first take a look at the top 50% of traffic and who creates it:

Of the 7% of media, the blogosphere represented 40%.

I’m assuming that Search and Social, that create the majority of the pageviews, are overrepresented in the head fo the web, so deeper the crawl will go, for media and reference will surface. I’ll publish an update with that. For now, here are the estimates of total pageviews based on these shares:

Category Monthly Global Pageviews Estimate
Search 1,125,711,592,824
Social 416,548,737,145
E-commerce 172,000,706,996
Media 159,984,707,266
Reference 106,394,591,704
Utility 75,323,701,823
Unknown 264,035,962,241
Total 2,320,000,000,000

I am very happy that the ‘media’ and ‘social’ estimates are very much inline with Akamai’s estimates.

Tagged , , , , , , ,

§ 6 Responses to How big is the Web? [data]"

What's this?

You are currently reading How big is the Web? [data] at Rational Idealist.