this post only to link to a couple of short articles discussing the recent view of big data as some kind of manna, a panacea, a means of showing what people are really thinking.. but as kate crawford points out here, some data-crunching types seem to easily fall in to the trap of thinking that correlation equals causation:
while just the other day i happened across another related piece in the sydney morning herald (link not presently available) discussing how attacks on businesses – relying on being found on the first page of a google search – render their websites virtually invisible when rival companies create codes for that page which lead to too many 404’s – causing google to remove them from their search engine hits..
here an analysis of twitter tweets over the past year or so using some sort of algorithm + “sentiment analysis” . it is based on local political stuff, so may not make sense to those in the USA – and also i have to annonce the rider that, not being privy to what has been used to make the algorithm, and furthermore, being extremely skeptical of anything called “sentiment analysis” that is automatically compiled, i cannot say that some of the readouts will be of any actual authenticity or believability… however, my interest lies in the use of info-graphics for rendering lots of data. as the site says, “see more. read less”.
so what does astro-turfing the internet mean?
seems as if software and big companies can get together and make it seem as if their opinion is being held by more involved identities/personas than might otherwise be the case.
a whole history, complete with identity ‘furniture’, can now be suplied so that one person can seem to be many, and maybe get paid for their time spreading their disinterested views of a reactionary nature to various sites…
my suspicious nature is ready to believe it in spades.
there are a couple of comments interrogating how it is to be read, and how it was compiled, etc, so the whole thing is interesting as an applied exercise in using different modalities to represent data..
an excerpt (via susanna on email CITASA list):
Wael Ghonim, a pivotal figure in this self-organzing system who instigated the initial protests on January 25th, is prominently located near the bottom of the network, straddling two factions as well as two languages. The size of his node reflects his influence on the entire network.
The lump on the left is dominated by journalists, NGO and foreign policy types; it seems nearly gafted on, and goes through an intermediary buffer layer before making contact with the true Egyptian activists on the ground. However, this process of translation and aggregation is key; it is how those in Egypt are finally getting a voice in Western society, and an insurance policy against regime violence. Many of the prominent nodes in this network were at some point arrested, but their deep connectivity help ensure they were not “dissapeared”.
and now of course, there is much related matter in discussion mode everywhere… variously termed “the great debate”, ” the arab spring”, and so-on, one of the common debating points (you could check out abc q&a’s program last week but one, for an example if you could get it online out of australia) is whether social media/twitter/wikileaks ’caused’ the uprisings, or merely enabled/helped/provided extra fuel to the spring/fire/activities.
here is a link to one of them by philip n howard, who is apparently an author elsewhere on topics political.
What you see is the result of a large-scale scan of web sites favorites icons (“favicons”) using the Nmap security scanner and the Nmap Scripting Engine (NSE).
The sites scanned were the one million domain names with the greatest “reach” according to Alexa on January 19, 2010, plus the one million names created by prepending “www.” to the former.
For each of these, an NSE script downloaded the favicon, calculated its MD5 hash, then summed the reach of the sites under each hash. 328,427 unique icons were retrieved; of those, 288,945 were loadable by PerlMagick and the remaining 39,482 were considered non–image files.
The reach was not know exactly for every site. On January 27, 2010, the reach was looked up for a sample of 178 sites, and the reach of the remaining sites was calculated by the formula reach = 66.1682 × rank0.9337. The formula comes from a linear regression of log(reach) versus log(rank) of the sampled sites. This chart shows the closeness of the fit between the estimate and sample: (see chart).
The area of each icon is proportional to the sum of the reach of all sites using that icon. When both a bare domain name and its “www.” counterpart used the same icon, only one of them was counted. The smallest icons, those corresponding to sites with approximately 0.0001% reach, are scaled to 8 × 8 pixels at 600 pixels per inch, or about 0.34 mm on a side. The larger icons are scaled proportionally, their size however being constrained to be a multiple of 8 pixels. The largest icon is 5,968 × 5,968 pixels, and the whole diagram is 18,720 × 18,720.
Details of the scan: The script first retrieved the root document and searched for an element of the form
. If such an element was present, the favicon was retrieved from the given URL. If the element did not exist, or if the URL could not be retrieved, the favicon was looked for at /favicon.ico. Up to five redirects were followed for every document retireved. An icon was considered to belong to a domain name even when redirects led away from that domain name, or the icon was on a different domain. After redirects, only response with an HTTP status code of 200 were counted. When multiple icons were present in a file, the image with the greatest size and color depth is shown.
Viewer beware: The chart should not be taken as authoritative of the popularity of the sites presented, because of the inaccuracy of large, over-Internet scans. Sites were only counted if their favicon could be downloaded. Because of the unpredictable network effects, some sites, such as Bing, Baidu, and Amazon, are shown smaller than they should be.
My web home, squareONE, was fetched to reveal the following assessment:
http://www.squareone-learning.com 10000 bytes in 0.00 seconds.
http://www.squareone-learning.com/favicon.ico 2550 bytes in 0.00 seconds.
Online lookup: The icon is at (16.960, 16.200) and is 1056 × 1056 pixels.
Putting it in the top 1 million web sites. Yowza!!!