Icons of the Web


(source page)


What you see is the result of a large-scale scan of web sites favorites icons (“favicons”) using the Nmap security scanner and the Nmap Scripting Engine (NSE).


The sites scanned were the one million domain names with the greatest “reach” according to Alexa on January 19, 2010, plus the one million names created by prepending “www.” to the former.


For each of these, an NSE script downloaded the favicon, calculated its MD5 hash, then summed the reach of the sites under each hash. 328,427 unique icons were retrieved; of those, 288,945 were loadable by PerlMagick and the remaining 39,482 were considered non–image files.


The reach was not know exactly for every site. On January 27, 2010, the reach was looked up for a sample of 178 sites, and the reach of the remaining sites was calculated by the formula reach = 66.1682 × rank0.9337. The formula comes from a linear regression of log(reach) versus log(rank) of the sampled sites. This chart shows the closeness of the fit between the estimate and sample: (see chart).


The area of each icon is proportional to the sum of the reach of all sites using that icon. When both a bare domain name and its “www.” counterpart used the same icon, only one of them was counted. The smallest icons, those corresponding to sites with approximately 0.0001% reach, are scaled to 8 × 8 pixels at 600 pixels per inch, or about 0.34 mm on a side. The larger icons are scaled proportionally, their size however being constrained to be a multiple of 8 pixels. The largest icon is 5,968 × 5,968 pixels, and the whole diagram is 18,720 × 18,720.


Details of the scan: The script first retrieved the root document and searched for an element of the form


. If such an element was present, the favicon was retrieved from the given URL. If the element did not exist, or if the URL could not be retrieved, the favicon was looked for at /favicon.ico. Up to five redirects were followed for every document retireved. An icon was considered to belong to a domain name even when redirects led away from that domain name, or the icon was on a different domain. After redirects, only response with an HTTP status code of 200 were counted. When multiple icons were present in a file, the image with the greatest size and color depth is shown.

Viewer beware: The chart should not be taken as authoritative of the popularity of the sites presented, because of the inaccuracy of large, over-Internet scans. Sites were only counted if their favicon could be downloaded. Because of the unpredictable network effects, some sites, such as Bing, Baidu, and Amazon, are shown smaller than they should be.



My web home, squareONE, was fetched to reveal the following assessment:


http://www.squareone-learning.com 10000 bytes in 0.00 seconds.

http://www.squareone-learning.com/favicon.ico 2550 bytes in 0.00 seconds.


Online lookup: The icon is at (16.960, 16.200) and is 1056 × 1056 pixels.


Putting it in the top 1 million web sites. Yowza!!!


Globalization of Humor


WordPress 3.0


Resource map by Sallie Goetsch (src), via WordPress Asylum.


Absolutely painless upgrade to WP3 here on ND2.0. The only chip in the pile was learning that our venerable–by our standards–theme is not compliant with some of WP3′s added functionalities. The only ones I’ve identified are the extra widget areas, and, the menu builder.


WP3′s new features bring some power user capabilities to the masses. Although, in noting this greater power, it is only afforded to those who can grok the basics of how WordPress works under the hood. For example, custom post types provides a way of breaking out content from the either/or of Post/Page, but, it’s most beneficial application involves situating those custom “types” within dedicated Divisions within a layout, using their loop.


Closer to our wheelhouse here is the revamped taxonomy function, that could be deployed to classify tagged texts. WP3 also integrates WordPress Multi-User; although ND2, multi-user as it is, is also minimalist in approach.


The WordPress 3.0 feature set was finalized a long time ago. The one addition I would eventually like to see is easy play list podcasting. The kludgy workarounds which use plug-ins are hit-or-miss–mostly miss.



Feature guide via Sixrevisions

Smashing Magazine’s Highlights of WP3

Taxonomies explained at 1stwebdesigner.


Lexicalist


Screen capture of part of the interface. Keyword search=privacy. Lexicalist


ABOUT. Lexicalist reads through millions of words of chatter on the internet to analyze how certain demographics talk and what kinds of things they talk about. We currently break this information down into three kinds of demographics: age, gender, and geography.


METHODOLOGY. Lexicalist works by analyzing rich sources of information online, including blog posts, news sources, and social networking sites like Twitter. Each bit of information is subjected to rigorous natural language processing, which includes a likelihood distribution of being authored over all geographic, age and gender demographics.


All of the statistical results displayed here are then normalized against the volume of information coming from each demographic to see what words are most commonly associated with certain populations. The result is a descriptive snapshot of language as it’s used today.

src


Lexicalist is worth investigating. In fact, it potentially is a time sink/waster for the lexically minded. I wish I knew how a descriptive snapshot relates to language usage. This seems to me to be more a question of the relations between a particular form for methodical description and some particular frame for usage. Presumably there is a relation born by natural language processing.


Okay, over at Wikipedia, the treatment of NLP includes:


Tasks and limitations


In theory, natural-language processing is a very attractive method of human-computer interaction. Early systems, such as SHRDLU, working in restricted “blocks worlds” with restricted vocabularies, worked extremely well, leading researchers to excessive optimism, which was soon lost when the systems were extended to more realistic situations with real-world ambiguity and complexity.


Natural-language understanding is sometimes referred to as an AI-complete problem, because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it. The definition of “understanding” is one of the major problems in natural-language processing. src


‘Context is to understanding, . . ‘


(I added Lexicalist to our links.)



A Slick PWN


(Pwn (below: Various pronunciations) is a leetspeak slang term derived from the verb “own”, as meaning to appropriate or to conquer to gain ownership. The term implies domination or humiliation of a rival, used primarily in the Internet-based video game culture to taunt an opponent who has just been soundly defeated (e.g., “You just got pwned!”). It was popular among Counter-Strike gamers before spreading through the more general Internet world. The past tense and past participle, pwned, may also be spelled pwnd, pwn’d, pwn3d, pwnt, poned, pawned, or powned. Source: Wikipedia )



Enterprising parodists on May 19 created a Twitter account and feed to mock BP, BPGlobalPR.


Chris Matyszczyk reports (5/26) from CNET,


CNN did contact BP and asked the company whether it might feel its image was being polluted by this rogue global PR force. BP reportedly said it had seen it, but was sure that people would realize it’s not really the company’s work.


Perhaps this underestimates people’s notions of what is and isn’t possible in today’s often ugly, cynical world.


Still, I know there will be sticklers among you who will attempt to invoke Twitter’s fake pages policy. It reads that impersonators “should not be the exact name of the subject of the parody, commentary, or fandom; to make it clearer, you should distinguish the account with a qualifier such as ‘not,’ ‘fake’ or ‘fan.’”


It’s unlikely Twitter will get too picky about this, given that it gets some nice PR (happy to help, as always, chaps) out of it all, and given that BP seems unlikely to complain. BP has made its first wise PR move in allowing this site to gush black humor while the nation’s beaches are threatened by a far more painful darkness.


90,000+ followers, and counting.


Sometime in the next few days, BPGlobalPR’s following will surpass in number BP’s number of employees worldwide.


BP America’s Twitter following? 8,000 or so.


Although the official feed doesn’t offer any black humor, it’s funny in a different way.


Old and New Net




click to enlarge



Web 3.0 from Kate Ray on Vimeo.


This video from Kate Ray quickly made the rounds.


What a long way the net has come. I suppose it necessary but gratuitous to add: ‘for better and for worse.’


There’s a moment in this interesting mash-up where the speaker implies the following: could we re-render human brain to think more like a machine? This follows from the difficulty of making a machine think like a human.


I had to look up the use of the term ontologies because I know little about information science, and, the its use in the video seemed to depart from the philosophical term. Here’s the treatment about ontologies at wikipedia.


There is nothing about the problems faced by the varieties of user. I’m a user and I know of the problems I encounter in searching for information, both on the internet, in libraries, and, on my own computer, in my own archive of documents.


I’ll mention three challenges. I’ll frame this by stating that I wish my computer-based archives and library archives were indexed by google.


(1) usually, (my) searches for information on google are satisfied. However, because the results are matched with the real-time indexing my cognition provides for, the end of a search on a given topic–usually in the social sciences–is arbitrarily terminated. In other words, I have conclusive idea that a given result is the optimum result. I’d also characterize my search methods using partly ad hoc heuristics.


(2) searches in my computer-based archive are brute force and leverage Spotlite’s ability to look into the text of every file, BUT, involve scanning through very long result lists, most of which are not positive. As a user, the labor intensive task of organizing files on my end is, ‘too much.’ And, fit to this is the ease with which information can be archived versus the labor involved in organizing it. Somewhat: the intuitive’s curse…


(3) The most difficult search of the web and internet resources are those that are very particular and very local. A good example would be somebody’s address. Searches oriented to topics do not fall into this category.


One other note–I would guess my own search capability falls into the highly capable slice of any Bell Curve. This guess is based in my understanding of how to use the specific editing features of google search. And, it’s based on observing how most other people use search. One of the challenges for the semantic web, given,


The Semantic Web is an evolving development of the World Wide Web in which the meaning (semantics) of information on the web is defined, making it possible for machines to process it.


is any useful, more powerful interface and facilitation, has to meet the different modes of differentiated users.


For example, I wouldn’t be skeptical of a machine’s ability to qualify results so that I could be confident I’ve reached the optimum set of results, but I’d like to know beforehand why I needn’t be skeptical. And, this would have to be presented to me at my level.


Who’s to Know?


Following from my previous post about methods for learning more about people encountered on the internet, The New York Times today features an article The Tell-All Generation Learns to Keep Things Off-line (Laura M. Holson; NYT 5-8:2010).


While participation in social networks is still strong, a survey released last month by the University of California, Berkeley, found that more than half the young adults questioned had become more concerned about privacy than they were five years ago — mirroring the number of people their parent’s age or older with that worry.


They are more diligent than older adults, however, in trying to protect themselves. In a new study to be released this month, the Pew Internet Project has found that people in their 20s exert more control over their digital reputations than older adults, more vigorously deleting unwanted posts and limiting information about themselves. “Social networking requires vigilance, not only in what you post, but what your friends post about you,” said Mary Madden, a senior research specialist who oversaw the study by Pew, which examines online behavior. “Now you are responsible for everything.”


One interesting question raised by the article–but not addressed–concerns how investigations into online ‘reputation,’ are framed by investigators.


In this article from Septmeber 2009, How HR Professionals Analyze Your Facebook Profile, author Damian Davila Rojas mentions a key finding from a Harris Interactive poll of HR professionals,


The findings were more likely to get candidates rejected than hired: 35% of HR professionals said social networking content had caused them to eliminate a candidate, while only 18% reported deciding to employ someone based on a profile.


There’s a graphic presented to represent the negative reasons for rejecting a job candidate based in their online data.


Of more interest to me is the positive graphic because it begs the question of how positive data is framed.


Here are the top three categories:


50% Got a good feel for the candidate’s personality, could see a good fit within the company culture

39% Job candidate’s background information supported their professional qualifications for the job

39% Job candidate’s site conveyed a professional image


Item #2 is the only element subject to neutral verification. Whereas item #1 begs the question about framing and instrumental approach, and, item #3 does the same while pointing in the direction of normative practices. Also, item #3, with respect to Facebook, can only mean a professional image within the limitations set by Facebook. This includes all the data from friends which flows into the person’s Facebook home page.


Hiring practices vary greatly. They can be very subjective and are subject to hidden cognitive biases. For example, the hunch is more a problem to be eliminated than a valuable instinct in this area.


Social media presents data about a person’s social network. This is not off limits to the hiring professional. Yet, this realm of data raises interesting questions.




Subscribe: Entries | Comments

Copyright © NetDynam 2.0 2010 | NetDynam 2.0 is proudly powered by WordPress and Ani World.

Proudly using Dynamic Headers by Nicasio WordPress Design