Measuring Up: Web 2.0 poses difficulty for site traffickers

google analytics
By Jon Donley
A few years before Hurricane Katrina, New Orleans website NOLA.com served up more than 20 million page views on Fat Tuesday, the climax of an annual traffic spike fueled by the site’s live views of Mardi Gras festivities.
These numbers attracted a national liquor brand looking to ride the Mardi Gras wave. The advertiser bought a major sponsorship. By the next year, though, the advertiser was looking for new metrics. And although page views still soared, the metric didn’t meet the advertiser’s needs.
No matter what your goal — whether to reach a top ranking, understand how visitors use your site, or gather critical ROI data — the measurement of your site’s traffic patterns is critical.
It’s also problematic.
Mountains of data
There are three main ways to obtain data about a website’s visitors:
- Internal server logs – The site’s own records of all the activity occurring on its server;
- External logs – Data tracked by ad delivery and tracking servers and external analytics services, such as Google Analytics; and
- • Panel-based sampling – Tracking computer use of participants by an audience-measurement organization.
The Web generates mountains of data, even before the professional analysts start tossing in code snippets and other data generators. Sifting through this data and drawing sound conclusions about what it means is a quandary that has vexed developers since the infancy of the Web. And while the growing body of data offers the opportunity to make more informed decisions, each of the major data collection models is flawed in some way.
Things were simpler when the Web was in its infancy and online commerce was still a pipe dream. The only data available came from the website’s system administrators. Webmasters measured traffic by parsing the server logs for the total number of “hits” – or requests to the server for files of various types. This number was accurate, but didn’t reflect the true value of the site; not all “hits” are equal. A plain-text HTML page is a hit. Add a photo to the page; now it’s two file requests — two hits. Some webmasters packed pages with scores of tiny graphics, to run up the score.
As the Web moved steadily toward commercialization in the latter half of the 1990s, sites needed to give advertisers a more credible and intuitive traffic count. Site administrators provided “page views” by filtering out of the log reports all of the files that did not represent a web page. Graphics files – GIFs, JPGs, etc. – were filtered out. Page framework files – such as HTML, TXT, CFM and others — were left in. One Web page equaled one page view.
By the early 2000s, these measurements were no longer enough and advertisers began asking for proof on how many people actually were visiting the site and being exposed to their ads. The weakness of “page views” is that a large amount of traffic can be produced by a large number of visitors who only view a few pages — or it can be produced by a smaller number of people viewing a much larger number of pages.
The latter was the case with NOLA.com’s Mardi Gras traffic, which skewed heavily toward visitors spending long periods of time watching one or more of the site’s live webcams in the French Quarter and on the city’s main parade route. These features produced a prodigious amount of page views per visitor.
But the online advertising environment was changing. The most popular method of buying ads, based on a cost per thousand ad impressions, was falling out of favor supplanted by packages that paid for “unique users.” (A related measurement is “visits” or “sessions.” A visitor arriving at a site and browsing on any number of pages generates a single session.)
The “unique user” count has its own weaknesses. With significant numbers of visitors reaching a website through an organizational network, the tracking of IP addresses isn’t a reliable measure of an individual user. To the extent users are tracked by cookies, software that blocks or deletes cookies throws the measurements off. And the unique user metric didn’t take Web 2.0 into account.
In the Web 2.0 world, in which some youth spend more time watching YouTube videos than watching television, advertisers and marketers now want to revisit how much time users spend on a site.
Web 2.0 in general has been a challenge for those involved in measurement and analysis of site traffic. A traditional webcam — whether updating stills or live video streams — counts traffic by refreshing pages, either a still photo content frame, or a tiny graphic refreshing in a small frame around the video stream. This method doesn’t apply to many popular Web 2.0 features. Ajax, the cluster of technologies that enables many Web 2.0 tools, for example, can refresh content in a portion of the page without the site-log triggering activity of page-refreshing.
Major Web analysis vendors only recently have begun addressing such challenges. And other trends — including the rise of RSS feeds and the growing number of people accessing the Web via mobile devices — further muddy the water.
Audience measurement standards
In 1996, a coalition of advertising groups formed the Interactive Advertising Bureau, dedicated to education and development of standards for the newly hatched Web advertising industry. At the time, would-be advertisers were dependent on website sales reps for information about the site’s audience.
While page views and unique user stats were steps forward, the real issue was that the figures depended on Web publishers with a conflict of interest — the higher the stats, the more opportunity to make money. Beginning with online advertising delivery and tracking services, such as DoubleClick, advertisers and marketers began taking control of data. Such services could generate their own counts of ad delivery, instead of trusting the site sales staff. Such third-party tracking also served the website’s interests, by providing proof it had delivered.
Companies specializing in Web measurement and analysis have become a huge industry, providing a growing number of services to guide both site owners and the advertising/marketing community in designing campaigns, attracting users and getting the biggest bang for the buck.
Heavy hitters include ComScore, Nielsen Online, Omniture, and WebTrends, which provides analytics for the IAB’s own website.
Heavy hitters face heat
Even at this level, the Web traffic figures have drawn complaints. Amid significant criticism from advertising groups about gross disparities between their own server logs and figures being reported by ComScore and Nielsen Online, the IAB demanded that the two companies submit to an audit by the Media Rating Council a group that accredits rating organizations. Both companies submitted themselves to the audits, and both companies are currently under review for accreditation.
These two industry leaders depend on user panels to generate their data, much as the Nielsen Ratings service has for decades generated television audience data. Desktop tracking software records users’ surfing habits, and the companies extrapolate that sampling into overall traffic figures. Critics claim, however, that the panels have been populated in ways that don’t reflect the Internet audience, households with more than one computer and the growing number of households opting out of traditional telephone service in favor of cell phones only.
In his 2007 demand for audits, IAB president and CEO Randall Rothenberg questioned the validity of measurement models that were created for radio in the 1930s. “To persist in using panels that potentially undercount or ignore the diverse populations that are the future of consumer marketing is to deny marketers the insights they need to build their businesses,” he says. “And it certainly appears to us as if these audiences are being undercounted or disregarded, for our members’ server logs continue to diverge starkly from your companies’ sample-based assessments, by 2x to 3x magnitudes, in some cases far beyond any legitimate margin of sampling error.”
As they near the end of their audit and certification ordeal, both companies have announced the imminent rollout of enhanced products and methodologies, some of which directly address concerns raised by the IAB, and both obviously targeting the doubts about their credibility over the past several years.
Nielsen Online announced last week that a U.S. version of its latest NetView product will be rolled out in July. The company has expanded its number of U.S. panelists from 28,000 to more than 200,000, chosen to more accurately reflect American demographics. The upgraded service also addresses issues of multiple computers in a household and cell phone-only households.
At the end of May, ComScore unveiled its own new product. Media Metrix 360 is a “panel-centric hybrid solution to digital audience measurement.” The new model maintains its panel-based strategy, but promises to “reconcile traffic metrics reported from client server-side and ad server data.” Media Metrix 360 starts collecting data in the United States and Canada in July, which will be published in August. The new model will roll out globally throughout the year.
For clients with big budgets, the retooling of these major players is good news. It’s also good news for the Internet advertising industry in general – budgets are being choked by the recession, and anything that casts doubt on the ability to properly track ROI is deadly.
Also, all of the major players have developed workarounds for tracking Web 2.0 features. This helps everyone.
Budget solutions for thin wallets

Awstats
For those without the finances to purchase the proprietary data produced by ComScore and Nielson, the software-as-service of Omniture’s Site Catalyst or the tracking and analysis services of WebTrends, there are still options available.
Log File Analysis — Your server is continuously logging a wealth of data about users on your site. You may not be able to sway potential advertisers with it (although advertisers and marketers, as noted above, insist their own server log files are gospel), but as a website owner, you can take full advantage of the data. Viewing raw log files is only for masochistic hyper-geeks, but an array of software slices and dices it into charts and graphs that can show page views, unique visitors, sessions, geographic location, operating systems, browser versions, most popular pages, top entry and exit pages, external links that brought the visitor to your site, search engine-initiated traffic and top search terms used to find your site.
If you are buying web hosting, you likely already have the popular (and free) AWStats or a similar graphical reporting tool as part of your package. If not, you can download it free.
If you can afford a bit of cash, SurfStats offers a nice SurfStats Site (Log) Traffic Analyzer and a tool for viewing live statistics called SurfStats Live. Both are available as 30-day trial downloads. Each costs $95 for the standard edition.
Third-party analytics – One of the ways to beef up and supplement log file analysis is through page-tagging, a code snippet that sends information to a third-party server for analysis. The biggest bargain on the Web is Google Analytics, which can be used free by anyone with a Google ID. Yahoo! Web Analytics has been rated as a better service by some industry experts, but it can only be used by advertising and hosted merchant clients. If you are a Yahoo! advertising client, it’s a great fringe benefit.
For others, though, Google Analytics offers an impressive array of information about your visitors and their habits. The administrative dashboard allows you to set up multiple websites for tracking, and provides a unique tracking code snippet for each site. The code can be placed into a sitewide template, such as a footer, or in specific pages, depending on what you’re trying to track.
Jon Donley, a 30-year newsroom veteran, founded three major metro newspaper websites. His specialty is community engagement and next-generation models of journalism. jon@dawnsinger.com


Comments
2 Responses to “Measuring Up: Web 2.0 poses difficulty for site traffickers”Trackbacks
Check out what others are saying about this post...[...] article "Measuring Up: Web 2.0 poses difficulty for site traffickers" sketches the history of web traffic metrics and the state of the [...]
[...] more: Measuring Up: Web 2.0 poses difficulty for site traffickers … [...]