Visibility metrics in the news media – a few observations

Before I present and briefly discuss the data, let me clarify a few notions.

By metric, I mean "some abstraction that can be
measured." What you weight is a simple metric. How happy you are is
not. In this post we deal with a simple metric — how visible is a
concept (say… football) in the news media.

By measure I mean applying an instrument to a metric. Up on a
scale and we can make a reading of how much you weight. In this post,
the instruments are news search engines. 

By index I mean an aggregate of measures. The main purpose of
an index is to yield a more reliable figure. In this post I report on
the following news engines : {Ask, AllTheWeb, Bloglines, Google, Factiva, Live, Topix and Yahoo!}

By reliable I mean that an instrument produces consistent
estimates of a metric for a specific concept, i.e. if you take the
measure twice, you should get the same value.

By robust I mean that an index is designed in such a way as to appropriately discount bogus values. More about this in a future post.

To follow on the sports example I introduced a couple of posts back, Table 1
shows the number of results returned by each news search engine for 5
sports. Whenever possible, the search specified a single day (October
1st), but no region nor language restriction.

Two observations should be made. First, there are fairly large
differences across engines. Not considering Ask, Live nor Topix’s
figures (see notes (1) and (3)), counts range from a low of 2,293 news
items to a high of 6,740 news items for football. Second, all engines
produce consistent estimates — on successive requests, counts returned
by an engine will generally not vary by much.

Consistent yet different counts need not be a matter of concern if
counts vary merely by some fixed proportion (i.e. Yahoo! always
returning more items than Factiva). If it were not the case (i.e.
sometimes, Yahoo! claims more items, sometimes Factiva does), then
there would be a problem. To answer this, we must correlate instruments
(the search engines) across concepts (the sports).

Table 2 shows how results correlate across news search
engines. We can readily see that Ask and Live are poor indicators in
our example, because counts have reached their ceilings, therefore
providing no information on relative visibility — according to Ask,
cricket is the most visible sport, dubious as all other engines but one
put it at the bottom. We can also see that Topix’s overall stock of
news items correlates as well as any other engine.

Table 3 shows the customary reliability statistics for our
array of measures (which could become an index).  The Cronbach’s alpha
value (more or less similar to the average correlation between items,
varying between zero for pure noise, up to one for totally reliable)
can be very high. As high as 0.992 if we merely remove counts returned by Ask and Live.

So far so good, but not good enough as the concepts used in this
example are in no way representative of the search universe. In my next
post I’ll present results derived from a much larger and reasonably
diverse sample of concepts.

                                                               
                                                                      
         

Table 1

Untitled1

Notes:
(1) Notice that Ask and Live cap results. Ask will not
return counts above about 800 pages; Live will not return counts above 1100 pages or so.
(2) The results figure provided by Google News
appears to refer to the total stock of active news items (presumably a
month). If you click to list news items of the past week, day or hour,
the results count stays the same. The numbers I report are my own
estimates, based on the number of items available or the rate at which
the past 1000 items have been published.
(3) Topix doesn’t allow narrowing the search to a specific day or date. Figures refer to the stock of active news items
(4)
MAPA stands for Mean Absolute Percent Accuracy. It is computed as (1 –
MAPE) where MAPE is the well know Mean Absolute Percent Error,
routinely used to compare various forecasting methods. These figures
were computed based on 5 bursts of 50 consecutive requests made at
intervals of less than 1 second (because differences between longer
intervals might arise because real changes have affected the underlying
metric, i.e. new news coming in, old news moving out).

Table 2

Untitled2

Table 3

Untitled3

Note: (5) The 6-item index excludes Topix, Ask and Live ; the 7-item index excludes Ask and Live