Tagging is at the heart of Alacra Pulse – it’s the
technology that sifts through hundreds of thousands of stories and blog posts,
delivering just the relevant content to our users. In defining the requirements
for Alacra Pulse, we needed a technology which:
- Is adaptable, letting us move from a concept
like M&A to another like bankruptcies
- Is scalable, so we can deliver tagged news
stories in minutes, not hours, and provide the service to clients on a
cost-effective basis
- Can “understand” complex concepts like events
and not just return matches to keywords
- Is highly accurate; 60-70% may be sufficient for
search engines but would not meet the needs of our markets
A simple search engine can’t “understand” a concept like an
analyst comment on a company or differentiate between a new M&A rumor and a
story about a deal that occurred years prior. Due to the vagaries of language,
search engines cannot easily sense the difference between Company A buying
Company B and Company A buying a product from Company B.
The technology best suited for this type of task is semantic
tagging. This approach breaks apart sentences, looking to infer the meaning of
terms based upon the context in which they are used. Yet even the best semantic
taggers typically only achieve accuracy levels of 70-80%, which we knew would
be insufficient for this type of product.
Our solution to this challenge has three components:
1. We
start with a state-of-the-art semantic tagging engine, which we can use to
identify both entities (such as companies or people) and events (such as
M&A transactions). The semantic tagger splits each document into sentences,
identifies the parts of speech (nouns, verbs, etc), then seeks to match those
to known entities. Once it has identified the entities, it identifies relevant
events, based upon rules we have defined.
2. Next,
we add what we consider our “secret sauce”, Alacra’s knowledge base, which includes
specific information about companies, people, deals and more. For example, if
the tagger sees the word “Apple” and wants to know if it refers to Apple, Inc.,
we have vast information in the knowledge base, which we use to make that
determination. For example, we know Apple’s ticker is AAPL, its CEO is Steve
Jobs, it is headquartered in Cupertino, CA, it makes iPhones, iPods, MacBooks
and more; it’s in the digital media, computer and electronics industries; its
partners include AT&T, Rogers, Telus, Bell, Orange and Vodafone, and its
competitors include Microsoft, Google, Sony, Palm, RIM and others. Using all of
the information in the knowledge base, the tagger can assess whether this
mention is referring to Apple, Inc. or not.
Beyond simple company information, the
Alacra knowledge base contains information on analysts, their firms and their
coverage. We also maintain a proprietary database of M&A deals, so we can
accurately determine whether a story is about a real and current deal or not.
3. Finally,
we rely upon human review of the results. While technology is critical, in that
it allows the product to scale, technology alone will only reach accuracy
levels of 80-85%. That may be sufficient for some products, but when we’re
pushing out alerts to users, we believe the bar is higher. So, we have a
24-hour team of editors who review the tagged events to ensure they are
accurate.
Together, with use of semantic tagging, the Alacra knowledge
base and skilled editors provides the highest level of accuracy. Recent tests
show this approach yields precision in the 92-95% range, enabling Pulse users
to consistently see results which are relevant, timely and accurate.