Feeds:
Posts
Comments

Do you know Google Trends?

As Google Trends provides insights into broad search patterns, today I tried to search for “information retrieval” stats.

The results are very interesting:

Where:

ZyLAB Unveils Major Enhancements to XML-Based Document Management and Information Retrieval Solution
CMSWire – Mar 23 2004
A View On Google’s Patent: Information Retrieval Based On Historical Data
WebProNews – May 12 2005
Nomura selects Autonomy’s information retrieval technology
Finextra (press release) – Sep 12 2006
“Research Meets Practice” Information Retrieval Symposium 2008
MarketWatch – Nov 4 2008
Research and Markets: Global Outlook Report on the Information Retrieval Services Industry
Reuters – Nov 3 2009
Research and Markets: Information Retrieval – SciFinder, 2nd Edition
Financial Post – Jul 14 2010

The Top 5 Regions for “Information Retrieval” are:

1. South Korea
2. Taiwan
3. India
4. Iran
5. Malaysia
As for the Languages:
1. Korean
2. Indonesian
3. Chinese
4. Greek
5. Thai
6. English

 Should I start to learn Korean? 아니면, 내가 뭐라고해야 정말 한국어 배우기 시작할까요?

UK internet users now spend 64% more time using search engines (31 million hours per month in April 2010) than they did 3 years ago.

(UKOM, May 2010)

We are currently crossing the 1 zettabyte mark for the total amount of the world’s digital information (That’s a million milion gigabytes!). Digital information grew by 62% in 2009 to 800,000 petabytes (1 million gigabytes). This amount could be stored on 75 billion iPads, and is the equivalent of a century’s worth of constant tweeting by every man, woman and child.

(The Guardian, May 2010)

It is estimated that by 2012, 90% of data will be video

(Cisco as cited by www.readwriteweb.com, July 2010)

It is estimated that globally, over 62 million consumers will have internet access in their cars by 2016. This compares to the 970,000 consumers with access in 2009

(iSuppli Corporation as cited by eMarketer, June 2010)


Can computers learn to read? We think so. “Read the Web” is a research project that attempts to create a computer system that learns over time to read the web. Since January 2010, a computer system called NELL (Never-Ending Language Learner) has been running continuously, attempting to perform two tasks each day:

  • First, it attempts to “read,” or extract facts from text found in hundreds of millions of web pages (e.g., playsInstrument(George_Harrison, guitar)).
  • Second, it attempts to improve its reading competence, so that tomorrow it can extract more facts from the web, more accurately.

At present, NELL has accumulated a knowledge base of 644,836 beliefs that it has read from various web pages. It is not perfect, but NELL is learning. You can track NELL’s progress on @cmunell on Twitter, browse and download its knowledge base, read more about our technical approach, or join the discussion group.

Source


Hi again,

When you’ve got a question and no clue about the answer, you always ask the internet.But where do you start?

For the  majority of our questions we use Google. It’s simple, fast, and most of all it’s very effective.

Nevertheless, there are another not so well known solutions:

Stack Exchange is a fast-growing network of 51 question and answer sites on diverse topics. Basically, it is an expert knowledge exchange place where physics researchers can ask each other about quantum entanglement, computer programmers can ask about JavaScript date formats, and photographers can share knowledge about taking great pictures in the snow.

After someone asks a question, members of the community propose answers. Others vote on those answers. Very quickly, the answers with the most votes rise to the top. You don’t have to read through a lot of discussion to find the best answer.

Like topics on Wikipedia, questions and answers on Stack Exchange can be edited. If someone writes the beginning of a great answer, someone else can embellish it and make it even better. Useful tool.

Quora is a question and answer community that has a goal of building answer pages for virtually every question you could think to ask. A continually improving collection of questions and answers created, edited, and organized by everyone who uses it.

If a particular topic interests you, or even another member of Quora, you can follow it/them to keep yourself up to date.


Hi everyone!

So, I’ve been working since a couple of days on a new feature for Question and Answering systems regarding context.

The main idea is to keep the topic of discussion from one user, so that he can be able to query the QA system with shortened questions.

Here is an example scenario:

  1. What is Walmart? (Walmart is …)
  2. and Tesco? (Tesco is …)
  3. and who is the president of it? (The president of Tesco is …)
  4. and how many stores has it? (Tesco has … stores …)

basically, in order to perform this task, we only need to keep the Target Object and the Type of Question from each one of the questions. Then, whenever a context questions is detected (question starts with the word and), two scenarios can happen:

  1. and <New Target Object>
  2. and <New Question Type where the new Target Object  is he, she or it>

So, the work to be done is respectively:

  1. Query the system with the old Question Type plus update the Old Target Object by the New Target Object.
  2. Query the system with the new Question Type plus keep the Old Target Object as New Target Object.

and voilá…

Another interesting feature that I tried following the scope of this project, is to implement a standard spell checker on the query system so that:

  1. User: What is Xalmart?
  2. QA system: Since the system cannot answer that question, QA system queries if the user is he meant “What is Walmart” (search on dictionaries in order to perform a Query Reformulation)
  3. User: Yes
  4. QA system: Re-query the engine with the new query: What is Walmart ?
  5. Answer: Walmart is …

I already customized the Open Ephyra Question and Answering standard framework with these features, and it works like a charm :)

Any suggestions for more expansions of standard QA systems regarding context? Looking forward to hear from you!


As new technologies and solutions tend to appear, the search queries also tend to evolute, as you can see with the voice search example.

With voice interface, users tend to pose queries that resemble natural language with many function words than just a sequence of keywords: ‘starbucks in chicago’ vs ‘starbucks chicago’. In fact, some popular prepositions (like ‘in’ and ‘at’) appear twice as frequently in the voice sample data set as in the other samples. Nevertheless, ease of input seems to make voice queries a lot more descriptive than typed queries. In terms of search query category, voice queries increased on retail (5.8%), local (2.9%), automotive (2.6%), and finance (1.2%) categories, while decreasing on (electronic gad- gets) signi ficantly, in comparison to typed mobile queries. Compared to computer web queries, voice queries increased on local (11.1%) and retail (1.6%), while decreasing on electronic gadgets (4.2%) and Health (1%) categories. We hypothesize this difference can be explained by the user behavior that mobile users may use voice queries when they have to, yet are distracted and can not type. Some examples of such situation could be where they are lost while driving, or where they need to find and call a place quickly.

Source

With the technologies evolution, posed queries tend to be more and more similar to natural language. So, shouldn’t we find solutions to better support those queries?


Global social network ad revenues make up about about 10% of worldwide online ad spending, or about $8 billion. (Financial Times , March 2011)

52% of internet users search for a brand on a search engine in response to seeing a TV adThinkbox and Decipher “Tellyport” (New Media Age,December 2010)

Among people with internet access worldwide, 61% access daily. 54% watch TV daily vs. 36% listening to the radio and 32% read papers (TNS as cited by Digital Strategy Consulting, October 2010)

Globally, there are a total of 1.8 billion Internet users. (royal.pingdom.com, July 2010)

Every day there are more than a billion searches on Google for information. Every week, more than a billion people search on Google for information. (Google data, September 2010)

Follow

Get every new post delivered to your Inbox.