Feeds:
Posts
Comments

Archive for April, 2010

Many people are more comfortable formulating search queries in their own language but have difficulty typing these queries into Google (try typing नमस्ते on a keyboard with English letters). To overcome the difficulty they face in typing in their local language scripts, some people have resorted to copying and pasting from other sites and from [...]

Read Full Post »

In Exploring OpenEphyra – Query Generation, was given a small overview about  OpenEphyra’s Question Generation Model. In the last days I was exploring the framework, and reading some papers about it. After some experiments I got the first results, which made me very impressed. Let’s see more details … OpenEphyra is based on the  assumption [...]

Read Full Post »

I’m still looking for the  holly grail to convert natural language to RDF. Today I was  reading some interesting stuff about Cypher Natural Language to RDF/SPARQL transcoder. Cypher is an AI program that generates the .rdf (RDF graph) and .serql (SeRQL query) representations of plain language input, allowing users to speak plain language to update [...]

Read Full Post »

Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika. A nice feature is the AutoDetectParser, which will automagically detect the [...]

Read Full Post »

W3C Cheat Sheet

Back in November Dominique Hazaël-Massieux, provided a small W3C Cheat Sheet, a compact, mobile-friendly Web application that allows to look up keywords in various W3C specifications, as well as to access various guidelines and best practices. Now it provided a updated version, with a number of improvements: a new layout with improved user interactions. the [...]

Read Full Post »

NLP2RDF is a framework that integrates multiple NLP tools and linguistic ontologies in order to explicate implicit meaning of natural language by means of RDF/OWL descriptions. Natural language ( a character sequence with implicit knowledge) is converted into a more expressive formalism – in this case OWL-DL – aiming to grasp the underlying meaning. This [...]

Read Full Post »

Google‘s Public Data Explorer is a Google Labs project for a data visualization tool, aims to make datasets easy to communicate and explore, allowing navigation between different views, comparisons of results and sharing of findings. Google has compiled a list of datasets that you can explore and embed into blogs and webpages now, such as [...]

Read Full Post »

“Microsoft posted strong results for the third quarter of its 2010 fiscal year, largely thanks to sales of Windows 7. But the company continues to suffer heavy losses in its Online Services Division as it tries to match Google in the online search and advertising market. … The division’s quarterly loss grew by 73 percent [...]

Read Full Post »

How can I index PDF documents, using  Apache Lucene? Lucene can index anything that can be converted to String and fed to it through its API. So, to index PDF files, first we need to parse them in order to extract text that we want to index. Here are some PDF parsers that can help [...]

Read Full Post »

Kaggle is a platform for forecasting and host data mining competitions. Predictions are critical to most organizations. Retailers predict their sales to optimize inventory; insurance companies predict which claims are candidates for fraud investigations; and fund managers predict asset prices to maximize their clients’ wealth. Kaggle provides a platform for data-prediction competitions allowing organizations to [...]

Read Full Post »

Older Posts »

Follow

Get every new post delivered to your Inbox.