Many people are more comfortable formulating search queries in their own language but have difficulty typing these queries into Google (try typing नमस्ते on a keyboard with English letters). To overcome the difficulty they face in typing in their local language scripts, some people have resorted to copying and pasting from other sites and from [...]
Archive for April, 2010
Integrating virtual keyboards in Google search
Posted in information retrieval, tagged google, information on April 30, 2010 | Leave a Comment »
OpenEphyra – Question Interpretation – PART I
Posted in question and answering, tools, tagged NLP question, OpenEphyra, Question generation, Question Interpretation, Question Parsing in Java on April 30, 2010 | 1 Comment »
In Exploring OpenEphyra – Query Generation, was given a small overview about OpenEphyra’s Question Generation Model. In the last days I was exploring the framework, and reading some papers about it. After some experiments I got the first results, which made me very impressed. Let’s see more details … OpenEphyra is based on the assumption [...]
Cypher Natural Language to RDF/SPARQL transcoder – Wanted! Dead or Alive!
Posted in ontology, Text Annotation, Text Extraction, tools, tagged Cyper, Help, NLP to RDF, RDF, SPARQL on April 29, 2010 | 1 Comment »
I’m still looking for the holly grail to convert natural language to RDF. Today I was reading some interesting stuff about Cypher Natural Language to RDF/SPARQL transcoder. Cypher is an AI program that generates the .rdf (RDF graph) and .serql (SeRQL query) representations of plain language input, allowing users to speak plain language to update [...]
Lucene: How to index pdf files? (part2) Apache Tika
Posted in information retrieval, tools, tagged apache, index, java, lucene, maven 2, pdf, tika on April 29, 2010 | Leave a Comment »
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika. A nice feature is the AutoDetectParser, which will automagically detect the [...]
W3C Cheat Sheet
Posted in news, tagged cheat sheet, opensearch, standarts, w3c on April 29, 2010 | 1 Comment »
Back in November Dominique Hazaël-Massieux, provided a small W3C Cheat Sheet, a compact, mobile-friendly Web application that allows to look up keywords in various W3C specifications, as well as to access various guidelines and best practices. Now it provided a updated version, with a number of improvements: a new layout with improved user interactions. the [...]
Tools: NLP2RDF
Posted in natural language processing, ontology, tools, tagged natural language processing, nlp2rdf, OWL, owld DL, RDF, stanford to rdf, triple extraction on April 28, 2010 | 3 Comments »
NLP2RDF is a framework that integrates multiple NLP tools and linguistic ontologies in order to explicate implicit meaning of natural language by means of RDF/OWL descriptions. Natural language ( a character sequence with implicit knowledge) is converted into a more expressive formalism – in this case OWL-DL – aiming to grasp the underlying meaning. This [...]
Public data explorer
Posted in news, Uncategorized, tagged dataset, google labs, public data explorer on April 28, 2010 | Leave a Comment »
Google‘s Public Data Explorer is a Google Labs project for a data visualization tool, aims to make datasets easy to communicate and explore, allowing navigation between different views, comparisons of results and sharing of findings. Google has compiled a list of datasets that you can explore and embed into blogs and webpages now, such as [...]
They feud of folllowing the leader
Posted in news, tagged bing, google, microsoft on April 28, 2010 | Leave a Comment »
“Microsoft posted strong results for the third quarter of its 2010 fiscal year, largely thanks to sales of Windows 7. But the company continues to suffer heavy losses in its Online Services Division as it tries to match Google in the online search and advertising market. … The division’s quarterly loss grew by 73 percent [...]
Lucene: How to index pdf files?
Posted in information retrieval, tools, tagged apache, html, indexing, java, lucene, pdf on April 28, 2010 | 2 Comments »
How can I index PDF documents, using Apache Lucene? Lucene can index anything that can be converted to String and fed to it through its API. So, to index PDF files, first we need to parse them in order to extract text that we want to index. Here are some PDF parsers that can help [...]
Host your data mining competitions
Posted in news, tagged competition, Data Mining, kaggle, netflix, tex mining on April 28, 2010 | 1 Comment »
Kaggle is a platform for forecasting and host data mining competitions. Predictions are critical to most organizations. Retailers predict their sales to optimize inventory; insurance companies predict which claims are candidates for fraud investigations; and fund managers predict asset prices to maximize their clients’ wealth. Kaggle provides a platform for data-prediction competitions allowing organizations to [...]
