“The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” - Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001
The World Wide Web is the biggest repository of information ever created and according to the W3C consortium, the Web can only reach its full potential only if data can be shared, processed, and understood by automated tools as well as by people .
- “Stairway to Heaven” is a song by the English rock band Led Zeppelin.
- Led Zeppelin were an English rock band formed in 1968 by Jimmy Page.
The statements can be easily understood by people with basic knowledge of English, but they are not trivial to computers. They follow a syntax (grammatical rules for specifying correct word order and inflectional structure in a sentence), but how can convey semantic (grammatical rules for assigning meaning to a sentence)?
The core process is centered in describing properties of things (shape, colour, size), and relationships between things (x is member of y).
While XML (Extensible Markup Language) is accepted within the NLP (Natural language programming) community as the main format to describe data and relationships between data items, especially when emphasis is in simplicity, interoperability and usability, the use of Semantic Web Stack [Figure 1] technologies and tools like RDF (Resource Definition Framework), RDFS (RDF Schema), and OWL (Ontology Web Language) in NLP applications, is still limited.
Figure 1 – Semantic Web stack.
The Web Ontology Language Overview is supposed to be used when information is processed by a machine, and only then presented to humans. It represents the meaning of terms in vocabularies by describing functions and relationship and it’s known as ontology.
OWL is created to meet Web Ontology Language requirements:
- XML provides an essential syntax for content structure within documents, yet associates no semantics meaning or constrains to the document.
- XML Schema and DTD is a language. They provide a means for defining the structure, content and semantics of XML documents.
- RDF is a language for describing data models for objects(“resources”) and relations, providing basic semantics. It can be expressed in XML syntax.
- RDF Schema like XML schemas provide means of defining the structure and content of properties and classes in RDF-based resources. It has specific semantics for hierarchies of properties and classes.
- OWL extends RDFS by adding more advanced constructs to describe semantics of RDF statements, like relations between classes (e.g. disjointness), cardinality, equality, restrictions of values, richer typing of properties and characteristics of properties (e.g. symmetry and transitivity), and enumerated classes. It’s based on description logic and brings reasoning power to the semantic web.
- SPARQL is a RDF query language, and queries RDFS, OWL and any RDF-based data. Used to retrieve semantic data.
Currently RIF (Rule Interchange Format) is evolved in an ongoing process of standardizations, designed to enable interoperability among rule languages in general. The layers of “Unifying Logic” and “Proof” are undergoing active research and have not been fully implemented.
The original vision of semantic web hasn’t been achieved, but we are progressing towards the desired result. It already helps in the production of more connectible, interoperable, and adaptable software, while keeping maintenance cheap and easy.
As researched by Leo Sauermann, positive results have been obtained while using the semantic web, namely increased profit (better customer satisfaction, shareholder value, user work support), data integration (a consistent data model to build upon. integrate content from different organizations, providers and departments, disparate data sources, legacy data), better querying systems, and taxonomies(categorization, or classification, of things based on a predetermined system).
The main problem of semantic is how to avoid and manage:
- Vastness – information increases at a exponential rate. Automated reasoning system are require to manage so vast volumes of inputs.
- Vagueness – undefinition of certain concepts, generate uncertainty in queries. By using fuzzy logic, some uncertainty can be removed.
- Uncertainty – uncertain values for of variables, different results are obtained (each synthoms in a medical report, can have different probabilities attributed, which can lead to different diagnoses). The can be mitigated with the user of probabilistic models and reasoning methods.
- Inconsistency – when large ontologies are used or diverse documentation corpus, contradiction between documents or inside ontologies are bound to happen. Defensible reasoning and consistent reasoning are two techniques which can be employed to deal with inconsistency.
- Deceit – With the rise of data sources, some producers will produce information intentionally, that is misleading and damaging to consumer.
The implementation of “unifying logic” and “proof” layers is still an ongoing process.
Read Full Post »