Digital repositories facilitate users in archiving digital documents. However, semantic heterogeneity in their content causes difficulties in retrieving relevant documents (Alipanah et al., 2010; Rinaldi, 2009; Lee and Soo, 2005; Khan et al., 2004; Blasio et al., 2004). Semantic heterogeneity refers to similar data that dgat inhibitor are represented differently in a document, for example, the use of the word author versus the word writer. There are different semantic heterogeneity issues such as polysemy and synonymy (Yang et al., 2011; Fang et al., 2005; Lee and Soo, 2005; Rodriguez and Egenhofer, 2003; Uschold and Gruninger, 2004). A synonym refers to a word that has the same meaning as another word; e.g., movie is a synonym of film. Polysemy refers to a word or phrase with multiple related meanings; e.g., a bank can refer to a financial institute in one context and a river corner/edge in another context. The main concern in information retrieval (IR) is to effectively retrieve relevant information from repositories.
Domain ontology provides a conceptual framework for the structured representation of context, through a common vocabulary in a particular domain (Bonino et al., 2004; Fang et al., 2005). The vocabulary usually includes concepts, relationships between concepts, and definitions of these concepts and relationships. For example, in a statement “Bilal works in HSBC,” Bilal and HSBC are concepts, and works is a relationship between these concepts. Moreover, ontology rules and axioms are also defined to define new concepts that can be introduced in ontology and to apply logical inference (Ding et al., 2004). Semantic similarity refers to semantic closeness, proximity, or nearness. It indicates similarity between different concepts and their relationships. There are three types of semantic similarity: (a) surface, (b) structure, and (c) thematic similarity (Poole et al., 1995; Zhong et al., 2002; Zhu et al., 2002; Montes-Y-Gomez et al., 2000). Surface and structure similarity focus individually on concepts and relationships, respectively, whereas thematic similarity considers the pattern (i.e., combination) of concepts and the relationship that exists among them. The term “keyword” stands for either a concept or relationship of domain ontology alternatively in this paper.
Existing typical semantic search systems (Bonino et al., 2004; Fang et al., 2005; Varelas et al., 2005) expand individual keywords through domain ontology to deal with different semantic heterogeneity challenges such as synonymy. For example, a search for the concept writer can be expanded through domain ontology to the keywords writer and author. The search, looking only for a keyword writer may have fewer results than the search looking for writer and author. The existing systems focus on matching the semantic similarity of individual keywords (i.e., they apply either surface or structure similarity) and apply Boolean operators if multiple keywords are given in a query. They ignore the semantic relationships that exist among the multiple keywords themselves.
More relevant documents for a multiple keywords query can be retrieved if systems know the meanings and relationships that exist among the multiple keywords themselves in the query. By keywords pattern, we mean a combination of at least two concepts and their relationship that exists in the domain ontology. A pattern can represent the context/theme, that is, circumstances in which something happens or should be considered. Therefore, the existing systems (Bonino et al., 2004; Fang et al., 2005; Varelas et al., 2005; Rinaldi, 2009; Alipanah et al., 2010; Yang et al., 2011) cannot resolve the semantic heterogeneity issue of polysemy because Writhing number requires identification of the context of keywords to comprehend their actual semantics. Moreover, the existing systems also ignore other important relationships, such as semantic neighborhoods (Rodriguez and Egenhofer, 2003), that can also contribute to useful search results.