Semantic memory

Human understanding of the world would be impossible without semantic memory , storing structured representations of knowledge about the world entities, concepts and relations between them. Semantic memory is a permanent storage of conceptual data. “Permanent” means that data is collected throughout the whole lifetime of the system, even though old information can be overridden or corrected by newer input. “Conceptual” means that this type of memory contains semantic relations between words and uses them to create concept definitions. Semantic memory in practical applications should be a container for storage, efficient retrieval and information mining. Two approaches have been used here to realise it: Collins and Quillian hierarchic model of semantic memory and Collins and Loftus spreading activation model [8]. Our implementation is based on the connectionist part of this model and uses relational knowledge base and natural-language-processing toolkit. It has been generated automatically from the large corpus of about 700,000 sentences collected in the Open Mind Common Sense Project, a Worldwide-web based collaboration

in which over 14,000 authors typed all kinds of obvious “common sense” facts. The concise ConceptNet knowledge-base has 200,000 assertions and the full base contains 1.6 million assertions. These assertions cover the spatial, physical, social, temporal, and psychological aspects of everyday life. They capture a wide range of commonsense concepts and relations in a simple, easy to use semantic network, like WordNet, which has been used as the third main source of data.

WordNet is the largest hand-crafted project of its kind, with more than 200,000 words-sense pairs. It may be described as “a lexical reference system whose design is inspired by current psycho-linguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets” . ConceptNet is focused on concepts, while WordNet is focused more on words. ConceptNet has more diverse relational ontology thenWordNet, facilitating creation of practical, context-oriented, commonsense inferences for processing of real-world texts. Information from individual sources was loaded separately into its own workspace. Functions are provided to combine and match it for further processing. The most basic workspace used for most of further calculations is based on the IS-A relation imported from WordNet hypernymic relations (this is “a kind of ” relation). To save storage and processing time in initial computational experiments objects and keywords were limited to animal kingdom only. Hyponym and meronym relations from WordNet were also added. Note that WordNet defines relations between synsets (synonym sets), not individual concepts. Other dictionaries use only words, so for compatibility all WordNet data was converted into words before storing. This enables adding this information to relations stored in ConceptNet. Relation types such as: CapableOf, PropertyOf, PartOf, MadeOf, have been imported. The ConceptNet IS-A relation and Sumo/Milo ontology served as verification for a given a priori Word-Net hypernimic relations. The effect of this approach was enhancing factors of ontological relations and bringing up the most characteristic of them. WordNet and Concept- Net relations were then compared, and new types of relations were created, including only those pairs (conceptkeyword) that were considered related in both dictionaries.

Linguistic competence of systems using Semantic Memory depends greatly on amount and quality of information stored in the memory. Initial set of data was created from several available dictionaries that could be processed automatically. They do not, however, contain full information necessary for more complex human computer conversation. Therefore it is necessary to extend the semantic network with new nodes (features and concepts) and relations between them. This task can be performed in two ways: by querying humans in a conversation, or by automatic extraction from text corpora.