User Tools

Site Tools


gellish_databases

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
gellish_databases [2013/01/12 21:53]
andries
gellish_databases [2018/10/31 16:27] (current)
andries [1. Universal Semantic Databases]
Line 1: Line 1:
 ====== 1. Universal Semantic Databases ====== ====== 1. Universal Semantic Databases ======
  
- +Gellish is a family of universally applicable ​languagesThe standardized structure ​or syntax ​of Gellish ​expressions ​is system independent ​and universally applicable, thus allowing ​for the expression ​of any ideaThis implies that Gellish ​enabled databases or data exchange messages (interfaces) ​are not application specific ​and thus they do not require the development ​of dedicated data models, nor dedicated database ​designsThis means that Gellish ​enables defining ​universal ​semantic databases and interfaces and enables ​that software ​can be reused ​in other applicationsThis is due to the Gellish ​syntax ​or universal ​data model and the standardization ​of the concept definitions and vocabulary.\\  ​ 
-Gellish is a universally applicable ​languageA Gellish Semantic Database ​or Gellish Data Exchange Message consists of a collection ​of Gellish ​Expressions. Every Gellish expression ​is an expression of a 'main fact' (a main statement or proposition) ​and a number of '​accessory facts' that are relevant ​for the correct interpretation ​of the main factTogether, the accessory facts form the "**//Gellish ​set of accessory facts// **". The set is comparable with the '​Dublin core'. The Gellish set is intended to be a complete set of accessory facts that are dedicated to an expression of one fact and do not imply additional main facts. +The [[https://github.com/AndriesSHP/Gellish|Gellish ​Communicator project]] on GitHub is an example of such universal Gellish ​database ​system 
- +The uniform data structure ​or syntax ​of the expressions is described ​in [[gellish_expression_format|the previous section]]\\
- +
-The structure or syntax ​of Gellish expressions is also universally applicable and does not require a dedicated data model, nor dedicated database ​design. Gellish ​expressions can be implemented in a simple syntax that consists of the structure of one universal ​Gellish Expression Table, whereas ​that table structure ​can be presented ​in various different forms, such as flat Unicode files, Excel spreadsheet tables, SQL tables or RDF/XML triple storesEach Gellish ​Database consists of one or more Gellish Expression Tables. Each of those tables has basically the same structure. and is standardised and is application system independent. This is different from conventional databases that usually have proprietary ​data structures, ​and that have database tables that are all different. Each of the Gellish Database tables shall contain at least the obligatory columns of one of the subsets of columns that are defined in the Gellish Database Definition document, which is summarised below. \\  The content of Gellish Database tables shall be compliant with the grammar and the dictionary of the formal Gellish English language (or a Gellish variant in any other natural language). The standardised tables, combined with the formal Gellish language enables to combine an arbitrary number of Gellish Database tables into one Database. Furthermore,​ such a database might be centralised,​ but can also be a //**distributed database** ​//. This also enables to combine the results of a [[:​querying_a_gellish_english_database|Gellish ​query]] to various //​independent//​ data stores, which then act as distributed ​database. ​\\  ​The various Gellish Database tables all have the same core of column definitions. Apart from that core, the tables may also have one or more of the optional columns. Preferred collections of columns are defined in standard Gellish database table subsets. +
- +
- +
-A Gellish Database may be implemented in various formats. It can be in the form of an SQL database, or in XML, or even in XLS (the form of Excel spreadsheet tables). +
  
 ====== 2. Limitations of conventional databases ====== ====== 2. Limitations of conventional databases ======
  
- +Conventional databases typically consist of many tables, each of which is composed of a number of columns. The definition of those tables and columns determine the storage capabilities of database, whereas the (implied and explicit) ​relations between the columns define the kinds of facts that can be stored in such a database. Those columns and relations determine the database structure that defines the expression capabilities of the database. Similar rules apply for the //structure // of data exchange files and thus for the information that is exchanged in electronic data files and interface messages. \\  This conventional database technology has some major constraints:​
-Conventional databases typically consist of many tables, each of which is composed of a number of columns. The definition of those tables and columns determine the storage capabilities of the database, whereas the relations between the columns define the kinds of facts that can be stored in such a database. Those columns and relations determine the database structure that defines the expression capabilities of the database. Similar rules apply for the //structure // of data exchange files and thus for the information that is exchanged in electronic data files. \\  This conventional database technology has some major constraints:​+
  
  
     * When data was not covered during the database design and thus is not included in the data model, then such data cannot be stored in the database nor exchanged via such a data file structure.     * When data was not covered during the database design and thus is not included in the data model, then such data cannot be stored in the database nor exchanged via such a data file structure.
     * Different databases have different data structures, which causes that data in one database cannot be integrated with data from other databases nor exchanged between databases without dedicated data conversion.     * Different databases have different data structures, which causes that data in one database cannot be integrated with data from other databases nor exchanged between databases without dedicated data conversion.
-    * A database modification or extension requires redesign of the database structure, modification of software and data conversion, which makes it a relatively complicated and costly exercise. +    * A database modification or extension requires redesign of the database structure, modification of software and data conversion, which makes such modifications ​a relatively complicated and costly exercise.
- +
- +
-Another characteristic of conventional databases is that there are hardy international standards available or used for the //content //    of the databases, being the data that is entered by its users. This typically means that local conventions are applied to limit the diversity of data that may be entered in those databases. As local conventions usually differ from other local conventions this has as disadvantage that data that are entered in one database cannot be compared or integrated with data in other databases, even if those database structures are the same and even if the application domain of the databases is the same. For example, within a company there may be various implementations of the same system in various sites for the storage of data about equipment, whereas for example the performance data about the same type of equipment still cannot be compared with the performance data in another location, because the equipment types have different names and the properties are also different. +
- +
- +
-====== 3. Characteristics of a Gellish Database ====== +
- +
- +
-A Gellish database does not have the semantic limitations that conventional databases have, because of the flexibility and openness of the Gellish language and because of its standard universal data structure (grammar), which is simple, computer and human interpretable. A Gellish database consists of one or more database tables, each of which has the same Gellish Expression Table structure (with standard column definitions). The fact that those Gellish Expression Tables are standardised and universally applicable makes a Gellish database application independent. A standardised Gellish Expression Table is universally applicable because it enables the application of the following fundamental principles:​ +
- +
- +
-    - Explicit classification of specializatExplicit classification of individual things or explicit specialisation of kinds of things (concepts, classes and types), with an unlimited number of kinds of things in a dictionary. +
-    - The Gellish Expression Table enables to store any kind of statement about any kind of object; because any individual object can be introduced by specification of an explicit classification relation between the object and a kind of thing, whereas kinds of things can be selected from an unlimited dictionary. The dictionary can be the existing large Gellish Formal English Dictionary-Taxonomy (or a subset of that) or it can be a proprietary or public extension of it.  This flexibility is fundamentally different from conventional databases that predefine the object types (classes) about which information can be stored by defining a limited number of entity types and attribute types in a fixed data model. +
-    - Explicit classification of relations (expressions of facts), by an extensible unlimited number of standardised relation types. +
-    - The Gellish Expression Table enables to store any kind of statement about any kind of object, because any fact can be expressed by a relation, whereas those relations are explicitly classified by relation types that can be selected from the standardised collection of relation types that are defined in the Gellish Dictionary-Taxonomy or by relation types that are added to the dictionary as proprietary extensions. This flexibility is also fundamentally different from conventional databases that predefine a fixed and limited number of relation types between the columns in the database tables (whereas unfortunately those relation types are usually defined only in an implicit way). +
- +
- +
-As a consequence,​ a Gellish database does not need to be modified or extended when the scope of an application changes and facts from different Gellish databases can be merged and integrated whenever required without a need for a conversion exercise. [[:br|br]] Furthermore the content of a Gellish Database uses a common Gellish Dictionary for all its data, including for example, equipment types, property types, document types, activity types, etc. +
- +
- +
-==== 3.1 Gellish Expressions in a Gellish Database ==== +
- +
- +
-A Gellish Database is a database that contains one or more standardised Gellish Database tables. Each such table contains the same predefined columns and is suitable for the expression of virtually any kind of fact such that is computer interpretable and system independent. The table can be implemented as an MSAccess database table, an SQL database table or simply as a standard table in a spreadsheet. The core of a Gellish Database table consists of three columns, just as is the case in RDF/​Notation 3. Each row with those three columns in such a table expresses a main (binary) fact. For example, the fact that the Eiffel tower is located in Paris can be expressed as follows: +
- +
- +
-|**Left hand object** ​ |**Relation type** ​ |**Right hand object** ​ | +
-|The Eiffel tower |is located in |Paris | +
-|The Eiffel tower |is classified as a |tower | +
-|Paris |is classified as a |city | +
- +
- +
-The left hand objects and the right hand objects may either be selected from the Gellish English dictionary or may be new proprietary objects that are introduced by defining them on separate lines. If such a new object is an individual thing, then it shall be defined by a classification relation with a class, as is done in the above table and if the nwe object is a class, then it shall be defined on a separate line by a specialisation relation with their direct supertype. The relation types (such as 'is located in' and 'is classified as a') shall be selected from the Gellish English dictionary, otherwise the expression cannot be called standard Gellish, but becomes a proprietary extension of Gellish English. +
- +
- +
-==== 3.2 Multi-language support ==== +
- +
- +
-Furthermore,​ a Gellish database structure supports the simultaneous use of multiple languages. This is enabled because a Gellish database table contains a separate column for the language in which a fact is expressed (see the example table below). Thus a Gellish database supports the use of various natural language specific versions of Gellish. In principle, there is a Gellish variant language for each natural language, depending on the availability of a translation of the Gellish concepts. For example, the Gellish English Dictionary defines Gellish English, and contains partial translations to Gellish Deutsch (German) and Gellish Nederlands (Dutch). International terminology (such as most units of measure and mathematical concepts) is included as International Gellish. +
- +
- +
-==== 3.3 Unique identifiers,​ homonyms, synonyms and automatic translation ==== +
- +
- +
-A Gellish database uses a unique identifier for each thing, irrespective whether it is a user object, a concept from the Gellish dictionary, a fact or a relation type. The following Gellish database table is an extended version of the above example and includes the language in which the fact is expressed as well as the identifiers of the objects. +
- +
- +
-|**Language** ​ |**UID of left hand objet** ​ |**Name of left hand object** ​ |**UID of fact** ​ |**UID of relation type** ​ |**Name of relation type** ​ |**UID of right hand object** ​ |**Name of right hand object** ​ | +
-|English |1 |The Eiffel tower |101 |5138 |is located in |2700887 |Paris | +
-|English |1 |The Eiffel tower |102 |1225 |is classified as a |40903 |tower | +
-|Dutch |1 |De Eiffel toren |103 |4691 |is a translation of |1 |The Eiffel tower | +
- +
- +
-The unique identifiers enable the use of synonyms and homonyms and enable that a computer can automatically translate a Gellish expression in a certain language into a Gellish expression in another language. This is caused by the fact that the meaning of a Gellish expression is captured as a relation between the unique identifiers,​ so that the meaning is language independent. \\  This adds automatic translation capabilities to Gellish expressions,​ because a Gellish message can be created e.g. in Gellish English whereas computer software can present it in another Gellish variant, such as Gellish Dutch if a dictionary or a translation is available, such as on the third line in the above table. +
- +
- +
-==== 3.4 Accessory facts ==== +
- +
- +
-A Gellish Expression Table has a number of additional columns that enable the expression of accessory facts or data about the main facts. For example, columns for: +
- +
- +
-    * a textual definition in natural language of the left hand object +
-    * the context in which the fact is valid +
-    * a unit of measure with its UID +
-    * the status of the fact (accepted, proposed, deleted, replaced, etc.) +
-    * the originator of the fact +
-    * the date of creation of the fact +
-    * etc. +
- +
- +
-These accessory facts are described in more detail in the next chapter. +
- +
- +
-====== 4. Gellish Expression Table Definition ====== +
- +
- +
-The document '​[[http://​www.gellish.net/​downloads/​file/​33-definition-of-gellishdatatables.html|Definition of Universal Semantic Databases and Data Exchange Messages]] ' defines the full set of columns in each Gellish Expression Table is a part ofa Gellish Database, Gellish Message or Gellish Query. The document also defines a number of standardized subsets for usage in applications that do not require the full number of columns. \\  One of those subsets, the Business Model subset, is suitable for nearly all database contents data exchange usecases that describe knowledge and propositions. It application range includes business communication about both designs (imaginary objects) as well as real world objects (observed individual objects) during their lifecycle and about enquiries, answers, orders, confirmations,​ etc. This table is a superset (indicated in **bold**) of the product model table, so it can also be used for knowledge about classes of objects. \\  This subset consists of over 30 standard table columns. +
- +
- +
-As sumary of the above document is given below. +
- +
- +
-==== 4.1 The Gellish Expression Table header definition ==== +
- +
- +
-Each Gellish Expression Table has in principle a table header, as illustrated in Figure 3, extended with additional columns as described in this paragraph. \\  A Gellish Expression Table can consist either of a complete set of columns or of one of a subset of columns. The document defines a number of standard subsets.\\ ​ Each column has a column ID and a column name and has a meaning as defined below. \\  Note that the presence of a value in a column field implies one or more relations with values in other columns. The semantics of these implied relations are specified in the definitions of the table columns. //Those relations define the (accessory) facts about the main fact!//  +
- +
- +
-If the table is implemented in a spreadsheet or ASCII or Unicode file, then the table starts with a header of three lines, as follows: +
- +
- +
-    * The first line contains a sequence of the following four fields A1, A2, A3 and A4, which shall contain the following text: +
- +
- +
-A1 = ’Gellish’ \\  A2 = Natural language of the expressions in the table. Default '​Formal English'​.\\ ​ A3 = ‘Version:​’ \\  A4 = version number of the applicable Gellish dictionary.\\ ​ A5 = date of the release of the facts in this table (optional). \\  followed by free text fields. +
- +
- +
-    * The second line contains the column ID’s which consists of standard numbers, although arbitrarily chosen. They allow the columns to be presented in a different sequence without loss of meaning (the numbers below correspond to those column ID’s). +
-    * The third line contains human readable text in every column field providing a short name of the column. This name is free text. +
- +
- +
-==== 4.2 The Gellish Expression Table body column definitions ==== +
- +
- +
-The lines (rows) in a Gellish Expression Table are independent of each other and thus the lines may be sorted in any sequence, without loss of semantics (meaning). +
- +
- +
-Each line in the body of a Gellish Expression Table (which in a spreadsheet starts on the fourth line) expresses a group of facts, which consists of a '//​main fact//' ​   and a number of '//​accessory facts//'​ that aredefined as follows. +
- +
- +
-**Main fact.** ​     \\  A main fact is expressed by a combination of the following objects (the column IDs' are given in brackets):​ +
- +
- +
-    * A UID of a main fact (1) +
-    * A UID of a left hand object (2) +
-    * A UID of a relation type (60) +
-    * A UID of a right hand object (15) +
-    * A UID of a scale (unit of measure) (66) +
-    * A UID of an intention (5) +
- +
- +
-**Prime accessory facts.**  +
- +
- +
-The prime accessory facts are represented by the following ​ table columns, each of which implies an expression by a triple of objects (which are implicitly classificied). The table columns are: +
- +
- +
-    * A UID of a left hand kind of role (72) +
-    * A UID of a right hand kind of role (74) +
-    * A pair of left hand object cardinalities (44) +
-    * A pair of right hand object cardinalities (45) +
-    * A UID of the accuracy of a quantification (76) +
-    * A UID of a pick list for the qualification of aspects (70) +
-    * A UID of the validity context for a fact (19) +
-    * A partial textual definition of a concept or individual thing (65) +
-    * A full textual definition of a concept or individual thing (4) +
-    * A textual description of a main fact (42) +
-    * Remarks on the expression of a main fact (14) +
-    * Approval status of the expression of a main fact (8) +
- +
- +
-**Secondary accessory facts.**  +
- +
- +
-The secondary accessory facts are represented by the following ​ table columns, each of which implies a triple of classified objects. These accessory facts form the context for the validity of the UID’s and the names for objects that are identified by their UID’s:+
  
  
-    * A reason for latest change ​of status +Another characteristic ​of conventional databases is that there are hardy international standards available or used for the //content //    of the databases, being the data that is entered by its users. This typically means that local conventions are applied ​in order to limit the diversity ​of data that may be entered in those databases. As local conventions usually differ from other local conventions this has as disadvantage that data that are entered in one database cannot be compared or integrated with data in other databases, even if those database structures are the same and even if the application domain ​of the databases is the same. For example, within a company there may be various implementations ​of the same system in various sites for the storage ​of data about equipment, whereas for example ​the performance data about the same type of equipment in different systems still cannot be compared with the performance data in other locations, because ​the equipment types have different names and the properties are also different.
-    * A UID of the successor of the fact, in cases it has the status '​replaced'​ +
-    * UID of creator of fact +
-    * Date-time of start of validity ​of the fact +
-    * Date-time of start of availability of the expression +
-    * Date-time of creation of copy +
-    * Date-time of latest change ​of the expression +
-    * UID of addressee of the expression +
-    * References +
-    * UID of the expression ​of the fact (Line UID) +
-    * UID of a collection of facts to which the fact belongs +
-    * A presentation sequence ​in which the expressions can be presented+
  
  
-The columns with UID's are accompanied by columns with a name for the thing that is represented by the UID.+====== 3Characteristics of Gellish enabled databases ======
  
  
-**Field formats ​and optionality** ​+A Gellish database does not have the semantic limitations that conventional databases have, because of the flexibility ​and openness of the Gellish language and because of its standard universal data structure (syntax), which is simple and computer as well as human interpretable. \\  A Gellish database consists of a representation of the content of one or more tables in the Gellish Expression Format. For example, the Gellish Communicator software system includes an object-oriented database that is implemented in the form of a network of binary relations '​objects'​ (in which an '​object'​ may be anything), whereas each object definition includes all the Gellish Expressions that are applicable for that object. The fact that those Gellish Expression Tables are standardized and universally applicable makes a Gellish database application independent. \\
  
 +The Gellish enabled database is universally applicable because it enables the application of the following fundamental principles:
  
-Several columns contain unique identifiers ​(UID’s). Each UID should preferably ​be represented ​by a 64-bit integer (8-byteInt64 or bigint'''​),''' ​whereas ​only positive values shall be used. It is not recommended to use an unsigned integer ​(which only allows positive valuesbecause SQL only enables the bigint datatype, ​which is signed. \\  ​Most other columns contain character //string values//. For database implementations it is indicated whether they have a fixed or variable length ​(//​nvarchar ​  // ​  or //varchar//or whether the string is externally stored (data types //​ntext ​  // ​  and //text//)In addition to that it is indicated whether the cells may contain Unicode. \\  ​Fields in columns that are indicated as optional may be left empty, in which case the indicated default value is applicableOtherwise a field value is obligatory.+    - Explicit classification and specialization relations.\\ ​ Explicit classification of individual things or explicit specialization of kinds of things ​(concepts, classes and types), with an unlimited number of kinds of things in a taxonomic dictionary. 
 +    - Unlimited extendability,​ formal and open.\\ ​ Gellish enables storing any kind of statement about any kind of thing; because any individual thing can be introduced ​by specifying an explicit classification relation between the individual thing and kind, whereas ​kinds can be selected from an unlimited open dictionary. The dictionary consists of a core of generic concepts. This core can be used in combination with the large Gellish Taxonomic Dictionary (or a subset of that) and can be combined with a proprietary or public extensions and translations This flexibility ​is fundamentally different from conventional databases that predefine the object types (classesabout which information can be stored by defining a limited number of entity types and attribute types in a fixed data model. 
 +    - Unlimited semantic expression capability.\\  ​The semantic expression capability ​is provided by explicit classification of relations ​(expressions of ideasby an extensible unlimited number of standardized relation ​types. 
 +    - Powerful standard Gellish set of contextual facts.\\  ​Gellish enables storing any kind of idea about any kind of object, because any idea can be expressed by one or more binary relationswhereas those relations are explicitly classified by relation types that can be selected from the standardized collection of relation types that are defined ​in the Gellish Taxonomic Dictionary or by relation types that are added to the dictionary as proprietary extensionsThis flexibility ​is also fundamentally different from conventional databases that predefine a fixed and limited number of kinds of relations between the columns in the database tables (whereas unfortunately those kinds of relations are usually defined only in an implicit way).
  
  
-Further details of the column definitions are given in the document '​[[http://​www.gellish.net/​downloads/​file/​33-definition-of-gellishdatatables.html|Definition ​of Universal Semantic Databases and Data Exchange Messages]] '+As a consequence,​ Gellish enabled databases do not need to be modified or extended when the scope of an application changesAnd expressions from different Gellish enabled databases can be merged and integrated whenever required, without a need for a conversion exerciseFurthermore the content ​of Gellish enabled databases is standardized by using a common Taxonomic Dictionary for all its data, including for example, kinds of equipment, kinds of properties, kinds of documents, kinds of activities, etc., etc.
  
  
gellish_databases.1358024004.txt.gz · Last modified: 2017/11/15 11:05 (external edit)