mathinform

Mathematik und Informatik


Data may be seen as abstraction of a part of the reality. But they are only valuable, if they help to answer questions. If you let data grow in your organization without management attention and guidance, you probably will get stuck in a mess of uncleared ownership due to over boarding relations and dependencies of data.

If you are planning to build a new IT system, it is essential to develop a rough vision of the complete data aggregation well in advance of the implementation. There are plenty of strategies and tools around to accomplish these early parts of data management.

The expert scene recommends the entity relationship modeling approach using a data dictionary. It may be seen as an electronic means of communication for users, system designers and data and software engineers..

An entity is defined to be a compound instance of business, which carries information. The owners of such a piece of information are appointed by the business units. They are in charge to control the modeling of their data.

The entity relationship analysis starts with short listing the names of the information pieces, which are common in an organization. There must be an understanding of the owners, what it is all about to share data. The business units may describe their own data in more detailed form and discover new data elements, due to be shared.

The entity relationship approach works with  three normal forms of data descriptions : They are said to be in the first normal form, if all entities are free of repeating data groups or elements. For the second normal form each entity must have a key assigned, and all data elements are fully dependent on the key. The third normal form has been reached, if the groups of those data elements, which depend only on a part of the key, have been separated.

The pragmatic aspect of this process is, that ownership for normalized data is much easier to assign, than it is for compound data.

Usually  the owner  of information is the person, who has generated the data. Owners may delegate their right for access, change, transmission and modification of data.

Measures to maintain the data  and keep them correct are neccessary. These regulations depend on the value of the data  for  the administration, the business or plans of the private person involved.

The physical and logical access have to be limited. The probability of unauthorized actions must be controlled as well as the management of situations in case of human errors and malfunction of machines.

An entity will be specified by attributes. They may vary,  even for the same entity in number and type. As we have learned from the analysis of natural languages, one can classify attributes according to their context as local, temporal, conditional, causal and modal types. Attributes can be used to refer from one entity to another one. Not all references, which users may construct, can be assumed to be common although. Therefore we must differentiate, whether a relationship is just a personal view or intrinsic part of the data.

If we look upon attributes, entities and relationships from a mathematical point of view, entities behave like sets, which contain certain data elements. We have called them attributes. Relationships are defined to be subsets of the product of sets. We call the property to be a subset a filter. If we omit some attributes to focus on a certain data view, we call the operation a projection.

The query language SQL, which was developed  by IBM, features a few  phrases to express the operations mentioned above in order to select  data.
In SQL you will find syntactic provisions for inserting, deleting and updating data bases too.

Now, we are setting up a grammar to describe a subset of SQL as follows:

SQL= SELECT Projection FROM data WHERE filter Semikolon ;
SELECT="Select";
FROM="from";
WHERE="where";
Projection = AttributName Komma [rAttr];
 rAttr = AttributName Komma [rAttr];
data = EntityName [Komma rEnt];
 rEnt =EntityName [Komma rEnt];
filter = logExpression;
Semikolon=";";
Komma=",";
logExpression = logTerm [rlogTerm];
rlogTerm = OR logTerm [rlogTerm];
logTerm= logFac [rlogFac];
 rlogFac= AND logFac [rlogFac];
logFac = AttributName logRelation Expr;
logRelation= "=" | "contains";
AND="and";
OR="or";
Expr = term [rterm];
rterm =( "+" | "-") term [rterm];
term = factor [rfactor];
rfactor = ("*"|"/" ) factor [rfactor];
factor = number |string | "(" Expr ")" ;
AttributName=Bezeichner;
number=Zahl;
EntityName=Bezeichner;
string=Date;

09 / 09 / 2018

W. Reiwer