Discovering Regularities in Databases Using Canonical Decomposition
of Binary Relations
A. Jaoua, S. Elloumi, A. Hasnah, J. Jaam, and I. Nafkha
Regularities in databases are directly useful for knowledge discovery
and data summarization. As a mathematical background, relational
algebra helped for discovering the main data structures and existing
dependencies between the different attributes in a relational database.
Functional, difunctional and other kinds of dependencies in a
relational database describe invariant regular structures that have
been used intensively for database decomposition, and for minimizing
redundancy. In this paper, we explain why "concepts" or "maximal
rectangles" should be considered as the atomic regular structure for
decomposing a binary relation which can be useful for different
applications. More specifically, we have noticed experimentally, that
"optimal concepts" contain pertinent information about data that we
have exploited positively for machine learning, dynamic and incremental
database organization, text summarization, data reduction, and even for
modeling human thinking. Operators on concepts need to be developed
because of their general usefulness in data and information
engineering. In this paper, we propose to work on a canonical
decomposition of binary relations based on two operators f and g, to model some important open
problems, as for example on how to put in equation the best optimal
conceptual coverage of a binary relation. We first develop an algorithm
to find a conceptual coverage of a binary relation. We then exploit
Riguet's difunctional relation to put in equation all isolated pairs in
a binary relation. Using iteratively these isolated pairs, we
find several varieties of efficient solutions for the canonical
decomposition problem.
Journal on Relation Methods in Computer Science, Vol. 1, pp. 217-234,
2004