|
Intelligent Parsers
Database Parser:
Our understanding is that a data parser is software which breaks a data block into smaller
chunks by following a set of rules, so that it can be more easily interpreted, managed, or
transmitted by a computer. The goals are:
• Interpret
• Manage
• Convert
• Transmit
We have a number of parsers which we would implement and the following is one of
them , which we call the "Long IDs".
Our Long IDs Parser:
We also need to understand the scope of our parser. Assuming that we will be
parsing the existing databases for one of the airlines. We need to identify the following:
• Business type
• The size of the business and its operations
• The rules for running this business
• The buzzwords and the business vocabulary
• Business factors such as airports, weather, government regulations
• The clienteles
• Business environment (manufactures, services, retail, etc)
• Misc
From these categories we would be able to identify:
• Possible databases, tables, fields names and types
• Business rules
• The ranges of values for these data fields
• Possible errors and exceptions
• Business operations and processes
• Misc
Retrieving every field in this airline databases and parsing them is a monumental task, therefore
we need to develop a scheme of retrieving and categorize each field. Our approach
is to create one or more index for each possible field and its value as and IDs.
Looking at Java long:
Long.MAX_VALUE = 9223372036854775807 (19 digits)
Long.MIN_VALUE = -9223372036854775808
Note:
Mod and Div operations can be used to retrieval of any digit regardless of its position in the long integer.
Our approach is to create an index as a Java long integer where the position of each
digit is a part of category. For example, the "Ticket Price Field" would have the following:
Database
|
Table
|
Field
|
Category
|
Value Range
|
Field Type
|
Error Type
|
Total
|
3 digits
|
3 digits
|
3 digits
|
2 digits
|
5 digits
|
2 digits
|
1 digits
|
19 digits
|
Each digit has 0..9 possible values, so 2 digits has 0..99 possible values.
We can also create a number of these ID indexes based on the personal, business, transactions, security
and business rules. All the matrixes or arrays map the database fields for further parsing
and analysis. Cross references of these matrixes can help find errors and duplicate values.
|
|