BigData and the cloud, sounds like David & Goliath; but really could be Romeo & Juliet, if defined with a good deployment model. If wired differently and haphazardly, it could turnout to be “Brutus & Caesar”.
For Bigdata deep packet specifications consortium site, please visit here
There are two kinds of data, both human generated. It has to be appreciated that machines itself generates data, that is initiated primarily by human activity. That type of data that is generated by machines and kick started by human activities are a different subject by itself. What we discuss here is about human generated data.
The two further classifications at the top hierarchical level are as follows:-
- Designated Data Generation and the other
- Forceful-flow Data Generation
Designated Data Generation
An example of the Designated Data Generation (DDG) would be transactional system data within databases, log files pertaining to requests coming in to snatch transactions, a post on a blog specific site such as this where a definite number of users are expected as it is in a transactional system where a definite number of users can come in and minimum and maximum range of clickstream are pre-determined. If you look closely, this can be both structured and un-structured data.
Forceful-flow Data Generation
An example of Forceful-Flow Data Generation (FFG) are request logs coming into news sites, a public picture published on a public site, a query resulting in relevant search results etc. Requests are not coming in with pre-designed expectations. System would never have expected such a request and by-passer circling around that particular area of the requesting station, providing more of such request data. This again generating activities based on clickstreams. Close look again says, that both can be either Structured or un-structured.
Within these two above realms, analytical modules should first and foremost categorize the data. Just like how the categorization is done against, structured, semi-structured and UN-structured data, prior to this categorization, modules coming in for the snatch, must categorize it as DDG or FFG.
No matter what you do, abstraction layers may be of some help, however, when talking about bigdata, in many situations, we deal with design specifics to a domain. Likewise, we also have to build interactivity. Interactivity is something we cannot evade incorporating it into the analytics design. So now, we are becoming more and more specific in approaching the raw big data. These categorization and specificness of designing an intelligence application utilizing full potential of bigdata drives one to get the most out of it. Patterns can now be “designed to be detected” about and applications can be written targeted to inspect such patterns or even provide event trigger, that depicts a forth coming event such as the burst of a viral video or incoming of a booming season or an incoming natural calamity or even outbreak of an epidemic. Because data collection is specific and can be categorized, no matter what. Unless you are collecting data created in the upper surface of the Exosphere, from sound emitted out from earths surface, you can funnel down to the categorization mentioned above. Enterprises are advised herein to adhere to the above categorization and have a conceptual model and standardization defined.
Inference from above:
Apart from structured, un-structured and semi-structured data, let there be two more categorization mentioned above (DDG, FFG) and subsequent standardization defined. Already a domain will exist within enterprises. With these, vendors can design abstraction layers that can now be seamlessly plugged into the system and detect patterns, provide analytics or trigger events.
It has to be remembered that this standardization, should be defined at conceptual levels as enterprises are already following standardization for etching or writing to log files, for data transfers and storage purposes etc. The good part of the story is that further standardization also exists at business levels as well, if we are talking about dealing with data. Existence of XML, JSON, EDI formats are already been used. Therefore, conceptual standardization at conceptual level for dealing with bigdata becomes rather easy. Once again, this standardization at conceptual level helps bigdata vendors providing intelligence and analytics services can not only defined the required abstraction, but now it becomes seamless to integrate.
The growth of business to business and business to consumer thereafter and further expansion of business process to global markets, demanded enterprise applications to be integrated internally and externally. This presented a huge challenge by itself. Enterprises spend long time to not only write and deploy code, right from testing such integrations and coping up with constant changes in endpoint systems or source systems created overheads within integration engineers. Slowly enterprise integration vendors such as Tibco, WebMethods, Informatica, IBM Message Broker and Apache opensource projects and others proliferated the environment during phase I.
This development did not help, rather presented opportunities and thereby paved way for service orientation and what we are seeing today. Tools such as ESB’s, webservices, modernization of messaging systems evolved.
This state of the system, where it is, is attained or pushed to reach wherever it is right now, by virtue of natural requirements and demands, is because of nothing other than standardization. Standardization was infused into every aspect of integration. Data transfers, communication between source and destination systems like API calls to webservices, logging and error mechanisms, testing mechanisms and others, standardization ruled. Today, while traditional integration requirements exist and are being done, we are seeing that the existence of such standardizations are now, even helping in future directions.
The exponential growth of data has been overwhelmingly emphasized and repeated. This reiteration of the existing data and the growth seen is coming out from sources because of the value being seen but at the same time, not being able to do much with it today. People realize there is gold beneath, but difficult to dig it out.
Recomendations from standards body.
What the bigdata standards committee is calling for is to comply with standardization of existing data formats and thereby have these standards definitions in place for incoming data AND for applications comming for analytics. Let this be the phase I approach.