Big data observation, inference and actionable items, leading to substantial results.

Big data analytics presenting substantial results – infographics . Big data companies must try to attain results from big data analytics. Today  generic reports are being provided to by big data companies. Analytics from big data must be futuristic.


Big Data analytics


Posted in Big Data BigData Cloud, BigData Big Data Cloud | Tagged , , , | Leave a comment

There should be “Intellectual Ratings” for content publishing.


As content creators, small, big and devils create content and publish like shower of flames emitting from the mouth of fire monsters, regulators WILL evolve and become more powerful like the formation of the government, during the beginning of modern era. Regulations are important to begin with. But as the freeways in many parts were left open for speed, the internet must also open up. Net neutrality must not be the only one. Provisioning of content must also be opened up much more heavily along with net neutrality.  But, no common man or individual through writings or through propaganda can bring regulations to content publishing in the modern digital age where internet content dominates and more than 40% of knowledge and information sharing is kick started through sharing of internet content.

What can be done?
Crappy content, favored content, sponsored content and non-organic content in the form of information or favorably termed as informacial  or rarest of rarest content or even unique tasks being performed as absolutely useless will proliferate internet. Such tasks can be anywhere from eating an era old bread to elephant wearing loose trousers and girls standing in the rain with no clothes which to a greater extent is already been shown. Societies will slowly start enjoying such useless content and will believe in sham as real. These type of social behaviors will transform an intellectual society into a nothing but useless hollow society which will shy away from risks and seclude into themselves. Selfishness and lethargy will be the driving force and innovation will subside to nothing, creating an stagnated world.

Content provisioning through customized filtering to certain regions or geographical filtering or even personalization, which is the new name, if measures not taken, can lead one to confine individuals within those societies to limitations with human thoughts. Isn’t “human mind a terrible thing to waste ?” Regulations come in many different ways. Messages through philosophical or godly messages told to the common man becomes useful rather than enforcement based on punishments and consequences alone. While those must exist, the latter can be useful. Regulations or simply governance are ONLY to set and pave the way to a better society, may what it be. Therefore, let there be regulations on any type of content provisioning. Today, this can be easy. In order to control the mass, let there be control on the giant enterprises and these enterprises ill enforce regulations on the mass which uses the tools of the enterprises to publish content. Yes, openness is the ONLY way to have a free world. A ten commandments, or even less, that considers the global nature of the internet where even net neutrality is taken into perspectives and evaluated, will help. Let net neutrality not talk about access privileges in terms of speed of access ALONE, but also take into consideration, “visibility of content” to common man.

May you see what you want to read, and write what others need to see. ~Sunny Menon

Posted in Big Data BigData Cloud | Tagged , , , , , , | Leave a comment

World deadliest animals, what can man do?

Check out who kills the most. Classic representation and great data to share with the society we live in.
Infographic: The World’s Deadliest Animals  | Statista

You will find more statistics at Statista

Posted in Big Data BigData Cloud | Leave a comment

The word of “Love”, the Shakespeare story of #bigdata

  • #Bigdata spin off on the data on Shakespeare’s classics, provides new insights into his books and concepts. According to queries spawned off on words used by Shakespeare in his books, it appears that the infamous book of love and romance has the word “love”, only 134 times. While this is a good number, it should also be noted that in the book “Sonnets”, the word of Love has been written 157 times, higher than in the book “Romeo and Juliet. Midsummer night’s dream has 102 times and “Two gentlemen of Verona” has 147 times mentioned. Also note there is ONLY ONE MORE BOOK where the word has been written more than hundred times, the book is “As You Like It”. According to psychological factors and free association principles perhaps, the more words you use , a human mind has less of the feelings when the word is used more times. According to #bigdata interpretations, as you derive  #Value from the bigdata from Shakespeare’s books, should you interpret that the book that was heavily known to be a classic romance and love, is NOT really a book of love ? or should you interpret the imaginative mind which stylized and exaggerated the term “Love” ? Nevertheless, there is data and now it is time to derive the value. May be, we will get to know a different Shakespeare who did not always write books of love and perhaps, Romeo and Juliet was not really a book of love. “Wait until dark” because, for the blind knows more when the light goes off and darkness falls. Now, I hear the howling of the  jackals and animals from the jungles, a frog croaked, snakes hissed. There was somewhere far, a distant roar echoing from the dark jungle where leaves shivered and rain fell cold…Enjoy The Halloween.
We are such stuff as dreams are made on, and our little life, is rounded with a sleep.
Better three hours too soon than a minute too late.
We know what we are, but know not what we may be. ~ Shakespeare 

Posted in Big Data BigData Cloud | Leave a comment

What if ? & What If There Are 26 Letters ?

Click the video. Slow readable view. IMAGINE with Patience.

Did you share the imagination ?

Posted in Big Data BigData Cloud | Leave a comment

#Bigdata may reveal insights into John F Kennedy Assasination

This gallery contains 1 photo.

More than 52 years after the assassination of John F Kennedy, the humanitarian president of 1963, new technology if not new evidence, may reveal more insights into his assassination and plot. With over 200 million search results spitted out by … Continue reading

More Galleries | Leave a comment

Wrong turn of BigData -Ten Reasons and how BigData will take a big leap!

Enterprises are taking every step to tap into the realms of bigdata. How much is bigdata has already been debated. It seems, there are valuable and futuristic information hidden within big data clusters, no doubt –  Old jungle saying. If so, then why is that, this valuable information is unable to be retrieved. Speed is not a problem anymore. Even if speed is a problem, people may be ready to wait a while, to know “The Future“. Marketing cut throats, business greedy owners and others who have nothing to loose and everything to earn, have shifted the needs to something else and focus to elsewhere in space. Technological erosion is due to this fact, wherein personal needs and vested interests led people, in doing deeds and buying things, which people don’t need it in the first place. A man buys the material he never ever wanted and would never be needing it at all, in his life time. All done through the influence of the powerful media. Nevertheless,the  big data implementations currently exist are nothing but retrieval of traditional reports. Projects ripple across CIO’s but finally find its resting place within IT and becomes an IT project. Lost in translation, the CIO is now provided with traditional reports. Here are top ten reasons, BigData will take a second jump soon.

1. Requirement for experienced talent pops up. For some time, companies have been focusing on recruiting fresh talent. While this is a great “venture”, it can also cause what can be called as “beginners fall“. Not knowing the complexity of applications, these fresh recruits can go ahead and create solutions that may not solve the actual problem.  But this trend seems to be slowly diminishing, especially within the #bigdata world. People seem to be looking for experienced hands. Conferences are filled with long bearded masters in the domain. Requirements are flooded with minimum eight year experience statements.

2.In the footsteps of a giant:So far traditional reports were being provided to the office of CIO and Marketing, business and other stake holders. With more experienced hands working on #bigdata, analytics, look and feel seems fresh. There is still time to mature, but it appears to be in the right path. Giants have already reaped the fruit of #bigdata, because investment was heavy and they have seen the rise of big data, years ahead of many. While they themselves are learning, many lessons have come out.

3.Bigdata should not be an IT project. Infrastructure tools are plenty. But with huge data that requires processing, even with MapReduce, this task was not understood in reality. Distributed systems and coding with that paradigm has always been a challenge. With more experienced technical hands, this challenge is being met with. People have understood the potential of MR’s and other distributed processing systems. Multi-core programming concepts and capabilities, are also becoming helpful.

4. Service enablement API provisions. Data always resided within several sub systems within an enterprise. With more integration between business process and more robust API frameworks available and with distributed processing capabilities, data retrievals, processing and analytics have become more seamless than ever before.

5. Technological challenges: A fair number of tools have already been made available to developers. Although infrastructure tools dominate, applications that focus on domain specific data is being released from time to time. This helps organizations in achieving #bigdataanalytics. There are a lot of confusions here. With  big data hype in the market, all kinds of companies are embedding the keyword big data and internet is proliferated with such useless documents that has no relevance to big data. RSS syndication may be of some help with specialized searches. Yet, challenge on accurate information retrieval exists.

6.Data Scientists have entered the market with data mining knowledge and not very much focussed on big data. Although we see challenges in the usage of big data tools by these data scientists, we still have many algorithms, complex data science related concepts evolving that may provide better analytics. There is always a tendency to go back to traditional reports as opposed to analytics. Overcoming this trend need to take place.

7.Data accessibility: For achieving accurate analytics, not only technology alone or complex algorithms or mathematical models and visualization capabilities would not suffice. Most important factor, DATA must be available within hands reach. With extensive compliance needs and disparately residing data and most importantly, unstructured and semi-structured data within enterprises,  will make it hard on enterprises to give the data out for processing, even internally.

8.BigData as a service:-  Bigdata means analytics and not report. This will be a huge challenge, given the point 7 above. However, this possibility of having to get the job done on a service based approach is a huge benefit because, this reduces tremendous research, design and development time put forth by enterprises and they can simply rely on third party vendors who may have done this job anyway. But with point number 7 in perspectives, this handover or usage of such software by enterprises will be an issue. But because of selling pressures within big data vendors, these saas models will turn into outright traditional licensing models very quickly. This change in licensing model may perhaps help in taking big data analytics to next level.

9.Hidden Data:  The secret of business is to know something that nobody else knows ~Aristotle . Given that the data resides in unstructured form, arrival of data within the enterprise through sensory devices , to get to what is not known what resides there requires expertise. Data scientists should be relied upon. Mathematical models and probability thought models may only be a beginning point. Abstraction of probability has to be broken open to derive patterns or predictability. Predictability may not be the only derivation that people should go after. 

10.Breaking open the technological challenge: Bigdata projects always spin through different business units. Most often , it arises from the office of CIO. Many evolve from the demands of marketing and sales. May what it be, today, it all ends up within the engineering/IT division. This flow need to stop. BigData should not be an IT project. If it becomes an IT project, what we will get back will be nothing but reports. Tools must be designed and developed by IT but the final attainment of results must be done by the stake holder. Moreover, big data should be an enterprise wide project and not segregated within certain business units. Therefore infrastructure for dealing with big data should be supported and mentored by engineering/IT ONLY and the owners must be the actual INDEPENDENT users. The job must be delegated and executed by the stake holder rather than looking at big data as a IT project.

Posted in Big Data BigData Cloud | Leave a comment

The Abstractionism of probability theory.

This discussion on ‘The Abstractionism of probability” is perhaps one of the first in the world to be discussed publicly. It has to be understood that, this discussion has evolved out of various other discussions with mathematicians, philosophers, doctors and engineers and many other participants including rappers, mainstream musicians, artists, actors and actresses. Filmmakers where never touched. Because this was the subject matter for a documentary, to keep its serenity and purity, no filmmakers of any kind were interviewed. The film is in the making. Please read on.

From time immemorial, probability theory has been in existence. Having its origin close to mathematics and many a times being discussed along side mathematics, the subject matter has been of great importance to the financial industry.  As many giant thinkers have stated, with time, concepts changes, with time, philosophies changes, with time, crap changes to meaningful terms such as the insanity of Galileo Galili who claimed that earth is round during the 17th century and when he was imprisoned. Through the bars of the prison, as he looked  at the solar eclipse, it is believed that he lost his eyesight. This earth is not flat, was an insane statement, that eventually turned into the greatest facts world will ever live with.  Many mathematicians, physicists and astronomers and what not, have all juggled with probability theories. Times-a-changin. Now, with #bigdata in perspectives, probability has taken a whole different meaning. But times-a-changin.

In faith there is enough light for those who want to believe and enough shadows to blind those who don’t.  Blaise Pascal

The genius Pascal himself, who had provided scientific interpretations in the early periods of the origin of probability has quoted on time.  Nothing is a constant. Yeah! I know about change and that is not the subject. What is abstract is probability talks about the percentage of occurrence of an event. I don’t need to define it. You know what this means.. or please check the wikipedia. So what the above really means is, there is a more likelihood of rain today.

What this means today,  in a world of predictability is, that the coin tossed can be either a heads or tails. Nothing more than that.

This mind game, has driven mathematicians and philosophers equally towards its fanciness and its nature of attracting inquisitive minds. Having its nature close to numbers and mathematical formulas, it has driven more people closer towards it.

A detailed look at history and its origin reveals that, many mathematicians, philosophers and physicists  have even formed equations as an attempt to drive closer to predictability of futuristic happenings as I derive it.

Probability theories have always been related to futuristic happenings or occurrence. Let us look at the most common example of probability. It describes about a tossed coin. It says, the probability of having the coin falling on head or tails is 50%. In today’s modern cognitive thought process this has no meaning. Given the tremendous possibilities of linear and non-linear nature of flow of occurrences or simply happenings, probability theories take a step down and has only a negligible amount of wisdom that can be attained on analytics; analytics of any nature. However, getting rid of probability theories can mean “another cave man born yesterday”.  For this purpose and for the purpose of mankind and for the sake of sanity, and of all, for the sake of history and time that analytical minds have come through, and for the man’s thought process that has evolved,  let us give it an appropriate place in the universe and call it.. a step prior to actual predictability or futuristic occurrence.

Thank you.

Posted in Big Data BigData Cloud | 1 Comment

Low Cost BigData Implementation-The Money that lies beneth.

On wrapping up a bigdata POC implementation project for a major retail organization. We were tasked with reviewing security and compliance issues with the same client. To a question, where is the money residing and buried within enterprises, we ask a consecutive question, what do you have to offer in #bigdata ? Our approach to the problem has always been from the architecture standpoint. Architecture matters. There is lot of data within enterprises. Especially with sensory data moving constantly and absorbing as much information as it can, enterprises are challenged with storing this information that can make sense eventually, if not today. This by itself presents a greater challenge. Data can be completely useless to the enterprise or at the same, data can be very useful. Storing unwanted data can cause overheads, especially when we talk about thousands of employees pinching their devices on a daily basis and in a minute by minute bases. Above all this, imagine the movement of these subjects transmitting coordinates to base station. All of the above opens doors and paves the way for tremendous possibilities. We recently delivered a major bigdata project for a retail company. Although the company had completed phase I of their bigdata project, what was being sought out was the second phase. This phase called for fraud detection. Bigdata plays a BIG role here.  We began with a proof of concept on bigdata. For the proof of concept with bigdata, unlike traditional application development, one huge factor that comes into ply here is data itself. This time the main focus will be to manipulate bigdata. When we think of data, we now think of containers beyond ordinary database systems. NoSQL comes into ply. When we think of data we think about data transfers. Data transfers are to be thought about because, data is not only residing as un-structured or semi-structured data, it is residing as huge chunks in systems outside in different security zones or different domain boundaries. And therefore in order for data to be manipulated, we need data to be transferred closer to the application that utilizes this data. The application that utilizes the data is not simple linear application anymore. Non-linear applications spawning across several threads across processes takes over memory segment in a parallel execution mode and starts executing data. Because of this reason, the design and flow of systems need to be carefully thought about and surpasses traditional design methodologies. Today we will see two different pages of flow for each processes and decision trees need to be carefully designed to deal with outcomes. Responsiveness is another factor. Although this responsive programming can be kept aside, a good idea is to think about it during the design phase. Here are some excerpts of big data implementation for the retailer. Here is where the money is.

  • Number of downstream systems touching the main application : 28
  • Repositories used in total : 4
  • NoSQL – MongoDB  (Follow me on twitter to know how to play with MongoDB. How to work on mongodb and scale mongodb -@sunnymenon)
    BigData stack – Cloudera, Hortonworks, DataStax, MongDB,  and a set of other tools, some being evaluated and other being discussed with vendors and therefore the floatation of clones. They will soon be deciding and one of our work is, to help them choose. POC is utilizing some of these vendor platforms.. but we have stated the need to drill down. It is important that one of the vendors be alongside in the bigdata realization path. We will talk more.

OUR STACK : Apache Hadoop on widnows. Bigdata made easy. Talk to us to know more about OUR STACK.  The Apache Hadoop Stack on Windows made by we developers. Easy to roll out, difficult to neglect. No IT Challenges. No questions asked. Nothing to sign up.

  • Our Team size 5 people
  • Began with discussions with stake holders – 12 hours on different days. Total of 35 people interviewed. Included CxO’s, Engineering managers, directors, developers, IT operations staff, network sys admins, database sys admins, data analysts, and architects and consultants.
  • Primary layout of the composite was defined. Presentation of the layout for poc – 8 hours.  This was to target known application where major part of data resides. Strategy was to utilize that data and thereby look at analytical data and at the same time pull in “some bigdata” from outside unstructured.. thereby “proving” accessibility to voluminous data and retrieval functionality, streaming etc.
  • Messaging layer played “big” time. Asynchronous message of bigdata has different meaning. Kafka can be thought about. But found it challenging to get information. Community strength is still growing. Stackoverflow is NOT overflowing. STORM the same. “One little pig” can be useful really, with HIVE is where the honey might be. All tools combined, the assembly should begin.  Call it the ORACLE WAY .. Engineered systems.  Good boy .. good boy.
  • Visualization is the key. Go beyond clustering, graphs to new MEANS ?
  • Interactivity another.
  • Equations come to the front end.
  • Probability theories are favorite tool of a data scientist.
  • Enterprises should go beyond probabilities.
  • Security and compliance audit. We didn’t touch on single sign on.  Look, ORACLE engineered systems is the way to go. But will ORACLE survive the greediness of innovation? or will they survive, Bigdata-base in the cloud and NoSQL in the cloud?
  • Clustering algorithms may not make much sense as randomness of data is too disparate and K-means accessibility may not be able to be assessed.
  • Pure solution approach – Credit from CEO in a direct email and quote about us in the brown-bag meeting.
  • TOTAL TIME TAKEN 8 weeks.
  • Evaluated Result : POC, inventory list and portfolio.  Data visualization modules for actual usage provided code and training to business analysts. Excel and R.
  • Total Charged to clients : <<ASK ME>> This is the total cost of bigdata project. I wouldn’t recommend enterprises or companies wanting to negotiate lower rates for a bigdata project. Look out for gains in technology that vendors brings in,  look for knowledge on the enterprise infrastructure vendors bring in, look out for appropriate deployment model, look out for how it serves a specific need. While cost savings is important, let that be secondary aspect in the beginning stages. Invest on foundation, reducing recurring costs, should be the principle as far as bigdata is concerned.

Should cloudera or hortonworks be your Enterprise bigdata architecture ? What about DataStax ? Where has SPLUNK gone? Ask that question again, should cloudera or hortonworks be your enterprise bigdata architecture or should it be mapR or should you engage in working with Amazon?

Many users ask if any single product can provide all the features necessary for the enterprise in dealing with BigData. According to whitepaper released by Computer sciences corporation, CSC, 2014 will leave us with two major players in the market. Further more they say, others will be either  acquired or make an exit. While the bigdata vendor wars appear to be moving in the direction depicted by CSC, we still have to wait and see how the necessity will take shape within enterprise. Meaning, what will enterprise try to achieve. Will they be relying on existing analytics and business intelligence only ? and make use of bigdata as another component? OR will they move beyond ordinary predictive analytics and elevate themselves to real-time predictions, converting those analytics into actionable items? Forget business intelligence and reporting.. think predictive.

~We Do Not Learn; And What We Call Learning Is Only A Process Of Recollection Plato

Hey , ya all have a great Friday, Saturday & Sunday Aal-right?

Posted in BigData Big Data Cloud | Tagged , , , , , , , , , , , , , , , , , , , | 2 Comments

BigData And The Cloud

BigData and the cloud, sounds like David & Goliath; but really could be Romeo & Juliet, if defined with a good deployment model.  If wired differently and haphazardly, it could turnout to be “Brutus & Caesar”.

For Bigdata deep packet specifications consortium site,  please visit here

There are two kinds of data, both human generated. It has to be appreciated that machines itself generates data, that is initiated primarily by human activity.  That type of data that is generated by machines and kick started by human activities are a different subject by itself. What we discuss here is about human generated data.

The  two further classifications at the top hierarchical level are as follows:-

  • Designated Data Generation and the other
  • Forceful-flow Data Generation

Designated Data Generation
An example of the Designated Data Generation (DDG) would be transactional system data within databases, log files pertaining to requests coming in to snatch transactions, a post on a blog specific site such as this where a definite number of users are expected as it is in a transactional system where a definite number of users can come in and minimum and maximum range of clickstream are pre-determined. If you look closely, this can be both structured and un-structured data.

Forceful-flow Data Generation
An example of Forceful-Flow Data Generation (FFG) are request logs coming into news sites, a public picture published on a public site, a query resulting in relevant search results etc. Requests are not coming in with pre-designed expectations. System would never have  expected such a request and by-passer circling around that particular area of the requesting station, providing more of such request data. This again generating activities based on clickstreams. Close look again says, that both can be either Structured or un-structured.

Within these two above realms, analytical modules should first and foremost categorize the data. Just like how the categorization is done against, structured, semi-structured and UN-structured data, prior to this categorization, modules coming in for the snatch, must categorize it as DDG or FFG.

No matter what you do, abstraction layers may be of some help, however, when talking about bigdata, in many situations, we deal with design specifics to a domain. Likewise, we also have to build interactivity. Interactivity is something we cannot evade incorporating it into the analytics design.  So now, we are becoming more and more specific in approaching the raw big data. These categorization and specificness of designing an intelligence application utilizing full potential of bigdata drives one to get the most out of it. Patterns can now be “designed to be detected” about and applications can be written targeted to inspect such patterns or even provide event trigger, that depicts a forth coming event such as the burst of a viral video or incoming of a booming season or an incoming natural calamity or even outbreak of an epidemic. Because data collection is specific and can be categorized, no matter what. Unless you are collecting data created in the upper surface of the Exosphere, from sound emitted out from earths surface, you can funnel down to the categorization mentioned above. Enterprises are advised herein to adhere to the above categorization and have a conceptual model and standardization defined.

Inference from above:
Apart from structured, un-structured and semi-structured data, let there be two more categorization mentioned above (DDG, FFG) and subsequent standardization defined. Already a domain will exist within enterprises. With these, vendors can design abstraction layers that can now be seamlessly plugged into the system and detect patterns, provide analytics or trigger events.

It has to be remembered that this standardization, should be defined at conceptual levels as enterprises are already following standardization for etching or writing to log files, for data transfers and storage purposes etc.  The good part of the story is that further standardization also exists at business levels as well, if we are talking about dealing with data. Existence of XML, JSON, EDI formats are already been used. Therefore, conceptual standardization at conceptual level for dealing with bigdata becomes rather easy. Once again, this standardization at conceptual level helps bigdata vendors providing intelligence and analytics services can not only defined the required abstraction, but now it becomes seamless to integrate.

Historical Evidences:
The growth of business to business and business to consumer thereafter and further expansion of business process to global markets, demanded enterprise applications to be integrated internally and externally. This presented a huge challenge by itself. Enterprises spend long time to not only write and deploy code, right from testing such integrations and coping up with constant changes in endpoint systems or source systems created overheads within integration engineers. Slowly enterprise integration vendors such as Tibco, WebMethods, Informatica, IBM Message Broker and Apache opensource  projects  and others proliferated the environment during phase I.
This development did not help, rather presented opportunities and thereby paved way for service orientation and what we are seeing today. Tools such as ESB’s, webservices, modernization of messaging systems evolved.
This state of the system, where it is, is attained or pushed to reach wherever it is right now, by virtue of natural requirements and demands, is because of nothing other than standardization. Standardization was infused into every aspect of integration. Data transfers, communication between source and destination systems like API calls to webservices, logging and error mechanisms, testing mechanisms and others, standardization ruled. Today, while traditional integration requirements exist and are being done, we are seeing that the existence of such standardizations are now, even helping in future directions.

The exponential growth of data has been overwhelmingly emphasized and repeated. This reiteration of the existing data and the growth seen is coming out from sources because of the value being seen but at the same time, not being able to do much with it today. People realize there is gold beneath, but difficult to dig it out.

Recomendations from standards body.
What the bigdata standards committee is calling for is to comply with standardization of existing data formats and thereby have these standards definitions in place for incoming data AND for applications comming for analytics. Let this be the phase I approach.

For more information on bigdata standards specifications and know-hows, please visit here. Business analysts/Data analysts click here.


Posted in Big Data BigData Cloud | 12 Comments