Statistical analysis tools may not be where the solution is. Good lord, how long will these rituals continue?

As more door opens up in the financing world for ‘Big Data’ tools, the existing tools provided by SAS, R-Programing, or tools that spit tabular data into excel spreadsheet, provides just statistical data. Not only that, these existing tools are not designed to accept the humongous data sets waiting to be consumed, arriving at high speed pipes in the form of requests. Disruption has taken place not by virtue of innovators and intelligent systems, but by the great demand that Big Data has put forth. Hard to handle the weight, tapes are being utilized traditionally and carried out as ritual acts. Yes, they like to forget about it, however, compliances and security aspects wouldn’t allow them to.  With more globalization occurring and confusions existing among data carriers and the ever increasing challenges in the telecommunications field, all demanded for disruption!

The OracleOpenWorld which will begin in SFO, starting sept. 29th, has attracted about 50,000 attendees from 123 countries and expected online attendees of about 1 Mill. Here is a little “big” data about the past event:

      There were 3,570 speakers

      40,942 Seats for sessions

      142,000 cups of coffee

      95,000 Sodas

      63,000 Lunches

      42,000 Gallons of Water

This year sessions include “Big Data”, a few of them along with MySQL related sessions, Oracle Exalytics etc, which can shed light on the issues of big data. Any evolution of the field of science of religion travels through the path to happiness or travels through the path to evil. With the Big Data hype growing on a day to day basis, there is “much ado about nothing” companies floating too.  It is anticipated that there will be discussions on the big data, the required preparedness to perceive the scams and debunk the myths. One interesting thing on the session’s list, is the union of Oracle Times Ten fuse with Exalytics. Oracle Times Ten is supposed to be an in-memory cache or full-featured, memory-optimized, relational database.

Predictive analytics provide much needed customer information puffed out of a data warehouse systems traditionally. Usually that is what is currently being retrieved by tools. Other than helping with some upselling items for instance or maximum or peak times of purchase or money matters? At times, patterns can be detected on stored data. When big data is involved, then it becomes the question of anacondas against viper. Viper fangs are poisonous. It can spit poison. Ancondas are nonvenomous. Well, it’s this way, when it comes to big data, it is a question of Godzilla against a monkey. Not the code monkeys.

As per the readings, todays current tools do not have the bandwidth to deal with Big Data. With existence of hadoop and its acceptance within the development community and NoSQL discussions, more tools are wanted by the big data “dealers”.  Keep in mind, there are more players who have not much dealt with Big Data nor dealt with Data at all. Remember, data is secured and so how can a lot of them deal with it?  Having designed a few databases might not give you the insight to deal with the Big Data problem. In order to understand Big Data, you either need to have the core mind of a researcher or you have to get the blessings of actually seeing and perhaps touching it. Only these two can see or visualize or perceive the problem of big data.

Pardon the architects, for they do not know what they are doing – Tible Quoting.

Welcome to the new age of code monkeys. Leave a comment and I will give you a chocolate. Hey, I like to hear what you have on big data, ok?

Nobody knew what would ever happen to the data slowly growing within enterprise walls.
Listen to these statements from the past….

      Man will not fly for fifty years – Orville Wright (1901)

      A Rocket will never leave the earth’s atmosphere – New York Times (1936)

      There is a world market for may be five computers – IBM’s Thomas Watson (1943)

      640K may be enough for anybody – Bill Gates

Dear weather man, once you had told us, it may or may not rain. Today, you there is possibility of rain. I have a story and I had dreamt it and will narrate it. Twinkle twinkle little stars, I still wonder how big you are….

Posted in Big Data BigData Cloud | Tagged , , , | Leave a comment

BigQuery – Google comes out from “INSIDE”; Who else & What more will we really see through ?

Google’s BigQuery’s public appearance, makes things even more complicated for deciding where to go for Big Data analytics. How big is really big? Can one hundred thousand records with mega-size data such as images or even one million records be called as big ? As they asked, how big a table can it be? How many fields etc.
When Oracle first came with Version 2.0 as their first database version without version 1, it was primarily to store unstructured data in a meaningful form. CIA’s interest in the software that can collect global information made it compelling enough to release Oracle and make the software more rigid. It was not clear to any then it seems, that there would be the problem of millions and millions of data which needed to be analyzed and perhaps   make sense out of it. Who could not have thought about the amount of data that could accumulate over course of time, for whether related information for instance. Similarly, for medical data, for product purchases, credit card usages, traveling etc etc. First it was the storing problem. How big can we store? then things changed. More storage devices evolved and they evolved cheap. Now, it is a much bigger problem; how can we make sense of what has been stored.
Demystification of noise among data, is a good thought. But at this point the concept itself has created much “noise” in the equation. (Hahah-A small laughter) The truth is out there!
BigQuery is simply put, an analysis tool for the Big Data. With huge amount of content residing within google’s walls such as 60 hours of video uploads per minute and 100 million gigabytes in search index and importantly,425 million gmail accounts produce and inject a lot of data into the thousands of servers of Google, world across. This repository can be tapped to attain result oriented queries in the discovery process. This is Big Query.
While discovery is important, the time to retrieve results and help in the discovery process, is also very important. This is different from batch oriented queries run by Hadoop. When it is said, different, it does’t mean “bad”. It is simply different.
Big Query seems to have been evolved from an internal system called Dremel. Dremel is now externalized to run queries on big data sets and is now called as Big Query.
What is interesting is, Dremel the internal Big Query of google, talks about the term, what they call as full table scan. This is supposedly done
to span the search across hundreds and hundreds of tables which resides on servers across. According to what is being generally talked about by Big Query experts, this search is far better than running a search on an indexed RDBMS. Hmmmm!
An example of such a query would be , what are the different applications running on google servers, for which a type is set to “Something” or somethis similar to above question.
Google, would have extensively used this tool to run across their servers. With huge content within their walls, when externalized, the usage would be interesting. The question still remains, why wouldn’t an RDBMS suffice the need? Why is it that the indexing and statistical updates on databases and optimization techniques help in fast retrievals? Big Query data sets has a limitation of 64KB for total field lengths per query.
Big Query internally the Dremel, was using SQL “Like” statements to pull data for analysis and it IS FASTER. But then, this is due to the underlying format of data that needed to be withdrawn for analytical purposes.

So what is BigQuery ? BigQ is all about Big Data available withing Google’s Cloud storage which is loaded to BigQuery and quired for results.
With the API, one can embed it within applications to enable users to fire the queries. With SQL Like statements run against a process which does a full table scan as opposed to indexed databases. Supported scripts/languages include Java, Python, .NET, Javascript and also Ruby. BigQ comes with a web based UI and the pricing as is understood would be charged on “per query” basis. Some of the examples of running a BigQ query would be to see the things such as, number of page views of wikipedia in a month and what subject perhaps; OR taking a look at all the works of Shakespeare existing and is known so far ? Interestingly, the total number of page views for a month on Wikipedia is about 6 Terabytes of data un-compressed and BigQ runs against that number as heard from the horses mouth.

A small “BigQ” I run from the examples was to see the biggest work of Shakespeare. Biggest in terms of maximum number of words. The query, gave a result as “Hamlet” with 32446 words “topping” the total published works of Shakespeare’s 42 books that he wrote.

What more should I see..

Many are the things that man seeing must understand. Not seeing, how shall he know what lies in the hand of time to come? – Sophocles

Posted in Big Data BigData Cloud | Leave a comment

Oracle adopts R-Programing for Statistical Analysis

Oracle commits itself to Big Data once more and deepens its presence in the “large volume” space. Everything pertaining to large volume of data residing disparately, is an issue. Relational databases have failed and given rise to NoSQL. Slowly the battle for the win is rising between SQL and NoSQL. The fight is worth watching.

While RDBMS has been called as traditional by NoSQL enthusiasts and that which does not fit well within the Big Data paradigm, RDBMS or SQL people have laid down numbers and has said that, without much complex joins or constraints, data cannot be made much meaningful. Even with much constraints, there has been large volumes stored in RDBMS, is the argument.

May what it be, monstrous Big Data is a winner and is seen, when SPLUNK fore-casted a cent loss and actually, in their earnings report, they actually gained.

Within these discussions, Oracle has taken another step in providing tools to solve the Big Data problem. It is like saying, I have various kinds of wrenches, choose one although they all provide the same functionality. R-Programming, created by John Chambers is a command line programming interface to work with statistical data. You use equations to derive statistical outcomes. It is supposed to work in synch with Oracle times ten in-memory database and other databases including Oracle 11g. Lookout for the “R” talk at http://www.oracle.com/us/corporate/press/1743599

The link to download is here https://oss.oracle.com/ORD/

One more step for Oracle, another step for opensource.

Posted in Big Data BigData Cloud | 1 Comment

NASA Plays song on mars..tuned to “Reach for stars”

The tune “Reach for stars” from rapper will.i.am played on mars. The amazing capability of human brain that is capable of creating and enjoying music was talked about in the movie AI, by speilberg, when the men of the future, discover the AI lying deep in the ocean. Novel thinking.
The song, “Reach for the stars” by will.i.am, rapper is an overton’ed or treated with “gain” special effects. Sound has been shown to vary with presence of medium or for that matter, AIR. Experiments have demonstrated the distortion of sound proportional with presence of a medium such as air. Thus, in mars, sound can vary and may sound totally different from earth obviously. Raindrops may sound different , the blow of wind would sound perhaps monstrous, the sound of a human may sqeek or may sound monstrous. In general, a music with a rhythmic flow of sound patterns may distort and may sound as though added with overtones in a music composition. The interval of time that the flow can sustain is yet to be heard.
The song of will.i.am is a treated with “gained” effects, as you may hear. Some similarity of how it can be heard from being on mars.
Song was played as though it was being being played from the NASA lab/station ? Hope to hear how it would sound actually on mars. Given the fact that data from mars travels across a huge distance and takes hours to get downloaded, not sure if this can be expected. Voice is a “big data” and reducing noise with all the hops; who knows ?
Here is the video link to NASA listening  to the music being played on mars. http://mashable.com/2012/08/28/reach-for-stars-first-song-mars/

Posted in Big Data BigData Cloud | Leave a comment

Big Data & The Big Players

Is it David & Goliath or Goliath and David?

Big data has been gaining more popularity in the TAG world and in the world of synonyms for some time now. More innovation has always been seen within small organizations. With Hadoop being accepted among the developer community, with a considerable amount of learning curve,  it is important that all involved in “drilling” down on any kind of voluminous data, give importance to “Hadoop”. Now, Sqoop’s entry has helped enterprises in thinking more about their disparate databases. In database where one field embeds many other information such as “outcome” information of any actions, retrieving those information, sunk down the deep caves of enterprises  repositories is becoming less and less of an issue. More thoughts and joint thinking has to be inputted with the realms of what to do with the data or simply , one has to starting thinking of “How to make sense of this data”.

Large corporations such as Oracle ,Google,  IBM and other companies have begun brining out tools for the “living dead”-the Big data.

Googles  “Big Query” seems to be interesting.  Availability of this to the public is even more interesting.

Amazon elastic MapReduce is another effective way to extract information from massive volumes of data. How much this can scale is yet to be seen.

Similarly , IBM’s big data appliance is provided to retailers with Netezza; which is the IBM Netezza Customer Intelligence Appliance. The new Netezza appliance also seem to incorporate business-intelligence software. This could be a novel thinking. Retrieving and making sense of it.

Within all these , Oracle has brought a set of services which looks like the usual standardized Oralces way of rolling out things. Out Of The Box solution. A set of tools and services has been rolled out. How far this can be taken to the field depends mainly on the people who are involved.  But here is a look at all their tools and services and a brief introduction; all in one page.

Oralce looks at Big Data problem in three main areas, as it should be.  Within Big data problem, you inspect and retrieve, analyze and then present. So does Oracle;

Oralce Acquire Big data says, will acquire from Oracle NoSQL database and Oracle Database 11g.

The Organizing part of it consists of, Oracle Big Data appliance, Connectors and the oracle data integrator

The Analyze components include, Oracle advanced analytics, Oracle data warehousing, Exadata and Exalytics In-Memory Machine ? I mean “Machine” J interesting anyways.

The Appliance: seems to be integrated with cloudera, Hadoop and JVM in a linux composite.

The Data Integrator: SOA modules and ETL

The Connectors: Include the HDFS connectors and Hadoop loaders including something for data integrator application adapter for Hadoop. (Says it reduces MapReduce development efforts.) and OLTP.

Below are other common Oracle components that now is extended for Big Data

To think independently, it is not about drilling down some data and replicating the data mining done ages ago. This is drilling alright, but drilling for the future.

Do you want to be chat with me and become a mad thinker? Talk to me @ enterprise.architects AT Yahoo.com. I am online , yup.

Posted in Big Data BigData Cloud | Leave a comment

Big data analytics using clustering.

Big data or bigdata can fall into various dimensions. k-means algorithms can provide results that are previously observed. It can also provide certain patterns on available data that has occurred over disparate units. However, to attain actual analysis of large volume of data and attain valuable information, a preset of data units charted on a graph and looking at its movement over, for instance TIME, cannot provide desired results or “validation”. Some times, ‘external” and “internal” evaluations could be looked at as a dialectical approach to “problem solving”. However, it may not fit within the realms of huge volume of data sets; big data ; bigdata. Further more, the clustering quality may depreciate. Above all that, the distance between centroids could cause issues in analysis. This has distance problem has already been foreseen and new set of algorithms has started emerging. BIRCH approach could result in new algorithms being put to test.

Approach II [In the process of writing]

Posted in Big Data BigData Cloud | Leave a comment

The Seven steps in seeking funding from a VC in a wierd economy.

1. Talk more and work more.
Complexity rules todays environment. As opposed to earlier times when things were simply database posts, today things are composite in nature. For ideas to ripple across,you need to talk more explaining clearly about what your product does and therefore use tools to communicate efficiently.

2. The product prototype.
Prototyping the product may not always seem feasible in a composite environment. This is when you either use third party providers to to tie your applications or roll out your prototypes. This will enable one to define the composite and present your product to the potential investors.

3.Talk latest buzz words and let those remain not simply buzz words, make it work. Let those be tied to actual work. Today’s investors look for details. If you talk cloud, let those be explained well and demonstrated if possible. Rolling out your prototype apps in a third party service provider hosted environments for demonstration purposes serves multiple needs.

4. Client lists are good things. But now provide more robust implementations to the VC who is ever probing into the clientele.

5. Ask for little money first then move towards the bigger amount. Let the ask be very real and support it with all kinds of numbers such as:-
a. Remaining work to be done.
b. Number of resources, man hours and rates of each resource.
c. Software/HW that needed to be purchased.
d. Facility costs, involve off-shore for low costs.
e. Talk green and low energy.
Budget, go to market strategies and sales force are some other tags.

6.Talk extensively about nature of viral-ity of the product and highlight the cloud nature of deployment models for scalability etc. Talk extensively on security now. Involve experts in giving presentations on security matters and involve specialists who have presented products to VC’s.

7. Be real, show code and execute the code. Be live if needed being the code word. Be confident. Remember, we have seen a balloon filled with air. Let your presentation understand the past bubble burst and show a pot with gold.

Hey, I am here – enterprise.architect@yahoo.com

Posted in Big Data BigData Cloud | Leave a comment

Simple 7 steps towards Agility

Now that we know, there is something called agility that one can adopt to simply make more money by reducing overheads and most importantly, using existing assets. When thought openly, one could take seven steps towards agility. Organizations must practice non-linear methodical disciplines to attain total agility. Alignment of business processes towards software infrastructure components must be taken as a separate initiative. All methodologies must be isolated out and initiative on Agility must be well focused. Because agility is not an option. It must be built into the enterprise stream of business flow. Here are 7 simple steps that can be followed to attain the required agility and have managerial artifacts tied to the business processes.

1. Segregate business processes. Today, though segregation exists, code level isolation does not exist. Having resources that do the task is by itself not sufficient. Integration of new business processes may become on overhead and both parties involved will spent more time solving simple problems. This is the new issue residing within enterprises that require agility experts on the job.

2. Define enterprise efforts specifically on agility. Although this also exist, the initiatives are being most often, directed by enterprise architecture team or by some business owners. In either case first, there is no look out for existing methodologies being followed by owners of either LOB’s or IT. This causes more confusion and the entire idea of risk mitigation gets more emphasis and looses sight of agility.

3. Agility is not a linear process. Understand this and move towards tuning initiatives to use the Agile methods for improving existing processes and tuning them to meet demands and therefore having control on management on IT components.

4. Work loads must be iterative. While this being simple, one must attain this using less repetitive, productive and move-forward work loads. In this, risk mitigation must be incorporated.

5. Think outside the OOAD box. Enterprises have tons of various IT components working together to serve business needs. This has to be appreciated. Bringing a new methodology such as SCRUM to bring agility may at times be one line of confined thinking. This is because, not only OOAD exists within enterprises, there are also extreme programming and RUP and above all that, financially focused methodologies, that must be appreciated and incorporated within the attaining agility initiatives.

6. Bringing agility does not mean, meeting all the time. Again, it is not a linear process and therefore, expect to act in a non-linear way and inculcate this habit among stake-holders.

7. Make use of tools. Tools for the living.

Posted in Big Data BigData Cloud | 1 Comment

Unified Collaboration & Towards What?

With the advent of smart phones and now smart devices which can make calls, connect to internet and then suggest your speed in reaching destinations, the need for people to interact has also jumped up towards attaining “Close Contacts”. Pretty soon, you could have 3D images of your colleague or friend standing in front of you. People will exploit this functionality. It is there in existence. In such a scenario, what if hackers come in front of you while your surfing and say, “I see your account number” Huha! and pretend somebody like Al Pacino? While you may enjoy Al Pacino being imitated, you may see your pants loose on the account detail being seen by the hacker.

Within all these anecdotes, what really happening is that the unified collaborations are slowly moving towards smart phones. Today it originates from mainly a web conferencing toolkit or simply from an online event being called for. Shortly, people will use virtual environments to crunch huge amount of data perhaps or doing voluminous computing tasks. Smart devices may come in. Things will demand for live sounds and images and may evolve a need to collaborate with a group where each will handle a specific task and yet they need to interact together. While the time consumed may be reduced by each individual compared to today, yet there will be a need to interact in the near future.

So… Unified collaboration will evolve from smart devices.

That was a quick jump, was it?

Posted in Big Data BigData Cloud | 1 Comment

Suicidal Birds

In a village in India called Jatingha, in state of Assam, at the end of monsoon months, mysterious behavior of birds puzzle bird watchers and enthusiasts. During moonless and foggy dark nights, flying birds come crashing to the ground with no prior warning whatsoever. The local tribes first took this natural phenomenon to be spirits flying from the sky to terrorize them. This phenomenon is not confined to a single species, with Tiger Bittern, Black Bittern, Little Egret, Pond Heron, Indian Pita and kingfishers all being affected.

Natural irregularities were speculated as the reason for this abnormal behavior.  The time is not far to know the minds of the birds or get kicked by a bird for bad behavior 🙂 Sky divers, beware!

Posted in Big Data BigData Cloud | Leave a comment