Crawling towards #BigData analytics, the 10 things to know.

What to do when you have a #bigdata project ?

  1. Analytics & Analysis has  been there since many years. Enterprises have been sucking in data from large monolithic systems called data warehouse since many years.  Once retrieved, the data went through phases of analysis for different purposes that included, sales forecasting, weather predictions, medical analytics and so forth that the term analytics has also gone through a paradiagm shift before it is now being viewed more from predictive analytics standpoint.  Time takes before a shift.  So predictive analytics is not just with bigdata. With bigdata a new kind of analytics is evolving.
  2. Proliferation of bigdata tools and services has begun in the bigdata world.  Data warehouse as a service was being talked about. But, with data security that is still not embracing “the openness” for obvious reasons, we have not yet seen success except from  giants such as Amazon, google, emc, Rackspace etc.  When talked about bigdata, there is a secondary term that needs to be queried along with the term bigdata and that is existing analytical data. Please review point number 1 on predictive analytics.
  3. Various tools since the inception of Hadoop provide some real cool value adds. Pig, Hive, Google’s NoSql, Cassandra and MongoDb, sqoop, Gora, HBase, Avro, combined with machine learning systems provide the pass-throughs to connect, search, filter, retrieve and manipulate data for further analytics.  Check out for oracle big data offering from big vendors.
  4. Some have tried to explain differences with business intelligence and replaced it with bigdata. While this attempt is good, they are both complimentary. Many fail to realize that there is already huge work done on analytics and so, bigdata work need not necessarily be a heavy weight lifting.
  5. While choosing tools, existing infrastructure must be studied. The study must focus on reports attained from business intelligence. This can provide insights into data patterns.
  6. Reports are on existing data. But reports provide insights. Reports on certain intervals or peak periods of data or on certain intervals can be detected on running complex equation based queries.  Used these perceptions for trials. Perceptions here can be reality.
  7. The first challenge today, irrespective of various tools within the enterprise and that is existing within the open-source world is to choose the tools. While this is less stressful, once after chosen, the deployment and infrastructure setup is the first major challenge today because, getting connection to the valuable data, and getting massive amount this time, becomes real challenging.  Here is where the data warehouse as a service, enterprises will lean more towards giants.  Please see point number 2.
  8. The crucial part of bigdata analytics involve creating test data during development. When creating test data, try NOT to replicate or avoid depletion of data points. This can create blocks in moving ahead especially when using PFP/mahout ; for further analysis of data. Who knows what can help.
  9. Furthermore, the test system must be thought of beyond a single machine. Most often for parallel processing, multiple nodes are involved. Especially when processing terabytes of data. Either case, when uploading from data repositories, multiple systems get involved anyways and therefore bigdata.
  10. Before choosing the tools, understand about the tool with respect to bigdata. There are many tools out there which but otherwise does something else. With the bigdata buzzword in action, many vendors incorporate capabilities for bigdata which is good. But that tools might not be the one you are looking for.  Once chosen, do a proof of concept. Engage engineers in rubbing shoulder to shoulder with bigdata people. Always good to collaborate. This is not a one person’s meal? Right?

    Thoughts beyond usual thinking.
    Many are the things that man seeing must understand. Not seeing, how shall he know what lies in the hand of time to come? – Sophocles

Posted in Big Data BigData Cloud | 11 Comments

7 steps in taking the #startup wagon up the successful hill

Slowly the funding scene has become better. Seeking funding from a VC in today’s market is not impossible any more. But getting funded, requires more rigid business model and a great product in today’s crazy market. Not only that, in order for the #startups to be successful, the team should work together to pull the wagon up the hill. These are easy steps, but need to be executed with a strong will. Given the #disruptive nature of the technology atmosphere, once the funding has been achieved, things only begin there. It is a pleasant beginning but there is a little more to it. What is seen today is, either #startups have been too much tied to product building and for that matter, thinks that the staff building should be given high priority. This deed, takes them to situations where they spend time and effort and due to this, they end up scrutinizing the candidates to the extend where either the candidates loose interest in the company or #startups feel that the candidate is not qualified enough. Here are simple to follow steps.

1. Have a big picture and make small modules adapt to suit the big picture.
Ceo’s, cxo’s, leaders should be bendable. Meaning, they should be able to adapt the strategies and business model more suited to fuse with changing market conditions. For instance you may have decided to build a machine which floated when needed. But when a product has entered the market like a flying car, it is imperative to change the product definition to beat the flying car.

2. Think of disrupting .
If the funding has been on a well defined product wherein, the strategy was not to #disrupt, this may not be possible. But still, it is a good idea to see if there can be something broken.. #disrupted. If so , why hesitate. Do it.

3. Invest on human resources. Don’t be picky.
To build a good product, you need good coders. Remember, unless you are building a new kind of rocket that can disappear by saying “Holo”, you don’t want to spend your hours of talking with the candidate. Let onsite interview be short and sweet. C’mon , candidate are coming to your site because you have already talked to them and now they are confident that they can do the work. Unless somebody is a real idiot, they won’t come to your site to spend three or four hours talking with your staff members, not known his or her stuff. If it is an idiot, you would have already known it in your phone conversation. Won’t it?

4. Embrace opensource.
There will be great support if you involve opensource. When we say involve opensource, it means, bring in not only tools and services that support opensource, but get community involved. This goes a long way.

5. Tie up left and right cautiously
Join hands and collaborate with business and technology partners carefully. Aggressive marketing strategies sometimes may take the product from one level to another level where by definition it may all look good. But to build such a product which suffices the need of marketing strategies may some times look like a fools paradise. Avoid it. Say, let us be simple to the marketing people.

6. Sow seed and reap. Invest on experienced hands a little.
Let experienced consultants rub shoulders with your hired staff. Work with recruiters. Hire professional services companies to do some of the work for you. This not only gets the work done faster, but shares their industry gained knowledge with your staff. This increases professionalism among newly hired,out of college members which #startups these days heavily rely on. Spend time with the recruitment companies/professional service providers in educating them with what you want. Measure VALUE by looking at return on investment from the human resources perspectives.

7. Be at easeShow easiness in proceedings with staff members. Projecting low funds available, or saying things like , we need to move our butts to succeed, or say, we are not getting the right people etc are simply negative statements. Do you agree?

Prosper and stay well.

Posted in Big Data BigData Cloud | Leave a comment

Nothing Much To Look Through The Google Glass.(For Now)

Google glass, like any other act of a giant, gets praised and glorified, tweeted with sore throats and posted on walls, by many. It gets talked about in various blogs, publications, news channels and what not? All glorified the looks and talked about value of google glass. It was clearly looking like marketing. Some talked a little bad about things like walking into a public rest rooms or a cinema theater wearing one of the GG’s., but none talked about its “NO VALUE”.
1. Cost of it is 1500 – No worry. It will be down to a few 100’s pretty soon and stay at a cost of, may be around $200.00 to $300 for couple of years. After all what does it have ? A 5 Mbps Camera, a Wifi connectivity card, yes it is small, yet.. and a projection display; this may cost money too, yet what does all this integrated, bring the value altogether ?.
2. Of the many demo’s that majorly demonstrated was, to capture images and display it to second person and make it feel like you are seeing through the eyes of the person wearing the GG. Some put it this way, you can find your way using GPS ; this is good nano technology look and feel; yet, how does it help ? I can get it using a good phablet and it doesn’t obstruct my eyes.
3. Some others said , if you are in the airport, you could ask the voice activated GG to show you flight timings. Other demonstrated the same old demo google demonstrated in the first place.. voice activated searches of cats and dogs and, recording a video and searching for a video., its text to speech with an in-built speaker will give you information that you asked for by kick starting the prompt, by saying “Ok Glass”..
4.What you did so far is to do exactly what one would do in a fairly good display, wifi enabled, mobile device.
Let us say the cost has come down to $100 . Will you buy it and if so for what ?
5.Some were asking if they had power glasses, can the still use “google glass” on top of that. Some guy tried it and that looked like, the GG is designed to fit people with power glasses. Good good.
6.This doesn’t look anything like a of transformation from PC to laptop or laptop to hand held device. There is no “True disruption” that I could see.
Although, the current release is being projected as an ‘exploratory” release, a few years from now, given the premise of its functionality, there isn’t much to be anticipated. Other than speed at which GG brings back search results, or having more resolution on the camera and battery life, may be some deign changes…there isn’t much that could be seen ‘looking through the blue glass’. There is a big news media publication working on delivery news headlines to google glass. Who would want that? Not sure. API’s for developers have started getting crowded though and so are the downloads on the mobile applications. Not sure as what the people downloading the app are doing with it… because the number of downloads exceeds the number of glasses released by almost twenty or thirty times.

Irrespective of all this.. I defenitely see one good and real good application to Mr. GG.

“To raise new questions, new possibilities, to regard old problems from a new angle, requires creative imagination and marks real advance in science.”
— Albert Einstein

Posted in Big Data BigData Cloud | Leave a comment

Big Data Visualization

Fun Stuff.  As visualization takes ‘shape’,more data is thrown to the wall and interpreted.
More applications are erupting from within the Big Data community and irrespective of all the developments, the mass has been simply sitting and clicking the hell out of machines. The containers have ruptured and there is no more long legs to run for the evening. Just remembered “Salvodor Daly”.

Hadoop has exhausted and new algorithms are being talked about. Drill is gaining ground. For that matter, Drill’s run to solve the problem of the ever blotting mass of data has been with faster legs. Let us wait and see.

Research using mobile devices on Big Data is feeling human face. Read first the Rick Smolan, Jennifer Erwitt story.

Ok. Have fun here. The Preview to The Big Data Visuals.

Remember the two key strokes. Pass it on. Give me a few if you can. Wont be heavy to take in. (See the preview to know what the two key stroke is)

If I have seen further it is by standing on the shoulders of giants. -Issac Newton

Thank you.


Posted in Big Data BigData Cloud | 1 Comment

Statistical analysis tools may not be where the solution is. Good lord, how long will these rituals continue?

As more door opens up in the financing world for ‘Big Data’ tools, the existing tools provided by SAS, R-Programing, or tools that spit tabular data into excel spreadsheet, provides just statistical data. Not only that, these existing tools are not designed to accept the humongous data sets waiting to be consumed, arriving at high speed pipes in the form of requests. Disruption has taken place not by virtue of innovators and intelligent systems, but by the great demand that Big Data has put forth. Hard to handle the weight, tapes are being utilized traditionally and carried out as ritual acts. Yes, they like to forget about it, however, compliances and security aspects wouldn’t allow them to.  With more globalization occurring and confusions existing among data carriers and the ever increasing challenges in the telecommunications field, all demanded for disruption!

The OracleOpenWorld which will begin in SFO, starting sept. 29th, has attracted about 50,000 attendees from 123 countries and expected online attendees of about 1 Mill. Here is a little “big” data about the past event:

      There were 3,570 speakers

      40,942 Seats for sessions

      142,000 cups of coffee

      95,000 Sodas

      63,000 Lunches

      42,000 Gallons of Water

This year sessions include “Big Data”, a few of them along with MySQL related sessions, Oracle Exalytics etc, which can shed light on the issues of big data. Any evolution of the field of science of religion travels through the path to happiness or travels through the path to evil. With the Big Data hype growing on a day to day basis, there is “much ado about nothing” companies floating too.  It is anticipated that there will be discussions on the big data, the required preparedness to perceive the scams and debunk the myths. One interesting thing on the session’s list, is the union of Oracle Times Ten fuse with Exalytics. Oracle Times Ten is supposed to be an in-memory cache or full-featured, memory-optimized, relational database.

Predictive analytics provide much needed customer information puffed out of a data warehouse systems traditionally. Usually that is what is currently being retrieved by tools. Other than helping with some upselling items for instance or maximum or peak times of purchase or money matters? At times, patterns can be detected on stored data. When big data is involved, then it becomes the question of anacondas against viper. Viper fangs are poisonous. It can spit poison. Ancondas are nonvenomous. Well, it’s this way, when it comes to big data, it is a question of Godzilla against a monkey. Not the code monkeys.

As per the readings, todays current tools do not have the bandwidth to deal with Big Data. With existence of hadoop and its acceptance within the development community and NoSQL discussions, more tools are wanted by the big data “dealers”.  Keep in mind, there are more players who have not much dealt with Big Data nor dealt with Data at all. Remember, data is secured and so how can a lot of them deal with it?  Having designed a few databases might not give you the insight to deal with the Big Data problem. In order to understand Big Data, you either need to have the core mind of a researcher or you have to get the blessings of actually seeing and perhaps touching it. Only these two can see or visualize or perceive the problem of big data.

Pardon the architects, for they do not know what they are doing – Tible Quoting.

Welcome to the new age of code monkeys. Leave a comment and I will give you a chocolate. Hey, I like to hear what you have on big data, ok?

Nobody knew what would ever happen to the data slowly growing within enterprise walls.
Listen to these statements from the past….

      Man will not fly for fifty years – Orville Wright (1901)

      A Rocket will never leave the earth’s atmosphere – New York Times (1936)

      There is a world market for may be five computers – IBM’s Thomas Watson (1943)

      640K may be enough for anybody – Bill Gates

Dear weather man, once you had told us, it may or may not rain. Today, you there is possibility of rain. I have a story and I had dreamt it and will narrate it. Twinkle twinkle little stars, I still wonder how big you are….

Posted in Big Data BigData Cloud | Tagged , , , | 43 Comments

BigQuery – Google comes out from “INSIDE”; Who else & What more will we really see through ?

Google’s BigQuery’s public appearance, makes things even more complicated for deciding where to go for Big Data analytics. How big is really big? Can one hundred thousand records with mega-size data such as images or even one million records be called as big ? As they asked, how big a table can it be? How many fields etc.
When Oracle first came with Version 2.0 as their first database version without version 1, it was primarily to store unstructured data in a meaningful form. CIA’s interest in the software that can collect global information made it compelling enough to release Oracle and make the software more rigid. It was not clear to any then it seems, that there would be the problem of millions and millions of data which needed to be analyzed and perhaps   make sense out of it. Who could not have thought about the amount of data that could accumulate over course of time, for whether related information for instance. Similarly, for medical data, for product purchases, credit card usages, traveling etc etc. First it was the storing problem. How big can we store? then things changed. More storage devices evolved and they evolved cheap. Now, it is a much bigger problem; how can we make sense of what has been stored.
Demystification of noise among data, is a good thought. But at this point the concept itself has created much “noise” in the equation. (Hahah-A small laughter) The truth is out there!
BigQuery is simply put, an analysis tool for the Big Data. With huge amount of content residing within google’s walls such as 60 hours of video uploads per minute and 100 million gigabytes in search index and importantly,425 million gmail accounts produce and inject a lot of data into the thousands of servers of Google, world across. This repository can be tapped to attain result oriented queries in the discovery process. This is Big Query.
While discovery is important, the time to retrieve results and help in the discovery process, is also very important. This is different from batch oriented queries run by Hadoop. When it is said, different, it does’t mean “bad”. It is simply different.
Big Query seems to have been evolved from an internal system called Dremel. Dremel is now externalized to run queries on big data sets and is now called as Big Query.
What is interesting is, Dremel the internal Big Query of google, talks about the term, what they call as full table scan. This is supposedly done
to span the search across hundreds and hundreds of tables which resides on servers across. According to what is being generally talked about by Big Query experts, this search is far better than running a search on an indexed RDBMS. Hmmmm!
An example of such a query would be , what are the different applications running on google servers, for which a type is set to “Something” or somethis similar to above question.
Google, would have extensively used this tool to run across their servers. With huge content within their walls, when externalized, the usage would be interesting. The question still remains, why wouldn’t an RDBMS suffice the need? Why is it that the indexing and statistical updates on databases and optimization techniques help in fast retrievals? Big Query data sets has a limitation of 64KB for total field lengths per query.
Big Query internally the Dremel, was using SQL “Like” statements to pull data for analysis and it IS FASTER. But then, this is due to the underlying format of data that needed to be withdrawn for analytical purposes.

So what is BigQuery ? BigQ is all about Big Data available withing Google’s Cloud storage which is loaded to BigQuery and quired for results.
With the API, one can embed it within applications to enable users to fire the queries. With SQL Like statements run against a process which does a full table scan as opposed to indexed databases. Supported scripts/languages include Java, Python, .NET, Javascript and also Ruby. BigQ comes with a web based UI and the pricing as is understood would be charged on “per query” basis. Some of the examples of running a BigQ query would be to see the things such as, number of page views of wikipedia in a month and what subject perhaps; OR taking a look at all the works of Shakespeare existing and is known so far ? Interestingly, the total number of page views for a month on Wikipedia is about 6 Terabytes of data un-compressed and BigQ runs against that number as heard from the horses mouth.

A small “BigQ” I run from the examples was to see the biggest work of Shakespeare. Biggest in terms of maximum number of words. The query, gave a result as “Hamlet” with 32446 words “topping” the total published works of Shakespeare’s 42 books that he wrote.

What more should I see..

Many are the things that man seeing must understand. Not seeing, how shall he know what lies in the hand of time to come? – Sophocles

Posted in Big Data BigData Cloud | Leave a comment

Oracle adopts R-Programing for Statistical Analysis

Oracle commits itself to Big Data once more and deepens its presence in the “large volume” space. Everything pertaining to large volume of data residing disparately, is an issue. Relational databases have failed and given rise to NoSQL. Slowly the battle for the win is rising between SQL and NoSQL. The fight is worth watching.

While RDBMS has been called as traditional by NoSQL enthusiasts and that which does not fit well within the Big Data paradigm, RDBMS or SQL people have laid down numbers and has said that, without much complex joins or constraints, data cannot be made much meaningful. Even with much constraints, there has been large volumes stored in RDBMS, is the argument.

May what it be, monstrous Big Data is a winner and is seen, when SPLUNK fore-casted a cent loss and actually, in their earnings report, they actually gained.

Within these discussions, Oracle has taken another step in providing tools to solve the Big Data problem. It is like saying, I have various kinds of wrenches, choose one although they all provide the same functionality. R-Programming, created by John Chambers is a command line programming interface to work with statistical data. You use equations to derive statistical outcomes. It is supposed to work in synch with Oracle times ten in-memory database and other databases including Oracle 11g. Lookout for the “R” talk at

The link to download is here

One more step for Oracle, another step for opensource.

Posted in Big Data BigData Cloud | Leave a comment

NASA Plays song on mars..tuned to “Reach for stars”

The tune “Reach for stars” from rapper played on mars. The amazing capability of human brain that is capable of creating and enjoying music was talked about in the movie AI, by speilberg, when the men of the future, discover the AI lying deep in the ocean. Novel thinking.
The song, “Reach for the stars” by, rapper is an overton’ed or treated with “gain” special effects. Sound has been shown to vary with presence of medium or for that matter, AIR. Experiments have demonstrated the distortion of sound proportional with presence of a medium such as air. Thus, in mars, sound can vary and may sound totally different from earth obviously. Raindrops may sound different , the blow of wind would sound perhaps monstrous, the sound of a human may sqeek or may sound monstrous. In general, a music with a rhythmic flow of sound patterns may distort and may sound as though added with overtones in a music composition. The interval of time that the flow can sustain is yet to be heard.
The song of is a treated with “gained” effects, as you may hear. Some similarity of how it can be heard from being on mars.
Song was played as though it was being being played from the NASA lab/station ? Hope to hear how it would sound actually on mars. Given the fact that data from mars travels across a huge distance and takes hours to get downloaded, not sure if this can be expected. Voice is a “big data” and reducing noise with all the hops; who knows ?
Here is the video link to NASA listening  to the music being played on mars.

Posted in Big Data BigData Cloud | Leave a comment

Big Data & The Big Players

Is it David & Goliath or Goliath and David?

Big data has been gaining more popularity in the TAG world and in the world of synonyms for some time now. More innovation has always been seen within small organizations. With Hadoop being accepted among the developer community, with a considerable amount of learning curve,  it is important that all involved in “drilling” down on any kind of voluminous data, give importance to “Hadoop”. Now, Sqoop’s entry has helped enterprises in thinking more about their disparate databases. In database where one field embeds many other information such as “outcome” information of any actions, retrieving those information, sunk down the deep caves of enterprises  repositories is becoming less and less of an issue. More thoughts and joint thinking has to be inputted with the realms of what to do with the data or simply , one has to starting thinking of “How to make sense of this data”.

Large corporations such as Oracle ,Google,  IBM and other companies have begun brining out tools for the “living dead”-the Big data.

Googles  “Big Query” seems to be interesting.  Availability of this to the public is even more interesting.

Amazon elastic MapReduce is another effective way to extract information from massive volumes of data. How much this can scale is yet to be seen.

Similarly , IBM’s big data appliance is provided to retailers with Netezza; which is the IBM Netezza Customer Intelligence Appliance. The new Netezza appliance also seem to incorporate business-intelligence software. This could be a novel thinking. Retrieving and making sense of it.

Within all these , Oracle has brought a set of services which looks like the usual standardized Oralces way of rolling out things. Out Of The Box solution. A set of tools and services has been rolled out. How far this can be taken to the field depends mainly on the people who are involved.  But here is a look at all their tools and services and a brief introduction; all in one page.

Oralce looks at Big Data problem in three main areas, as it should be.  Within Big data problem, you inspect and retrieve, analyze and then present. So does Oracle;

Oralce Acquire Big data says, will acquire from Oracle NoSQL database and Oracle Database 11g.

The Organizing part of it consists of, Oracle Big Data appliance, Connectors and the oracle data integrator

The Analyze components include, Oracle advanced analytics, Oracle data warehousing, Exadata and Exalytics In-Memory Machine ? I mean “Machine” J interesting anyways.

The Appliance: seems to be integrated with cloudera, Hadoop and JVM in a linux composite.

The Data Integrator: SOA modules and ETL

The Connectors: Include the HDFS connectors and Hadoop loaders including something for data integrator application adapter for Hadoop. (Says it reduces MapReduce development efforts.) and OLTP.

Below are other common Oracle components that now is extended for Big Data

To think independently, it is not about drilling down some data and replicating the data mining done ages ago. This is drilling alright, but drilling for the future.

Do you want to be chat with me and become a mad thinker? Talk to me @ enterprise.architects AT I am online , yup.

Posted in Big Data BigData Cloud | Leave a comment

Big data analytics using clustering.

Big data or bigdata can fall into various dimensions. k-means algorithms can provide results that are previously observed. It can also provide certain patterns on available data that has occurred over disparate units. However, to attain actual analysis of large volume of data and attain valuable information, a preset of data units charted on a graph and looking at its movement over, for instance TIME, cannot provide desired results or “validation”. Some times, ‘external” and “internal” evaluations could be looked at as a dialectical approach to “problem solving”. However, it may not fit within the realms of huge volume of data sets; big data ; bigdata. Further more, the clustering quality may depreciate. Above all that, the distance between centroids could cause issues in analysis. This has distance problem has already been foreseen and new set of algorithms has started emerging. BIRCH approach could result in new algorithms being put to test.

Approach II [In the process of writing]

Posted in Big Data BigData Cloud | Leave a comment