Wrong turn of BigData -Ten Reasons and how BigData will take a big leap!

Enterprises are taking every step to tap into the realms of bigdata. How much is bigdata has already been debated. It seems, there are valuable and futuristic information hidden within big data clusters, no doubt –  Old jungle saying. If so, then why is that, this valuable information is unable to be retrieved. Speed is not a problem anymore. Even if speed is a problem, people may be ready to wait a while, to know “The Future“. Marketing cut throats, business greedy owners and others who have nothing to loose and everything to earn, have shifted the needs to something else and focus to elsewhere in space. Technological erosion is due to this fact, wherein personal needs and vested interests led people, in doing deeds and buying things, which people don’t need it in the first place. A man buys the material he never ever wanted and would never be needing it at all, in his life time. All done through the influence of the powerful media. Nevertheless,the  big data implementations currently exist are nothing but retrieval of traditional reports. Projects ripple across CIO’s but finally find its resting place within IT and becomes an IT project. Lost in translation, the CIO is now provided with traditional reports. Here are top ten reasons, BigData will take a second jump soon.

1. Requirement for experienced talent pops up. For some time, companies have been focusing on recruiting fresh talent. While this is a great “venture”, it can also cause what can be called as “beginners fall“. Not knowing the complexity of applications, these fresh recruits can go ahead and create solutions that may not solve the actual problem.  But this trend seems to be slowly diminishing, especially within the #bigdata world. People seem to be looking for experienced hands. Conferences are filled with long bearded masters in the domain. Requirements are flooded with minimum eight year experience statements.

2.In the footsteps of a giant:So far traditional reports were being provided to the office of CIO and Marketing, business and other stake holders. With more experienced hands working on #bigdata, analytics, look and feel seems fresh. There is still time to mature, but it appears to be in the right path. Giants have already reaped the fruit of #bigdata, because investment was heavy and they have seen the rise of big data, years ahead of many. While they themselves are learning, many lessons have come out.

3.Bigdata should not be an IT project. Infrastructure tools are plenty. But with huge data that requires processing, even with MapReduce, this task was not understood in reality. Distributed systems and coding with that paradigm has always been a challenge. With more experienced technical hands, this challenge is being met with. People have understood the potential of MR’s and other distributed processing systems. Multi-core programming concepts and capabilities, are also becoming helpful.

4. Service enablement API provisions. Data always resided within several sub systems within an enterprise. With more integration between business process and more robust API frameworks available and with distributed processing capabilities, data retrievals, processing and analytics have become more seamless than ever before.

5. Technological challenges: A fair number of tools have already been made available to developers. Although infrastructure tools dominate, applications that focus on domain specific data is being released from time to time. This helps organizations in achieving #bigdataanalytics. There are a lot of confusions here. With  big data hype in the market, all kinds of companies are embedding the keyword big data and internet is proliferated with such useless documents that has no relevance to big data. RSS syndication may be of some help with specialized searches. Yet, challenge on accurate information retrieval exists.

6.Data Scientists have entered the market with data mining knowledge and not very much focussed on big data. Although we see challenges in the usage of big data tools by these data scientists, we still have many algorithms, complex data science related concepts evolving that may provide better analytics. There is always a tendency to go back to traditional reports as opposed to analytics. Overcoming this trend need to take place.

7.Data accessibility: For achieving accurate analytics, not only technology alone or complex algorithms or mathematical models and visualization capabilities would not suffice. Most important factor, DATA must be available within hands reach. With extensive compliance needs and disparately residing data and most importantly, unstructured and semi-structured data within enterprises,  will make it hard on enterprises to give the data out for processing, even internally.

8.BigData as a service:-  Bigdata means analytics and not report. This will be a huge challenge, given the point 7 above. However, this possibility of having to get the job done on a service based approach is a huge benefit because, this reduces tremendous research, design and development time put forth by enterprises and they can simply rely on third party vendors who may have done this job anyway. But with point number 7 in perspectives, this handover or usage of such software by enterprises will be an issue. But because of selling pressures within big data vendors, these saas models will turn into outright traditional licensing models very quickly. This change in licensing model may perhaps help in taking big data analytics to next level.

9.Hidden Data:  The secret of business is to know something that nobody else knows ~Aristotle . Given that the data resides in unstructured form, arrival of data within the enterprise through sensory devices , to get to what is not known what resides there requires expertise. Data scientists should be relied upon. Mathematical models and probability thought models may only be a beginning point. Abstraction of probability has to be broken open to derive patterns or predictability. Predictability may not be the only derivation that people should go after. 

10.Breaking open the technological challenge: Bigdata projects always spin through different business units. Most often , it arises from the office of CIO. Many evolve from the demands of marketing and sales. May what it be, today, it all ends up within the engineering/IT division. This flow need to stop. BigData should not be an IT project. If it becomes an IT project, what we will get back will be nothing but reports. Tools must be designed and developed by IT but the final attainment of results must be done by the stake holder. Moreover, big data should be an enterprise wide project and not segregated within certain business units. Therefore infrastructure for dealing with big data should be supported and mentored by engineering/IT ONLY and the owners must be the actual INDEPENDENT users. The job must be delegated and executed by the stake holder rather than looking at big data as a IT project.

Posted in Big Data BigData Cloud | Leave a comment

The Abstractionism of probability theory.

This discussion on ‘The Abstractionism of probability” is perhaps one of the first in the world to be discussed publicly. It has to be understood that, this discussion has evolved out of various other discussions with mathematicians, philosophers, doctors and engineers and many other participants including rappers, mainstream musicians, artists, actors and actresses. Filmmakers where never touched. Because this was the subject matter for a documentary, to keep its serenity and purity, no filmmakers of any kind were interviewed. The film is in the making. Please read on.

From time immemorial, probability theory has been in existence. Having its origin close to mathematics and many a times being discussed along side mathematics, the subject matter has been of great importance to the financial industry.  As many giant thinkers have stated, with time, concepts changes, with time, philosophies changes, with time, crap changes to meaningful terms such as the insanity of Galileo Galili who claimed that earth is round during the 17th century and when he was imprisoned. Through the bars of the prison, as he looked  at the solar eclipse, it is believed that he lost his eyesight. This earth is not flat, was an insane statement, that eventually turned into the greatest facts world will ever live with.  Many mathematicians, physicists and astronomers and what not, have all juggled with probability theories. Times-a-changin. Now, with #bigdata in perspectives, probability has taken a whole different meaning. But times-a-changin.

In faith there is enough light for those who want to believe and enough shadows to blind those who don’t.  Blaise Pascal

The genius Pascal himself, who had provided scientific interpretations in the early periods of the origin of probability has quoted on time.  Nothing is a constant. Yeah! I know about change and that is not the subject. What is abstract is probability talks about the percentage of occurrence of an event. I don’t need to define it. You know what this means.. or please check the wikipedia. So what the above really means is, there is a more likelihood of rain today.

What this means today,  in a world of predictability is, that the coin tossed can be either a heads or tails. Nothing more than that.

This mind game, has driven mathematicians and philosophers equally towards its fanciness and its nature of attracting inquisitive minds. Having its nature close to numbers and mathematical formulas, it has driven more people closer towards it.

A detailed look at history and its origin reveals that, many mathematicians, philosophers and physicists  have even formed equations as an attempt to drive closer to predictability of futuristic happenings as I derive it.

Probability theories have always been related to futuristic happenings or occurrence. Let us look at the most common example of probability. It describes about a tossed coin. It says, the probability of having the coin falling on head or tails is 50%. In today’s modern cognitive thought process this has no meaning. Given the tremendous possibilities of linear and non-linear nature of flow of occurrences or simply happenings, probability theories take a step down and has only a negligible amount of wisdom that can be attained on analytics; analytics of any nature. However, getting rid of probability theories can mean “another cave man born yesterday”.  For this purpose and for the purpose of mankind and for the sake of sanity, and of all, for the sake of history and time that analytical minds have come through, and for the man’s thought process that has evolved,  let us give it an appropriate place in the universe and call it.. a step prior to actual predictability or futuristic occurrence.

Thank you.

Posted in Big Data BigData Cloud | 2 Comments

Low Cost BigData Implementation-The Money that lies beneth.

On wrapping up a bigdata POC implementation project for a major retail organization. We were tasked with reviewing security and compliance issues with the same client. To a question, where is the money residing and buried within enterprises, we ask a consecutive question, what do you have to offer in #bigdata ? Our approach to the problem has always been from the architecture standpoint. Architecture matters. There is lot of data within enterprises. Especially with sensory data moving constantly and absorbing as much information as it can, enterprises are challenged with storing this information that can make sense eventually, if not today. This by itself presents a greater challenge. Data can be completely useless to the enterprise or at the same, data can be very useful. Storing unwanted data can cause overheads, especially when we talk about thousands of employees pinching their devices on a daily basis and in a minute by minute bases. Above all this, imagine the movement of these subjects transmitting coordinates to base station. All of the above opens doors and paves the way for tremendous possibilities. We recently delivered a major bigdata project for a retail company. Although the company had completed phase I of their bigdata project, what was being sought out was the second phase. This phase called for fraud detection. Bigdata plays a BIG role here.  We began with a proof of concept on bigdata. For the proof of concept with bigdata, unlike traditional application development, one huge factor that comes into ply here is data itself. This time the main focus will be to manipulate bigdata. When we think of data, we now think of containers beyond ordinary database systems. NoSQL comes into ply. When we think of data we think about data transfers. Data transfers are to be thought about because, data is not only residing as un-structured or semi-structured data, it is residing as huge chunks in systems outside in different security zones or different domain boundaries. And therefore in order for data to be manipulated, we need data to be transferred closer to the application that utilizes this data. The application that utilizes the data is not simple linear application anymore. Non-linear applications spawning across several threads across processes takes over memory segment in a parallel execution mode and starts executing data. Because of this reason, the design and flow of systems need to be carefully thought about and surpasses traditional design methodologies. Today we will see two different pages of flow for each processes and decision trees need to be carefully designed to deal with outcomes. Responsiveness is another factor. Although this responsive programming can be kept aside, a good idea is to think about it during the design phase. Here are some excerpts of big data implementation for the retailer. Here is where the money is.

  • Number of downstream systems touching the main application : 28
  • Repositories used in total : 4
  • NoSQL – MongoDB  (Follow me on twitter to know how to play with MongoDB. How to work on mongodb and scale mongodb -@sunnymenon)
    BigData stack – Cloudera, Hortonworks, DataStax, MongDB,  and a set of other tools, some being evaluated and other being discussed with vendors and therefore the floatation of clones. They will soon be deciding and one of our work is, to help them choose. POC is utilizing some of these vendor platforms.. but we have stated the need to drill down. It is important that one of the vendors be alongside in the bigdata realization path. We will talk more.

OUR STACK : Apache Hadoop on widnows. Bigdata made easy. Talk to us to know more about OUR STACK.  The Apache Hadoop Stack on Windows made by we developers. Easy to roll out, difficult to neglect. No IT Challenges. No questions asked. Nothing to sign up.

  • Our Team size 5 people
  • Began with discussions with stake holders – 12 hours on different days. Total of 35 people interviewed. Included CxO’s, Engineering managers, directors, developers, IT operations staff, network sys admins, database sys admins, data analysts, and architects and consultants.
  • Primary layout of the composite was defined. Presentation of the layout for poc – 8 hours.  This was to target known application where major part of data resides. Strategy was to utilize that data and thereby look at analytical data and at the same time pull in “some bigdata” from outside unstructured.. thereby “proving” accessibility to voluminous data and retrieval functionality, streaming etc.
  • Messaging layer played “big” time. Asynchronous message of bigdata has different meaning. Kafka can be thought about. But found it challenging to get information. Community strength is still growing. Stackoverflow is NOT overflowing. STORM the same. “One little pig” can be useful really, with HIVE is where the honey might be. All tools combined, the assembly should begin.  Call it the ORACLE WAY .. Engineered systems.  Good boy .. good boy.
  • Visualization is the key. Go beyond clustering, graphs to new MEANS ?
  • Interactivity another.
  • Equations come to the front end.
  • Probability theories are favorite tool of a data scientist.
  • Enterprises should go beyond probabilities.
  • Security and compliance audit. We didn’t touch on single sign on.  Look, ORACLE engineered systems is the way to go. But will ORACLE survive the greediness of innovation? or will they survive, Bigdata-base in the cloud and NoSQL in the cloud?
  • Clustering algorithms may not make much sense as randomness of data is too disparate and K-means accessibility may not be able to be assessed.
  • Pure solution approach – Credit from CEO in a direct email and quote about us in the brown-bag meeting.
  • TOTAL TIME TAKEN 8 weeks.
  • Evaluated Result : POC, inventory list and portfolio.  Data visualization modules for actual usage provided code and training to business analysts. Excel and R.
  • Total Charged to clients : <<ASK ME>> This is the total cost of bigdata project. I wouldn’t recommend enterprises or companies wanting to negotiate lower rates for a bigdata project. Look out for gains in technology that vendors brings in,  look for knowledge on the enterprise infrastructure vendors bring in, look out for appropriate deployment model, look out for how it serves a specific need. While cost savings is important, let that be secondary aspect in the beginning stages. Invest on foundation, reducing recurring costs, should be the principle as far as bigdata is concerned.

Should cloudera or hortonworks be your Enterprise bigdata architecture ? What about DataStax ? Where has SPLUNK gone? Ask that question again, should cloudera or hortonworks be your enterprise bigdata architecture or should it be mapR or should you engage in working with Amazon?

Many users ask if any single product can provide all the features necessary for the enterprise in dealing with BigData. According to whitepaper released by Computer sciences corporation, CSC, 2014 will leave us with two major players in the market. Further more they say, others will be either  acquired or make an exit. While the bigdata vendor wars appear to be moving in the direction depicted by CSC, we still have to wait and see how the necessity will take shape within enterprise. Meaning, what will enterprise try to achieve. Will they be relying on existing analytics and business intelligence only ? and make use of bigdata as another component? OR will they move beyond ordinary predictive analytics and elevate themselves to real-time predictions, converting those analytics into actionable items? Forget business intelligence and reporting.. think predictive.

~We Do Not Learn; And What We Call Learning Is Only A Process Of Recollection Plato

Hey , ya all have a great Friday, Saturday & Sunday Aal-right?

Posted in BigData Big Data Cloud | Tagged , , , , , , , , , , , , , , , , , , , | 2 Comments

BigData And The Cloud

BigData and the cloud, sounds like David & Goliath; but really could be Romeo & Juliet, if defined with a good deployment model.  If wired differently and haphazardly, it could turnout to be “Brutus & Caesar”.

For Bigdata deep packet specifications consortium site,  please visit here

There are two kinds of data, both human generated. It has to be appreciated that machines itself generates data, that is initiated primarily by human activity.  That type of data that is generated by machines and kick started by human activities are a different subject by itself. What we discuss here is about human generated data.

The  two further classifications at the top hierarchical level are as follows:-

  • Designated Data Generation and the other
  • Forceful-flow Data Generation

Designated Data Generation
An example of the Designated Data Generation (DDG) would be transactional system data within databases, log files pertaining to requests coming in to snatch transactions, a post on a blog specific site such as this where a definite number of users are expected as it is in a transactional system where a definite number of users can come in and minimum and maximum range of clickstream are pre-determined. If you look closely, this can be both structured and un-structured data.

Forceful-flow Data Generation
An example of Forceful-Flow Data Generation (FFG) are request logs coming into news sites, a public picture published on a public site, a query resulting in relevant search results etc. Requests are not coming in with pre-designed expectations. System would never have  expected such a request and by-passer circling around that particular area of the requesting station, providing more of such request data. This again generating activities based on clickstreams. Close look again says, that both can be either Structured or un-structured.

Within these two above realms, analytical modules should first and foremost categorize the data. Just like how the categorization is done against, structured, semi-structured and UN-structured data, prior to this categorization, modules coming in for the snatch, must categorize it as DDG or FFG.

No matter what you do, abstraction layers may be of some help, however, when talking about bigdata, in many situations, we deal with design specifics to a domain. Likewise, we also have to build interactivity. Interactivity is something we cannot evade incorporating it into the analytics design.  So now, we are becoming more and more specific in approaching the raw big data. These categorization and specificness of designing an intelligence application utilizing full potential of bigdata drives one to get the most out of it. Patterns can now be “designed to be detected” about and applications can be written targeted to inspect such patterns or even provide event trigger, that depicts a forth coming event such as the burst of a viral video or incoming of a booming season or an incoming natural calamity or even outbreak of an epidemic. Because data collection is specific and can be categorized, no matter what. Unless you are collecting data created in the upper surface of the Exosphere, from sound emitted out from earths surface, you can funnel down to the categorization mentioned above. Enterprises are advised herein to adhere to the above categorization and have a conceptual model and standardization defined.

Inference from above:
Apart from structured, un-structured and semi-structured data, let there be two more categorization mentioned above (DDG, FFG) and subsequent standardization defined. Already a domain will exist within enterprises. With these, vendors can design abstraction layers that can now be seamlessly plugged into the system and detect patterns, provide analytics or trigger events.

It has to be remembered that this standardization, should be defined at conceptual levels as enterprises are already following standardization for etching or writing to log files, for data transfers and storage purposes etc.  The good part of the story is that further standardization also exists at business levels as well, if we are talking about dealing with data. Existence of XML, JSON, EDI formats are already been used. Therefore, conceptual standardization at conceptual level for dealing with bigdata becomes rather easy. Once again, this standardization at conceptual level helps bigdata vendors providing intelligence and analytics services can not only defined the required abstraction, but now it becomes seamless to integrate.

Historical Evidences:
The growth of business to business and business to consumer thereafter and further expansion of business process to global markets, demanded enterprise applications to be integrated internally and externally. This presented a huge challenge by itself. Enterprises spend long time to not only write and deploy code, right from testing such integrations and coping up with constant changes in endpoint systems or source systems created overheads within integration engineers. Slowly enterprise integration vendors such as Tibco, WebMethods, Informatica, IBM Message Broker and Apache opensource  projects  and others proliferated the environment during phase I.
This development did not help, rather presented opportunities and thereby paved way for service orientation and what we are seeing today. Tools such as ESB’s, webservices, modernization of messaging systems evolved.
This state of the system, where it is, is attained or pushed to reach wherever it is right now, by virtue of natural requirements and demands, is because of nothing other than standardization. Standardization was infused into every aspect of integration. Data transfers, communication between source and destination systems like API calls to webservices, logging and error mechanisms, testing mechanisms and others, standardization ruled. Today, while traditional integration requirements exist and are being done, we are seeing that the existence of such standardizations are now, even helping in future directions.

The exponential growth of data has been overwhelmingly emphasized and repeated. This reiteration of the existing data and the growth seen is coming out from sources because of the value being seen but at the same time, not being able to do much with it today. People realize there is gold beneath, but difficult to dig it out.

Recomendations from standards body.
What the bigdata standards committee is calling for is to comply with standardization of existing data formats and thereby have these standards definitions in place for incoming data AND for applications comming for analytics. Let this be the phase I approach.

For more information on bigdata standards specifications and know-hows, please visit here. Business analysts/Data analysts click here.


Posted in Big Data BigData Cloud | 13 Comments

Crawling towards #BigData analytics, the 10 things to know.

What to do when you have a #bigdata project ?

  1. Analytics & Analysis has  been there since many years. Enterprises have been sucking in data from large monolithic systems called data warehouse since many years.  Once retrieved, the data went through phases of analysis for different purposes that included, sales forecasting, weather predictions, medical analytics and so forth that the term analytics has also gone through a paradiagm shift before it is now being viewed more from predictive analytics standpoint.  Time takes before a shift.  So predictive analytics is not just with bigdata. With bigdata a new kind of analytics is evolving.
  2. Proliferation of bigdata tools and services has begun in the bigdata world.  Data warehouse as a service was being talked about. But, with data security that is still not embracing “the openness” for obvious reasons, we have not yet seen success except from  giants such as Amazon, google, emc, Rackspace etc.  When talked about bigdata, there is a secondary term that needs to be queried along with the term bigdata and that is existing analytical data. Please review point number 1 on predictive analytics.
  3. Various tools since the inception of Hadoop provide some real cool value adds. Pig, Hive, Google’s NoSql, Cassandra and MongoDb, sqoop, Gora, HBase, Avro, combined with machine learning systems provide the pass-throughs to connect, search, filter, retrieve and manipulate data for further analytics.  Check out for oracle big data offering from big vendors.
  4. Some have tried to explain differences with business intelligence and replaced it with bigdata. While this attempt is good, they are both complimentary. Many fail to realize that there is already huge work done on analytics and so, bigdata work need not necessarily be a heavy weight lifting.
  5. While choosing tools, existing infrastructure must be studied. The study must focus on reports attained from business intelligence. This can provide insights into data patterns.
  6. Reports are on existing data. But reports provide insights. Reports on certain intervals or peak periods of data or on certain intervals can be detected on running complex equation based queries.  Used these perceptions for trials. Perceptions here can be reality.
  7. The first challenge today, irrespective of various tools within the enterprise and that is existing within the open-source world is to choose the tools. While this is less stressful, once after chosen, the deployment and infrastructure setup is the first major challenge today because, getting connection to the valuable data, and getting massive amount this time, becomes real challenging.  Here is where the data warehouse as a service, enterprises will lean more towards giants.  Please see point number 2.
  8. The crucial part of bigdata analytics involve creating test data during development. When creating test data, try NOT to replicate or avoid depletion of data points. This can create blocks in moving ahead especially when using PFP/mahout ; for further analysis of data. Who knows what can help.
  9. Furthermore, the test system must be thought of beyond a single machine. Most often for parallel processing, multiple nodes are involved. Especially when processing terabytes of data. Either case, when uploading from data repositories, multiple systems get involved anyways and therefore bigdata.
  10. Before choosing the tools, understand about the tool with respect to bigdata. There are many tools out there which but otherwise does something else. With the bigdata buzzword in action, many vendors incorporate capabilities for bigdata which is good. But that tools might not be the one you are looking for.  Once chosen, do a proof of concept. Engage engineers in rubbing shoulder to shoulder with bigdata people. Always good to collaborate. This is not a one person’s meal? Right?

    Thoughts beyond usual thinking.
    Many are the things that man seeing must understand. Not seeing, how shall he know what lies in the hand of time to come? – Sophocles

Posted in Big Data BigData Cloud | 11 Comments

7 steps in taking the #startup wagon up the successful hill

Slowly the funding scene has become better. Seeking funding from a VC in today’s market is not impossible any more. But getting funded, requires more rigid business model and a great product in today’s crazy market. Not only that, in order for the #startups to be successful, the team should work together to pull the wagon up the hill. These are easy steps, but need to be executed with a strong will. Given the #disruptive nature of the technology atmosphere, once the funding has been achieved, things only begin there. It is a pleasant beginning but there is a little more to it. What is seen today is, either #startups have been too much tied to product building and for that matter, thinks that the staff building should be given high priority. This deed, takes them to situations where they spend time and effort and due to this, they end up scrutinizing the candidates to the extend where either the candidates loose interest in the company or #startups feel that the candidate is not qualified enough. Here are simple to follow steps.

1. Have a big picture and make small modules adapt to suit the big picture.
Ceo’s, cxo’s, leaders should be bendable. Meaning, they should be able to adapt the strategies and business model more suited to fuse with changing market conditions. For instance you may have decided to build a machine which floated when needed. But when a product has entered the market like a flying car, it is imperative to change the product definition to beat the flying car.

2. Think of disrupting .
If the funding has been on a well defined product wherein, the strategy was not to #disrupt, this may not be possible. But still, it is a good idea to see if there can be something broken.. #disrupted. If so , why hesitate. Do it.

3. Invest on human resources. Don’t be picky.
To build a good product, you need good coders. Remember, unless you are building a new kind of rocket that can disappear by saying “Holo”, you don’t want to spend your hours of talking with the candidate. Let onsite interview be short and sweet. C’mon , candidate are coming to your site because you have already talked to them and now they are confident that they can do the work. Unless somebody is a real idiot, they won’t come to your site to spend three or four hours talking with your staff members, not known his or her stuff. If it is an idiot, you would have already known it in your phone conversation. Won’t it?

4. Embrace opensource.
There will be great support if you involve opensource. When we say involve opensource, it means, bring in not only tools and services that support opensource, but get community involved. This goes a long way.

5. Tie up left and right cautiously
Join hands and collaborate with business and technology partners carefully. Aggressive marketing strategies sometimes may take the product from one level to another level where by definition it may all look good. But to build such a product which suffices the need of marketing strategies may some times look like a fools paradise. Avoid it. Say, let us be simple to the marketing people.

6. Sow seed and reap. Invest on experienced hands a little.
Let experienced consultants rub shoulders with your hired staff. Work with recruiters. Hire professional services companies to do some of the work for you. This not only gets the work done faster, but shares their industry gained knowledge with your staff. This increases professionalism among newly hired,out of college members which #startups these days heavily rely on. Spend time with the recruitment companies/professional service providers in educating them with what you want. Measure VALUE by looking at return on investment from the human resources perspectives.

7. Be at easeShow easiness in proceedings with staff members. Projecting low funds available, or saying things like , we need to move our butts to succeed, or say, we are not getting the right people etc are simply negative statements. Do you agree?

Prosper and stay well.

Posted in Big Data BigData Cloud | Leave a comment

Nothing Much To Look Through The Google Glass.(For Now)

Google glass, like any other act of a giant, gets praised and glorified, tweeted with sore throats and posted on walls, by many. It gets talked about in various blogs, publications, news channels and what not? All glorified the looks and talked about value of google glass. It was clearly looking like marketing. Some talked a little bad about things like walking into a public rest rooms or a cinema theater wearing one of the GG’s., but none talked about its “NO VALUE”.
1. Cost of it is 1500 – No worry. It will be down to a few 100’s pretty soon and stay at a cost of, may be around $200.00 to $300 for couple of years. After all what does it have ? A 5 Mbps Camera, a Wifi connectivity card, yes it is small, yet.. and a projection display; this may cost money too, yet what does all this integrated, bring the value altogether ?.
2. Of the many demo’s that majorly demonstrated was, to capture images and display it to second person and make it feel like you are seeing through the eyes of the person wearing the GG. Some put it this way, you can find your way using GPS ; this is good nano technology look and feel; yet, how does it help ? I can get it using a good phablet and it doesn’t obstruct my eyes.
3. Some others said , if you are in the airport, you could ask the voice activated GG to show you flight timings. Other demonstrated the same old demo google demonstrated in the first place.. voice activated searches of cats and dogs and, recording a video and searching for a video., its text to speech with an in-built speaker will give you information that you asked for by kick starting the prompt, by saying “Ok Glass”..
4.What you did so far is to do exactly what one would do in a fairly good display, wifi enabled, mobile device.
Let us say the cost has come down to $100 . Will you buy it and if so for what ?
5.Some were asking if they had power glasses, can the still use “google glass” on top of that. Some guy tried it and that looked like, the GG is designed to fit people with power glasses. Good good.
6.This doesn’t look anything like a of transformation from PC to laptop or laptop to hand held device. There is no “True disruption” that I could see.
Although, the current release is being projected as an ‘exploratory” release, a few years from now, given the premise of its functionality, there isn’t much to be anticipated. Other than speed at which GG brings back search results, or having more resolution on the camera and battery life, may be some deign changes…there isn’t much that could be seen ‘looking through the blue glass’. There is a big news media publication working on delivery news headlines to google glass. Who would want that? Not sure. API’s for developers have started getting crowded though and so are the downloads on the mobile applications. Not sure as what the people downloading the app are doing with it… because the number of downloads exceeds the number of glasses released by almost twenty or thirty times.

Irrespective of all this.. I defenitely see one good and real good application to Mr. GG.

“To raise new questions, new possibilities, to regard old problems from a new angle, requires creative imagination and marks real advance in science.”
— Albert Einstein

Posted in Big Data BigData Cloud | Leave a comment

Big Data Visualization

Fun Stuff.  As visualization takes ‘shape’,more data is thrown to the wall and interpreted.
More applications are erupting from within the Big Data community and irrespective of all the developments, the mass has been simply sitting and clicking the hell out of machines. The containers have ruptured and there is no more long legs to run for the evening. Just remembered “Salvodor Daly”.

Hadoop has exhausted and new algorithms are being talked about. Drill is gaining ground. For that matter, Drill’s run to solve the problem of the ever blotting mass of data has been with faster legs. Let us wait and see.

Research using mobile devices on Big Data is feeling human face. Read first the Rick Smolan, Jennifer Erwitt story.

Ok. Have fun here. The Preview to The Big Data Visuals.

Remember the two key strokes. Pass it on. Give me a few if you can. Wont be heavy to take in. (See the preview to know what the two key stroke is)

If I have seen further it is by standing on the shoulders of giants. -Issac Newton

Thank you.


Posted in Big Data BigData Cloud | 1 Comment

Statistical analysis tools may not be where the solution is. Good lord, how long will these rituals continue?

As more door opens up in the financing world for ‘Big Data’ tools, the existing tools provided by SAS, R-Programing, or tools that spit tabular data into excel spreadsheet, provides just statistical data. Not only that, these existing tools are not designed to accept the humongous data sets waiting to be consumed, arriving at high speed pipes in the form of requests. Disruption has taken place not by virtue of innovators and intelligent systems, but by the great demand that Big Data has put forth. Hard to handle the weight, tapes are being utilized traditionally and carried out as ritual acts. Yes, they like to forget about it, however, compliances and security aspects wouldn’t allow them to.  With more globalization occurring and confusions existing among data carriers and the ever increasing challenges in the telecommunications field, all demanded for disruption!

The OracleOpenWorld which will begin in SFO, starting sept. 29th, has attracted about 50,000 attendees from 123 countries and expected online attendees of about 1 Mill. Here is a little “big” data about the past event:

      There were 3,570 speakers

      40,942 Seats for sessions

      142,000 cups of coffee

      95,000 Sodas

      63,000 Lunches

      42,000 Gallons of Water

This year sessions include “Big Data”, a few of them along with MySQL related sessions, Oracle Exalytics etc, which can shed light on the issues of big data. Any evolution of the field of science of religion travels through the path to happiness or travels through the path to evil. With the Big Data hype growing on a day to day basis, there is “much ado about nothing” companies floating too.  It is anticipated that there will be discussions on the big data, the required preparedness to perceive the scams and debunk the myths. One interesting thing on the session’s list, is the union of Oracle Times Ten fuse with Exalytics. Oracle Times Ten is supposed to be an in-memory cache or full-featured, memory-optimized, relational database.

Predictive analytics provide much needed customer information puffed out of a data warehouse systems traditionally. Usually that is what is currently being retrieved by tools. Other than helping with some upselling items for instance or maximum or peak times of purchase or money matters? At times, patterns can be detected on stored data. When big data is involved, then it becomes the question of anacondas against viper. Viper fangs are poisonous. It can spit poison. Ancondas are nonvenomous. Well, it’s this way, when it comes to big data, it is a question of Godzilla against a monkey. Not the code monkeys.

As per the readings, todays current tools do not have the bandwidth to deal with Big Data. With existence of hadoop and its acceptance within the development community and NoSQL discussions, more tools are wanted by the big data “dealers”.  Keep in mind, there are more players who have not much dealt with Big Data nor dealt with Data at all. Remember, data is secured and so how can a lot of them deal with it?  Having designed a few databases might not give you the insight to deal with the Big Data problem. In order to understand Big Data, you either need to have the core mind of a researcher or you have to get the blessings of actually seeing and perhaps touching it. Only these two can see or visualize or perceive the problem of big data.

Pardon the architects, for they do not know what they are doing – Tible Quoting.

Welcome to the new age of code monkeys. Leave a comment and I will give you a chocolate. Hey, I like to hear what you have on big data, ok?

Nobody knew what would ever happen to the data slowly growing within enterprise walls.
Listen to these statements from the past….

      Man will not fly for fifty years – Orville Wright (1901)

      A Rocket will never leave the earth’s atmosphere – New York Times (1936)

      There is a world market for may be five computers – IBM’s Thomas Watson (1943)

      640K may be enough for anybody – Bill Gates

Dear weather man, once you had told us, it may or may not rain. Today, you there is possibility of rain. I have a story and I had dreamt it and will narrate it. Twinkle twinkle little stars, I still wonder how big you are….

Posted in Big Data BigData Cloud | Tagged , , , | Leave a comment

BigQuery – Google comes out from “INSIDE”; Who else & What more will we really see through ?

Google’s BigQuery’s public appearance, makes things even more complicated for deciding where to go for Big Data analytics. How big is really big? Can one hundred thousand records with mega-size data such as images or even one million records be called as big ? As they asked, how big a table can it be? How many fields etc.
When Oracle first came with Version 2.0 as their first database version without version 1, it was primarily to store unstructured data in a meaningful form. CIA’s interest in the software that can collect global information made it compelling enough to release Oracle and make the software more rigid. It was not clear to any then it seems, that there would be the problem of millions and millions of data which needed to be analyzed and perhaps   make sense out of it. Who could not have thought about the amount of data that could accumulate over course of time, for whether related information for instance. Similarly, for medical data, for product purchases, credit card usages, traveling etc etc. First it was the storing problem. How big can we store? then things changed. More storage devices evolved and they evolved cheap. Now, it is a much bigger problem; how can we make sense of what has been stored.
Demystification of noise among data, is a good thought. But at this point the concept itself has created much “noise” in the equation. (Hahah-A small laughter) The truth is out there!
BigQuery is simply put, an analysis tool for the Big Data. With huge amount of content residing within google’s walls such as 60 hours of video uploads per minute and 100 million gigabytes in search index and importantly,425 million gmail accounts produce and inject a lot of data into the thousands of servers of Google, world across. This repository can be tapped to attain result oriented queries in the discovery process. This is Big Query.
While discovery is important, the time to retrieve results and help in the discovery process, is also very important. This is different from batch oriented queries run by Hadoop. When it is said, different, it does’t mean “bad”. It is simply different.
Big Query seems to have been evolved from an internal system called Dremel. Dremel is now externalized to run queries on big data sets and is now called as Big Query.
What is interesting is, Dremel the internal Big Query of google, talks about the term, what they call as full table scan. This is supposedly done
to span the search across hundreds and hundreds of tables which resides on servers across. According to what is being generally talked about by Big Query experts, this search is far better than running a search on an indexed RDBMS. Hmmmm!
An example of such a query would be , what are the different applications running on google servers, for which a type is set to “Something” or somethis similar to above question.
Google, would have extensively used this tool to run across their servers. With huge content within their walls, when externalized, the usage would be interesting. The question still remains, why wouldn’t an RDBMS suffice the need? Why is it that the indexing and statistical updates on databases and optimization techniques help in fast retrievals? Big Query data sets has a limitation of 64KB for total field lengths per query.
Big Query internally the Dremel, was using SQL “Like” statements to pull data for analysis and it IS FASTER. But then, this is due to the underlying format of data that needed to be withdrawn for analytical purposes.

So what is BigQuery ? BigQ is all about Big Data available withing Google’s Cloud storage which is loaded to BigQuery and quired for results.
With the API, one can embed it within applications to enable users to fire the queries. With SQL Like statements run against a process which does a full table scan as opposed to indexed databases. Supported scripts/languages include Java, Python, .NET, Javascript and also Ruby. BigQ comes with a web based UI and the pricing as is understood would be charged on “per query” basis. Some of the examples of running a BigQ query would be to see the things such as, number of page views of wikipedia in a month and what subject perhaps; OR taking a look at all the works of Shakespeare existing and is known so far ? Interestingly, the total number of page views for a month on Wikipedia is about 6 Terabytes of data un-compressed and BigQ runs against that number as heard from the horses mouth.

A small “BigQ” I run from the examples was to see the biggest work of Shakespeare. Biggest in terms of maximum number of words. The query, gave a result as “Hamlet” with 32446 words “topping” the total published works of Shakespeare’s 42 books that he wrote.

What more should I see..

Many are the things that man seeing must understand. Not seeing, how shall he know what lies in the hand of time to come? – Sophocles

Posted in Big Data BigData Cloud | Leave a comment