Crawling towards #BigData analytics, the 10 things to know.

What to do when you have a #bigdata project ?

  1. Analytics & Analysis has  been there since many years. Enterprises have been sucking in data from large monolithic systems called data warehouse since many years.  Once retrieved, the data went through phases of analysis for different purposes that included, sales forecasting, weather predictions, medical analytics and so forth that the term analytics has also gone through a paradiagm shift before it is now being viewed more from predictive analytics standpoint.  Time takes before a shift.  So predictive analytics is not just with bigdata. With bigdata a new kind of analytics is evolving.
  2. Proliferation of bigdata tools and services has begun in the bigdata world.  Data warehouse as a service was being talked about. But, with data security that is still not embracing “the openness” for obvious reasons, we have not yet seen success except from  giants such as Amazon, google, emc, Rackspace etc.  When talked about bigdata, there is a secondary term that needs to be queried along with the term bigdata and that is existing analytical data. Please review point number 1 on predictive analytics.
  3. Various tools since the inception of Hadoop provide some real cool value adds. Pig, Hive, Google’s NoSql, Cassandra and MongoDb, sqoop, Gora, HBase, Avro, combined with machine learning systems provide the pass-throughs to connect, search, filter, retrieve and manipulate data for further analytics.  Check out for oracle big data offering from big vendors.
  4. Some have tried to explain differences with business intelligence and replaced it with bigdata. While this attempt is good, they are both complimentary. Many fail to realize that there is already huge work done on analytics and so, bigdata work need not necessarily be a heavy weight lifting.
  5. While choosing tools, existing infrastructure must be studied. The study must focus on reports attained from business intelligence. This can provide insights into data patterns.
  6. Reports are on existing data. But reports provide insights. Reports on certain intervals or peak periods of data or on certain intervals can be detected on running complex equation based queries.  Used these perceptions for trials. Perceptions here can be reality.
  7. The first challenge today, irrespective of various tools within the enterprise and that is existing within the open-source world is to choose the tools. While this is less stressful, once after chosen, the deployment and infrastructure setup is the first major challenge today because, getting connection to the valuable data, and getting massive amount this time, becomes real challenging.  Here is where the data warehouse as a service, enterprises will lean more towards giants.  Please see point number 2.
  8. The crucial part of bigdata analytics involve creating test data during development. When creating test data, try NOT to replicate or avoid depletion of data points. This can create blocks in moving ahead especially when using PFP/mahout ; for further analysis of data. Who knows what can help.
  9. Furthermore, the test system must be thought of beyond a single machine. Most often for parallel processing, multiple nodes are involved. Especially when processing terabytes of data. Either case, when uploading from data repositories, multiple systems get involved anyways and therefore bigdata.
  10. Before choosing the tools, understand about the tool with respect to bigdata. There are many tools out there which but otherwise does something else. With the bigdata buzzword in action, many vendors incorporate capabilities for bigdata which is good. But that tools might not be the one you are looking for.  Once chosen, do a proof of concept. Engage engineers in rubbing shoulder to shoulder with bigdata people. Always good to collaborate. This is not a one person’s meal? Right?

    Thoughts beyond usual thinking.
    Many are the things that man seeing must understand. Not seeing, how shall he know what lies in the hand of time to come? – Sophocles

About Sunny Menon

Sunny Menon is a software engineer with over 18 years of experience in the design, architecture, development of high volume enterprise applications. He has experience enabling cloud environment for enterprise applications. Designed and developed a bigdata product which is currently in stealth mode. He has helped #startups evolve from conceptual stages through definition of the actual product by aligning them with industry requirements, developing proof-of-concept and demonstrating the product thereby, helping in seeking funding from financiers. He has extensive experience in the integration of large enterprise applications, middle-ware and modernization of enterprise applications centered around SOA/SaaS/PaaS/Cloud environments. He has an Android app available in the Android market place /Google Play called EasyImageSender, and an iOS app. He has also developed android/iOs apps for payment, medical and insurance industries. They can be searched with the key term "EasyImageSender" At night, he enjoys 'staring' at the night skies and sings, twinkle twinkle little star, how I STILL WONDER what you are.... He is a cruel poet who walks bare foot at times, to feel the beauty of the earth, he sometimes set foot on. Technical advisory to SOADevelopers.com
This entry was posted in Big Data BigData Cloud. Bookmark the permalink.

11 Responses to Crawling towards #BigData analytics, the 10 things to know.

  1. Josh says:

    Look for existing infra is a must I guess.
    but otherwise good.

    best
    Josh

  2. singbo says:

    For POC I would follow the steps as long as thereis time and money. Lot of times there is no money and as you pointed out poc becomes important.
    thans for the share sunny. Good red.

    Brad

  3. Remi says:

    what about cloudera, mapr etc? I think they may speed up things a little.
    Thanks for the share. One thing is I was not able to access your link for some time, could be that I hit it as soon as you shared it and there were other requests? pls check. Thanks – Remi

  4. Mohit says:

    Apache lucene could b a good one too.
    http://lucene.apache.org/core/
    Thanks,
    Mohit

  5. Fujitsu says:

    Differences is good.
    Thanks for the share.
    Fujitsu

  6. Rezawat says:

    No Sqoop on it 🙂
    Thanks for the share.

    R

    • Sunny says:

      Yup. 🙂 No AKKA too ..Rezawat. Which part of the globe? Around? If so connect.

      • Rezawat says:

        Tuk Ba Ku ra bp — I am there now.
        Will be back home sweet home to Bupan anco ofs next week and at which time I will send you an email.
        uoy knaht
        Rezawat

  7. Sangeeta says:

    Great points. Some tools like cloudera very useful to quickly bring up a env and do MR programs.
    Thks for sharing.

    Best
    Sangeeta

  8. Aks says:

    We did travel through the poc and failed to bring out the importance. Then came a vendor who wantd to sell all the tools in the world and nothing with bigD. We then went to a consultant who turned out to be a “one day stand” he got another assignment near home and ditched us. Then were were left out in the dark havng nothing but tons of links which all points to same docs. Ended up having to partner with different vendors and today we are still rolling out several programs some of them called MRs and business is still asking us if we can do something with Mr. BigD?
    Anyways.. good to read.
    AKS

  9. Viral says:

    All points as somebody said here relate good with what we have here. Data that need to be brought together and challenges as in your terms—real pain in our terms. It is just that the whole of the data is in so many places and difficult to be brought together in one to start work. Looking forward to reading your thoughts on consolidation and methodology.. meaning what should be consolidated and strategy.

    Thanks for sharing.
    Viral

Leave a Reply

Your email address will not be published. Required fields are marked *