Eight things in the design of Apache Spark hadoop ecosystem.

There 8 things while designing an Apache Spark enabled application. Porting to a SPARK hadoop eco-system is an important step that is dictated by the need for streaming capabilities and extreme speed of execution. Apache SPARK uses clustering algorithms and can be used with HDFS making it a composite architecture. Unless you understand the business process and the incoming data, it would be in-efficient to build such architecture. Remember, from bigdata volumes comes value and NOT traditional reports.

1.SPARK relies on in-memory execution of tasks and storage. Because of this nature, it is important that you design your system having this thought in mind. Processes need to be built with this in view.

2. These days, writing in Java could be more efficient from resource standpoint and from the point of view that Java does its own concurrency better. Just because you have several API’s built on SCALA it need not necessarily speed up your execution. Therefore it is worthwhile to think of writing in Java.

3. SPARK architecture, may it be in the cloud or standalone, as it uses the in-memory space for data and executors, think about the heap size. Increasing heap sizes continuously to get it executed may reduce efficiency.

4. Using User Memory is not recommended unless your architecture really demands it for some core extremely high speed streaming needs such as in the case of fraudulent activities where a huge segment is to be detected OR a failure of a system within your APPLICATION cluster.

5.Take advantage of Unified Memory Management. Spark 1.6.x and above needed. This type of management appears to be using memory in a more dynamic way where the executor and data can push the limits if needed rather than a failure.

6.Consider nodes as individual machines. This will help in your infrastructure planning because every Spark executor in an application has the same fixed number of cores and same fixed heap size.

7.Before using Mesos, consider using hadoop/yarn.

8.Architecture is an art. So imagine, understand, absorb,design, travel through the design, re-design and architect, test small, test big, implement by deploying it in cloud;perhaps this is an ideal case and go live.

Meet me at #DreamForce #df16 . Know how would it benefit you and how to fix the meeting at .

Thank you.

About Sunny Menon

Sunny Menon is a software engineer with over 18 years of experience in the design, architecture, development of high volume enterprise applications. He has experience enabling cloud environment for enterprise applications. Designed and developed a bigdata product which is currently in stealth mode. He has helped #startups evolve from conceptual stages through definition of the actual product by aligning them with industry requirements, developing proof-of-concept and demonstrating the product thereby, helping in seeking funding from financiers. He has extensive experience in the integration of large enterprise applications, middle-ware and modernization of enterprise applications centered around SOA/SaaS/PaaS/Cloud environments. He has an Android app available in the Android market place /Google Play called EasyImageSender, and an iOS app. He has also developed android/iOs apps for payment, medical and insurance industries. They can be searched with the key term "EasyImageSender" At night, he enjoys 'staring' at the night skies and sings, twinkle twinkle little star, how I STILL WONDER what you are.... He is a cruel poet who walks bare foot at times, to feel the beauty of the earth, he sometimes set foot on. Technical advisory to SOADevelopers.com
This entry was posted in Big Data BigData Cloud. Bookmark the permalink.

One Response to Eight things in the design of Apache Spark hadoop ecosystem.

  1. Sumitha says:

    sir i have a requirement where every time a new node has to be enabled. Like this way I like to install sometimes two or three as new requirement. Deploying it is a big pain and many tools I have to get approval from IT. What do you suggest the best way. I am ok to write custom scripts. I also looked at cloudera horton and couple of others who specificall does this kind of things. Also pls let me know which of such apps will be useful.
    thank you
    sumitha

Leave a Reply

Your email address will not be published. Required fields are marked *