Technologist, author, and speaker are just a few of the hats Sujee Maniyam wears. In our interview with him, we discuss big data and Apache Spark, better known as Spark, a fast and general open-source framework for large-scale data processing.
Sujee Maniyam has been developing software for more than 15 years and is the co-founder and principal consultant at Elephant Scale, a company that offers consulting training for big data technologies. He focuses on big data, NoSQL and cloud technologies. Maniyam is also an advisor and mentor for a couple of startups in big data space.
SkilledUp: How would you describe Spark to someone who hasn’t heard of it?
Spark is the next big data acute platform just like Hadoop was five years ago. That’s where Spark is right now. It’s getting a lot of momentum. The reason it’s so popular is because it’s a unified app, meaning you can do a lot of things within Spark itself. Hadoop was always a bunch of components working with each other.
Spark is a “unified stack.” It supports different programming paradigms within the same umbrella — batch processing, ad hoc queries with Spark SQL, machine learning, and stream processing.
What advice would you give someone who wants to master Spark?
The simplest advice, I would say, is to read the book [“Learning Spark,” O’Reilly] to get the big idea of the concept and then they have to spend some time actually practicing in Spark.
If somebody is a manager and wants to get a higher-level understanding of Spark, reading the book is fine to give them an idea about what Spark is all about.
To become an expert, the best bet is to do as much hands-on exercises as possible. It is also a good idea to learn Scala language. Spark and Scala go well together.
In your opinion, how can Spark be made more appealing to younger generations, such as college students and recent graduates?
People who know Spark when they graduate will have premium value. So, that’s a big motivation for someone trying to get into the technology field.
Another thing that makes it easier is Spark, right out of the box, supports three programming languages: Java, Scala and Python. What that means is it covers a lot of ground for developers.
For example, if I’m a data scientist and I know Python. I can actually write my code in Python and it will work in Spark.
Do they need to have prior experience with coding?
Some previous coding experience will go a long way because Spark has a bit of a learning curve, and if you already have a programming background you can pick it up pretty quickly. But if you’re just trying to learn programming in Spark, that might be a tough climb.
Are there any new advancements or major updates being made with Spark?
Spark is a very fast-moving target and the company behind it, called Databricks, they come out with new versions of Spark every quarter. So, it’s pretty hard to pinpoint the latest features because they keep changing. Every month there is something new. The best way to track Spark is the Spark website. Follow their lastest news to see the latest features on it.
Do Spark and Hadoop work together? Are they similar in a way?
Right now, they are complementary. Spark works well with Hadoop. However, we are sort of seeing some people moving away from Hadoop to Spark because it’s a bit simpler and more efficient.
What “sparked” your interest in Spark?
My company, Elephant Scale, specializes in big data technologies. We do consulting and training with big data technologies. We have been working on Hadoop, NoSQL and Cloud ecosystem. We started working with Hadoop and we did a lot of consulting and training around that. And then we noticed Spark as the initial “new kid on the block,” and we wanted to be part of it and leverage it. So, we tracked its progress and saw a lot of interest from our clients.
How did you master the skills necessary to start working with Spark? Was it easy because you already worked with Hadoop or did you have trouble working with Spark?
My familiarity with Hadoop helped quite a bit to get up to speed. I was learning Scala already, so that helped a bit, too. Spark works a little bit differently, but to me it was very easy to pick it up, because we already had experience with Hadoop. Once I understand the core concepts, I try to do as much hands-on exercises I can.
You’re doing a workshop on April 24 at the IoT Stream Conference. Can the attendees expect to learn about some of the topics that we covered or can they expect to learn something different? How are you going to approach the workshop?
The IoT workshop is a one-day workshop. Spark is too big to cover in one day. So, what I’ll like to focus on is giving out basic knowledge of Spark so they can go and explore it on their own. We’ll focus on some of the fundamentals of Spark. We will also cover Spark streaming.
Is there any aspect of your career that you want to accomplish or start working on but haven’t yet?
We’ve been focused on being big data experts in Hadoop. We figure out Hadoop and Spark as technology. As we know, in the technology world, five years from now something can replace Spark. So, we want to stay up to date with technology. We want to do better. From a company point of view, we want to expand the company.
On a personal level, I want to get a little more active on the startup scene and I want to do more mentoring and advising. I work with a couple of companies in the big data area and I want to do more.
Are there any accomplishments from your career that you’re most proud of?
On a professional level, I am happy with building up my company, Elephant Scale. We are a well-known boutique company in big data space. I really learned a lot from the challenges we faced in creating a small business, startup from scratch and I’m still learning.
My co-founder, Mark Kerzner, and I wrote an open-source book on Hadoop in 2013.
That was a novel idea and the book was well received. In a sense, it helped us launch the company and got us some recognition.
On a personal level, I am a dad of two girls. They keep me plenty busy. It is enormously satisfying for me to be able to be part of their budding lives.
Do you have any words of advice for someone who is struggling in this career, second-guessing themselves or struggling to learn the necessary material?
A career in tech can be particularly rewarding. However, one thing a lot of people are not prepared for in this career is you have to be constantly learning. For example, the last couple years we were doing something with Hadoop, now we’re working with Spark and the next two or three years it will be something else.
It’s sort of like you’re on a treadmill and you just keep running.
But this might not be everyone’s idea for a career. People want to learn once, get their degree, and then keep working.
If you want a career in the coding level, you need to be up to learning new technologies. And you have to spend more time learning new things to advance your career.