Download Guides
Data Science

The Hadoop of Happiness: An Interview with Big Data Architect Mark Kerzner

POSTED 07/31/2014
By Jordyn Lee

Mark Kerzner is an author, trainer, member of Mensa, and is fluent in many languages. In our interview with him, however, we decided to focus primarily on big data and Apache Hadoop, an open-source software that allows for the distributed processing of large data sets across clusters of computers.1

Mark Kerzner has developed software for over 20 years in a variety of technologies (enterprise, web, high performance computing) and for a variety of verticals (healthcare, obstetrics and gynaecology, legal, financial). He currently focuses on Apache Hadoop, which is commonly referred to as just Hadoop, Not Only SQL (NoSQL), and Amazon Cloud Services. Mark also conducts Hadoop training for individuals and corporations.

Most of us are new to Hadoop. What is it?

Hadoop is the “glue” that puts together a number of computers, and makes them work together into a Hadoop cluster. You write a task in Java or another language, and Hadoop distributes the task to every computer in your cluster. Now you can store and process practically unlimited amounts of data.

Many people are trying to figure out their careers. What would you say to someone considering learning the skills to become a Hadoop programmer?

Mike Tuchen, the CEO of Talend, once said that as soon as a developer learns Hadoop, his salary doubles. I’d like to add something to that: And then the developer leaves to work for a better company. That’s a bit of a joke. But it’s true. The days of using just one computer are long gone. It just makes sense to program on a cluster. And Hadoop is the de facto standard for cluster programming. Which makes Hadoop programmers so valuable.

What is the most effective way to learn Hadoop?

I suggest learning it the hard way. By that I mean start by doing. Don’t copy and paste instructions; instead, type them all in. Start with an overview such as Hadoop Illuminated,2 a free book that I co-wrote with my colleague, Sujee Maniyam. Then construct your own clusters on Amazon Cloud Services. Run some sample code. Then write your own code. Try to start a project on GitHub or another hosting service.

If you want an Hadoop-based project to learn from, one that uses most of the technologies a programmer would need to start with, you can use my FreeEed.3  It’s  an electronic discovery project for lawyers. It enables them to search for documents that are relevant to a lawsuit—but people other than lawyers can use it too. It took me three years to create, and now it’s getting popular, which is great. Since it’s open source, it found its its way into places where closed commercial products—by that I mean proprietary products—don’t venture.

Mark Kerzner uses a popcorn metaphor to explain how FreeEed relates to electronic discovery for attorneys. He says that a corn is a lawsuit and FreeEed is a popcorn maker.

Mark Kerzner uses a popcorn metaphor to explain how FreeEed relates to electronic discovery for attorneys. He says that a corn is a lawsuit and FreeEed is a popcorn maker.

How did you gain the skills you need to be successful in your current job?

I started in Hadoop by doing a legal project that outgrew one computer, so I looked around and found Hadoop, and it went on from there. Soon after that, big data became even more enticing to me.

But I gained my skills in Hadoop by failing. I was building a big Hadoop cluster, and it was taking too long, and I got fired. Since then, I’ve counted all of the clusters I’ve built, and it’s now over 75.

What does your typical day look like?

Around 8 a.m. I have my first scrum meeting with the financial startup that I’m working for. From then on it’s non-stop coding and meetings and email until about midnight. Not that I work all of the time though. I take breaks, including a ninety minute workout each day except Saturday. On Saturday I rest.

What is your greatest accomplishment?

I have a two. Creating FreeEed. And bringing the Houston Hadoop Meetup4 from zero to 330 people—that took me four years.

Do you spend much of your time acquiring new skills?

Absolutely. I like teaching courses about topics that I didn’t previously know about—that way I get to learn, write, and deliver.

What’s your opinion of using professional certificates or a degree to get ahead with Hadoop?

I definitely recommend it. Not because it guarantees that you’ll have the knowledge, but because it shows potential employers that you spent a lot of time and effort—and maybe money too—to further your skills. As an example, later today I’m taking a Cloudera Certified Administrator test5 so that I can be eligible to teach Hadoop to a certain company. Certifications and courses help, especially today when there seems to be few agreed-upon standards and not all that much practical experience.

What trends in Hadoop are you excited about?

When I teach, I usually make a prediction that within 30 days something new and unexpected will be announced in Hadoop. And I’m often right. One time it was Intel making a major investment in Cloudera.6 Another time it was secure Hadoop with Cloudera Sentry. There are frequently new things coming out, such as University of California, Berkeley’s Spark and Shark for real-time in-memory Hadoop,7 and that’s where a lot of attention is concentrated right now—real time Hadoop.

If you could go back five years, what would you do differently? Or put another way, what would you tell a teenager who wants to be like you when they grow up?

Once I kept 46 Amazon servers up for 4 days, forgetting to shut them down. That’s probably my worst mistake, and if I could go back in time I would definitely not make that mistake again.

In terms of advice, everything you do is always harder than you think, so if you know how much work it will be up front, you probably won’t do it. So my advice is, just look at the day ahead of you and concentrate on that one task that you want to accomplish. Once in a while, glimpse ahead to the complete road in front of you, but don’t plan too far ahead.

Working hard is a given, at least of me, and maybe some other people are more lucky, but I haven’t met anyone who has succeeded by not trying.

What are your three favorite work tools? Why?

I happen to like NetBeans as an integrated development environment (IDE), and although both Eclipse and IntelliJ IDEA are very close, I prefer NetBeans. I also cannot imagine developing with Git and Maven.

Who do you follow for inspiration and learning material?

I used to read around 300 RSS entries a day, but now that Google Reader is gone, I don’t feel the void. News percolates, and I absorb it by osmosis through friends. I follow a few people on Twitter, but one guy who always writes well is Matt Asay.8

What’s next for you?

There are two big things that I’m focused on. First, it’s constant learning. Our training practice forces us to learn non-stop. Our customers demand new courses. Sometimes they want courses that are customized to what they already know. And sometimes they want us to teach them something that they don’t know. So we have to dig up and deliver the new information to them. This keeps us on our toes, and it’s hard, but it’s worth it.

Second—and I can’t say too much about this—is we’ve just been awarded a grant by a government defense agency to help them find the bad guys on the internet. We’re going to be using our product, FreeEed, which is exciting.

So, learning and products that implement our passion—that’s the future.

Anything else you’d like to share with those following in your footsteps?

When somebody asks if you can do something, first say yes, and then think of how you’re going to do it. Try it. See if it works. If it doesn’t, keep trying. And then teach everyone around you the same approach. But you have to do it first, to prove the point and to be the example. Then it becomes contagious.

Thank you Mark Kerzner for taking the time to share your experience and insight. Those who wish to follow up with Mark can connect with him via LinkedIn.

Footnotes
  1. What is Apache Hadoop?[]
  2. Mark Kerzner and Sujee Maniyam’s book, Hadoop Illuminated, is available free from hadoopilluminated.com.[]
  3. Mark Kerzner’s FreeEed is available free from github.com/markkerzner/FreeEed and freeeed.org/index.php/download.[]
  4. Mark Kerzner is one of the organizers of the Houston Hadoop Meetup Group.[]
  5. Search for Cloudera Hadoop courses on SkilledUp.[]
  6. For more info, refer to “Intel’s $740 Million Stake Puts Cloudera at $4.1 Billion, by Serena Saitto, March 31, 2014, Bloomberg.[]
  7. For more info, refer to shark.cs.berkeley.edu.[]
  8. Matt Asay is VP of Marketing and Corporate Strategy at MongoDB, and a columnist for ReadWrite: Twitter.com/mjasay.[]
Jagged Edge Media JAcom Consultants