Tech Journey

MySQL SQOOP Import Fails on HDP 2.3


It Looks like SQOOP-1400 error has not been resolved with HDP 2.3 Distribution which comes with SQOOP 1.4.6

I try to run a SQOOP MySQL import and got the exact same error.

Looking under the covers  RHEL 6.7 and CentOS 6.7 i see the same problem persists

The Sqoop Config is at /usr/hdp/current/sqoop-server/lib



The jar points to a symbolic link in /user/share/java


So Now to the important question of How to fix it.

Step 1 - Download the mysql connector.The current platform independant version is 5.1.36

https://dev.mysql.com/downloads/connector/j/

Step 2 - untar the file and copy to /usr/share/java

Re-point the Soft link that points to the older version

[root@ash java]# ln -nsf mysql-connector-java-5.1.36-bin.jar mysql-connector-java.jar




Your mySQL Sqoop Import should work fine now.

I still would like to know if i missed something in the installation or configuration , since the JIRA seems to be resolved and fixed.

TOAD for Hortonworks 2.3

TOAD is a tool that most of the folks who have done data analysis and Database development are familiar with.
With the advent of Big Data Technologies , some amount of analysis has shifted to the Hadoop space.
So i was really excited when Toad for Hadoop was available for download. For a data Analyst , this meant, not having to worry about where he is connecting to (Relational Oracle DB Vs  Hive Hadoop Metastore) . Use the familiar TOAD interface to run SQLs (or HQLs for the puritans) against Hadoop.

The environment i have is a Hortonworks HDP 2.3 platform on CentOS 6.7. Officially TOAD doesnot support HDP , but Hortonworks being a pure Hadoop platform ,with no customizations , it should work nevertheless.

Brad Wulf (@bradwulf) has a detailed post about how to set up Toad against HDP 

http://www.toadworld.com/products/toad-for-hadoop/b/weblog/archive/2015/07/24/toad-for-hadoop-does-39-nt-officially-support-hortonworks-here-39-s-how-to-connect-to-it


but i intend to go one level detail checking every config and ports to make sure what to look for in case you get an error, which is pretty much what i had to go through to get this up and running

Step 1 : Verify your Environment 

Step 2 : Ensure Cluster is up & Running by launching Ambari. You may do this from the same machine you are running toad. This will ensure that your cluster is accessible from your machine and no firewall issues exist. (Hint - Check the resource manager and Job Tracker UI's as well in 8088 and 19888 ports)


Step 3 : Verify the Resource Manager Port. This is the only difference in setting while configuring the TOAD ports . The default configuration sometimes picks the port as 8032 , vs 8088 which is default for Hortonworks.This can be verified from Ambari. Go to YARN Service > Configurations >Advanced yarn-site  > yarn.resourcemanager.webapp.address .




Step 4 - Launch the Toad Application and go through the configuration Steps



Once you get all the green checks, you are ready to roll.

You can set the execution engine to Map Reduce or Tez using the command. This shall be at a session level and willoverride what is set in Ambari. With tez engine proven to run faster in many cases , i use it by default and do see a difference in elapsed time.  On the right hand side you can see the proprty of each query and the mode it was executed on , among other properties.

set hive.execution.engine=mr;
set hive.execution.engine=tez;



I love the intellisense help that TOAD gives me.

Happy HQL’ing :)

Will play with the data transfer next and post my notes.

My CBIP Experience

I got the opportunity to attend the TDWI in Chicago in May 2015. End of March (roughly 45 days prior) is when the plan was finalized and that’s when I decided to jump on the CBIP bandwagon.
Motivation
Status Quo. Yes this was my biggest motivation. Our Daily jobs keep us pretty busy doing what we know best and trying to do it faster and quicker. Once you have spent a few years doing the same technology, you tend to get good at it. It puts you in a false sense of security. Also, trying to meet deadlines and making personal time, the thing you give up most is personal development.
For me it was a self-evaluation process to see where I stand, and if i can commit myself to learn an pass an exam.
Why CBIP
TWO WORDS -Technology Agnostic.
There is an endless list of tool based certification in the Data warehousing /Business Intelligence field offered by industry leaders like Microsoft, IBM, and Informatica. Every Certification has its merits. The CBIP Certification does not focus on any specific tool, but the fundamentals of BI and DW along with the IS Core basics. In my opinion, this would make it better if I am a hiring manager knowing that the candidate has strong fundamentals and quickly pick up on any tool thrown at him/her.
Without much ado, I will go into what it took to get the certification and what it meant for me.
Preparation Time
Depends. Really :)
If your background is BI and DW with a few years under your belt, I would say 6- 8 weeks at 7-8 hrs per week.  What I would suggest is spend a week going through the syllabus in TDWI Website, try to google the topics and see if you feel comfortable. Each exams has 6 to 7 main topics, if you are familiar with them then it’s safe to say that with some prep work you will be able to make it. If it seems very alien then you want to spent more time on that. Well that’s true for any exam isn’t it :)
Preparation Material
CBIP Exam Guide - Absolute must. Sets the context and topics. This is given for free if you attend TDWI and CBIP Exam Preparation Session, generally on Day 1 .However, if you plan to give the exam during a TDWI Conference, order it prior, it costs you 125 $.
IS Core - Hardest exam of all the 3 simply because the subject is too broad to “Study”.  I strongly recommend investing in the study material offered by DAMA for IS Core. This gives you a concise guide of the topics and sets boundaries. If you want to dive deep into any specific topic, google is your friend.

I started off on my own without the study guide but soon realized that topic is too vast to really "study" from an exam standpoint.
Data Warehousing Core - I found this exam to be pretty straight forward. I would think everyone who writes CBIP would feel the same way, since the intended audience definitely have a DW Background. Skimmed through the Data warehousing Toolkit by Kimball. There are lot of keywords in that book that you should be familiar with. Fact of the matter is this is an exam. There are lot of DW concepts that you practice, but you don’t use the "Terminology". This book will familiarize you with all of them.
Business Intelligence Analytics – Specialty Exam. I had hoped this would be an easy topic for me, but I was wrong. I really had to spent time preparing for this. Lot of concepts on Statistics, Analytics etc.
Books for Reference
Business Intelligence: The Savvy Manager's Guide by Morgan Kaufmann
Quantitative Methods for Business David R Anderson
The preparation for Analytics exam took me to Wikipedia countless times - Reading and understanding ton of statistical terms.
CBIP Exam preparation Class in TDWI - This would be a good finishing touch to your study. Typically on Day 1 of any TDWI Conference, gives you a game day prep.
Taking the Exam
You have proctored option to do this offsite or doing it as part of TDWI Conference. I chose the latter, Reason 1 being the exams were 50 $ cheaper, the 2nd being - in the scenario that you don’t clear you get an extra attempt for one exam. This is a feel good cushion. Lastly the CBIP class I mentioned before, it does really help you judge where you are.
Finally some blogs that gave me good ideas. Thank You for your contributions:
My Final Thoughts
Personally, the exam preparation helped me understand what I do not know .It was humbling to say the least. Gave me a better perspective of the BI Landscape and helped me delve into and read about topics that I knew never existed to even look for.
I know lot of people would ask - would this certification guarantee a job. I don't know. However, I am sure it would help you differentiate, and at the least take your resume to the top of the pile. Also all the learning would make you better articulate BI and may be make you appear a little smarter :)

PS - I will not discuss any specific questions as that would be against the policy.

Hello

Have been thinking for a long time to set up a blog where i can share what i have learned  in my tech world and interact with like minded folks in the community. So better late than never , as I start my journey into the Hadoop world ( technically a few months back .. oh well!)  

Coming from a warehousing BI background ,with SQL Drilled into your brain , it is a bit unnerving to step into the open source world. 

Through this blog i hope to document what i learn as i move forward and share my thoughts.