Awesome article. Read it!
Awesome article. Read it!
Organize your JSON objects:
Stop your REGEX headaches:
VirtualENV is an absolute requirement if you’re planning to do any serious development or collaboration in python. Essentially you can use it to create a standalone python install for a single project. First, it clones itself off of your source python installation (by default but you specify another install if you want). Then you can pip install right to the version of python you want. No more fussing with packages, install directories or what have you. I’m not going to produce yet another VENV tutorial, so see the links below for a lesson:
This is and oldie, and something obvious for anyone who uses python a lot. You’ll find that, many times, python installations conflict with one another and can cause a major headache. My personal best practice is as follows:
Measure your disk usage and other vital cluster stats using:
hadoop dfsadmin -report
hdfs dfs -du -h /user/hive/warehouse
“((www\\.|(http|https|ftp|news|file)+\\:\\/\\/)[_.a-z0-9-]+\\.[a-z0 9\\/_:@=.+?,##%&~-]*[^.|\\'|\\# |!|\\(|?|,| |>|<|;|\\)])”
use with any R regex function, perl = T.
Connecting to a MS SQL server. What an annoying process if you’re running Linux or Mac OSX. Looking past the fact that using MS SQL to me feels sort of odd these days connecting is non-the-less required. Ultimately our goal is to establish an ODBC connection to the database we can use to then leverage RODBC or pyODBC. Here I’ll cover the steps required to install all the necessary parts.
I’m going to cover the Mac OSX operations first as they require one more crucial step. From here it will be easy to cover the linux setup as well.
brew install unixodbc
This is, assuming you have homebrew installed. If you don’t there are plenty of guides out there to help with that. Note that homebrew dumps all its install files in
/usr/local/Cellar. So when you install unixODBC you should see a folder appear in the cellar folder.
brew install freetds --with-unixodbc
As you might have guessed this installs FreeTDS. The with argument is crucial as it sets up the links between FreeTDS and unixODBC, meaning that FreeTDS will know where to look for the ODBC drivers. If you don’t do this it will drive you a little crazy and you’ll be forced to move config files around a bit.
Step 3: Setup freetds.conf
Now we need to set up the config file for freetds (using homebrew found in
/usr/local/Cellar/freetds/version.number/etc). Go to this location and open the file in the editor of your choice (I’m using VIM so
The the config setup should look like:
host = destination ip
port = 1433
tds version = 7 (or 8 depending on your MS SQL server)
Input your information ass appropriate and close and save the file. (in vim hit
i to enter insert mode. When done hit
ESC to exit insert mode and then write/quit:
:wq and hit enter)
[EXNAME] represents the local DNS name we will establish for our destination server. You will see it referenced in the unixODBC config files and in connections strings.
Step 4: Test the TDS setup
To test the TDS setup we are going to try to connect to the server viz a stripped down SQL tool known as tsql. This comes with FreeTDS. To test the connection:
tsql -S EXNMAE -U MyUserName -P MyPassWord
Hit enter. You should see a tsql prompt:
locale is .....
locale charset is .....
This means the connection was successful. To exit tsql:
Step 4b: Some deeper testing
This is optional. If you want to actually try to get some data out of the DB before setting up unixODBC the easiest way is actually to pipe a small table into a local file. Note that this step requires knowing the database name, a table name and so on. Also pick a small table.
freebcp MyDatabaseName.dbo.MyTableName out ~/foo.test -c -t '|' -S EXNAME:1433 -U MyUserName -P MyPassWord
Now, if everything (including your DB access permissions) is working you should see data in the file
~/foo.test (try looking with less:
less ~/foo.test )
Step 5: Setup unixODBC .ini files
Ok now we need to setup the unixODBC files. Before we leave the FreeTDS folder though we’ll want to note the location of the tds drivers, specifically
Check to make sure they are located in
/usr/local/Cellar/freetsd/version.number/lib and note this filepath. In fact change directory to that location and check the permissions on the dirver
la -la ./libtdsodbc.so and make sure it is user executable. If it isn’t make it (
chmod to the number of your choice.
With this directory noted (you’ll need it later), change directory to the unixODBC install location and navigate to the
.ini file location:
You should see two files,
odbcinst.ini. Let’s start with
odbcinst.ini because we will need to reference it in
Description = FreeTDS
Driver = /usr/local/Cellar/freetsd/version.number/lib/libtdsodbc.so
UsageCount = 1
Now we will point
odbc.ini to the
[FreeTDS] object we just created.
Driver = FreeTDS
Description = ODBC INI FILE
ServerName = EXNAME
UID = MyUserName
PWD = MyPassWord
Essentially everything is exactly the same. This filepaths will obviously be different as you won’t be using brew. One thing to point out is that ODBC.ini will be a little different.
Description = FreeTDS
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
UsageCount = 1
Again, you will need to make sure to change the file permissions on both libtdsodbc.so and libtdsS.so to make them executable.
You should now be able to connect with isql,
isql DNS MyUserName MyPassWord and actually run some queries. Additionally, things like RODBC and pyODBC will now work file. If you run into any issues there is probably a typo somewhere or things are in the wrong place. To test run oslq,
osql -S DNS -U MyUserName -P MyPassWord it will essentially tell you where you messed up.
1.) Use Hue (see http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/)
2.) scp file from local to cluster:
scp foo.txt email@example.com:
2a.) Then ude hdfs dfs -put (or -copyFromLocal)
3.) echo it via ssh (old school but so cool):
you@host-machine$ cat somefile | ssh hadoop-user@vm-ip-addr \
"hadoop/bin/hadoop fs -put - destinationfile
Every few days (hopefully) I’m going to start to post QuickTips (the $ is a joke that we’ll pick up on in the very fist one). These will be all about random stuff.
Ok so tip number one. You’re working in the shell and let’s face it, your computer’s folder structure is, well, horrible. You know deep down that if you realized how long it has taken you to just
cd to folders and
cp files from one place to another you’d slip into a deep state of technological depression and revet to storing things in a filing cabinet. I mean who wants to have to type this
cd /Users/me/myfolders/myfolders2/code/project1 (I hope you realize I’m exaggerating). No One!
The tip for today? Make custom environment variables for your frequently visited places:
user$ export MYPATH= /Users/me/myfolders/myfolders2/code/project1
user$ echo $MYPATH
user$ cd $MYPATH
Call all your environment variables with a ‘$’ (this is the joke, $QuickTip, get it?). Also one last thing of note is to remove a variable:
user$ unset $MYPATH
All for now!