$QuickTip: Keep your stuff neat. Using VirtualENV…

VirtualENV is an absolute requirement if you’re planning to do any serious development or collaboration in python. Essentially you can use it to create a standalone python install for a single project. First, it clones itself off of your source python installation (by default but you specify another install if you want). Then you can pip install right to the version of python you want. No more fussing with packages, install directories or what have you. I’m not going to produce yet another VENV tutorial, so see the links below for a lesson:

http://docs.python-guide.org/en/latest/dev/virtualenvs/
http://hackercodex.com/guide/python-development-environment-on-mac-osx/

This is and oldie, and something obvious for anyone who uses python a lot. You’ll find that, many times, python installations conflict with one another and can cause a major headache. My personal best practice is as follows:

  1. Make sure you have a clean version of python (2.7.+) installed
  2. Install easy install (python setup-tools) and pip
  3. pip install numpy, scipy, scikit, and matplotlib
  4. NEVER touch this install again aside from two conditions, a.) you need to upgrade python, b.) you encounter a package that is a pain to install on a virtual environment (like scikit and matplotlib)
  5. create virtual environments on a project by project basis, pip install packages to these VENVs as needed.

 

 

Install and use FreeTDS, unixODBC and RODBC (or pyODBC)

Connecting to a MS SQL server. What an annoying process if you’re running Linux or Mac OSX. Looking past the fact that using MS SQL to me feels sort of odd these days connecting is non-the-less required. Ultimately our goal is to establish an ODBC connection to the database we can use to then leverage RODBC or pyODBC. Here I’ll cover the steps required to  install all the necessary parts.

Running on a Mac

I’m going to cover the Mac OSX operations first as they require one more crucial step. From here it will be easy to cover the linux setup as well.

Step 1: brew install unixodbc

This is, assuming you have homebrew installed. If you don’t there are plenty of guides out there to help with that. Note that homebrew dumps all its install files in /usr/local/Cellar. So when you install unixODBC you should see a folder appear in the cellar folder.

Step 2: brew install freetds --with-unixodbc

As you might have guessed this installs FreeTDS. The with argument is crucial as it sets up the links between FreeTDS and unixODBC, meaning that FreeTDS will know where to look for the ODBC drivers. If you don’t do this it will drive you a little crazy and you’ll be forced to move config files around a bit.

Step 3: Setup freetds.conf

Now we need to set up the config file for freetds (using homebrew found in /usr/local/Cellar/freetds/version.number/etc). Go to this location and open the file in the editor of your choice (I’m using VIM so vim freetds.conf)

The the config setup should look like:


[EXNAME]
host = destination ip
port = 1433
tds version = 7 (or 8 depending on your MS SQL server)

Input your information ass appropriate and close and save the file. (in vim hit i to enter insert mode. When done hit ESC to exit insert mode and then write/quit: :wq and hit enter)

The [EXNAME] represents the local DNS name we will establish for our destination server. You will see it referenced in the unixODBC config files and in connections strings.

Step 4: Test the TDS setup

To test the TDS setup we are going to try to connect to the server viz a stripped down SQL tool known as tsql. This comes with FreeTDS. To test the connection:

tsql -S EXNMAE -U MyUserName -P MyPassWord

Hit enter. You should see a tsql prompt:


locale is .....
locale charset is .....
using default....
1>

This means the connection was successful. To exit tsql: 1> exit

Step 4b: Some deeper testing

This is optional. If you want to actually try to get some data out of the DB before setting up unixODBC the easiest way is actually to pipe a small table into a local file. Note that this step requires knowing the database name, a table name and so on. Also pick a small table.


freebcp MyDatabaseName.dbo.MyTableName out ~/foo.test -c -t '|' -S EXNAME:1433 -U MyUserName -P MyPassWord

Now, if everything (including your DB access permissions) is working you should see data in the file ~/foo.test (try looking with less: less ~/foo.test )

Step 5: Setup unixODBC .ini files

Ok now we need to setup the unixODBC files. Before we leave the FreeTDS folder though we’ll want to note the location of the tds drivers, specifically libtdsodbc.so.

Check to make sure they are located in /usr/local/Cellar/freetsd/version.number/lib and note this filepath. In fact change directory to that location and check the permissions on the dirver la -la ./libtdsodbc.so and make sure it is user executable. If it isn’t make it (chmod to the number of your choice.

With this directory noted (you’ll need it later), change directory to the unixODBC install location and navigate to the .ini file location: /usr/local/Cellar/unixodbc/version.number/etc

You should see two files, odbc.ini and odbcinst.ini. Let’s start with odbcinst.ini because we will need to reference it in odbc.ini.

[FreeTDS]
Description = FreeTDS
Driver = /usr/local/Cellar/freetsd/version.number/lib/libtdsodbc.so
Setup= /usr/local/Cellar/freetsd/version.number/lib/libtdsodbc.so
UsageCount = 1

Now we will point odbc.ini to the [FreeTDS] object we just created.

[EXNAME]
Driver = FreeTDS
Description = ODBC INI FILE
ServerName = EXNAME
UID = MyUserName
PWD = MyPassWord

Running on Linux (Ubuntu 12.04 and 14.04 LTS)

Essentially everything is exactly the same. This filepaths will obviously be different as you won’t be using brew. One thing to point out is that ODBC.ini will be a little different.


[FreeTDS]
Description = FreeTDS
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup= /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
UsageCount = 1

Again, you will need to make sure to change the file permissions on both libtdsodbc.so and libtdsS.so to make them executable.

The end

You should now be able to connect with isql, isql DNS MyUserName MyPassWord and actually run some queries. Additionally, things like RODBC and pyODBC will now work file. If you run into any issues there is probably a typo somewhere or things are in the wrong place. To test run oslq, osql -S DNS -U MyUserName -P MyPassWord it will essentially tell you where you messed up.

Enjoy.

Using Hadoop: Moving data onto HDFS

1.) Use Hue (see http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/)

2.) scp file from local to cluster:scp foo.txt hadoop-user@192.168.190.128:

2a.) Then ude hdfs dfs -put (or -copyFromLocal)

3.) echo it via ssh (old school but so cool): you@host-machine$ cat somefile | ssh hadoop-user@vm-ip-addr \
"hadoop/bin/hadoop fs -put - destinationfile

cool man.

$QuickTip: BASH enviro vars. March 31 2014

Every few days (hopefully) I’m going to start to post QuickTips (the $ is a joke that we’ll pick up on in the very fist one). These will be all about random stuff.

Ok so tip number one. You’re working in the shell and let’s face it, your computer’s folder structure is, well, horrible. You know deep down that if you realized how long it has taken you to just cd to folders and cp files from one place to another you’d slip into a deep state of technological depression and revet to storing things in a filing cabinet. I mean who wants to have to type this cd /Users/me/myfolders/myfolders2/code/project1 (I hope you realize I’m exaggerating). No One!

The tip for today? Make custom environment variables for your frequently visited places:


user$ export MYPATH= /Users/me/myfolders/myfolders2/code/project1
user$ echo $MYPATH
user$ cd $MYPATH

Call all your environment variables with a ‘$’ (this is the joke, $QuickTip, get it?). Also one last thing of note is to remove a variable:

user$ unset $MYPATH

All for now!