Always a Student, Always Learning

Lessons from the martial arts world

There's this saying that jiu-jitsu practitioners say now and then that "a black belt is just a white belt that never stopped learning". That rings very true with me and the way I approach life. I don't claim to be an expert in anything. I am a lifelong student and will try my best to continue learning things day by day.

But what does this all mean really? Am I constantly just learning new things and not polishing my expertise in something? Well, no. I mean to say that there will always be something to learn as long as you allow yourself to learn. Going along with this martial arts theme, there is also this phrase that gets said; and that is to "empty your cup". If you consider yourself to be an expert in anything, do you just sit back and stop learning about whatever field you are an expert in? No. Always look to innovate and always look to improve.

Teaching is also learning

So Katie Cunningham gave this wonderful closing for PyGotham this year. Hopefully the video of it will be up some time. It was truly wonderful. But my takeaways from it were more of a reinforcement of what I've already been trying to do: speak more and teach more. I'm quite the introvert so I tend to shy away from conversation. But, I challenge myself to speak to new people and speak more at conferences and meetups. For an introvert, this is extremely scary. But, not only do I try and speak more, I am trying to be a better teacher. Okay, so not your typical classroom teacher, but a teacher in the sense that I am knowledge transferring something to someone else. So for instance, obviously at work we want more cross-functional teams and more open collaboration, so any of my work should be easily picked up by any of my team mates. But in order for me to do so, I need to be a good teacher. Not a teacher that just says, "here's some code, RTFM now!".

I've noticed that I definitely have found myself seeking opportunities to improve my teaching skills. Whether it be a small chit chat with a colleague on what's new in the world of tech or just being a good role model for my kids, I am pushing for a very easy, relaxed conversation that everyone will enjoy with no fear, no pressure.

Phrasing matters! One thing that I've taken extra care in doing so, is the way I will converse with others. For instance, publicly shaming someone is never a good idea. "This piece of code is wrong" versus "Can you tell me how this piece of code works? I think something looks odd here" is a good example of the different language choices you can make to sound more amiable.

Student of life

Knowing that there will always be something new to learn is always reassuring for me. For that very reason is why I enjoy being an engineer; technology is always changing and I am there learning to keep up. But obviously, occupation isn't the only part of your life where you'll constantly learn; growing up is really just a continuous learning process. Constantly learn and tune those dials, you too, are a student of life, learning all sorts of new things daily.

By Adrian Cruz | Published July 31, 2016, 5:02 a.m. | Permalink | tags: engineering advice, pygotham

Writing Out Files & Python UnicodeEncodeError Woes

A very common headache that I am sure every engineer has had to face at least one time in their life is character encoding. Oh, yes that fun topic! No, I do not have a solution for everyone. Sorry! But, in case you are in the Python world like I am and you are writing out files and getting a bunch of UnicodeEncodeError, well try the following below that I have.

tl;dr Show me an example!

import codecs

with codecs.open(filename, 'w', encoding='latin-1') as outfile:
    outfile.write('{}\n'.format(json.dumps(data, encoding='latin-1')))

So, what? And why?

Okay, so I am writing a latin-1 encoded file. Yes, latin-1. Why? Well, I've chosen latin-1 here because latin-1 was giving me issues, so there! But really, if you want to write to a different encoding, obviously just swap that out.

But the long explanation, if you were curious, is that I am reading in data that was latin-1 encoded and it was making my data ingestion jobs fail because I default to utf-8. The json.dumps() bit is actually not needed if you are not working with json (obviously!). But, I wanted to point out that in case you were writing json, you also need to set the encoding to whatever you choose there as well. It is currently on my TODO list to see why that is the case.

By Adrian Cruz | Published March 21, 2016, 9:50 p.m. | Permalink | tags: python

Intro to Building Out Data Pipelines With Python and Luigi

A very common question that I have been getting asked is, "Luigi? What's that?". Well, my answer that I usually give, in brief, is that it is a project open sourced by Spotify to facilitate workflow and dependencies. But to quote the Luigi ReadTheDocs page:

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

Luigi Tasks & Targets

The main pieces of Luigi are built around a Task and a Target. A Task is exactly what you would think it would be, it's a single task in your data pipeline. So for example, you may have a task that reads in some csv file and pull out specific values that you want from the file. The Target, is the intended output for your Task. So, back to our csv file example, an output() Target for that task may be a cleaned up csv file with the values you wanted.

Here Is Our Example Task

import luigi


class MyTask(luigi.Task):
    def output(self):
         return luigi.LocalTarget('my_output.csv')

    def run(self):
        with open('input.csv', 'r') as input:
            cat_count = 0
            for line in input.readlines():
                animal, age, color = line.split(',')
                if animal == 'cat':
                    cat_count += 1

        with self.output().open('w') as out_file:
             out_file.write(cat_count)

That is a [dumb] simple Task. This task has no requirements. All that it does is read in a file, parse it and write out its output to another file.

A few tings to note are the importance of output(). Luigi checks to see if the output() exists to check if this task is complete. That is Luigi's definition of complete. You can also override complete() if you do not have an output, but for now just think that every task needs an output.

"So, what is this good for?"

So one big thing that I purposely left out when describing Luigi, is that it integrates really well into the whole Hadoop ecosystem! So, now let's take a step back and think about how we would process these batch data processing jobs without Luigi...

Let's say we have several jobs that need to be accomplished in order for the task you want to be considered complete. So for example, maybe we have a need to process some data that we have found in some log files. The log files are currently stored in S3, so we'll have a job to fetch those locally. Maybe, the logs need to be cleaned up a little bit, so we will do whatever filtering and et cetera transformations we'll need to do to cleanse the data. Next, we'll want the newly formatted data logs loaded into HDFS. After that, we can utilize Hive, so we'll want to create a table with those logs.

Data jobs in the past

So in the past, we would just have these several jobs run in several cron jobs. But wait, depending on how much data we're working with, these jobs will have varying length of time to complete! So, the best you can do is see how long these jobs run and schedule them accordingly. So for example, we know the data is pulled down from S3 within ~20 minutes, so we'll schedule the next job 30 minutes later, and do the same thing for the remaining jobs as well.

Now, with Luigi

With Luigi, you create dependency chains for jobs very easily by overriding the requires() method. Now when you define your entire process, you want it to run in this order: Task0->Task1->Task2->Task3. So, you can now say that Task3 requires Task2 to run, Task2 requires Task1 to run, etc. etc. This looks like the following:

class Task3(luigi.Task):
    def requires(self):
        return [Task2()]
    
    """
    Other core code would go here as well, like run(), output()...

    """
Now, you have one single point to schedule and no need to guess when each job runs! Pretty neat right? :)

This is obviously just an introduction. I've only touched the surface about Luigi. But, if you need to build out data pipelines and enjoy doing so in Python, I highly recommend checking out Luigi! Cheers!

By Adrian Cruz | Published May 31, 2015, 7:57 p.m. | Permalink | tags: big data, hadoop, luigi, python

Squashing Git Commits Is Great Code Etiquette

Whether you collaborate with a team of folks or just by yourself, having tidy git commits are useful in being able to search through your own git log.

Take this git log example:

[[email protected] tmp_blog]$ git log
commit eeb0dae575e2ce9f055625f367c802060838d584
Author: Adrian
Date: Mon Apr 6 15:50:06 2015 -0400

added comments for decorator

commit 12cb1bdb3278c410cc21842c95da1066b3386b21
Author: Adrian
Date: Mon Apr 6 15:49:25 2015 -0400

added working decorator

commit 690b9668ed0a2f205ac0a535c89f089c4e2cca7b
Author: Adrian
Date: Mon Apr 6 15:33:54 2015 -0400

WIP: testing a decorator

commit 34eeee492343aad2936d1b89eff92ee04b818aef
Author: Adrian
Date: Mon Apr 6 15:27:53 2015 -0400

initial commit with README

I want to tidy things up before I put in a pull request, because if I were the person reviewing my code, I wouldn't want to have all of those commits to look through. I also want to keep the git log clean so that when myself or any other engineer comes and looks to see the history, there will be a definitive point where I can say, "okay, this is the commit from this pull request".

The basic steps that I typically do are as follows git rebase -i {COMMIT_TO_REBASE_AFTER} and then edit the git commit during the interactive rebase.

So, for my example, I want to rebase after my initial commit: [[email protected] tmp_blog]$ git rebase -i 34eeee492343aad2936d1b89eff92ee04b818aef

A menu comes up similar to the following:

pick 690b966 WIP: testing a decorator
pick 12cb1bd added working decorator
pick eeb0dae added comments for decorator

# Rebase 34eeee4..eeb0dae onto 34eeee4
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#

It is hopefully self explanatory. I am going to reword one commit and fixup the rest so I can have one nice, detailed commit.

r 690b966 WIP: testing a decorator
f 12cb1bd added working decorator
f eeb0dae added comments for decorator

And now, my git log is nice and tidy!

[[email protected] tmp_blog]$ git log
commit 477232a042c308fff5e771757616130603207d85
Author: Adrian
Date: Mon Apr 6 15:33:54 2015 -0400

Added tmp.py with a decorator

decorator has awesome decorator-functionality


commit 34eeee492343aad2936d1b89eff92ee04b818aef
Author: Adrian
Date: Mon Apr 6 15:27:53 2015 -0400

initial commit with README

By Adrian Cruz | Published April 6, 2015, 7:56 p.m. | Permalink | tags: git, source control management

Fighting Procrastination With Time Management

The Time Is Now

Now is always the best time to get something done and out of the way. Obviously, this means if there are no blockers. Blockers here are major tasks that take priority over the task you are trying to schedule.

I used to find myself saying to myself, "I'm going to get {insert personal project here} done this year". And often times, I'll push it aside and not set aside enough time to even start working on it! I'm sure we're all good with our work projects because we have responsibilities to hold up to --at least, I hope you're all good with managing work projects! But, let's explore our productivity outside of the workplace.

Non-work-related Technical Projects

As any passionate technologist, I love exploring and learning new things. Obviously, one way to feed this fire is to pick up a personal project. Want to learn a new programming language? Want to see what all the fuss is about that database system that you heard about? Well, why don't you go ahead and start poking around with it? "Work is busy!" Yes, trust me I know! "My children take up a lot of my time!" Okay, that's a tough one, but not too difficult. "So, how", you ask? A bit of time management magic of course!

Cut Some Fat and Get Into a Routine

I hate to sound like one of those personal trainers selling 10-minute abs, but eh, whatever. Hopefully, you'll get the point!

Cutting the fat is cutting out those unwanted time killers. Does watching that re-run of {insert television show here} do anything for you? Outside of entertainment, probably not. The time you can pick up from cutting down on TV time is amazing. If you really need some of that TV time (we all do), you can reward yourself with it after you've done X amount of productive work.

Start a routine! Try and start a habit of slicing up parts of your day to work on your project. Take for example, I set aside roughly one hour after work, 2-3 times a week to get out of the office and find a cafe to cowork at. Yes, there are some weeks where work is horribly busy and I won't do as much personal work, but you get the point; you're slicing up your time to find an hour here and there to get in the groove of your project.

A small tip from me: coworking in a cafe works wonders. If you're like me and catch yourself getting out of focus at the privacy of your own home (Youtube playlists, online shopping, etc.), working in a cafe helps out because it gets you out of your comfort zone (literally) and makes you more aware of your surroundings. So for instance, I don't want to be that guy who is watching some cat video on Youtube in a public place (or maybe sometimes I do).

Make Small Attainable Goals

Smaller, means greater chance of feeling productive. Take for example a goal of, "building a web app". That can be thought of as a pretty big task and you may lose focus along the way. Let's still keep that goal in mind, but when you find a slice of time, have a smaller attainable goal in mind. So, you're taking one hour after work; in that hour, a simple goal for me would be something like, testing out a framework you think you'd like to use. It's as simple as that as long as you're heading towards your ultimate goal.

Do What Works For YOU

I'm sure you've found yourself saying, "this won't work for me, because of X". Well, see what does work for you. This isn't a static formula that you can use out of the box. Working at a cafe works for me because well, they're pretty much everywhere close to my work; maybe the library is what works for you. All I'm saying is that with a little bit of self-motivation and time management you can be more productive with work, personal work, et cetera.

Cheers!

By Adrian Cruz | Published Nov. 3, 2014, 5:05 a.m. | Permalink | tags: time management