Elixir Conditionals with Function Signatures

On Learning Elixir

So, I've been learning and doing a bit more Elixir lately. I'm only a couple months in, but Elixir has been the primary language I have been coding in at work. Happy days!

This is not an introductory write-up nor a tutorial, but rather a quick look at how conditionals and guards are done in Elixir. Onward.

Let's Take a Look at FizzBuzz

So, yes, FizzBuzz; the classic programming problem and still a favourite interview question. As a quick refresher, this game is played by counting from one through n, but for multiples of three, we'll have "Fizz", multiples of five, "Buzz", and multiples of both, "FizzBuzz". I think that is the simplest I can word it.

Let's have a look at an example in Python first.

def fizz_buzz(n):
    if 0 == (n % 3) and 0 == (n % 5):
        return "FizzBuzz"
    elif 0 == (n % 3):
        return "Fizz"
    elif 0 == (n % 5):
        return "Buzz"
    else:
        return str(n)
There is nothing crazy going on here, the logic is simple and it does exactly what we need it to. It uses the usual if/else logic that you would expect.

So, now let's look at a solution in Elixir.

def fizz_buzz(n) when 0 === rem(n, 3) and 0 === rem(n, 5) do
  "FizzBuzz"
end
def fizz_buzz(n) when 0 === rem(n, 3), do: "Fizz"
def fizz_buzz(n) when 0 === rem(n, 5), do: "Buzz"
def fizz_buzz(n), do: n
If you've never done any Elixir before, I'm sure you may be a bit confused.

Function Signatures as Conditionals

So, let me first say that, yes, if/else does exist in Elixir. "Then, why isn't it used here at all?", you ask? Well, Elixir has a good foundation of having their functions be explicit in what they do and being able to pipe your code is very useful. I won't write much about the pipe operator here, but if you don't know much about it in Elixir, I highly recommend learning about it.

Guards

So what we see in the Elixir FizzBuzz is an overloaded fizz_buzz/1 function. As I'm sure you've looked at it and studied it a bit by now, yes, that is being done in lieu of if/else. The important piece is the when part of the function signature which is called a guard. Using guards, we now have logic for which function we want to match on when we pass in the variable n. So now we see that each function signature matches up exactly to what we have done in Python with if/elif/else logic. Pretty neat eh?

By Adrian Cruz | Published March 14, 2017, 7:06 p.m. | Permalink | tags: elixir

Always a Student, Always Learning

Lessons from the martial arts world

There's this saying that jiu-jitsu practitioners say now and then that "a black belt is just a white belt that never stopped learning". That rings very true with me and the way I approach life. I don't claim to be an expert in anything. I am a lifelong student and will try my best to continue learning things day by day.

But what does this all mean really? Am I constantly just learning new things and not polishing my expertise in something? Well, no. I mean to say that there will always be something to learn as long as you allow yourself to learn. Going along with this martial arts theme, there is also this phrase that gets said; and that is to "empty your cup". If you consider yourself to be an expert in anything, do you just sit back and stop learning about whatever field you are an expert in? No. Always look to innovate and always look to improve.

Teaching is also learning

So Katie Cunningham gave this wonderful closing for PyGotham this year. Hopefully the video of it will be up some time. It was truly wonderful. But my takeaways from it were more of a reinforcement of what I've already been trying to do: speak more and teach more. I'm quite the introvert so I tend to shy away from conversation. But, I challenge myself to speak to new people and speak more at conferences and meetups. For an introvert, this is extremely scary. But, not only do I try and speak more, I am trying to be a better teacher. Okay, so not your typical classroom teacher, but a teacher in the sense that I am knowledge transferring something to someone else. So for instance, obviously at work we want more cross-functional teams and more open collaboration, so any of my work should be easily picked up by any of my team mates. But in order for me to do so, I need to be a good teacher. Not a teacher that just says, "here's some code, RTFM now!".

I've noticed that I definitely have found myself seeking opportunities to improve my teaching skills. Whether it be a small chit chat with a colleague on what's new in the world of tech or just being a good role model for my kids, I am pushing for a very easy, relaxed conversation that everyone will enjoy with no fear, no pressure.

Phrasing matters! One thing that I've taken extra care in doing so, is the way I will converse with others. For instance, publicly shaming someone is never a good idea. "This piece of code is wrong" versus "Can you tell me how this piece of code works? I think something looks odd here" is a good example of the different language choices you can make to sound more amiable.

Student of life

Knowing that there will always be something new to learn is always reassuring for me. For that very reason is why I enjoy being an engineer; technology is always changing and I am there learning to keep up. But obviously, occupation isn't the only part of your life where you'll constantly learn; growing up is really just a continuous learning process. Constantly learn and tune those dials, you too, are a student of life, learning all sorts of new things daily.

By Adrian Cruz | Published July 31, 2016, 5:02 a.m. | Permalink | tags: engineering advice, pygotham

Writing Out Files & Python UnicodeEncodeError Woes

A very common headache that I am sure every engineer has had to face at least one time in their life is character encoding. Oh, yes that fun topic! No, I do not have a solution for everyone. Sorry! But, in case you are in the Python world like I am and you are writing out files and getting a bunch of UnicodeEncodeError, well try the following below that I have.

tl;dr Show me an example!

import codecs

with codecs.open(filename, 'w', encoding='latin-1') as outfile:
    outfile.write('{}\n'.format(json.dumps(data, encoding='latin-1')))

So, what? And why?

Okay, so I am writing a latin-1 encoded file. Yes, latin-1. Why? Well, I've chosen latin-1 here because latin-1 was giving me issues, so there! But really, if you want to write to a different encoding, obviously just swap that out.

But the long explanation, if you were curious, is that I am reading in data that was latin-1 encoded and it was making my data ingestion jobs fail because I default to utf-8. The json.dumps() bit is actually not needed if you are not working with json (obviously!). But, I wanted to point out that in case you were writing json, you also need to set the encoding to whatever you choose there as well. It is currently on my TODO list to see why that is the case.

By Adrian Cruz | Published March 21, 2016, 9:50 p.m. | Permalink | tags: python

Intro to Building Out Data Pipelines With Python and Luigi

A very common question that I have been getting asked is, "Luigi? What's that?". Well, my answer that I usually give, in brief, is that it is a project open sourced by Spotify to facilitate workflow and dependencies. But to quote the Luigi ReadTheDocs page:

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

Luigi Tasks & Targets

The main pieces of Luigi are built around a Task and a Target. A Task is exactly what you would think it would be, it's a single task in your data pipeline. So for example, you may have a task that reads in some csv file and pull out specific values that you want from the file. The Target, is the intended output for your Task. So, back to our csv file example, an output() Target for that task may be a cleaned up csv file with the values you wanted.

Here Is Our Example Task

import luigi


class MyTask(luigi.Task):
    def output(self):
         return luigi.LocalTarget('my_output.csv')

    def run(self):
        with open('input.csv', 'r') as input:
            cat_count = 0
            for line in input.readlines():
                animal, age, color = line.split(',')
                if animal == 'cat':
                    cat_count += 1

        with self.output().open('w') as out_file:
             out_file.write(cat_count)

That is a [dumb] simple Task. This task has no requirements. All that it does is read in a file, parse it and write out its output to another file.

A few tings to note are the importance of output(). Luigi checks to see if the output() exists to check if this task is complete. That is Luigi's definition of complete. You can also override complete() if you do not have an output, but for now just think that every task needs an output.

"So, what is this good for?"

So one big thing that I purposely left out when describing Luigi, is that it integrates really well into the whole Hadoop ecosystem! So, now let's take a step back and think about how we would process these batch data processing jobs without Luigi...

Let's say we have several jobs that need to be accomplished in order for the task you want to be considered complete. So for example, maybe we have a need to process some data that we have found in some log files. The log files are currently stored in S3, so we'll have a job to fetch those locally. Maybe, the logs need to be cleaned up a little bit, so we will do whatever filtering and et cetera transformations we'll need to do to cleanse the data. Next, we'll want the newly formatted data logs loaded into HDFS. After that, we can utilize Hive, so we'll want to create a table with those logs.

Data jobs in the past

So in the past, we would just have these several jobs run in several cron jobs. But wait, depending on how much data we're working with, these jobs will have varying length of time to complete! So, the best you can do is see how long these jobs run and schedule them accordingly. So for example, we know the data is pulled down from S3 within ~20 minutes, so we'll schedule the next job 30 minutes later, and do the same thing for the remaining jobs as well.

Now, with Luigi

With Luigi, you create dependency chains for jobs very easily by overriding the requires() method. Now when you define your entire process, you want it to run in this order: Task0->Task1->Task2->Task3. So, you can now say that Task3 requires Task2 to run, Task2 requires Task1 to run, etc. etc. This looks like the following:

class Task3(luigi.Task):
    def requires(self):
        return [Task2()]
    
    """
    Other core code would go here as well, like run(), output()...

    """
Now, you have one single point to schedule and no need to guess when each job runs! Pretty neat right? :)

This is obviously just an introduction. I've only touched the surface about Luigi. But, if you need to build out data pipelines and enjoy doing so in Python, I highly recommend checking out Luigi! Cheers!

By Adrian Cruz | Published May 31, 2015, 7:57 p.m. | Permalink | tags: big data, hadoop, luigi, python

Squashing Git Commits Is Great Code Etiquette

Whether you collaborate with a team of folks or just by yourself, having tidy git commits are useful in being able to search through your own git log.

Take this git log example:

[[email protected] tmp_blog]$ git log
commit eeb0dae575e2ce9f055625f367c802060838d584
Author: Adrian
Date: Mon Apr 6 15:50:06 2015 -0400

added comments for decorator

commit 12cb1bdb3278c410cc21842c95da1066b3386b21
Author: Adrian
Date: Mon Apr 6 15:49:25 2015 -0400

added working decorator

commit 690b9668ed0a2f205ac0a535c89f089c4e2cca7b
Author: Adrian
Date: Mon Apr 6 15:33:54 2015 -0400

WIP: testing a decorator

commit 34eeee492343aad2936d1b89eff92ee04b818aef
Author: Adrian
Date: Mon Apr 6 15:27:53 2015 -0400

initial commit with README

I want to tidy things up before I put in a pull request, because if I were the person reviewing my code, I wouldn't want to have all of those commits to look through. I also want to keep the git log clean so that when myself or any other engineer comes and looks to see the history, there will be a definitive point where I can say, "okay, this is the commit from this pull request".

The basic steps that I typically do are as follows git rebase -i {COMMIT_TO_REBASE_AFTER} and then edit the git commit during the interactive rebase.

So, for my example, I want to rebase after my initial commit: [[email protected] tmp_blog]$ git rebase -i 34eeee492343aad2936d1b89eff92ee04b818aef

A menu comes up similar to the following:

pick 690b966 WIP: testing a decorator
pick 12cb1bd added working decorator
pick eeb0dae added comments for decorator

# Rebase 34eeee4..eeb0dae onto 34eeee4
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#

It is hopefully self explanatory. I am going to reword one commit and fixup the rest so I can have one nice, detailed commit.

r 690b966 WIP: testing a decorator
f 12cb1bd added working decorator
f eeb0dae added comments for decorator

And now, my git log is nice and tidy!

[[email protected] tmp_blog]$ git log
commit 477232a042c308fff5e771757616130603207d85
Author: Adrian
Date: Mon Apr 6 15:33:54 2015 -0400

Added tmp.py with a decorator

decorator has awesome decorator-functionality


commit 34eeee492343aad2936d1b89eff92ee04b818aef
Author: Adrian
Date: Mon Apr 6 15:27:53 2015 -0400

initial commit with README

By Adrian Cruz | Published April 6, 2015, 7:56 p.m. | Permalink | tags: git, source control management