Merge Sort

Yesterday I decided to read up on sorting algorithms, a topic I’d previously assumed would be over my head. It turns out I was completely wrong, and I had a lot of fun writing my own merge sort algorithm!

  def merge(a,b)
    sorted_list=[]
    while !a.empty? && !b.empty?
      if a.first < b.first
        sorted_list << a.shift
      elsif a.first > b.first
        sorted_list << b.shift
      elsif a.first==b.first
        sorted_list< < a.shift
        b.shift
      end
    end
      ((sorted_list<<a)<<b).flatten!
  end

  def sort(list)
    if list.length <=1
      return list
    else
      middle = (list.length/2)-1
      a=list[0..middle]
      b=list[middle+1..list.length]
      merge(sort(a),sort(b))
    end
  end

How does it work?

Mergesort can be broken down into two parts, which are here divided into two different methods.

The merge method accepts two ordered lists, and combines them into one ordered list by comparing the first value of each of these two input lists and sequentially moving the lower value into a new list:

But if all we have is one unordered list which needs to be sorted, how will a method requiring two ordered lists help us? If we had two ordered lists that each contained half of the numbers of the original list, we could merge the two together to create one ordered list. We can start by breaking our single unordered list in half at the center of the list. Now have two unordered lists half the length of the original, but what we want is two ordered lists. We are faced with the same problem as before: our two unordered lists could be ordered if only we had two smaller ordered lists. If we continue to halve our lists further and further, we can eventually isolate each element of our large unsorted list, resulting in many lists containing only one element. A list containing only one element is inherrently ordered, so once we have broken down our original list to this level, we can begin to merge our many little lists back together.

The process of splitting the unsorted list in half recursively is very similar to a binary tree. In terms of the call stack, the mergesort algorithm decends all the way down each branch of the tree, one at a time, splititng lists at nodes as necessary. Although the original unsorted list is immediately split, the second resultant half-list isn’t touched again until sorting of the first half-list is complete, which may require further splitting, decending and merging. This is because of the way sort is recurssively called within itself: merge(sort(a),sort(b)), where sort(a) must be complete before sort(b). List a is now the input list being passed to sort(), and is itself split into component lists a and b, on which sort(a) and sort(b) will be called yet again. The recursive call is not executed when the length of the input list is 1 and cannot be split any further, at which point we move up a level in the call stack and either descend down another branch to sort another list, or merge two lists together. You can see this demonstrated in the following animation:

Building mergesort was easier than I’d expected but still a fun challenge. I hope to write out the code for some of the other sort algorithms in the coming week.

An Intro to Cron

Until recently, I knew very little about cron jobs. I’ve wanted to know how to schedule recurring events to run in the background, but did not know where to get started. About a week ago, I implemented the Whenever gem for the first time, a gem I had read about but did not feel confident in using. Whenever is a Ruby gem that makes it very easy to schedule cron jobs to run in the background at regular intervals. Whenever has its own DSL that is then transformed into a format that cron understands. Using the Whenever gem turned out out to be quite simple and promted me to learn more about how cron works.

What is cron?

Cron is a daemon that runs on your computer or on your app’s server that executes jobs at specific times. Between scheduled jobs, cron sleeps for the calculated interval until the next scheduled job. When it wakes, it checks the current time against the job’s scheduled time and executes the job as scheduled.

To add jobs to cron’s queue, you must to a crontab file. Cron automatically looks for a crontab file (or many, if you have more than one), to in order to find about events that should be scheduled.

Crontab File Syntax

Cron has its own syntax; it looks something like this:

59 23 * * * echo “This command is run daily at 11:59 pm”

Timing

This first five values (59 23 * * *) represent the time at which an event is scheduled to occur. The first value represents minutes, the second represents hours, the third is the day of the month, the fourth is the month and the fifth is the day of the week. These are all represented numerically, however the month and the day of the week can also be represented using abbreviations like JAN, FEB, SUN or MON.

If you set values for both the day of the month and the day of the week, the event will take place whenever either condition is met. For example if you set your event to take place on the 3rd day of the month and also on Tuesdays, the event will fire any day that is a Tuesday OR (not AND) the third day of the month.

An asterisk (*) indicates that the event should occur for any value in the given field. In the example above, the event is scheduled to take place any day of the month, any month, and any day of the week.

There are additional special characters that I have not had occasion to use, but which would definitely come in handy to anyone scheduling jobs at very specific intervals. The wikipedia entry on cron, though by no means an exhaustive resource for cron documentaion, does a good job explaining many–perhaps all–of the special characters you can use to indicate time in your crontab file. http://en.wikipedia.org/wiki/Cron#Special_characters

Give cron a command

What follows is the command to be executed at the indicated time. You can load a script saved in a file or enter commands directly into the crontab.If I want to check that my command works, I can highlight, copy and paste everything after my timing fields into terminal and execute the exact code that will be run at the time I scheduled.

For my project with the Whenever gem, the commands in my crontab file were more complicated than the simple echo command in the example above. Executing a cronjob in a Rails app requires loading your app’s environment into memory so that cron is aware of all your classes, methods and variables. This is where the Whenever gem comes in to help. Stay tuned for another post on setting up the Whenever gem to generate your crontab file for you in order to execute jobs from the context of your Rails app.

Ruby Iterators: Enumerable Methods, Part I

#each and #map

As someone new to Ruby who has a fairly limited lexicon of method names, I often have no idea that Ruby already contains a method to accomplish exactly what I am trying to achieve. Sometimes when I’m writing code, after constructing a sequence of method calls or wrapping some logic in my own method, I then find that Ruby has a built in method to abstract a common pattern, producing the same result in just one or two lines of code.

A few methods that I’ve recently come to understand are the map, inject and each_with_object methods. Anything that can be accomplished with these methods could also be accomplished in several more lines of code using a varation on the each method, so they are by no means absolutely necessary, but they have much more precise uses than the more general each method. Building these three methods from scratch using each helped me to understad (1) how they worked and (2) how easy it is to assign the result of some code to a variable and (3) how “yield” works. Today I’m going to examine each and map.

#each

each is our basic iterator. It pretty much does what the name implies. To print the elements of an array, we can write:

array = [1, 2, 3, 4, 5]
array.each do |i|
  puts i
end

the result:

1
2
3
4
5
 => [1, 2, 3, 4, 5] 

each, our iterator, takes each element from its reciever, the Enumerable array, and yields it to the block, the indented line between our each call and the end keyword. It does this five times, as there are 5 elements in the array. With each iteration, the element currently passed to the block will temporarily be assigned to i. end indicates the end of the code to be executed with each iteration and signals the start of the following iteration using the next element in the reciever.

Even the each method could be written in more basic terms:

array = [1, 2, 3, 4, 5]
i=0
while i < array.length
  puts array[i]
  i = i+1
end

produces:

1
2
3
4
5
 => nil 

This, isn’t a method, it’s just some code. Notice how both print the elements of the array, but our first code sample, using each returned the original array (as indicated by the hashrocket =>), while the second returned nil. each always returns the origal array on which it was called. In both the above cases, we were simply printing each element of the array, not trying to modify it in any way. But what if we wanted to transform the array in some way? For example:

array = [1, 2, 3, 4, 5]
array.each do |i|
  i + 5
end
=> [1, 2, 3, 4, 5] 

Once again, each returns the orginal array. In order to create an array with new values, each five more than the values of the original array, we would have to do the following:

array = [1, 2, 3, 4, 5]
result_array = []
array.each do |i|
  result_array << i + 5
end
 => [1, 2, 3, 4, 5] 
result_array
 => [6, 7, 8, 9, 10]

The value of the variable array has not changed, but we now have a new array, result_array containing the modified values. Note that the variable result_array must be declared and assigned the value of empty array before array.each is called. Then with each iteration, a new value i + 5is appended to result_array.

#map

Why should we have to declare an array outside the each loop and then individually add values to it? We can’t just assign the return of the each block to a new variable because, as we know, each always returns the original array. Ruby’s #map method is built to do exactly what we are asking each to do!

array = [1, 2, 3, 4, 5]
array.map do |i|
  i+5
end
 => [6, 7, 8, 9, 10] 

Unlike each, map returns an array whose elements are the result of whatever code was evaluated within the block, which is what we really want. Take note that map doesn’t “overwrite” the original array with the return array, but we can easily assign this return array to a variable.

array = [1, 2, 3, 4, 5]
result_array = array.map do |i|
  i+5
end
 => [6, 7, 8, 9, 10] 

array
 => [1, 2, 3, 4, 5] 

result_array
 => [6, 7, 8, 9, 10]

I sometimes forget how easy it is to assign the return value of an entire block of code to a variable because it’s not as intuitive as writing n = 1, but it’s really no different than assigning an integer, array, string or any other class of object to a variable. If you did not want to retain the original array and wanted to assign it the value of the new array, you could easily do so:

array = [1, 2, 3, 4, 5]
array = array.map do |i|
  i+5
end
 => [6, 7, 8, 9, 10] 

array
 => [6, 7, 8, 9, 10]

The difference between map and each is fairly straight-forward. We already say how certain uses of each are really begging for map. This was pretty easy to recognize, but building my own method that mimicked the functionality of map was a bit harder. The method can only use the each method within its definition to iterate over the reciever array. It also must be versatile–like map it must yield to a block that specifies exactly what should happen to each element in the reciever–I don’t always want to add 5 to every element. It must then return the modified array.

array = [1,2,3,4,5]

def my_map
  return_array = []
  self.each do |n|
    return_array << yield(n)
  end
  return_array
end

array.my_map do |i|
  i * 2
end
=> [2, 4, 6, 8, 10] 

The thing I really struggled to understand was how exactly yield makes this work. Whe we call array.my_map, Ruby immediately shifts to the first line of code within the definition of my_map. Before even interpreting do |i|, Ruby inserts a bookmark and starts reading the first line of the definition of my_app. On this line, a new empty array is initialized. On the next, the method each is called on self. In the context of a method definition, self is equal to the method’s reciever. In this case, the reciever is array. In this particular method call, this line of code means array.each do |n|.

Like always, each signals an iteration over its reciever where each element in that reciever will temporarily be assigned to a variable, in this case n, and then passed to the block that follows. In the first iteration, n is equal to 1.

Inside the block on the following line, we see that some value is appended to the array return_array, but Ruby does not yet knwo what that value will be. yield(n) puts another bookmark in our code, and tells Ruby to go back to where it left of earlier, just before do |i|, bringing the variable nwith it. The variable i is temporarily assigned the value of n. If this is our firt iteration, n is equal to 1. iis equal to n within this block. The block is then evaluated. The last line of code evaluated within the block is its implied return value.

In the first iteration, starting from the point where Ruby interprets yield(n), here’s an expanded version of what’s happening:

n = 1     
i = n
i * 2
 => 2 

After this, Ruby reads the keyword end and knows that it has reached the end of the block. Thus the return value of the block is equal to 2 because the last line of code in the block evaluates to 2. Think back to when Ruby placed its last bookmark. It was back inside a different block of code, within the my_map method. Ruby paused evaluation at this point because it interpreted instructions to yield. Now that the block has ended, Ruby returns to this point. yield(n)evaluates to the return value of the block to which it yielded. In this case, yield(n) where n = 1 evaluates to 2, as we saw. Ruby now knows what value to append to the reciever of the shovel method, return_array.

Upon interpreting the endkeyword on the following line, Ruby will perform the next iteration over array, again assigning n a value, passing that value to the block where my_map was originally called, it will evaluate that block and shovel its return value into return_array all over again and again for each array element. After the last iteration, Ruby will proceed to the line following the end key. Because we want the method to return the modified array, the last line of code in the method definition must be return_array. Whenever this function is called, it will return the updated array.

Abstracting behaviors that will be used again and again is always a good decision. Utilizing yield in this method allowed us to abstract everthing execept the exact way in which each element of the original array would be transformed. The distracting and confusing logic of iteration and return values is separated from the code where the method is called, where all we really want is the result of all that logic. Using yield makes the method highly versatile–the logic is standardized, but we can specify the way in which the elements of the original array should be transformed. So long as we want to iterate over an array and return a new modified array, this method is applicable. This is of course why the makers of Ruby included it!

There are other methods that can be called on Enumerable objects that behave in slightly different ways. They follow a similar model–iteration and return values are abstracted. These too can be built using each. Building these methods from scratch helps to understand the subtle differences between them and. For me, it is a useful way to practice using yield and to add another build in Ruby method to my vocabulary.

Small Pleasures

I managed to get something up on Heroku!


This past Friday, we built our first app that actually functions in a browser. It allows you to enter some text, save it to a database, and view all the entries in that database. For some reason I had a rough time putting this on Heroku on Friday, but today I had no problems. Here’s the original assignment. The files come with enough code to get you started, but leave enough out to require some thinking.

See my first lil appy here!

Why I’m Learning to Program

Here are my thoughts, in a more or less logical order, on why I’d like to be a programmer:

  • Technology and the internet dictates culture, society and behaviour. Those in control of technology hold great power.

  • I do not want to be a passive bystander, I want to be in control of the way I use techonology and how technology affects me. I watch my mom using her computer, limited to the ways in which her software permits her to use it, feeling that she has to sign up for Facebook because it has become the new way to communicate with colleagues and friends, and I know I don’t want to be left at the mercy of the decisions previous programmers who designed certain software or websites.

  • There are a lot of people or companies in control of technology – and thus of society and its behaviours – who are not doing a great job, sell a product for which they are simply creating an artificial demand, or whom I believe are affecting culture negatively. I feel compelled to bring to life the apps I wish exited so that perhaps the net effect of technology on the world will perhaps be just a little (or maybe a lot!) more positive.

  • Why wouldn’t I? I think everyone should (and someday will) have at least some understanding of how programming works. Being able to program will soon not be an exception to the rule, it will be the norm. Some day NOBODY will be in the position where technology simply happens to them. Some day programming will be a section on the SATs. But why wait until then?

  • As its name indicates, “the web” has transformed mass dissemination of information from a few-to-many model to a weblike, many-to-many model. (Think a few writers at a few newspapers distributed to many readers vs. Twitter, where everyone is both a writer and a reader). Technology affects every individual and the larger blob of society that all those individuals form, but decisions about technology with these far-reaching effects are made by a few huge tech companies and by a few individuals (relative to population) who know how to write code. When the day comes that everyone knows how to write code, this one-way flow will become a web of mutual exchange, just as communication has. Again, why wait until then? I want to realize and participate in that exchange now!

I’d like to point readers to a blog post by my (very smart) friend Casey Gollan: User-Generated Content

Abstract:

The way things are now is more like a pile up of metaphors and recycled code than laws of interaction which are set in stone. Designing computer systems is a strangely direct way of altering how people experience the world and relate to each other. Perhaps in the coming years artists will be able to create new platforms with the conceptual backbone that is lacking in today’s popular offerings.

Video: Wendy Chun - the Enduring Ephemeral, or the Future Is a Memory

Click Here

Wendy’s blurb about the lecture:

Key to the digital as the new is an ideological conflation of memory and storage that undermines and underlines digital media’s archival promise. Memory, with its constant degeneration, does not equal storage; although artificial memory has historically combined the transitory with the permanent, the passing with the stable, digital media complicates this relationship by making the permanent into an enduring ephemeral. As I explain in this talk, it does so not simply through some inherent technological feature, but rather because everyday usage and parlance seeks to arrest memory and its degenerative possibilities in order to support dreams of digital programmability, of the future unfolding predictably from memory. Unpacking the theoretical implications of constantly disseminated and regenerated digital content, this paper argues these dreams create, rather than solve, archival nightmares. They proliferate non-simultaneous enduring ephemerals.

I first watch this video maybe a year ago, and really enjoyed it. Wendy Chun teaches at Brown in the Modern Culture and Media department.