Sunday 1 December 2013

Sorting algorithms: selection sort, quick sort, merge sort, and efficiency.

A sorting algorithm is an algorithm that places elements in a list in a certain order.

Selection sort is an algorithm which finds the minimum value and swaps it with the value in the first position. It then scans for the next minimum value, and swaps position with the second position, and so on. It does at most n swaps, so is useful to use where swapping is expensive. Selection sort has an average time complexity of O(n2), thus is not very efficient for large lists.


Quicksort algorithm partitions an array by taking an element as a pivot with all elements smaller than the pivot moved before it and all greater elements coming after the pivot. The lesser and greater sublists are then recursively sorted. It has complexity of O(n log n).

Merge sort algorithm works by comparing every two elements and swaps if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four and so on until the last two lists are merged into the final sorted list. This algorithm has a worst-case running time of  O(n log n), so is efficient on very large lists.

Friday 8 November 2013

What's the difference between str() and repr()?

While scanning the BTNode class in lab today, I just realized that the __repr__ method takes str(self.data) but repr(self.left) and repr(self.right) !!

And I thought to myself what's the difference? Why not just be consistent and use str() or repr()?

So, I went back to read the Python documentation for the two functions and it states:

object.__repr__(self): called by the repr() built-in function and by string conversions (reverse quotes) to compute the “official” string representation of an object.
object.__str__(self): called by the str() build-in function and by the print statement to compute the “informal” string representation of an object.

I was still scratching my head after reading this. What did they mean by "official" and "informal"?

Digging into it further, I finally got the distinction!

For example, if we create a datetime object:
>>> import datetime
>>> today = datetime.datetime.now()
Using the str() function to display today gives this:
>>> str(today)
'2013-11-08 15:06:15.063764'
While the repr() function gives this output:
>>> repr(today)
'datetime.datetime(2013, 11, 8, 15, 6, 15, 63764)'
As shown above, the str() function displayed the date and time in a way that is easily read and understood. The repr() also returned a string, but was shown as the "official" representation of a datetime object, and when called by eval(), should return the same object.
So, the __str__ (the "informal' representation) may be preferred for cases like this as it is simpler and easier to read.

However, even though the official representation of an object is a bit of an eyesore compared to the informal one, it is unambiguous(more comprehensive info about the object) and more robust against bugs!

So ending this post off, when to use which?

Well, every class you implement should always have a __repr__ method.
And if you think it'd be beneficial, you can also implement a __str__ method.

And one more thing! The print statement (along with the str() built-in function) also uses __str__ to display the string representation of the object!

Anyway, if you were wondering the same thing, I hope that clarified the difference between the two! :)





Saturday 2 November 2013

Regular Expressions!

From assignment 2, I first came upon the jargon regular expression (regex). Regex is a sequence of characters that forms a search pattern, and is commonly used for pattern matching in strings.
I also recently found out what it means to scrub websites, which is just extracting all the information that you want to do whatever you want with it. And with regex, this is done quite easily!

For example, say you have a mountain of documents and you wanted to extract every street address from it. Doing this manually might take you several hours, but using regex it's a pretty simple matter.
If given that every house number is less than 5 digits in length, with the street name being exactly one word, followed by the word street or avenue and a period always following.
You can use the regular expression
\d{1,5} \s \w+ \s \w+ \.

where
\d{1, 5}  #represents digits 1 to 5 in length
\s            #followed by a space between
\w+   #w represents anything that's a character, plus sign represents one or more
\s         #space again
\w+         #series of characters
\.               #period
All regular expressions (as well as special characters) is preceded by a backslash!

There are many uses for regex and I look forward to learning more about it in future CSC courses!

Friday 25 October 2013

OOP and Recursion!

Object oriented programming (OOP) is a type of programming language model where everything is grouped into self sustainable "objects". An object is basically a "thing" that can perform a set of related activities. In OOP, objects are usually instances of classes which can interact with one another to design and create computer programs. (In python, everything is an object!) Also, in OOP, data tends to carry with it a set of functions, unlike procedural programming, in which data tends to be decoupled from the functions that operates on it. It is especially useful when there are several programmers contributing code, or when you want to reuse code between projects.

Recursion is when you take the original problem and divide it into smaller (more easily solved) instances of itself, solve those smaller instances (usually by using the same algorithm again) and then reassemble them into the final solution. Essentially, a recursion is a function that calls itself! Recursion is useful when you have data that is not always the same format or has multiple nuances, such as when you need to do iterative branching.