Hadoop example for Exim logs with Python.

The following is an example of parsing an exim_mainlog using Hadoop streaming. I’ve implemented both the mapper and the reducer in Python. The mapper and reducer don’t handle all of Exim’s log formats yet but this can be easily extended in the mapper and reducer if you actually end up using the output (this is just an example).

The following is the EximMapper.py file.

#!/usr/bin/python
import sys, re
date_time_id_rest = 
  '([0-9-]{10})\s([0-9:-]{8})\s([a-zA-Z0-9-]{16})\s(.*)'
def eximSplit(line):
    matches = re.match(date_time_id_rest, line)
    if matches is not None:
        key = matches.group(3)
        val = matches.group(0)
        print key + '\t' + val
        return
for line in sys.stdin:
    eximSplit(line)

The following is the EximReducer.py file. It may take a while to understand this one – the main intricacy is that the reducer script must remember state information between runs when run under Hadoop Streaming.

#!/usr/bin/python
import sys, re
keys = 0
(lastKey, lastVal) = (None, '')
for line in sys.stdin:
    (key, val) = line.strip().split('\t')
    if lastKey and lastKey != key:
        print lastKey + '\t' + lastVal
        (lastKey, lastVal) = (key, val)
        keys += 1
    else:
        (lastKey, lastVal) = (key, val + '\n' + lastVal)
if lastKey:
    print lastKey + '\t' + lastVal

That is it. Now we can run our mapper and reducer under bash to see if it will work. This is an easy way of visualizing a Hadoop work unit for those who are familiar with pipes in bash.

cat exim_mainlog | ./EximMapper.py | sort | ./EximReducer.py

Ideally that will organize all the entries in the exim_mainlog into transactions separated and arranged by their transaction ID. If that is the case, we’re onto running this thing under Hadoop. We’ll want to grab our hadoop-streaming.jar file and put it in the classpath like so. You’ll have to update the paths I used to be relative to your environment.

$HADOOP/bin/hadoop jar hadoop-0.20.2-streaming.jar \
   -input exim_mainlog \
   -output StreamingOutputDirectory \
   -mapper EximMapper.py \
   -reducer EximReducer.py
Posted in Programming | Tagged , , , | Comments closed

Facebook’s Connect on WordPress.

So after several hours of beating my head against the keyboard I’ve finally been able to integrate Facebook’s Connect feature to http://blog.gnucom.cc – mainly for the comments box. I decided not to go with a WordPress plugin and instead modified the comments.php section of my WordPress installation.

This URL made the process sound mindlessly easy – http://developers.facebook.com/blog/post/198 – and looking back on the process, I guess it was. I just had a few peculiarities: I was running the Connect feature on a subdomain and had typical WordPress URL rewriting going on behind the scenes. These two peculiarities cost me 4 hours.

These two Facebook resources were the winners for me. If they’re not for you, it would probably behoove you to dig around in their wiki.

Supporting subdomains isn’t as obvious as it sounded, so Facebook kindly provided a how/what/why here: http://wiki.developers.facebook.com/index.php/Supporting_Subdomains_In_Facebook_Connect

Apparently the cross domain communication channel (this is what breaks Connect on long URL with several directories) differed slightly between the one that Facebook’s development community originally suggested, and I found the fix on this page: http://wiki.developers.facebook.com/index.php/Cross_Domain_Communication_Channel.

Now Facebook can spy on my blog, too.

Posted in Wordpress | Tagged , | Leave a comment

Sorting methods in Java.

I made this during school and have used it many a times while preparing for an interview or checking my new work in new languages. The class has static methods for all of the classic sorting methods like selection sort, bubble sort, insertion sort, heap sort, merge sort and quick sort.

public class Sorts {
 
  /*SLECTIONSORT ALGORITHM*/
  //precondition:  supply instantiated array of size 
  // many elements
  public static <Type extends Comparable<? super Type>> 
  void selectionSort(Type[] arr, int size) {
    Type temp;
    int minIndex;
 
    for (int i = 0; i < size - 1; i++) {
      minIndex = i;
      for (int j = i + 1; j < size; j++) {
        if (arr[j].compareTo(arr[minIndex]) < 0) {
          minIndex = j;
        }
      }
      temp = arr[minIndex];
      arr[minIndex] = arr[i];
      arr[i] = temp;
    }
  }
  //postcondition:  returns parameter argument arr in 
  // ascending sorted order
 
 
  /*BUBBLESORT ALGORITHM*/
  //precondition:  supply instantiated array of size 
  // many elements
  public static <Type extends Comparable<? super Type>> 
  void bubbleSort(Type[] arr, int size) {
    boolean done = false;
    Type temp;
 
    while(!done) {
      done = true;
      for (int index = 0; index < size - 1; index ++) {
        if (arr[index].compareTo(arr[index + 1]) > 0) {
          temp = arr[index];
          arr[index] = arr[index + 1];
          arr[index + 1] = temp;
          done = false;
        }
      }
    }
  }
  //postcondition:  returns parameter argument arr in 
  // ascending sorted order
 
  /*INSERTIONSORT ALGORITHM*/
  //precondition:  supply instantiated array of size 
  // many elements
  public static <Type extends Comparable<? super Type>> 
  void insertionSort(Type[] arr, int size) {
    Type save;
    for (int i = 0; i < size; i++) {
      save = arr[i];
      int j = i;
      //move elements while they are greater than 
      // save value
      while (j > 0 && arr[j - 1].compareTo(save) > 0) {
        arr[j] = arr[j - 1];
        j--;
      }
      arr[j] = save;
    }
  }
  //postcondition:  returns parameter argument arr 
  // in ascending sorted order
 
 
  /*HEAPSORT ALGORITHM*/
  //precondition:  supply instantiated array of size 
  // many elements
  public static <Type extends Comparable<? super Type>> 
  void heapSort(Type[] arr, int size) {
    if (size > 0) {
      Type[] temp = (Type[])new Comparable[size];
      int tempLength = size;
 
      //step 1. buildHeap    
      for (int index = (size / 2) - 1; index >= 0; index--) {
        percolateDown(arr, size, index);
      }
      //step 2. sortList
      for (int index = 0; index <= size - 2; index++) {
        temp[index] = arr[0]; //save root to temp array
        arr[0] = arr[tempLength - 1]; //put last element into root
        tempLength--; //decrease array length
        percolateDown(arr, tempLength, 0); //percolate root node
      }
      temp[size - 1] = arr[0]; //handles last element
      //step 3. copyList
      System.arraycopy(temp, 0, arr, 0, size);
    }
  }
  //postcondition:  returns parameter argument arr in 
  // ascending sorted order
 
  //precondition:  supply instantiated array of size 
  // any elements and index of hole to be percolated down
  private static <Type extends Comparable<? super Type>> 
  void percolateDown(Type[] arr, int size, int x) {
    int hole = x;
    Type item = arr[x];
    int newHole = newHole(arr, size, hole, item);
    while(newHole != -1) {
      arr[hole] = arr[newHole];
      hole = newHole;
      newHole = newHole(arr, size, hole, item);
    }
    arr[hole] = item;
    return;
  }
  //postcondition:  percolates down an index to 
  // correct location to maintain heap quality
 
  //precondition:  supply instantiated array of 
  // size many elements, index of current hole, 
  // and item to be compared against possible new holes
  private static <Type extends Comparable<? super Type>> 
  int newHole(Type[] arr, int size, int hole, Type item) {
    int newHole = -1;
    //checks for children
    if (2*hole + 2 < size || 2*hole + 1 < size) { 
      //checks for invalid right child
      if (2*hole + 2 > size) { 
        //compare to left child
        if (item.compareTo(arr[2*hole + 1]) > 0) { 
          //this is the new hole
          hole = 2*hole + 1; 
        }
      }
      //checks for valid right child
      else if (2*hole + 2 < size){ 
        //if left is less than right
        if (arr[2*hole + 1].compareTo(arr[2*hole+2]) <= 0) { 
          //compares left to value
          if (item.compareTo(arr[2*hole + 1]) > 0) { 
            //this is the new hole
            newHole = 2*hole + 1; 
          }
        }
        else {
          //compares right child to value
          if (item.compareTo(arr[2*hole + 2]) > 0) { 
            //this is the new hole
            newHole = 2*hole + 2; 
          }
        }
      }
      else {
        //checks if value is greater than left child
        if (item.compareTo(arr[2*hole + 1]) > 0) { 
          //this is the new hole
          newHole = 2*hole + 1; 
        }
      }
    }
    return newHole;
  }
  //postcondition:  returns index of new hole
 
  /*MERGESORT ALGORITHM*/
  //precondition:  supply instantiated array 
  // of size many elements
  public static <Type extends Comparable<? super Type>> 
  void mergeSort(Type[] arr, int size) {
    mergesort(arr, 0, size - 1);
  }
  //postcondition:  returns parameter argument 
  // arr in ascending sorted order
 
  //precondition:  supply instantiated array of 
  // size many elements, index of first element, 
  // and index of last element
  private static <Type extends Comparable<? super Type>> 
  void mergesort(Type[] arr, int first, int last) {
    if (first < last) {
      int middle = (first + last)  / 2;
      //recursive mergesort on first half
      mergesort(arr, first, middle); 
      //recursive mergesort on second half
      mergesort(arr, middle + 1, last); 
      //merge both halves
      mergeSortedHalves(arr, first, middle, last);
    }
  }
  //postcondition:  returns parameter argument 
  // array in ascending sorted order
 
  //precondition:  supply instantiated array, 
  // index of left element, index of middle 
  // element, and index of right element
  private static <Type extends Comparable<? super Type>> 
  void mergeSortedHalves(Type[] arr, int left, int middle, int right) {
    Type[] temp = (Type[])new Comparable[right - left + 1];
 
    int index1 = left;
    int index2 = middle + 1;
    int index = 0;
    //while both halves have elements
    while (index1 <= middle && index2 <= right) {
      if (arr[index1].compareTo(arr[index2]) <= 0) {
        temp[index] = arr[index1];
        index1++;
      }
      else {
        temp[index] = arr[index2];
        index2++;
      }
      index++;
    }
    //if upper while loop finished but first half 
    // has more elements
    if (index1 <= middle) {
      while (index1 <= middle) {
        temp[index] = arr[index1];
        index++;
        index1++;
      }
    }
    //if upper while loop finished but second half 
    // has more elements
    else if (index2 <= right) {
      while (index2 <= right) {
        temp[index] = arr[index2];
        index++;
        index2++;
      }
    }
    System.arraycopy(temp, 0, arr, left, temp.length);
  }
  //postcondition:  returns a single sorted array from 
  // sub array[left to middle] and sub array[middle to right]
 
 
  /*QUICKSORT ALGORITHM*/
  //precondition:  supply instantiated array of 
  // size many elements
  public static <Type extends Comparable<? super Type>> 
  void quickSort(Type[] arr, int size) {
    quickSort(arr, 0, size - 1);
  }
  //postcondition:  returns parameter argument array 
  // in ascending sorted order
 
  //precondition:  supply instantiated array, 
  // index of first element, and index of last element
  private static <Type extends Comparable<? super Type>> 
  void quickSort(Type[] arr, int first, int last) {
    if (first < last) {
      setPivotToEnd(arr, first, last);
      int pivotIndex = splitList(arr, first, last);
      quickSort(arr, first, pivotIndex - 1);
      quickSort(arr, pivotIndex + 1, last);
    }
  }
  //postcondition:  returns parameter argument array 
  // in ascending sorted order
 
  //precondition:  supply instantiated array, index 
  // of left element, and index of right element
  private static <Type extends Comparable<? super Type>> 
  void setPivotToEnd(Type[] arr, int left, int right) {
    int center = (left + right) / 2;
    Type temp;
 
    if (arr[center].compareTo(arr[left]) < 0) {
      temp = arr[center];
      arr[center] = arr[left];
      arr[left] = temp;
    }
    if (arr[right].compareTo(arr[left]) < 0) {
      temp = arr[right];
      arr[right] = arr[left];
      arr[left] = temp;
    }
    if (arr[center].compareTo(arr[right]) < 0) {
      temp = arr[center];
      arr[center] = arr[right];
      arr[right] = temp;
    }
  }
  //postcondition:  median of three values at left, 
  // right, and center is swapped to right index
 
  //precondition:  supply instantiated array, index 
  // of left element, and index of right element
  private static <Type extends Comparable<? super Type>> 
  int splitList(Type[] arr, int left, int right) {
    int indexL = left;
    int indexR = right - 1;
    Type pivot = arr[right];
    Type temp;
 
    while (indexL <= indexR) { 
      //while indices have not run over
      while (arr[indexL].compareTo(pivot) < 0) { 
        //find element greater than pivot
        indexL++;
      }
      while (indexL <= indexR && arr[indexR].compareTo(pivot) > 0) { 
        //find element less than pivot
        indexR--;
      }
      if (indexL <= indexR) {
        //swap the two values
        temp = arr[indexL];
        arr[indexL] = arr[indexR];
        arr[indexR] = temp;
        indexL++;
        indexR--;
      }
    }
    //if indexL is not the right index swap them
    if (indexL != right) {
      temp = arr[indexL];
      arr[indexL] = arr[right];
      arr[right] = temp;
    }
    return indexL;
  }
  //postcondition:  returns index of final pivot position
}
Posted in Programming | Tagged , , | Leave a comment