Woodstock Blog

a tech blog for general algorithmic interview questions

[Question] Subarray With 0 Sum

Question

link

Given an array of positive and negative numbers, find if there is a subarray with 0 sum.

eg. 4, 2, -3, 1, 6 -> return true (2, -3, 1)

eg. 4, 2, 0, 1, 6 -> return true (0)

Solution

General case, idea of 前缀和:

iterate through the array and for every element arr[i], calculate sum of elements form 0 to i (this can simply be done as sum += arr[i]). If the current sum has been seen before, then there is a zero sum array.

O(n) time.

Special Case, if the input is all positive number, then this question can be solved using the idea from Subarray with Particular Sum, O(n) time.

[Question] Check Power of 2

Question

Requirement: O(1) time

Solution

Do a special handle for input = 0. source

bool IsPowerOfTwo(ulong x) {
    return (x & (x - 1)) == 0;
}

[LintCode] Trailing Zeros of Factorial

Question

link

Write an algorithm which computes the number of trailing zeros in n factorial.

Example: 11! = 39916800, so the out should be 2

Solution

Note that a trailing zero is produced by 2 * 5.

This question basically is couting the number of factor 5 (because factor 2 is always sufficient).

Code

public long trailingZeros(long n) {
    if (n <= 0) {
        return 0;
    }
    long d = 5;
    long result = 0;
    while (d <= n) {
        result += n / d;
        d *= 5;
    }
    return result;
}

[Question] the Skyline Problem

Question

link

Given n buildings, each building is an rectangle located on x-axis, and indicated by (x1, x2, height). Calculate the outline of all buildings. Output them in order.

Solution

Solution 1 is keeping a heightmap (Brute force). Keep an array of a certain size, and update max_height for every single point.

Solution 2 is using Sweeping line algorithm. Sweep from left to right, always try to find the largest height of the rectangle.

First make sure the rectangles are sorted. While sweeping, if sees an building-start, insert the height to the heap. If a building-end, remove from the heap. Then the current maximum height is the max point in the heap. A visualized explanation can be found here.

Total cost is O(nlogn), source.

Two solutions compared:

  1. In the brute force approach, doubling the width of the buildings will double the runtime cost. With the sweep line algorithm, it won’t.

  2. Sweep Line algorithm will work if the input is not discrete, whereas the heightmap approach requires integer coordinates.

Updated on Feb 1st, 2015: using floor 26 of this post, we can do insertion and deletion in max-heap, then peek the largest element in the heap.

把所有的turning points 放在一起,根据coordination从小到大sort 。

再用max-heap, 把所有的turning points扫一遍,遇到start turning point, 把 volume放入max-heap. 遇到end turning point,把对应的volume从max-heap中取出。

max-heap的max 值就是对应区间的最大volume

Code presented below.

Code

Brate force code from ucf:

public static void main(String[] args) throws IOException {

    Scanner fin = new Scanner(new File("skyline.in"));

    // Read in general information about the buildings and skyline.
    int numbuildings = fin.nextInt();
    int min = fin.nextInt();
    int max = fin.nextInt();
    int size = max - min;

    // Initialize the skyline.
    int[] mysky = new int[size];
    for (int i = 0; i < size; i++)
        mysky[i] = 0;

    // For each building, edit the skyline as necessary.
    for (int i = 0; i < numbuildings; i++) {
        int left = fin.nextInt();
        int height = fin.nextInt();
        int right = fin.nextInt();

        // Note how we have to offset the left and right boundaries
        // due to how we set up our array. Basically, at each segment
        // of the skyline that the current building is part of, we see
        // if this current building is the tallest of the ones there.
        // If so, we make the necessary update.
        for (int j = left - min; j < right - min; j++)
            if (height > mysky[j])
                mysky[j] = height;
    }

    // Beginning of the skyline.
    System.out.print(min + " ");
    int cnt = 0;

    // Keep on going until you get right before the end of the skyline.
    while (cnt < size - 1) {

        // Increment cnt as long as the height is the same.
        while (cnt < size - 1 && mysky[cnt] == mysky[cnt + 1])
            cnt++;

        // Print out information for this part of the skyline.
        System.out.print(mysky[cnt] + " " + (cnt + 1 + min) + " ");
        cnt++;
    }

    // This is the case where the last segment is unique.
    if (cnt == size - 1)
        System.out.print(mysky[size - 1] + " " + max);
    System.out.println();
    fin.close();
}

The heap solution:

public int[] skyline(List<Building> bds, int min, int max) {
    int[] output = new int[max - min];

    List<Edge>[] edges = new List[max - min];
    for (int i = 0; i < edges.length; i++) {
        edges[i] = new ArrayList<Edge>();
    }
    for (Building b : bds) {
        // put all edges into an array of edges
        edges[b.from].add(new Edge(b, true));
        edges[b.to].add(new Edge(b, false));
    }

    Queue<Building> heap = new PriorityQueue<Building>(100,
            new Comparator<Building>() {
                public int compare(Building b1, Building b2) {
                    return b2.height - b1.height;
                }
            });
    for (int i = 0; i < edges.length; i++) {
        // insert or remove each building at position i into max heap
        for (Edge e : edges[i]) {
            if (e.isEnter) {
                heap.add(e.building);
            } else {
                heap.remove(e.building);
            }
        }
        // then culculate the current hight, which is top of the heap
        if (!heap.isEmpty()) {
            output[i] = heap.peek().height;
        }
    }

    return output;
}

static class Edge {
    Building building;
    boolean isEnter;

    public Edge(Building bld, boolean enter) {
        building = bld;
        isEnter = enter;
    }
}

static class Building {
    int from;
    int to;
    int height;

    public Building(int a, int b, int c) {
        from = a;
        to = b;
        height = c;
    }
}

[Question] Min Stack

Question

link

Implement a stack, enable O(1) Push, Pop Top, Min. Where Min() will return the value of minimum number in the stack.

Solution

Using two stacks.

The first one is the regular stack.

The second one only store minimum numbers.

Eg:

Actual stack: (head) 16 15 29 19 18 (tail)

Auxiliary Stack: (head) 15 15 18 18 18 (tail)

Code

Psudu-code:

Push(int x):
1) push x to the first stack (the stack with actual elements)
2) compare x with the top element of the second stack (the auxiliary stack). Let the top element be y.
…..a) If x is smaller than y then push x to the auxiliary stack.
…..b) If x is greater than y then push y to the auxiliary stack.

Pop():
1) pop the top element from the auxiliary stack.
2) pop the top element from the actual stack and return it.

Code is skipped.

[Question] Median in a Stream of Integers

Question

link

Given that integers are read from a data stream. Find median of elements read so for in efficient way.

For simplicity assume there are no duplicates. For example, let us consider the stream 5, 15, 1, 3 … The stream of output is 5, 10, 1, 4 …

Solution

we can use a max heap on left side to represent elements that are less than effective median, and a min heap on right side to represent elements that are greater than effective median.

After processing an incoming element, the number of elements in heaps differ utmost by 1 element. source

So we know that java.util.PriorityQueue is a max-heap. The only question is, how to implement a min-heap? Someone sugests writing your own customized Comparator:

you use two instances of a java.util.PriorityQueue containing the same elements? The first instance would be passed a comparator which puts the maximum at the head, and the second instance would use a comparator which puts the minimum at the head.

Code

skipped

[Question] Implement Queue Using Stacks

Question

link

Implement a queue by two stacks. Support O (1) push, pop, top.

Solution

Q.Push(x): S1-Push(x)

Q.Pop(): if S2.empty -> S1->S2 S2.pop()

Q.Top(): Similar with Q.Pop()

Learn and compare with another question [Question] Implement Stack using Two Queues.

Code

skipped

[Question] Implement a HashMap

Question

Implement a HashMap

Hint:

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets. source

Load factor = n/k where n is the number of entries and k is the number of buckets.

Analysis

Some points to take note:

  1. So first of all, we would need 2 important parameters in a Hashmap: initial capacity and load factor. In Java, it’s default 0.75.

    For example the product of capacity and load factor = 16 * 0.75 = 12. This represents that after storing the 12th key – value pair into the HashMap , its capacity will become 32.

  2. Then we would need an Entry<Obj, Obj> Class to store key-value pairs. Important to note the Entry would have a ‘next’ pointer, in order to mimic a LinkedList.

  3. Except key, value and next pointer, Entry class also need to store the ‘hash’ value, because it will be used during comparison.

  4. The main bucket is an array of Entry objects. It should resize when it has to, and this is done in the addEntry() method. I skipped this part of code.

  5. The code for put() and get() method are similar, because it’s bascially calculate hash, then get array index, using that index to locate the head Entry object, and then do a linear search for key. If found, put() will replace it, and get() will return it. Note than after replaced, the old value is returned (Java designed in this way, I personally wouldn’t do this).

  6. Last important thing is the indexFor() method. It looks useless to me at first, then I realize that this method is mapping my hash value into the index in the array. Fine, but this bit operation still looks odd. Look at next section for more.

The Niubility Operation: h & (length - 1)

This actually equals to the value of (h % length). It converts the hash value into array index.

这个方法非常巧妙,它通过 h & (length - 1) 来得到该对象的保存位,而HashMap底层数组的长度总是 2 的 n 次方,这是HashMap在速度上的优化。

当length总是 2 的n次方时,h & (length - 1)运算等价于对length取模,也就是h % length,但是 & 比 % 具有更高的效率。source

That’s all about implementing a HashMap. It’s difficult, but having a general idea definitely helps.

What’s more, know the mod operation “h & (length - 1)” would definitely help you grab some attentions!

Code

public class MyHashMap {

    Pair[] table;
    float loadFactor;
    int size;

    public MyHashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0 || loadFactor <= 0) {
            return;
        }
        table = new Pair[initialCapacity];
        this.size = 0;
        this.loadFactor = loadFactor;
    }

    public String get(int key) {
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        Pair e = table[i];
        for (; e != null; e = e.next) {
            if (e.hash == hash && key == e.key) {
                return e.value;
            }
        }
        return null;
    }

    public String put(int key, String value) {
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        Pair e = table[i];
        for (; e != null; e = e.next) {
            if (e.hash == hash && key == e.key) {
                String oldValue = e.value;
                e.value = value;
                return oldValue;
            }
        }
        addEntry(hash, key, value, i);
        return null;
    }

    void addEntry(int hash, int key, String value, int i) {
        // in charge of resizing here (check the load factor)
        // resizing code is not written here
        Pair e = table[i];
        table[i] = new Pair(hash, key, value, e);
    }

    int indexFor(int h, int length) {
        return h & (length - 1);
    }

    final int hash(int code) {
        // this is the hash function from java 6 source code
        code ^= (code >>> 20) ^ (code >>> 12);
        return code ^ (code >>> 7) ^ (code >>> 4);
    }

    public static void main(String[] args) {
        MyHashMap map = new MyHashMap(16, 0.5f);
        map.put(1, "a");
        map.put(2, "b");
        map.put(3, "c");
        System.out.println(map.get(1));
        System.out.println(map.get(2));
        System.out.println(map.get(3));
        map.put(1, "Spider Man");
        map.put(4, "d");
        map.put(3, "Peter Parker");
        System.out.println(map.get(1));
        System.out.println(map.get(2));
        System.out.println(map.get(3));
        System.out.println(map.get(4));
        System.out.println(map.get(5));
    }

    class Pair {
        int hash;
        int key;
        String value;
        Pair next;

        Pair(int h, int k, String v, Pair n) {
            hash = h;
            key = k;
            value = v;
            next = n;
        }
    }
}

[Design] HashMap vs Hashtable vs HashSet

Overview

HashTable

  1. HashTable is thread safe synchronized. Only 1 thread can access at a time (compromise speed).
  2. HashTable does not allow null for key and value.

HashMap & HashSet

  1. HashMap is not sync, but better performance.
  2. HashMap allows null key or null value.
  3. HashSet is same as HashMap, just interface difference.

Conclusion

  1. HashMap is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.
  2. HashSet is same as HashMap. It’s not thread safe.

Implementation

  1. C++ Map is rbtree
  2. Java HashMap is buckets + entries (using Array + LinkedList)
  3. HashTable in Java is basically HashMap + lock

One more thing

HashMap is locked-out during rehashing, so it’s better to avoid rehashing (by setting initial size bigger).

[NineChap 9] Big Date, System Design and Resume (`)

Resume

  1. Do not write anything unrelated to CS.
  2. Do not write too long - 1 or 2 pages are fine. Senior engineer 3 pages.
  3. Do not write low GPA
  4. Never ever write “proficient in anything”

Big Data

Most classic question is “Frequent items” (refer to July’s blog).

Find top k hot queries in a daily access log of Google.

Variation:

  1. k = 1 vs k = 100000 - majority numbers
  2. low RAM vs sufficient RAM
  3. single machine vs multiple machines
  4. accurate vs inaccurate

Sufficient RAM

  1. HashTable + Heap (min-heap)
  2. Time O(n * logk), Space O(n)

Low RAM

  1. Split into 1000 (i.e. LOG/M) files by hash(query) % 1000
  2. Using HashTable + Heap to get top k for each files
  3. Collect 1000 top k queries and get global top k
  4. This method requires a lot of disk access and r/w, still slow.

Inaccurate (reduce memory from O(n) to O(k))

  1. Hash Count (only need to know this one) Limit the size of HashMap. The bigger the RAM, the more accurate is the result.
  2. Space Saving
  3. Lossy Counting
  4. Sticky Sampling
  5. Count Sketch

Bloom Filter

  1. Regular bloom filter - use 4 线性无关 formula
  2. Counting bloom filter - support delete
  3. Better DS than HashMap, but can loose some accuracy

Trie

Bitmap

Find all unique queries - use bigmap to store 3 types of states

System Design

Design a short url system

  1. Cache

to store hot urls

  1. Load Balance

Too many click in short time

  1. Storage balance

Hash value of an url and then store in individual machine

Expansibility?

  1. Consistent Hash

Node, can increase # of machines to store information

Migration process

  1. Router

check which machine response my query

light-weight calculations

what is router is down?

  1. Locale

url frequently access by China, then put the url storage in Beijing

Need-to-know Design patterns

  1. Singleton
  2. Factory
  3. Master-slave (esp. for relational DB)

MapReduce: Simplified Data Processing on Large Clusters

The Google File System

BigTable: A Distributed Storage System for Structured Data