Woodstock Blog

a tech blog for general algorithmic interview questions

[Apple] Calculate Area

Question

Calculate area:

Analysis

We are only able to come up with 2 equations using the obvious relationship between semicircle and square. But we have x, y and z variables.

How do we proceed?

Solution

Thanks to my girlfriend, who came up with the solution:

The tiny small shaded area can be calculated by substracting the isosceles triangle (等腰三角形) from a 1/12 circle.

And the tiny small shaded area is simply a part of the larger shaded area.

Do you own math.

[Question] Packing Rectangles

Question

link

Four rectangles are given. Find the smallest enclosing (new) rectangle into which these four may be fitted without overlapping. By smallest rectangle we mean the one with the smallest area.

Greedy

Greedy placement from large (area) to small.

  1. Put the largest rectangle remaining into your packed area.
  2. If it can’t fit anywhere, place it in a place that extends the pack region as little as possible.
  3. Repeat until you finish with the smallest rectangle.

Optimal Solution

There is a trade-off between implementation complexity/time and optimality, but there is a wide range of algorithms to choose from.

Below is quoted:

  1. First-Fit Decreasing Height (FFDH) algorithm
    FFDH packs the next item R (in non-increasing height) on the first level where R fits. If no level can accommodate R, a new level is created.
    Time complexity of FFDH: O(n·log n).
    Approximation ratio: FFDH(I)<=(17/10)·OPT(I)+1; the asymptotic bound of 17/10 is tight.

  2. Next-Fit Decreasing Height (NFDH) algorithm
    NFDH packs the next item R (in non-increasing height) on the current level if R fits. Otherwise, the current level is "closed" and a new level is created.
    Time complexity: O(n·log n).
    Approximation ratio: NFDH(I) <= 2·OPT(I)+1; the asymptotic bound of 2 is tight.

  3. Best-Fit Decreasing Height (BFDH) algorithm
    BFDH packs the next item R (in non-increasing height) on the level, among those that can accommodate R, for which the residual horizontal space is the minimum. If no level can accommodate R, a new level is created.

  4. Bottom-Left (BL) Algorithm
    BL first order items by non-increasing width. BL packs the next item as near to the bottom as it will fit and then as close to the left as it can go without overlapping with any packed item. Note that BL is not a level-oriented packing algorithm.
    Time complexity: O(n^2).
    Approximation ratio: BL(I) <= 3·OPT(I).

  5. Baker's Up-Down (UD) algorithm
    UD uses a combination of BL and a generalization of NFDH. The width of the strip and the items are normalized so that the strip is of unit width. UD orders the items in non-increasing width and then divides the items into five groups, each with width in the range (1/2, 1], (1/3,1/2], (1/4,1/3], (1/5,1/4], (0,1/5]. The strip is also divided into five regions R1, ··· , R5. Basically, some items of width in the range (1/i+1, 1/i], for 1 <= i <= 4, are packed to region Ri by BL. Since BL leaves a space of increasing width from top to bottom at the right side of the strip, UD takes this advantage by first packing the item to Rj for j = 1, ··· , 4 (in order) from top to bottom. If there is no such space, the item is packed to Ri by BL. Finally, items of size at most 1/5 are packed to the spaces in R1, ··· , R4 by the (generalized) NFDH algorithm. Again if there is no space in these regions, the item is packed to R5 using NFDH.
    Approximation ratio: UD(I) <= (5/4) · OPT(I)+(53/8)H, where H is the maximum height of the items; the asymptotic bound of 5/4 is tight.

  6. Reverse-fit (RF) algorithm
    RF also normalizes the width of the strip and the items so that the strip is of unit width. RF first stacks all items of width greater than 1/2. Remaining items are sorted in non-increasing height and will be packed above the height H0 reached by those greater than 1/2. Then RF repeats the following process. Roughly speaking, RF packs items from left to right with their bottom along the line of height H0 until there is no more room. Then packs items from right to left and from top to bottom (called reverse-level) until the total width is at least 1/2. Then the reverse-level is dropped down until (at least) one of them touches some item below. The drop down is somehow repeated.
    Approximation ratio: RF(I) <= 2·OPT(I).

  7. Steinberg's algorithm
    Steinberg's algorithm, denoted as M in the paper, estimates an upper bound of the height H required to pack all the items such that it is proved that the input items can be packed into a rectangle of width W and height H. They then define seven procedures (with seven conditions), each to divide a problem into two smaller ones and solve them recursively. It has been showed that any tractable problem satisfies one of the seven conditions.
    Approximation ratio: M(I) <= 2·OPT(I).

  8. Split-Fit algorithm (SF) SF divides items into two groups, L1 with width greater than 1/2 and L2 at most 1/2. All items of L1 are first packed by FFDH. Then they are arranged so that all items with width more than 2/3 are below those with width at most 2/3. This creates a rectangle R of space with width 1/3. Remaining items in L2 are then packed to R and the space above those packed with L1 using FFDH. The levels created in R are considered to be below those created above the packing of L1.
    Approximation ratio: SF(I) <= (3/2) ·OPT(I) + 2; the asymptotic bound of 3/2 is tight.

  9. Sleator's algorithm
    Sleater's algorithm consists of four steps:

    1. All items of width greater than 1/2 are packed on top of one another in the bottom of the strip. Suppose h0 is the height of the resulting packing All subsequent packing will occur above h0.

    2. Remaining items are ordered by non-increasing height. A level of items are packed (in non-increasing height order) from left to right along the line of height h0.

    3. A vertical line is then drawn in the middle to cut the strip into two equal halves (note this line may cut an item that is packed partially in the right half). Draw two horizontal line segments of length one half, one across the left half (called the left baseline) and one across the right half (called the right baseline) as low as possible such that the two lines do not cross any item.

    4. Choose the left or right baseline which is of a lower height and pack a level of items into the corresponding half of the strip until the next item is too wide.

    A new baseline is formed and Step (4) is repeated on the lower baseline until all items are packed.
    Time complexity: O(n ·log n).
    The approximation ratio of Sleator's algorithm is 2.5 which is tight.

[Question] Two Dimensional Knapsack Problem

Question

link

给定n种物品和一背包。物品i的重量是wi,体积是bi,其价值为vi,背包的容量为C,容积为D。问应如何选择装入背包中的物品,使得装入背包中物品的总价值最大?

在选择装入背包的物品时,对每种物品i只有两种选择,即装入背包或者不装入背包。不能将物品i装入背包多次,也不能只装入部分的物品i。试设计一个解此问题的动态规划算法,并分析算法的计算复杂性。

Analysis

This is a extended question from [Question] 0-1 Knapsack Problem.

Same solution, just use 3D-array for DP.

Solution

First of all, define a 2D array, Knapsack(n,W) denotes getting ‘n'th item, with weight 'W’. When n == 0 or W = 0, dp value is 0.

int[][][] dp = new int[n + 1][W + 1][B + 1];

Now if item ‘n’ is able to fit in:

dp[i][j][k] = max(dp[i-1][j][k] , dp[i-1][j-w[i]][k-b[i]] + v[i]);

If not able to fit in:

dp[i][j][k] = dp[i-1][j][k];

Code

Code from 绝对快乐一生:

int main()
{
   int i,j,k;
   int n,c,d;
   int w[MAX] = {0};   //重量
   int b[MAX] = {0};   //体积
   int v[MAX] = {0};   //价值
   cout<<"请输入物品个数:";
   cin>>n;
   cout<<"请输入背包的容量及容积:";
   cin>>c>>d;
   cout<<"请依次输入各个物品的重量,体积,价值:(共"<<n<<"个)"<<endl;
   for(i =1;i<n+1;i++)
   {
       cin>>w[i]>>b[i]>>v[i];
   }

   int dp[50][50][50]={0}; 
   //dp[i][j][k] i代表着第1到第i个物品,j代表的是重量,k代表的是容积,dp为最优价值

   for(i=1;i<n+1;i++)
       for(j =1;j <=c;j++)
           for(k = 1 ;k <= d ; k++)
           {
               if(w[i]<=j&&b[i]<=k)  
               //当前物品重量小于当前容量,且体积小于容积时 ,才可以考虑装入物品的问题
                   dp[i][j][k] = max(dp[i-1][j][k] , dp[i-1][j-w[i]][k-b[i]] + v[i]);
               else dp[i][j][k] = dp[i-1][j][k];
           }
   cout<<"背包能放物品的最大价值为:"<<dp[n][c][d]<<endl;
  int x[MAX] ={0};   //记录是否被选中
  for(i =n;i>1;i--)
       if(dp[i][c][d]==dp[i-1][c][d])x[i] =0;
      else {x[i]=1;c -= w[i];d -= b[i];}
   x[1]=(dp[1][c][d])?1:0;
   cout<<"被选入背包的物品的编号,质量和体积,价值分别是:"<<endl;
   for(i=1;i<</span>n+1;i++)
       if(x[i]==1)
           cout<<"第"<<i<<"个物品: "<<w[i]<<"  "<<b[i]<<"  "<<v[i]<<endl;

}

[Fundamental] Implement Trie and Suffix Tree

Trie Node

public class TrieNode {
    boolean isLeaf;
    TrieNode[] child;

    public TrieNode(boolean isLeaf) {
        this.isLeaf = isLeaf;
        this.child = new TrieNode[26];
    }

    public void insert(String str) {
        if (str == null || str.length() == 0) {
            this.isLeaf = true;
            return;
        }
        char cur = str.charAt(0);
        if (child[cur - 'a'] == null) {
            child[cur - 'a'] = new TrieNode(str.length() == 1);
        }
        child[cur - 'a'].insert(str.substring(1));
    }

    public boolean trieSearch(String str) {
        // have to consider leaf node
        if (str == null || str.length() == 0) {
            return isLeaf;
        }
        char cur = str.charAt(0);
        if (child[cur - 'a'] == null) {
            return false;
        }
        return child[cur - 'a'].trieSearch(str.substring(1));
    }

    public boolean suffixTreeSearch(String str) {
        // suffixTreeSearch don't consider leaf node
        // cuz we search for prefix of suffixes
        if (str == null || str.length() == 0) {
            return true;
        }
        char cur = str.charAt(0);
        if (child[cur - 'a'] == null) {
            return false;
        }
        return child[cur - 'a'].suffixTreeSearch(str.substring(1));
    }
}

Trie

public class Trie {
    TrieNode root;

    public Trie(String[] input) {
        root = new TrieNode(false);

        for (String str : input) {
            root.insert(str);
        }
    }

    public boolean search(String query) {
        return root.trieSearch(query);
    }
}

Suffix Tree

Suffix tree might also consider the List of indexes thing, which I do not take into consideration in my code.

public class SuffixTree {
    TrieNode root;

    public SuffixTree(String input) {
        root = new TrieNode(false);

        for (int i = 0; i < input.length(); i++) {
            root.insert(input.substring(i));
        }
    }

    public boolean search(String query) {
        return root.suffixTreeSearch(query);
    }
}

[Fundamental] Suffix Tree

ref

Suffix tree

Both KMP Algorithm and Rabin Karp Algorithm pre-process the pattern to make the pattern searching faster. The best time complexity that we could get by preprocessing pattern is O(n), where n is length of the text.

Now we will discuss an approach that pre-processes the text. A suffix tree is built of the text. After preprocessing text (building suffix tree of text), we can search any pattern in O(m) time where m is length of the pattern.

Though search is very fast - just proportional to length of the pattern, it may become costly if the text changes frequently. It is good for fixed text or less frequently changing text though.

Suffix Tree VS. Trie

A Suffix Tree is a compressed trie of all suffixes of the given text.

Compressed Trie

A Compressed Trie is obtained from standard trie by joining chains of single nodes.

Example, a standard trie:

A Compressed Trie:

build a Suffix Tree

  1. Generate all suffixes of given text.
  2. Consider all suffixes as individual words and build a compressed trie.

Eg.

banana\0
anana\0
nana\0
ana\0
na\0
a\0
\0

Example question: [CC150v4] 20.8 Full Text Search (Suffix Tree)

[Fundamental] Suffix Array

ref

Suffix Array

A suffix array is a sorted array of all suffixes of a given string.

Any suffix tree based algorithm can be replaced with an algorithm that uses a suffix array enhanced with additional information and solves the same problem in the same time complexity.

Naive algorithm for construction of suffix array is to consider all suffixes, sort them using a O(nLogn) sorting algorithm and while sorting, maintain original indexes. Time complexity is _O(n2 * Logn)__.

There is an advanced nLogn Algorithm algorithm available to read here. Basic idea is to “Sort according to first two characters” and then “according to first four character”.

Example question: [Facebook] Query Search (HashMap, suffix array).

[Fundamental] Prefix Tree

ref

Prefix tree

  1. Building a well balanced BST needs time proportional to M * log N, where M is maximum string length and N is number of keys in tree.
  2. Using trie, search in O(M) time.
  3. The penalty is on trie storage requirements.

Implementation

  1. Every node of trie consists of multiple branches.
  2. Each branch represents a possible character of keys.
  3. Mark the last node of every key as leaf node.

    struct trie_node { int value; / Used to mark leaf nodes / trie_node_t *children[ALPHABET_SIZE]; };

Note that a non-terminating node can also be marked as leaf. Eg.

                   root
                /   \    \
                t  'a'    b
                |   |     |
                h   n    'y'
                |   |  \    \
               'e'  s  'y'  'e'
             /  |   |
            i   r   w
           /    |   |
         'r'   'e'  e
                    |
                   'r'

The leaf nodes are quoted in ‘’.

Insert and search costs O(key_length), however the memory requirements of trie is O(ALPHABET_SIZE * key_length * N) where N is number of keys in trie.

Compressed trie, ternary search tree, etc. can be used to minimize memory requirements of trie.

[Fundamental] Pattern Searching Algorithms

ref

Overview

strStr() is a classic question in CS. There are various ways to solve (which we have discussed before), this is a summarization:

  1. naive - O(m * n)

  2. KMP - O(n) in worst case

  3. Rabin-Karp, rolling hash - between O(m * n) and O(m + n)

    Matches the hash value of the pattern with the hash value of pattern. If the hash values match, then only it starts matching individual characters.

  4. Modified naive algo, only work if pattern contains no duplicate characters.

    Only match the first char. This case is quite boring, can skip.

  5. Suffix tree

    Will discuss in details.

[Google] Generate Request ID

Question

link

Design a system to return an unique ID for each request.

(For most of requests, the ID value should increase as time goes, the system should handle 1000 requests per second at least. )

Note: Timestamps alone is not valid since there might be multiple requests with same timestamps.

Idea 1

Generating UUID

A universally unique identifier (UUID) is an identifier standard used in software construction. A UUID is simply a 128-bit value.

Generating unique IDs is easy. All modern OS'es have that functionality built in.

Detail: UUIDs may be generated from a combination of system time stamp, CPU/system ID, NIC MAC addresses, HBA WWNs, server id etc.

Clarifications

Let’s assume the following:

  1. What’s the length of the ID?

    <= 128 bits. In that case I’ll choose the standard 128 bit UUID format.

  2. The requirement states the system should handle 1000 request/s at least.

  3. What’s the average request rate?

    1000 < avg req. rate < 10,000

  4. What’s the max. burst rate the system must handle?

    < 1,000,000

  5. What’s the max. latency (wait time) for a request?

    1 ms

Solution

It’s a classical consumer-producer problem.

In this case we have many consumers of UUIDs and one producer. Let’s assume the OS provides an API to generate UUIDs and the max. running time of the API is 1ms.

  1. Pre-allocate 2 Mio UUIDs into an array / stack/ heap (depending on implementation) structure.

  2. If no overhead 2 Mio UUIDs would take ~ 32MB of RAM. Not a problem on today’s server system with many GB of RAM, but may be a problem on an embedded system.

  3. 2 Mio UUIDs ensures system can handle burst rate.

  4. The solution needs to ensure that its pool of UUIDs doesn’t run out as consumers request them and they are replenished.

Idea 2

This question is quite open-ended.

For starter, how about appending random(N) to the timestamp? N can be large enough to make collisions unlikely. If each server has an ID we can also include it to further reduce collisions.

If IDs must be unique, then I suppose you can design a counter that will return an ID and increment it at the same time. You will then need mutex to access this counter.

[Google] Make a Java Method Thread-safe

Question

link

Consider the following class:

class MyCounter {
    private static int counter = 0;

    public static int getCount() {
        return counter++;
    }
}

Is the method thread-safe? How to make it thread-safe?

Solution

No, it’s not.

The method is not thread-safe, because the counter++ operation is not atomic, which means it consists more than one atomic operations. In this case, one is accessing value and the other is increasing the value by one.

When Thread 1 accesses the method at t1, Thread 2 may not be done with the method. So the value returned to Thread 1 is the value that has not been increased.

Approach 1

Adding synchronized to this method. This will synchronize the instance of the static class.

class MyCounter {
    private static int counter = 0;

    public static synchronized int getCount() {
        return counter++;
    }
}

Approach 2

We actually can make count++ atomic by using AtomicInteger from the package “java.util.concurrent.atomic”.

import java.util.concurrent.atomic.AtomicInteger;

public class MyCounter {
    private static AtomicInteger counter = new AtomicInteger(0);

    public static int getCount() {
        return counter.getAndIncrement();
    }
}

Follow-up on thread stack

  1. Each thread has its own stack (never share stack).
  2. All local variables defined in a method will be allocated memory in stack.
  3. When execution completed by a thread, stack frame will be removed.