Woodstock Blog

a tech blog for general algorithmic interview questions

[Question] Push and Pop Sequences of Stacks

Question

link

Given two integer sequences, one of which is the push sequence of a stack, please check whether the other sequence is a corresponding pop sequence or not.

For example, if 1, 2, 3, 4, 5 is a push sequence, 4, 5, 3, 2, 1 is a corresponding pop sequence, but the sequence 4, 3, 5, 1, 2 is not.

Solution

Refer to this answer:

Construct the stack:

For each number X in the pop order:

If this number is not the same as the top of the stack (or the stack is empty), 
push numbers from the push order until you pushed X. 

If you pushed all numbers and didn't find X, there's no way to get the pop order.

Pop X

Code

public boolean validSequence(int[] input, int[] sequenc) {
    // keep a pointer p in the input[] array
    int len = input.length;
    int p = 0;
    Stack<Integer> stack = new Stack<Integer>();

    int i = 0;
    while (i < len) {
        if (stack.isEmpty()) {
            // just push an element to stack
            stack.push(input[p++]);
        } else {
            // stack got elements, then check top one
            if (stack.peek() == sequenc[i]) {
                // seq found, proceed to next number in seq
                stack.pop();
                i++;
            } else {
                // did not find seq, keep pushing to stack until done
                if (p == len) {
                    return false;
                } else {
                    stack.push(input[p++]);
                }
            }
        }
    }
    return i == len; // or just return true;
}

[Design] Distributed Caching - Memcached

What is Memcached?

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

  1. Free & open source
  2. high-performance, distributed memory object caching system
  3. generic in nature
  4. intended for use in speeding up dynamic web applications by alleviating database load

Definition from wiki:

Memcached is a general-purpose distributed memory caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read.

Who uses Memcached?

  1. Facebook
  2. YouTube
  3. Twitter
  4. Amazon
  5. Reddit
  6. Yahoo
  7. Zynga

Why Memcached?

Run memcached on one or more hosts and then use the shared cache to store objects. Because each host’s RAM is storing information, the access speed will be much faster than having to load the information from disk. This can provide a significant performance boost in retrieving data versus loading the data natively from a database.

Also, because the cache is just a repository for information, you can use the cache to store any data, including complex structures that would normally require a significant amount of effort to create, but in a ready-to-use format, helping to reduce the load on your MySQL servers.

FAQ

What is Memcached?

It is component which stored the data temporary for 1 Hour/ 6 Hour/1 Day etc. When we integrate the Memcached with our application, performance of application increased.

Memcached is open source, high-performance distributed memory object used for caching so that execution can be enhanced at nth level.

Where Memcached can be used?

• Social Networking -> Profile Caching

• Content Aggregation -> HTML/ Page Caching

• Ad targeting -> Cookie/profile tracking

• Relationship -> Session caching

• E-commerce -> Session and HTML caching

• Location-based services -> Data-base query scaling

• Gaming and entertainment -> Session caching

Why use Memcached?

• Speed up application processes

• It determines what to store and what not to

• Reduce the number of retrieval requests to the database

• Cuts down the I/O ( Input/Output) access (hard disk)

In what condition does retrieving cache fail?

• Memory allocated for the cache is exhausted

• Item from cache is deleted

• Individual item in the cache is expired

What is the drawback of Memcached?

• It is not a persistent data store

• Not a database

• It is not an application specific

• It cannot cache large object

Give more details about memcached failures

Memcached servers are indeed independent of each other. Memcached server is just an efficient key-value store implemented as in-memory hash table.

What makes memcached distributed is the client, which in most implementations can connect to a pool of servers. Typical implementations use consistent hashing, which means that when you add or remove server to/from a pool of N servers, you only have to remap 1/N keys.

Typically keys are not duplicated on various hosts, as memcached is not meant to be persistent store and gives no guarantees that your value will persist (for example when running out of assigned memory, memcached server drops least recently used (LRU) elements). Thus it’s assumed that your application should handle missing keys.

[Design] Design Google Suggest (Autocomplete)

Overview

Google Suggest was launched in 2004 as a 20% time project from Google engineer Kevin Gibbs. He developed the idea on a Google shuttle bus.

Design

  1. Use trie.
  2. Use cache (distributed cache)

Trie

Just make use of all keywords (including space) and build a trie out of it. Then you are good to go just search in Trie.

A comparison of speed between DB, Set and Trie can be found here. A lot of implementation code can be found here.

Memcached

Distributed caching + LRU replacement algorithm. Refer to [Design] Distributed Caching - memcached for more.

[Google] Top N Values From Sum of 2 Arrays

Question

link

给定两个数组A,B,长度均为n,求A[0]+B[0],…,A[0]+B[n-1],…,A[n-1]+B[0],…,A[n-1]+B[n]总共n2个数的最大的n个值。

Solution

Use a Heap and then iteratively pop 1 and push 2 elements. Until n values has been filled.

Code

public int[] topN(int[] arr1, int[] arr2, int n) {
    int[] ans = new int[n];
    int index = n - 1;

    PriorityQueue<Pair> heap = new PriorityQueue<Pair>(n,
            new SpecialComparator(arr1, arr2));
    Arrays.sort(arr1);
    Arrays.sort(arr2);

    Pair maxPair = new Pair(n - 1, n - 1);
    heap.add(maxPair);

    for (int i = 0; i < n; i++) {
        Pair next = heap.poll();
        ans[index--] = arr1[next.x] + arr2[next.y];
        if (next.y - 1 >= 0) {
            heap.add(new Pair(next.x, next.y - 1));
        }
        if (next.x - 1 >= 0) {
            heap.add(new Pair(next.x - 1, next.y));
        }
    }
    return ans;
}

class SpecialComparator implements Comparator<Pair> {

    int[] arr1, arr2;

    public SpecialComparator(int[] a1, int[] a2) {
        arr1 = a1;
        arr2 = a2;
    }

    @Override
    public int compare(Pair p1, Pair p2) {
        // note that larger value shall go up top of the heap, so -1 * ...
        return -1 * (arr1[p1.x] + arr2[p1.y] - arr1[p2.x] - arr2[p2.y]);
    }
}

[Java OOP] Java Vector and ArrayList

Difference

  1. Vectors are synchronized, ArrayLists are not.

  2. Data Growth Methods

    A Vector defaults to doubling the size of its array, while the ArrayList increases its array size by 50 percent.

Similarities

  1. Both Vector and ArrayList use growable array data structure.

  2. They both are ordered collection classes as they maintain the elements insertion order.

  3. Vector & ArrayList both allows duplicate and null values.

  4. They both grows and shrinks automatically when overflow and deletion happens.

source

History

The vector was not the part of collection framework - it has been included in collections later. It can be considered as Legacy code.

There is nothing about Vector which List collection cannot do. Therefore Vector should be avoided. If there is a need of thread-safe operation, make ArrayList synchronized.

[Java OOP] Interface and Abstract Classes

Definition

source

A abstract class is declared when it has one or more abstract methods.

An interface differs from an abstract class because an interface is not a class. An interface is essentially a type that can be satisfied by any class that implements the interface.

The short answer

  1. Abstract class is REAL parent.

  2. Abstr can have data members and non-abstract methods while interface only got constant variable and abstract methods (read below)

  3. class can implement multiple interface but only extend from 1 abstract class

does interface got ‘abstract’ methods?

An interface is “purely” abstract class: every method is an abstract method. We do not use ‘abstract’ keyword. eg.

interface Bicycle {
    void changeCadence(int newValue);
    void changeGear(int newValue);
    void speedUp(int increment);
    void applyBrakes(int decrement);
}

According to the Java Language Specification, the abstract keyword for interfaces is obsolete and should no longer be used. (Section 9.1.1.1)

The long answer

Abstract classes: strong relationship

Abstract classes are meant to be inherited from, and often indicates a strong relationship.

For instance, if we have an abstract base class called “Canine”, any deriving class should be an animal that belongs to the Canine family (like a Dog or a Wolf).

With an interface on the other hand, the relationship is not necessarily strong.

For example, if we have a class called “House”, that class could also implement an interface called “AirConditioning”. Having air conditioning not an essential part of a House.

A TownHouse is a type of House, that relationship is very strong, and would be more appropriately defined through inheritance instead of interfaces.

What’s more

  1. Multiple inheritance

    Java class can inherit from only one abstract class, but can implement multiple interfaces.

  2. Abstract method

    Abstract class can have non-abstract methods with actual implementation details.

When to use which

Use abstract class when:

  1. You want to share code among several closely related classes.

  2. you want to be able to declare non-public members. In an interface, all methods must be public.

  3. you think you will need to add methods in the future.

    Because if you add new method headings to an interface, then all of the classes that already implement that interface will have to be changed to implement the new methods. That can be quite a hassle.

Use interface when:

  1. You expect that unrelated classes would implement your interface. For example, the interfaces Comparable and Cloneable are implemented by many unrelated classes.

  2. you think that the API will not change for a while.

  3. you want to have something similar to multiple inheritance.

[Design] Difference Between HTTP and HTTPS

ref

1: What are benefits of using HTTPS over HTTP?

HTTPS means that you tunnel the HTTP protocol over TLS/SSL which encrypts the HTTP payload.

So the benefit is that HTTP requests and responses are transmitted securely over the wire, e.g. your Internet Service Provider does not know what you’re doing.

When Google switched Gmail to use HTTPS, no additional resources were required; no network hardware, no new hosts. It only increased CPU load by about 1%.

2: How to use HTTPS?

Enable it at your endpoint, in general a web server in front of your application server. Most web servers (e.g. IIS, Apache) support this by configuration. Depending on your confidentiality requirements this may not be enough.

3: Can we use HTTPS for only login purpose and then onwords HTTP?

Technically this is possible, but it introduces some security risks. Example: After a secured login you transmit session IDs identifying the user. If you transmit those session IDs unsecurely (no SSL), session hijacking becomes a risk (‘man-in-the-middle’)

4: What settings needs to be done for making website HTTPS?

See #2. In public internet scenarios you should request (buy) a certificate from a certain Certificate Authority (CA), so that end user clients can verify whether they should trust your certificate.

5: Is there any threat present in HTTPS?

In the protocol itself there is a slight risk of man-in-the-middle attacks. E.g. a proxy between the client and server could pretend to be the server itself (this requires a successful attack to network infrastructure, e.g. DNS). There are several other ‘more obscure’ risks that do not relate to the protocol itself, e.g.:

  1. usage of an outdated encryption key length (e.g. 256 bit)
  2. loss of private keys or unappropriate key management procedures (e.g. send via unencrypted email)
  3. certificate authority failure

6: Is processing time required for HTTPS is greater than HTTP?

Yes, key negotiation (handshaking) requires a lot CPU capacity.

Port Number

HTTP uses port 80 or 8080, while HTTPS uses TCP port 443.

The reason that some applications use 8080 (7080, 9080) instead of 80 is that on UNIX, port numbers below 1024 are reserved for super-user processes.

That’s why for OS compatibility reasons, some servers use other ports (greater than 1024). But they still have “80” inside the numner, eg. 7080, 8080, 9080.

[Palantir] Largest Basin Size in Matrix

Question

link

A group of farmers has some elevation data, and we’re going to help them understand how rainfall flows over their farmland.

We’ll represent the land as a two-dimensional array of altitudes and use the following model, based on the idea that water flows downhill:

If a cell’s neighboring cells all have higher altitudes, we call this cell a sink; water collects in sinks. Two cells are neighbors if the rows and columns each differ by at most one. Hence an interior cell will have eight neighbors, a cell on the edge will have five neighbors, and a cell in a corner will have three neighbors.

Otherwise, water will flow to the neighboring cell with the lowest altitude. If a cell is not a sink, you may assume it has a unique lowest neighbor and that this neighbor will be lower than the cell. Cells that drain into the same sink – directly or indirectly – are said to be part of the same basin.

Your challenge is to partition the map into basins. In particular, given a map of elevations, your code should partition the map into basins and output the sizes of the basins, in descending order.

Assume the elevation maps are square. Input will begin with a line with one integer, S, the height (and width) of the map. The next S lines will each contain a row of the map, each with S integers – the elevations of the S cells in the row. Some farmers have small land plots such as the examples below, while some have larger plots. However, in no case will the total size of the farmland exceed 1000x1000.

Note: The input uses unix line endings (\n). If you try to view the sample inputs on a windows machine with a program that does not convert line endings (like Notepad), you will see the input appear all on a single line.

Your code should output a space-separated list of the basin sizes, in descending order. (Trailing spaces are ignored.)

While correctness and performance are the most important parts of this problem, a human will be reading your solution, so please make an effort to submit clean, readable code. In particular, do not write code as if you were solving a problem for a competition.

A few examples are below.

Input:

       3
       1 5 2
       2 4 7
       3 6 9

Output:

            7 2

The basins, labeled with A’s and B’s, are:

       A A B
       A A B
       A A A

Input:

       1
       10

Output:

       1

There is only one basin in this case.

Input:

       5
       1 0 2 5 8
       2 3 4 7 9
       3 5 7 9 9
       1 2 5 5 3
       3 3 5 1 0

Output:

       10 8 7

The basins, labeled with A’s, B’s, and C’s, are:

       A A A A A
       A A A A A
       B B B C C
       B B C C C
       B B C C C

Input:

       4
       0 3 2 3
       3 2 1 4
       3 4 3 3
       5 5 2 0

Output:

       6 5 5

The basins, labeled with A’s, B’s, and C’s, are:

       A A B B
       A A B B
       A B C C
       A C C C

Solution

I did not find a ‘best’ solution online, but there’s a good enough explanation available here:

  1. First store index according to their heights.

  2. Then iterate from smallest height to largest height.

  3. If current index is not visited, make it Basin surface (where water can collect), and make all higher neighbours as Non-Basin surface.

  4. Repeat step 3 till all indexes are visited.

  5. DFS find each basin

Code

I post my solution written in 2013.

public class MeSolution2013 {

    public static void main(String args[]) {

        String inputFile;
        int testCaseNumber = 1;

        while (true) {
            inputFile = "input" + testCaseNumber + ".txt";
            BufferedReader in;
            try {
                URI uri = MeSolution2013.class.getResource(inputFile).toURI();
                in = new BufferedReader(new FileReader(new File(uri)));
            } catch (Exception e) {
                // there is not more test cases
                break;
            }

            Scanner sc = new Scanner(in);
            int length = sc.nextInt();
            int[][] elevation = new int[length][length];
            for (int i = 0; i < length; i++) {
                for (int j = 0; j < length; j++) {
                    elevation[i][j] = sc.nextInt();
                }
            }

            List<Pair> sinkList = new ArrayList<Pair>();

            // first find out all sink nodes
            for (int i = 0; i < length; i++) {
                for (int j = 0; j < length; j++) {
                    if (elevation[i][j] < lowestNeighbor(i, j, elevation)) {
                        // (i,j) is a sink point
                        sinkList.add(new Pair(i, j));
                    }
                }
            }

            // then for each sink node, return the count
            int[] basinSize = new int[sinkList.size()];
            for (int i = 0; i < sinkList.size(); i++) {
                basinSize[i] = count(sinkList.get(i).X, sinkList.get(i).Y,
                        elevation);
            }
            Arrays.sort(basinSize);

            for (int i = sinkList.size() - 1; i >= 0; i--) {
                System.out.print(basinSize[i]);
                if (i != 0)
                    System.out.print(" ");
            }
            System.out.println();
            testCaseNumber++;
        }
    }

    static int count(int x, int y, int[][] ele) {

        int num = 1;

        // if the neighbour is higher than current, add count to current count
        // if all neighbours are lower than current, return 1

        // diagonal neighbors
        if (withinBound(x - 1, y, ele) && rainFlowFrom(x, y, x - 1, y, ele)) { // upper
                                                                                // elment
            num += count(x - 1, y, ele);
        }
        if (withinBound(x + 1, y, ele) && rainFlowFrom(x, y, x + 1, y, ele)) { // lower
                                                                                // element
            num += count(x + 1, y, ele);
        }
        if (withinBound(x, y - 1, ele) && rainFlowFrom(x, y, x, y - 1, ele)) { // left
                                                                                // hand
                                                                                // side
            num += count(x, y - 1, ele);
        }
        if (withinBound(x, y + 1, ele) && rainFlowFrom(x, y, x, y + 1, ele)) { // right
                                                                                // hand
                                                                                // side
            num += count(x, y + 1, ele);
        }

        // diagonal neighbors
        if (withinBound(x - 1, y - 1, ele)
                && rainFlowFrom(x, y, x - 1, y - 1, ele)) { // upper elment
            num += count(x - 1, y - 1, ele);
        }
        if (withinBound(x + 1, y + 1, ele)
                && rainFlowFrom(x, y, x + 1, y + 1, ele)) { // lower element
            num += count(x + 1, y + 1, ele);
        }
        if (withinBound(x + 1, y - 1, ele)
                && rainFlowFrom(x, y, x + 1, y - 1, ele)) { // left hand side
            num += count(x + 1, y - 1, ele);
        }
        if (withinBound(x - 1, y + 1, ele)
                && rainFlowFrom(x, y, x - 1, y + 1, ele)) { // right hand side
            num += count(x - 1, y + 1, ele);
        }

        return num;
    }

    static Boolean withinBound(int x, int y, int[][] ele) {
        int bound = ele.length;
        return (x >= 0 && x < bound && y >= 0 && y < bound);
    }

    static Boolean rainFlowFrom(int a, int b, int c, int d, int[][] ele) {
        // rain in (c,d) flows into (a,b).
        if (ele[a][b] >= ele[c][d])
            // if (a,b) is higher than (c,d), impossible to flow this way
            return false;

        int lowest = lowestNeighbor(c, d, ele);

        return (ele[a][b] == lowest);
        // the question states "you may assume it has a unique lowest neighbor"
        // so if flow to (a,b), then that is the unique lowest height.
    }

    static int lowestNeighbor(int a, int b, int[][] ele) {
        int height = 9999999;

        // adjacent neighbor
        if (withinBound(a - 1, b, ele) && ele[a - 1][b] < height) {
            height = ele[a - 1][b];
        }
        if (withinBound(a + 1, b, ele) && ele[a + 1][b] < height) {
            height = ele[a + 1][b];
        }
        if (withinBound(a, b - 1, ele) && ele[a][b - 1] < height) {
            height = ele[a][b - 1];
        }
        if (withinBound(a, b + 1, ele) && ele[a][b + 1] < height) {
            height = ele[a][b + 1];
        }

        // diagonal neighbor
        if (withinBound(a - 1, b - 1, ele) && ele[a - 1][b - 1] < height) {
            height = ele[a - 1][b - 1];
        }
        if (withinBound(a + 1, b - 1, ele) && ele[a + 1][b - 1] < height) {
            height = ele[a + 1][b - 1];
        }
        if (withinBound(a - 1, b + 1, ele) && ele[a - 1][b + 1] < height) {
            height = ele[a - 1][b + 1];
        }
        if (withinBound(a + 1, b + 1, ele) && ele[a + 1][b + 1] < height) {
            height = ele[a + 1][b + 1];
        }
        return height;
    }
}

class Pair {

    public int X;
    public int Y;

    public Pair(int _x, int _y) {
        X = _x;
        Y = _y;
    }
}

[Design] Difference Between HTTP Protocol and TCP Protocol

ref

Short Version

TCP is a transport-layer protocol, and HTTP is an application-layer protocol that runs over TCP.

The layers

At the very bottom of the network stack is the physical layer. This is where electrical signals or light pulses or radio waves actually transmit information from place to place. The physical layer doesn’t really have protocols, but instead has standards for voltages, frequencies, and other physical properties. You can transmit information directly this way, but you need a lot of power or a dedicated line, and without higher layers you won’t be able to share bandwidth.

The next layer up is the link layer. This layer covers communication with devices that share a physical communications medium. Here, protocols like Ethernet, 802.11a/b/g/n, and Token Ring specify how to handle multiple concurrent accesses to the physical medium and how to direct traffic to one device instead of another. In a typical home network, this is how your computer talks to your home “router.”

The third layer is the network layer. In the majority of cases, this is dominated by Internet Protocol (IP). This is where the magic of the Internet happens, and you get to talk to a computer halfway around the world, without needing to know where it is. Routers handle directing your traffic from your local network to the network where the other computer lives, where its own link layer handles getting the packets to the right computer.

Now we are getting somewhere. We can talk to a computer somewhere around the world, but that computer is running lots of different programs. How should it know which one to deliver your message to? The transport layer takes care of this, usually with port numbers. The two most popular transport layer protocols are TCP and UDP. TCP does a lot of interesting things to smooth over the rough spots of network-layer packet-switched communication like reordering packets, retransmitting lost packets, etc. UDP is more unreliable, but has less overhead.

So we’ve connected your browser to the web server software on the other end, but how does the server know what page you want? How can you post a question or an answer? These are things that application-layer protocols handle. For web traffic, this is the HyperText Transfer Protocol (HTTP). There are thousands of application-layer protocols: SMTP, IMAP, and POP3 for email; XMPP, IRC, ICQ for chat; Telnet, SSH, RDP for remote administration; etc.

Ref: http://qr.ae/3pnJK

[Question] Check if Given Point Inside Polygon

Question

link

Given a polygon and a point ‘p’, find if ‘p’ lies inside the polygon or not. The points lying on the border are considered inside.

Solution

Prerequisite reading: [Question] Check if two line segments intersect.

Suggested by G4G, this is a simple idea to check:

  1. Draw a horizontal line to the right of each point and extend it to infinity

  2. Count the number of times the line intersects with polygon edges.

  3. A point is inside the polygon if either count of intersections is odd or point lies on an edge of polygon.

  4. Note the special case of point ‘g’.