Woodstock Blog

a tech blog for general algorithmic interview questions

[Design] Distributed Network Bottleneck

Question

link

一个distributed的环境,有很多机器,现在你发现性能有问题,可能是网络带宽造成的,你怎么解决? (你不能更换网络设备的前提下)

Answer

  1. Identify problem

首先得判定是否真的是网络造成的,就算是网络问题,哪些机器之间的网络问题? 这个得先大概了解high level component dependency relationship,

看看是不是cpu memory disk都没有问题。 可以profile几个机器看看是不是 a lot of time spent waiting for network calls.

  1. Locate the faulty component

判定是网络问题之后看是哪些components之间,或是某个component里面有很多网络通讯。

  1. Improvement

不能更换设备的话,能不能改network topology来让critical path machine之间的带宽有改善。

要是不能改topology就只能改程序了。还是先identify top offender,然后就只能慢慢改了

  1. What’s more

要还有时间的话就可以聊聊问啥不能换设备,是资金问题还是用的已经是top of the line了?

或者是在public cloud上?

Answer suggested by user remus.

[Google] Postorder Successor in Binary Tree

Question

link

Write an algorithm to find the ‘next’ post-order successor of a given node in a binary tree and binary search tree.

  1. where each node has a link to its parent.
  2. without parent pointer

Implement 2 versions of the algorithm: 1.) binary tree 2.) BST

Solution

In postorder, any node below current node shall be ‘predecessor’ of current node. Thus we only care about current node’s parent.

Suggested by a user:

1) parent pointer is given. 
- go to the parent of the current node. 
- if current node is the right child of its parent => print parent. 
- else return leftmost node of the right sub-tree of parent. 

2) parent pointer is not given. 
- traverse the tree and find the parent of the current node 
- do the same as method (1). 

[Design] Networks and TCP/IP

Overview

Internet protocol suite is a networking model and communications protocols used by the Internet. It is commonly known as TCP/IP.

Four abstraction layers:

  1. Application layer
  2. Transport layer
  3. Internet layer
  4. Link layer

The protocols are maintained by the Internet Engineering Task Force (IETF).

A summary of four-layer model

  1. Application: authentication, compression, and end user services
  2. Transport: handles the flow of data between systems and provides access to the network for applications via the BSD socket library
  3. Network: packet routing
  4. Link: Kernel OS/device driver interface to the network interface on the computer.

source

More details

The Application layer:

  1. where applications create user data and communicate with other applications on another host (make use of Transport Layer to provide reliable or unreliable pipes to other processes).
  2. different application architecture, such as the client-server model and peer-to-peer.
  3. SMTP, FTP, SSH, HTTP all operate above this layer.
  4. processes are addressed via ports.

Well-known official port numbers are in the range of 0 - 255. Port 256 - 1023 are reserved for other well-known services. 1024 - 65535 are neither in-use nor reserved. They are used when TCP/IP automatically assigns port numbers to client programs.

The Transport Layer:

  1. performs host-to-host communications.
  2. “UDP” provides unreliable datagram service.
  3. Transmission Control Protocol” provides flow-control, connection establishment, and reliable transmission of data.

The Internet layer:

  1. exchange datagrams across network boundaries.
  2. establishes internetworking thus establishes the Internet.
  3. Internet Protocol”, defines IP addresses and the routing structures used for the TCP/IP protocol suite.
  4. routing: transport datagrams to the next IP router that has the connectivity to a network closer to the final data destination.

The Link layer:

  1. defines the networking methods within local network link without intervening routers.
  2. describe local network topology and the interfaces needed to effect transmission of Internet layer datagrams to next-neighbor hosts.

Difference from OSI

The Internet protocol suite was in use before the OSI model, and is well established worldwide.

The OSI model, however, is a proven concept that is used in all other data communications protocols. It will continue to be used as a guideline for all other communications applications.

[Design] HTTP Headers

Overview

HTTP is an Application Layer protocol which stands for “Hypertext Transfer Protocol”. The entire World Wide Web uses it.

There’re a series of sessions in HTTP where client sends a request and server sends a reply message back to client including the request, an error message, or another piece of information.

HTTP headers are the core part of these HTTP requests and responses, and they carry information about the client browser, the requested page, the server and more.

Example HTTP request

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1
Host: net.tutsplus.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120
Pragma: no-cache
Cache-Control: no-cache

Example HTTP response

HTTP/1.x 200 OK
Transfer-Encoding: chunked
Date: Sat, 28 Nov 2009 04:36:25 GMT
Server: LiteSpeed
Connection: close
X-Powered-By: W3 Total Cache/0.8
Pragma: public
Expires: Sat, 28 Nov 2009 05:36:25 GMT
Etag: "pub1259380237;gz"
Cache-Control: max-age=3600, public
Content-Type: text/html; charset=UTF-8
Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT
X-Pingback: http://net.tutsplus.com/xmlrpc.php
Content-Encoding: gzip
Vary: Accept-Encoding, Cookie, User-Agent

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Top 20+ MySQL Best Practices - Nettuts+</title>
<!-- ... rest of the html ... -->

These HTTP requests are also sent and received for other things, such as images, CSS files, JavaScript files etc. That is why browser may send 40 or more HTTP requests for 1 article page.

Request Methods

The three most commonly used request methods are: GET, POST and HEAD.

GET: Retrieve a Document

Get an article site:

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1

Submit a form information:

GET /foo.php?first_name=John&last_name=Doe&action=Submit HTTP/1.1

Each input is added in the query string.

POST: Send Data to the Server

Even though you can send data to the server using GET and the query string, in many cases POST will be preferable. Sending large amounts of data using GET is not practical and has limitations.

POST /foo.php HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost/test.php
Content-Type: application/x-www-form-urlencoded
Content-Length: 43

first_name=John&last_name=Doe&action=Submit

The header contains no query string no more. The last line (data) is the query string.

HEAD: Retrieve Header Information

HEAD is identical to GET, except the server does not return the content in the HTTP response. It’s faster than GET.

[Google] Connect Graph Nodes and Avoid Intersect

Question

link

Draw a graph as a graph. Assume there is graphics library to draw lines and all. Just tell how will you order the vertices such that the edges don’t intersect and they seem ordered.

Solution

There’s no clear solution. Someone suggest the following:

  1. printing topological sort result (refer to post ‘Topology Sort’)
  2. edges should not intersect, finding if a graph is planar

There’s various Planarity Testing Algorithms.

[Question] Print Numbers Containing 5

Question

link

写一个function,对于参数n,输出从0到n之间所有含5的数字。eg. func(30) 应该输出5,15,25

this is Groupon interview question

Solution

Suggested by level 10 of this forum. The key is to consider numbers as string, not numbers. Why? The key of the question is to find “5” in a number. We do not really care about values. We care about characters instead!

用递归,基本思路是,

考虑第i位的数字,取得值可以是0-9;唯一要注意的是最后一位数字,如果5没有出现过,只能取5.

In other words, iterate thru the numbers one by one. Discard numbers that are larger than n. Discard numbers that don’t have “5”. We are left will all the valid output.

It’s surprising to me that this is a String + DFS question.

Code

written by me

public void mySolution(int num) {
    if (num >= 5) {
        String str = String.valueOf(num);
        helper(num, "", 0, str.length(), false);
    }
}

private void helper(int max, String prefix, int pos, int len, boolean have5) {
    if (pos == len) {
        int cur = Integer.parseInt(prefix);
        if (cur <= max) {
            System.out.print(cur + ", ");
        }
        return;
    }
    for (int i = 0; i < 10; i++) {
        if (pos == len - 1 && !have5 && i != 5) {
            continue;
        }
        helper(max, prefix + i, pos + 1, len, have5 || i == 5);
    }
}

[Google] Write a Random Number Generator

Question

link

Solution

Basically, the formula is as follows:

number = (previous_number * constant + other_constant) mod third_constant

The three constants are carefully selected, and a typical choice is:

number = (previous_number * 214013 + 2531011) mod 215

[Java OOP] Upcasting, Downcasting and Object Slicing

Upcasting and Downcasting

Java permits an object of a subclass type to be treated as an object of any superclass type. This is called upcasting. Upcasting is done automatically, while downcasting must be manually done.

Example:

Cat c1 = new Cat();
Animal a = c1; //automatic upcasting to Animal
Cat c2 = (Cat) a; //manual downcasting back to a Cat

Upcasting is always typesafe but can cause the so called “slicing” problem.

Object Slicing

Object slicing is defined as the conversion of an object into something with less information (typically a superclass)

More detail

If you upcast an object, it will lose all it’s properties, which were inherited. For example, if you cast a Cat to an Animal, it will lose properties inherited from Mammal and Cat. Note, that data will not be lost, you just can’t use it, until you downcast the object to the right level.

Example

class Base {
    void doSomething() {
        System.out.println("Base");
    }
}

class Derived extends Base {
    void doSomething() {
        System.out.println("Derived");
    }
}

public class JavaUpcasting {
    public static void main(String[] args) {
        Base instance = new Derived();
        instance.doSomething();
    }
}

This will output “Derived”.

[Google] Replace Question Mark With Number

Question

link

Given a string (for example: “a?bc?def?g”), write a program to generate all the possible strings by replacing ? with 0 and 1.

Input : a?b?c?

Output: a0b0c0, a0b0c1, a0b1c0, a0b1c1, a1b0c0, a1b0c1, a1b1c0, a1b1c1.

Solution

DFS search, but do not forget to set “letters[pos] = ‘?’;” at the end. I made this error once.

Code

public List<String> solution(String str) {
    List<String> result = new ArrayList<String>();
    helper(result, str.toCharArray(), 0);
    return result;
}

private void helper(List<String> result, char[] letters, int pos) {
    if (pos == letters.length) {
        result.add(String.valueOf(letters));
        return;
    } else if (letters[pos] != '?') {
        helper(result, letters, pos + 1);
        return;
    }
    for (char i = '0'; i <= '1'; i++) {
        // put char i in letters[] to replace the '?'
        letters[pos] = i;
        helper(result, letters, pos + 1);
        letters[pos] = '?';
    }
}

[Design] MapReduce Intro

An Overview

MapReduce is a software framework, or a distributed programming model intended for processing massive amounts of data in large clusters.

  1. Map, a function that parcels out work to different nodes in the distributed cluster.

  2. Reduce, another function that collates the work and resolves the results into a single value.

More Info

MapReduce isn’t intended to replace relational databases: it’s intended to provide a lightweight way of programming things so that they can run fast by running in parallel on a lot of machines. Google uses MapReduce for indexing every Web page they crawl.

MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to other nodes.

From a Senior Software Engineer at Google:

The key to how MapReduce works is to take input as, conceptually, a list of records. The records are split among the different computers in the cluster by Map. The result of the Map computation is a list of key/value pairs. Reduce then takes each set of values that has the same key and combines them into a single value. So Map takes a set of data chunks and produces key/value pairs and Reduce merges things, so that instead of a set of key/value pair sets, you get one result. You can’t tell whether the job was split into 100 pieces or 2 pieces…

Why MapReduce

MapReduce is important because it allows ordinary developers to use MapReduce library routines to create parallel programs without having to worry about programming for intra-cluster communication, task monitoring or failure handling.