Trivia

leetcode-253. Meeting Rooms II

2016-09-03

Given an array of meeting time intervals consisting of start and end times [[s1,e1],[s2,e2],…] (si < ei), find the minimum number of conference rooms required.

For example,
Given [[0, 30],[5, 10],[15, 20]],
return 2.
原题地址

跟上一道题差不多的思路，我的代码如下：

/**
 * Definition for an interval.
 * struct Interval {
 *     int start;
 *     int end;
 *     Interval() : start(0), end(0) {}
 *     Interval(int s, int e) : start(s), end(e) {}
 * };
 */
 struct comp{
    bool operator() (Interval itv1, Interval itv2) {
        return itv1.start < itv2.start;
    }
 };
class Solution {
public:
    int minMeetingRooms(vector<Interval>& intervals) {
        sort(intervals.begin(), intervals.end(), comp());
        //building a vector indicates the ending time of each room
        //if a new inteval's start time < all ending time, add a new ending time to the vector
        vector<int> rooms;
        for (Interval& itv : intervals) {
            bool addRoom = true;
            for (int i = 0; i < rooms.size(); i++) {
                if (itv.start >= rooms[i]) {
                    rooms[i] = itv.end;
                    addRoom = false;
                    break;
                }
            }
            if (addRoom) {
                rooms.push_back(itv.end);
            }
        }
        return rooms.size();
    }
};

在网上看到一个更好的思路，统计当前时间有多少会议开着。更加简洁明了。

/**
 * Definition for an interval.
 * struct Interval {
 *     int start;
 *     int end;
 *     Interval() : start(0), end(0) {}
 *     Interval(int s, int e) : start(s), end(e) {}
 * };
 */
class Solution {
public:
    int minMeetingRooms(vector<Interval>& intervals) {
        vector<pair<int, int>> changes;
        for (auto i : intervals) {
            changes.push_back({i.start, 1});
            changes.push_back({i.end, -1});
        };
        sort(begin(changes), end(changes));
        int rooms = 0, maxrooms = 0;
        for (auto change : changes)
            maxrooms = max(maxrooms, rooms += change.second);
        return maxrooms;
    }
};

leetcode-252. Meeting Rooms

2016-09-03

Given an array of meeting time intervals consisting of start and end times [[s1,e1],[s2,e2],…](si < ei), determine if a person could attend all meetings.

For example,
Given [[0, 30],[5, 10],[15, 20]],
return false.
原题地址

一道easy题，然而我zz一直没有反应过来。首先我们按照start time进行排序，要保证没有重叠的时间段，那么每一次的开始时间不能小于前一次的结束时间，即：一个会开完才能开下一个会。好吧，就这么简单。

/**
 * Definition for an interval.
 * struct Interval {
 *     int start;
 *     int end;
 *     Interval() : start(0), end(0) {}
 *     Interval(int s, int e) : start(s), end(e) {}
 * };
 */
class Solution {
    // Define a compare function, sort intervals according to start time
    static bool comp(const Interval& itv1, const Interval& itv2) {
        return itv1.start < itv2.start;
    }
public:
    bool canAttendMeetings(vector<Interval>& intervals) {
        sort(intervals.begin(), intervals.end(), comp);
        for (int i = 1; i < intervals.size(); i++) {
            if (intervals[i].start < intervals[i-1].end) {
                return false;
            }
        }
        return true;
    }
};

leetcode-338. Counting Bits

2016-09-02

Given a non negative integer number num. For every numbers i in the range 0 ≤ i ≤ num calculate the number of 1’s in their binary representation and return them as an array.

Example:
For num = 5 you should return [0,1,1,2,1,2].
原题地址

这是一道很简单的找规律题，这里想说下如何寻找规律。核心的思想是去掉一位，因为去除一位之后的数我们是求过的。去除的方法有很多，这里举几个例子：

去除最低位：右移即可，然后判断末位是否为1. f[i] = f[i >> 1] + i & 1;
去除最低位的1：f[i] = f[i & (i - 1)] + 1;
去除最高位的1：维护一个2^n的数，使2^n <= i < 2^(n + 1)，那么f[i] = f[i - 2^n] + 1;

总之需要记住要跟去掉一位之后的数进行比较。

根据页面响应获取动态页面信息

2016-08-31

之前写了一篇文章写了如何用splash结合scrapy爬取动态页面的信息，不可否认，splash的功能是很强大的，能够方便的获得加载js的页面，但是在测试的过程中我发现了splash的一些问题。

由于网络的不稳定导致页面的响应时间可能非常长，因此整体爬虫的时间会大幅延长，且不可控。并且因此导致更新时间间隔加长。
长时间运行splash对服务器压力较大，加上由响应时间过长造成的任务积压，服务器会变得卡顿和不稳定。

在经过对网站功能和服务器资源的分析后，我摒弃使用splash，转而对每个需要访问的页面通过浏览器工具分析其页面的响应。因为我们所需爬取的页面数量是有限的，但是我们需要高频率地去访问这些页面，因此要尽可能减少爬取过程中消耗的时间和资源，把工作做在前面。可以利用Chrome或者Firefox的Debug工具，在network选项里查找页面载入时发送了哪些请求，然后找到我们需要的内容查看Request URL. 这样，我们只需要把scrapy里爬取的网页地址换成这个URL，然后根据响应信息的格式(通常是json)进行处理。如此一来，整个爬取页面的过程会大大加快，而服务器也能稳定运行了。

mysql-python中转义字符的问题

2016-08-26

在做爬虫时遇到一个转义字符的问题。在”cursor.execute”时有一条没有跑过，原因是某主播标题中有一个单引号，解决的方法是给字符串进行转义。在网上看到有一种做法是：

1 2	sql = "INSERT INTO TABLE_A (COL_A,COL_B) VALUES (%s, %s)" cursor.execute(sql, (val1, val2))

但是我如此做并没有成功。这里先mark一下，以后找一下解决方法。

What happens when you type in a URL in browser?

2016-08-24

It is actually a question from stackoverflow. Since it is pretty useful, I copy the answer here for convenience.

In an extremely rough and simplified sketch, assuming the simplest possible HTTP request, no proxies, IPv4 and no problems in any step:

browser checks cache; if requested object is in cache and is fresh, skip to #9
browser asks OS for server’s IP address
OS makes a DNS lookup and replies the IP address to the browser
browser opens a TCP connection to server (this step is much more complex with HTTPS)
browser sends the HTTP request through TCP connection
browser receives HTTP response and may close the TCP connection, or reuse it for another request
browser checks if the response is a redirect or a conditional response (3xx result status codes), authorization request (401), error (4xx and 5xx), etc.; these are handled differently from normal responses (2xx)
if cacheable, response is stored in cache
browser decodes response (e.g. if it’s gzipped)
browser determines what to do with response (e.g. is it a HTML page, is it an image, is it a sound clip?)
browser renders response, or offers a download dialog for unrecognized types

scrapy-splash爬取动态网站

2016-08-21

最近网站的进展受到了极大的阻挠，国内几大直播网站（斗鱼，虎牙，战旗，熊猫，龙珠）都用scrapy爬取成功了，但是还有两家（全民，火猫）一直都爬不出来。一开始我是很奇怪的，明明可以用firebug查看到、并且能用Xpath Checker找到所需的信息，为什么scrapy一直失败呢。问题的突破点在于，全民tv的spider并不是完全爬不出来，而是能爬出来一些我不知道是什么的东西。然而今天我无意中查看全民的源码时，发现源码里是没有当前直播的信息的，而恰好有scrapy爬出来的东西。顺藤摸瓜，我才意识的问题的关键在于全民和火猫是动态网站，更令我惊奇的是之前爬取的都是静态网站。好吧，原谅我这么菜，这么久才知道。

知道问题的原因就比较好解决了，关于scrapy爬取动态网站的方法网上有许多教程，这里就大概的说一下。主要有两种方法来获取js渲染之后的页面：

Selenium
Splash

第一种是模拟用户使用浏览器的操作，需要和浏览器配合使用，考虑到服务器上安装浏览器的问题，我选择了第二种。每次安装新的东西都要折腾一段时间，splash也是如此，反正先按官网的说明来，出现什么问题就搜什么问题。好在装好之后，配合scrapy还是比较简单的。首先在settings.py里添加一些属性，然后启动splash：

1	docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash

可以通过访问 (http://localhost:8050/) 来查看js渲染后的页面。
然后只需要在原有的spider上加上一段代码即可：

import scrapy
from scrapy_splash import SplashRequest
class MySpider(scrapy.Spider):
    # ...
    def start_requests(self):
        splash_args = {
            'wait': 0.5,
        }
        for url in self.start_urls:
            yield SplashRequest(url, self.parse, endpoint='render.html', args=splash_args)
    # ...

基本就是这样，接下来还要在服务器上再安装一下splash，祝我好运。

leetcode-340. Longest Substring with At Most K Distinct Characters

2016-08-17

Given a string, find the length of the longest substring T that contains at most k distinct characters.

For example, Given s = “eceba” and k = 2,

T is “ece” which its length is 3.

Show Company Tags
Show Tags
Show Similar Problems
原题地址

利用hash table统计substring里的字符，类似的以char为key的题都可以用数组来做，会快一些。

int lengthOfLongestSubstringKDistinct(string s, int k) {
    unordered_map<char, int> hash;
    int res = 0;
    int p_1 = 0;
    for (int p_2 = 0; p_2 < s.size(); p_2++) {
        hash[s[p_2]]++;
        while (hash.size() > k) {
            if (--hash[s[p_1]] == 0) {
                hash.erase(s[p_1]);
            }
            p_1++;
        }
        res = max(res, p_2 - p_1 + 1);
    }
    return res;
}

leetcode-325. Maximum Size Subarray Sum Equals k

2016-08-17

Given an array nums and a target value k, find the maximum length of a subarray that sums to k. If there isn’t one, return 0 instead.

Example 1:
Given nums = [1, -1, 5, -2, 3], k = 3,
return 4. (because the subarray [1, -1, 5, -2] sums to 3 and is the longest)

Example 2:
Given nums = [-2, -1, 2, 1], k = 1,
return 2. (because the subarray [-1, 2] sums to 1 and is the longest)

Follow Up:
Can you do it in O(n) time?
原题地址

利用hash table纪录nums[0]到nums[i]的和，寻找两个和的差等于k的最大距离。Time: O(n). Space: O(n).

int maxSubArrayLen(vector<int>& nums, int k) {
    unordered_map<int, vector<int>> hash;
    int run_sum = 0;
    for (int i = 0; i < nums.size(); i++) {
        run_sum += nums[i];
        hash[run_sum].push_back(i);
    }
    int res = 0;
    if (hash.count(k) > 0) {
        res = hash[k].back() + 1;
    }
    for (auto& x : hash) {
        if (hash.count(x.first + k) > 0) {
            res = max(res, hash[x.first + k].back() - x.second[0]);
        }
    }
    return res;
}

后来发现可以一边存入hash table一边比较，因为只需要和之前的数据求差即可。速度可以更快，代码更简洁。

int maxSubArrayLen(vector<int>& nums, int k) {
    unordered_map<int, int> hash;
    int run_sum = 0;
    int res = 0;
    for (int i = 0; i < nums.size(); i++) {
        run_sum += nums[i];
        if (run_sum == k) {
            res = i + 1;
        }
        else if (hash.count(run_sum - k) > 0) {
            res = max(res, i - hash[run_sum - k]);
        }
        if (hash.count(run_sum) == 0) {
            hash[run_sum] = i;
        }
    }
    return res;
}

leetcode-246. Strobogrammatic Number

2016-08-17

A strobogrammatic number is a number that looks the same when rotated 180 degrees (looked at upside down).

Write a function to determine if a number is strobogrammatic. The number is represented as a string.

For example, the numbers “69”, “88”, and “818” are all strobogrammatic.
原题地址

首尾对比，只有五种匹配的方式：0-0, 1-1, 8-8, 6-9, 9-6。把这些存入hash table会方便许多。

bool isStrobogrammatic(string num) {
    int p_1 = 0;
    int p_2 = num.size() - 1;
    unordered_map<char, char> hash;
    hash['0'] = '0';
    hash['1'] = '1';
    hash['8'] = '8';
    hash['6'] = '9';
    hash['9'] = '6';
    while (p_2 >= p_1) {
        if (hash.count(num[p_1]) == 0 || hash[num[p_1]] != num[p_2]) {
            return false;
        }
        p_1++;
        p_2--;
    }
    return true;
}