博文中会简要介绍Leetcode P0187题目分析及解题思路。
“Repeated DNA Sequences”这道题比较基础。利用HashTable可以比较容易地解决这个题目。
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC", "CCCCCAAAAA"]
根据题目需求我们其实是在给定的字符串s
里寻找长度为10并且重复出现多次的子串。最直接的想法就是我们记录下所有长度为10的子串,利用HashTable来记录每个子串出现的次数,然后次数大于1的子串便是我们想要找的子串。
以下是Java的题解代码实现。
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Map<String, Integer> str2CountMap = new HashMap<>();
int left = 0, right = left+9, L = s.length();
while (right < L) {
String substr = s.substring(left, right+1);
str2CountMap.put(substr, str2CountMap.getOrDefault(substr, 0)+1);
++left;
++right;
}
List<String> repeated = new ArrayList<>();
for (Map.Entry<String, Integer> entry: str2CountMap.entrySet()) {
if (entry.getValue() > 1)
repeated.add(entry.getKey());
}
return repeated;
}
}
以下是C++的题解代码实现。
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_map<string, int> str2count;
int left = 0, right = left+9, L = s.length();
while (right < L) {
++str2count[s.substr(left, 10)];
++left;
++right;
}
vector<string> repeated;
for (auto entry: str2count) {
if (entry.second > 1)
repeated.push_back(entry.first);
}
return repeated;
}
};