[算法笔记]最长公共连续子串

Binarydog_Lee

2024-04-26 帮助1人

推导

学新通

对于两个串s和t，状态转移方程：

// 先都置零
// memset(dp,0,sizeof(dp));
dp[i][0]=1					// 若s[i]==t[0]（即边界上的特殊情况）
dp[0][j]=1					// 若s[0]==t[j]（即边界上的特殊情况）
[dp[i][j]=dp[i-1][j-1] 1	// 若s[i]==t[j]

其中d[i][j]表示的是s[0...i]和t[0...j]的公共连续子串长度（没说是最长的），然后在该数组中找到最大值即可。

学新通

输出子串：

生成dp时就用max保存最大值和它的行列。然后substr一下。上图就是发现max值为dp[5][5]处的3，随便看s或者t。以s为例，s串对应的是i=5，所以子串ret即是：

string ret = s.substr(i-max 1,max);

相关题目

[牛客]BM66 最长公共子串

// 能过但仅供参考思路
class Solution {
public:
    /**
     * longest common substring
     * @param str1 string字符串 the string
     * @param str2 string字符串 the string
     * @return string字符串
     */
     int dp[5000][5000];
    string LCS(string str1, string str2) {
        // write code here
        int max = -1;
        int s,d;
        memset(dp,0,sizeof(dp));
        for(int i = 0; i < str1.length();   i){
            if(str1[i]==str2[0]){
                dp[i][0]=1;
                max = 1;
                s = i;d = 0;
            }
        }
        for(int j = 0; j < str2.length();   j){
            if(str1[0]==str2[j]){
                dp[0][j]=1;
                max = 1;
                s = 0;d = j;
            }
        }
        for(int i = 1; i < str1.length();   i){
            for(int j = 1; j < str2.length();   j){
                if(str1[i]==str2[j]){
                    dp[i][j] = dp[i-1][j-1] 1;
                    if(dp[i][j] > max){
                        max = dp[i][j];
                        s = i;d = j;
                    }
                }
            }
        }
        string ret = str1.substr(s-max 1,max);
        return ret;
    }
};

代码解析

然后是大佬代码（是真的快）：

class Solution {
public:
    /**
     * longest common substring
     * @param str1 string字符串 the string
     * @param str2 string字符串 the string
     * @return string字符串
     */
    string LCS(string str1, string str2) {
        // write code here
        unordered_map<char, vector<int>> chIndex;
        int size1 = str1.size();
        int size2 = str2.size();
        int maxSize = 0, maxIndex = -1,curIndex,curCount;
        int i,j;
        for (i = 0; i < size1; i  ){
            chIndex[str1[i]].emplace_back(i);
        }
        for (i = 0; i < size2; i  ){
            if (chIndex.count(str2[i]) == 0) continue;
            for (int x: chIndex[str2[i]]){
                if ( (size1 - x) < maxSize || (size2 - i) < maxSize)
                    break;
                curIndex = i;
                curCount = 0;
                j = i;
                while (str1[x  ] == str2[j  ]){
                    curCount  ;
                }
                if (curCount > maxSize){
                    maxSize = curCount;
                    maxIndex = curIndex;
                }
            }
        }
        return maxSize>0?str2.substr(maxIndex,maxSize):string();
    }
};

首先是这个unordered_map。

C 中有两种键值对容器，即map与unordered_map。它们的底层和适用范围不尽相同：

map是基于红黑树实现，而unordered_map是基于hash_table实现。
key应当唯一（不可出现多次）
unordered_map查询单个key的时候效率比map高（适合单个数据的随机存储）

其他内容可以参考[cppreference]std::unordered_map

unordered_map<char, vector<int>> chIndex;

由上述代码可知，这里key是char，value是vector<int>

chIndex[str1[i]].emplace_back(i);

由于可以根据key找value，因此for循环中的上述语句，是对vector<int>使用emplace_back(i)。这个东西百度了一下说是C11的新特性。

push_back()和emplace_back()作用都是向vector容器尾部添加一个元素。区别在于底层实现的机制不同。前者是先创建再拷贝到尾部，而后者是直接在尾部创建，省去了拷贝的过程。

它支持按下标的方式初始化新的键值对，例如：

unordered_map<string, int> umap;
umap["as"] = 23;
cout << umap["as"];
// output:23

所以第一个for循环把str1的每个字母出现在该字符串的位置按先后顺序记录在了一个vector里（）

基于这个，我们至少可以知道

str1里面有哪些字符
str1字符出现的次数及其先后顺序
一个已出现的字符在 str1中的位置（位置从零开始计算）

然后就是这个代码的关键：

for (i = 0; i < size2; i  ) {
    if (chIndex.count(str2[i]) == 0) continue;
    for (int x : chIndex[str2[i]]) {
        if ((size1 - x) < maxSize || (size2 - i) < maxSize)
            break;
        curIndex = i;
        curCount = 0;
        j = i;
        while (str1[x  ] == str2[j  ]) {
            curCount  ;
        }
        if (curCount > maxSize) {
            maxSize = curCount;
            maxIndex = curIndex;
        }
    }
}

首先目的肯定是拿str2的内容和str1比较。第一个if将str2中未出现在str1的字符排除掉了。进行到该语句后面说明str2[i]出现在str1里了。

然后它在这个前提下遍历该字符出现的位置x：

首先if作用是从x处到末尾字符串要小于目前已知最大公共连续子串长度，就不去考虑了，反正比较到末尾就算成功，这个串的长度也不会大于已经发现的最大公共连续子串长度，自然也不会成为新的最大公共连续子串。

if ((size1 - x) < maxSize || (size2 - i) < maxSize)
	break;

然后就是

curIndex = i;
curCount = 0;
j = i;
while (str1[x  ] == str2[j  ]) {
    curCount  ;
}
if (curCount > maxSize) {
    maxSize = curCount;
    maxIndex = curIndex;
}

然后就是比较，看看连续子串长度（也就是curCount）能不能大于maxSize，能的话记录一下起始的新坐标（curIndex），和新长度（curCount）

奥，他这合着没按动归的状态缓存方法做。

这篇好文章是转载于：学新通技术网

[算法笔记]最长公共连续子串

推导

相关题目

代码解析

photoshop保存的图片太大微信发不了怎么办

word里面弄一个表格后上面的标题会跑到下面怎么办

《学习通》视频自动暂停处理方法

photoshop扩展功能面板显示灰色怎么办

Android 11 保存文件到外部存储，并分享文件

微信公众号没有声音提示怎么办

TikTok加速器哪个好免费的TK加速器推荐

excel下划线不显示怎么办

excel打印预览压线压字怎么办

怎样阻止微信小程序自动打开