如何让我的拆分只在一个真实的行上工作并且能够跳过字符串的引用部分?

时间:2023-02-12
本文介绍了如何让我的拆分只在一个真实的行上工作并且能够跳过字符串的引用部分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

所以我们有一个简单的拆分:

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;

vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
    vector<string> result;
    if (delim.empty()) {
        result.push_back(s);
        return result;
    }
    string::const_iterator substart = s.begin(), subend;
    while (true) {
        subend = search(substart, s.end(), delim.begin(), delim.end());
        string temp(substart, subend);
        if (keep_empty || !temp.empty()) {
            result.push_back(temp);
        }
        if (subend == s.end()) {
            break;
        }
        substart = subend + delim.size();
    }
    return result;
}

或boost split.我们有简单的 main 像:

or boost split. And we have simple main like:

int main() {
    const vector<string> words = split("close no "
 matter" how 
 far", " ");
    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "
"));
}

如何让它像输出一样

close 
no
"
 matter"
how
end symbol found.

我们想引入未拆分的拆分结构和结束解析过程的字符.怎么办?

we want to introduce to split structures that shall be held unsplited and charecters that shall end parsing process. how to do such thing?

推荐答案

如下代码:

vector<string>::const_iterator matchSymbol(const string & s, string::const_iterator i, const vector<string> & symbols)
{
    vector<string>::const_iterator testSymbol;
    for (testSymbol=symbols.begin();testSymbol!=symbols.end();++testSymbol) {
        if (!testSymbol->empty()) {
            if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {
                return testSymbol;
            }
        }
    }

    assert(testSymbol==symbols.end());
    return testSymbol;
}

vector<string> split(const string& s, const vector<string> & delims, const vector<string> & terms, const bool keep_empty = true)
{
    vector<string> result;
    if (delims.empty()) {
        result.push_back(s);
        return result;
    }

    bool checkForDelim=true;

    string temp;
    string::const_iterator i=s.begin();
    while (i!=s.end()) {
        vector<string>::const_iterator testTerm=terms.end();
        vector<string>::const_iterator testDelim=delims.end();

        if (checkForDelim) {
            testTerm=matchSymbol(s,i,terms);
            testDelim=matchSymbol(s,i,delims);
        }

        if (testTerm!=terms.end()) {
            i=s.end();
        } else if (testDelim!=delims.end()) {
            if (!temp.empty() || keep_empty) {
                result.push_back(temp);
                temp.clear();
            }
            string::const_iterator j=testDelim->begin();
            while (i!=s.end() && j!=testDelim->end()) {
                ++i;
                ++j;
            }
        } else if ('"'==*i) {
            if (checkForDelim) {
                string::const_iterator j=i;
                do {
                    ++j;
                } while (j!=s.end() && '"'!=*j);
                checkForDelim=(j==s.end());
                if (!checkForDelim && !temp.empty() || keep_empty) {
                    result.push_back(temp);
                    temp.clear();
                }
                temp.push_back('"');
                ++i;
            } else {
                //matched end quote
                checkForDelim=true;
                temp.push_back('"');
                ++i;
                result.push_back(temp);
                temp.clear();
            }
        } else if ('
'==*i) {
            temp+="\n";
            ++i;
        } else {
            temp.push_back(*i);
            ++i;
        }
    }

    if (!temp.empty() || keep_empty) {
        result.push_back(temp);
    }
    return result;
}

int runTest()
{
    vector<string> delims;
    delims.push_back(" ");
    delims.push_back("	");
    delims.push_back("
");
    delims.push_back("split_here");

    vector<string> terms;
    terms.push_back(">");
    terms.push_back("end_here");

    const vector<string> words = split("close no "
 end_here matter" how 
 far testsplit_heretest"another split_here test"with some"mo>re", delims, terms, false);

    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "
"));
}

生成:

close
no
"
 end_here matter"
how
far
test
test
"another split_here test"
with
some"mo

根据您提供的示例,您似乎希望换行符出现在引号之外时被视为分隔符,并在引号内时由文字 表示,这就是这样做的.它还添加了具有多个分隔符的功能,例如我使用测试时的 split_here.

Based on the examples you gave, you seemed to want newlines to count as delimiters when they appear outside of quotes and be represented by the literal when inside of quotes, so that's what this does. It also adds the ability to have multiple delimiters, such as split_here as I used the test.

我不确定您是否希望以匹配引号的方式拆分不匹配的引号,因为您提供的示例将不匹配的引号用空格分隔.此代码将不匹配的引号视为任何其他字符,但如果这不是您想要的行为,它应该很容易修改.

I wasn't sure if you want unmatched quotes to be split the way matched quotes do since the example you gave has the unmatched quote separated by spaces. This code treats unmatched quotes as any other character, but it should be easy to modify if this is not the behavior you want.

行:

if (0==testSymbol->compare(0,testSymbol->size(),&(*i),testSymbol->size())) {

将适用于 STL 的大多数(如果不是全部)实现,但不能保证工作.它可以替换为更安全但更慢的版本:

will work on most, if not all, implementations of the STL, but it is not gauranteed to work. It can be replaced with the safer, but slower, version:

if (*testSymbol==s.substr(i-s.begin(),testSymbol->size())) {

这篇关于如何让我的拆分只在一个真实的行上工作并且能够跳过字符串的引用部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

上一篇:C++ 模板角括号陷阱 - C++11 修复是什么? 下一篇:Spirit-Qi:如何编写非终结符解析器?

相关文章

最新文章