科技行者

行者学院 转型私董会 科技行者专题报道 网红大战科技行者

知识库

知识库 安全导航

至顶网软件频道基础软件Simple-view C++_boost_regex

Simple-view C++_boost_regex

  • 扫一扫
    分享文章到微信

  • 扫一扫
    关注官方公众号
    至顶头条

正则表达式在处理文本的时候有着十分强的优势。如果你熟悉linux的工作环境,如果经常使用比如grep ,sed perl或者emacs,vi你便知道regex在作用作用,可以大大提高工作效率。

作者:apple.davinci 来源:CSDN 2008年3月22日

关键字: Simple-view C++ C Linux

  • 评论
  • 分享微博
  • 分享邮件

正则表达式在处理文本的时候有着十分强的优势。如果你熟悉linux的工作环境,如果经常使用比如grep ,sed
perl或者emacs,vi你便知道regex在作用作用,可以大大提高工作效率。

很多语言都提供了对正则表达式的支持,比如故Java(java.util.regex),perl等,很多脚本语言真是应为regex才显得特别有用。

Now boost::regex也提供对C++对正则表达式的库支持。同时它也会被纳入下一代的标准之中,在tr1中

one simple example is more than thousands words.

how do U split string to word?
like this? :( sorry for my ugly codes)

void split_string()
{
size_t beg = 0;
size_t end = 0;
char* str = "davinci is a very good boy";
string word;
for(unsigned i = 0;i<strlen(str);i++)
{
if(i>=1 && str[i] != ' '&&str[i-1] == ' ')
{
beg = i;

}

    else if(str[i] == ' ')
{
end = i;
}
else if(str[i+1] == '\0')
{
end = i + 1;
}
if(beg < end)
{
word = std::string(str+beg,str+end);
cout<<word<<endl;
}
}
,but you can use boost::regex_split()
template <class OutputIterator, class charT, class Traits1, class Alloc1>
std::size_t regex_split(OutputIterator out, 
std::basic_string<charT, Traits1, Alloc1>& s);
当然是重载了的,还有其他的形式

 main()
{
using boost::lambda::_1;

main()
{
using boost::lambda::_1;

std::list<std::string> ls;
std::string str = "davinci is a very good boy";
boost::regex_split(std::back_inserter(ls),str);
std::for_each(ls.begin(),ls.end(),std::cout<<_1<<"\n");
}
//-----------------------------------------------
On regex rule:
每一种不同工具对正则表达式有少许不同。

.[{()\*+?|^$

*(0|n ) + (1|n ) ?(0|1) 0次或者1次
n表示n次

? No-greedy match非贪婪匹配.尽可能少的匹配
{n,m}? Matches the previous atom between n and m times, while consuming as little input as possible.

| 或者 ab(c|d)匹配abc或者abd

[] 可选择其中任意一个,[abc],可以a,或者b,或者c,
[a-z]表示a到z任意一个
[0-9],I know you know and every know it.


. match any single character


指定 具体的重复次数

a{n}重复n次

a{n,}n次或者更多

a{n,m}n到m次之间

A '^' character shall match the start of a line.

A '

example :
^a{2,3}cb$ 字符a开头,b结尾 . a重复2,到3次
包括
aacb
aaacb

^(a*).*\1$


但是下面是错误的:
a(*)

example:
apple@40years:~$ cat regex
anjutaProjects
a.out
book
Desktop
happy
Mail
misc
myProject
Project
regex
regex_replace.cpp
study
test
tmp
apple@40years:~$ grep ^a < regex //全是以a开头的字符串
anjutaProjects
a.out
apple@40years:~$ grep t___FCKpd___3lt;regex grep t___FCKpd___3lt;regex //t结尾的
a.out
myProject
Project
test
apple@40years:~$ grep [aj]o* < regex //包含a或者j,o可以出现任意次数或者不出现
anjutaProjects
a.out
happy
Mail
myProject
Project
regex_replace.cpp
apple@40years:~$


//---------------------------------
Yeah, it is more simple,and readable,but the function is not remarkable,and even it deprecated.
now one more:
如果你要找到一行注释/*smth */ and more ,how will you do ?

#include<boost/regex.hpp>
#include<iostream>
#include<string>
#include<map> main()
{
using std::string;
using std::endl;
using std::cout;
string re = main()
{
using std::string;
using std::endl;
using std::cout;
string re = main()
{
using std::string;
using std::endl;
using std::cout;
string re = main()
{
using std::string;
using std::endl;
using std::cout;
string re = "\\/*\\w+\\*\\/"; //æ煡æ壘/**/æ敞é噴
boost::regex e (re);
string s = "int a = 33,int b,/*333*/ c ; /*efg*/;int k,/*sadfa*/";
c ; /*efg*/;int k,/*sadfa*/";
c ; /*efg*/;int k,/*sadfa*/";
c ; /*efg*/;int k,/*sadfa*/";
boost::match_results<std::string::const_iterator> result;
i = 0;
i = 0;
i = 0;
i = 0;
while(boost::regex_search(s,result,e)&& i++<4 )
{
cout<<*result.begin()<<"\n"<<*(result.end()-1)<<"\n";
cout<<"suffix="<<result.suffix()<<endl;
s = result.suffix(); //ç户ç画æ悳ç储å悗è竟ç殑å瓧çä覆
}
{
cout<<"error"<<endl;
}
/*****************************************/
cout<<"------------------------------------"<<endl;
//é暱æ壘å瓧çä覆""
typedef std::map<std::string,int,std::less<std::string> > map_type;
std::string insert_beg = "<font>";
std::string insert_end = "</font>";

std::string str = "abc \"davinci\"23abca3 abcd ";
e = boost::regex("(\"\\w*\")\\w*(a\\d)");//??è〃çずå叾ä腑å彲ä互ä负ä换ä綍ä釜ä换æ剰ç殑å瓧ç
// e = boost::regex("\"\\w*\"");
std::string::const_iterator beg = str.begin();
std::string::const_iterator end = str.end();
boost::match_results<std::string::const_iterator> what;

map_type m;
while(boost::regex_search(beg,end,what,e))
{
cout<<"what[0] = "<<what[0]<<endl;
cout<<"what[1] = "<<what[1]<<endl;//sub_match 1, (\"\\w*\)
cout<<"what[2] = "<<what[2]<<endl;//sub_match 2 (a\\d)
std::string::size_type pos_end = what[1].second -str.begin();
std::string::size_type pos_beg = what[1].first - str.begin();

str.insert(pos_end,insert_end);
str.insert(pos_beg,insert_beg);
m[what[1]] = pos_beg;
assert(pos_end<str.size());

//update region
beg = what[1].second;
cout<<str<<endl;


}

}
//----------------------
match_result是sub_match的集合,sub_match:public pair<biIt,biIt>
match_result<std::string::const_iterator> what;
cout<<what[0]<<endl;//输出第一个匹配sub_match的字符串
what[0]是一个sub_match,一个pair,,对<<操作符进行了重载.

所以what[0]不是字符串string.
what[0].str()可以获取字符串.
what[0].first,what[0].second是BidirectionIterator,std::string(what[0].second-what[0].first)
what[1].first,what[1].second之间标志了该匹配的字符串

what[0]整个匹配的字符串
what[1],第一个sub_match匹配的字符串
what[2],第二个匹配的字符串
what[n]第n个sub_match匹配的字符串
.//---------------------------------------
So Overview it now:

three most import algorithm is start_with word regex
boost::regex_search()
boost::regex_match()
boost::regex_replace()

Types
syntax_option_type
error_type
match_flag_type
class regex_error
class regex_traits
class template basic_regex
class template sub_match
class template match_results
Algorithms
regex_match
regex_search
regex_replace
Iterators
regex_iterator
regex_token_iterator
Typedefs
regex     [ = basic_regex<char> ]
wregex     [ = basic_regex<wchar_t> ]
cmatch     [ = match_results<const char*> ]
wcmatch     [ = match_results<const wchar_t*> ]
smatch     [ = match_results<std::string::const_iterator> ]
wsmatch     [ = match_results<std::wstring::const_iterator> ]
cregex_iterator     [ = regex_iterator<const char*>]
wcregex_iterator     [ = regex_iterator<const wchar_t*>]
sregex_iterator     [ = regex_iterator<std::string::const_iterator>]
wsregex_iterator     [ = regex_iterator<std::wstring::const_iterator>]
cregex_token_iterator     [ = regex_token_iterator<const char*>]
wcregex_token_iterator     [ = regex_token_iterator<const wchar_t*>]
sregex_token_iterator     [ = regex_token_iterator<std::string::const_iterator>]
wsregex_token_iterator     [ = regex_token_iterator<std::wstring::const_iterator>]
Deprecated interfaces
POSIX API Compatibility Functions
class regbase
class template reg_expression
Algorithm regex_grep
Algorithm regex_format
Algorithm regex_merge
Algorithm regex_split
class RegEx
for more information ,visit 
http://www.boost.org/libs/regex/doc/index.html
There are many examples,which is wonderful.
中文网站也有,please google it,
wish you love boost::regex,
BTW, when you compile the source codes ,you must link lib
like this:
apple@40years:~/test/boost$ cd ..
apple@40years:~/test$ g++ -g -Wall -lboost_regex split_string.cpp
apple@40years:~/test$ ./a.out
davinci
is
a
very
good
boy
apple@40years:~/test$sd

All ,thanks

    • 评论
    • 分享微博
    • 分享邮件
    邮件订阅

    如果您非常迫切的想了解IT领域最新产品与技术信息,那么订阅至顶网技术邮件将是您的最佳途径之一。

    重磅专题
    往期文章
    最新文章