HTML和CSS的C++解析器 htmlcxx
本文导语: htmlcxx 是一个 C++ 的 HTML 解析器和 CSS1 的解析器。The parsing politics attempt to mimic the behavior of Mozilla Firefox, so you should expect parse trees similar to those created by Firefox. However, it does not insert nonexistent stuff in your HTML. Therefore, serializing the DOM tr...
htmlcxx 是一个 C++ 的 HTML 解析器和 CSS1 的解析器。The parsing politics attempt to mimic the behavior of Mozilla Firefox, so you should expect parse trees similar to those created by Firefox. However, it does not insert nonexistent stuff in your HTML. Therefore, serializing the DOM tree gives exactly the same output as the original HTML document. Another key feature is an STL-like tree navigation API provided by the tree.hh template library.
示例代码:
#include
...
//Parse some html code
string html = "hey";
HTML::ParserDom parser;
tree dom = parser.parseTree(html);
//Print whole DOM tree
cout parseAttributes();
cout attributes("href");
}
}
//Dump all text of the document
it = dom.begin();
end = dom.end();
for (; it != end; ++it)
{
if ((!it->isTag()) && (!it->isComment()))
{
cout text();
}
}