想要用表格处理多个 html 页面 。
网页:
- contains several classless tables, the only way how to identify the correct one
- the needed table has in the 1st cell value "Content"
问题:如何在网上找到基于其单元格值的正确表格 : Scrape 或 Scrappy 或其他工具?
示例代码:
#!/usr/bin/env perl
use 5.014;
use warnings;
use Web::Scraper;
use YAML;
my $html = do { local $/; <DATA> };
my $table = scraper {
#the easy way - table with class, or id or any attribute
#process table.xxx > tr , rows[] => scraper {
#unfortunately, the table hasn t class= xxx , so :(
process NEED_HELP_HERE > tr , rows[] => scraper {
process th , header => TEXT ;
process td , cols[] => TEXT ;
};
};
my $result = $table->scrape( $html );
say Dump($result);
__DATA__
<head><title>title</title></head>
<body>
<table><tr><th class="inverted">header</th><td>value</td></tr></table>
<!-- here are several another tables (different count) -->
<table> <!-- would be easy with some class="xxx" -->
<tr>
<th class="inverted">Content</th> <!-- Need this table - 1st cell == "Content" -->
<td class="inverted">col-1</td>
<td class="inverted">col-n</td>
</tr>
<tr>
<th>Date</th>
<td>2012</td>
<td>2001</td>
</tr>
<tr>
<th>Banana</th>
<td>val-1</td>
<td>val-n</td>
</tr>
</table>
</body>
</html>