The purpose of this set of functions is to parse a Robots Exclusion Standard file into a data structure for easy access.
◆ parse_record_field
| #define parse_record_field |
( |
| d, |
|
|
| f ) |
Value:parse_record_field(d, f, sizeof(f) - 1)
◆ wget_robots_parse()
| int wget_robots_parse |
( |
wget_robots ** | _robots, |
|
|
const char * | data, |
|
|
const char * | client ) |
- Parameters
-
| [in] | data | Memory with robots.txt content (with trailing 0-byte) |
| [in] | client | Name of the client / user-agent |
- Returns
- Return an allocated wget_robots structure or NULL on error
The function parses the robots.txt data in accordance to https://www.robotstxt.org/orig.html#format and returns a ROBOTS structure including a list of the disallowed paths and including a list of the sitemap files.
The ROBOTS structure has to be freed by calling wget_robots_free().
◆ wget_robots_free()
| void wget_robots_free |
( |
wget_robots ** | robots | ) |
|
- Parameters
-
| [in,out] | robots | Pointer to Pointer to wget_robots structure |
wget_robots_free() free's the formerly allocated wget_robots structure.
◆ wget_robots_get_path_count()
| int wget_robots_get_path_count |
( |
wget_robots * | robots | ) |
|
- Parameters
-
| robots | Pointer to instance of wget_robots |
- Returns
- Returns the number of paths listed in
robots
◆ wget_robots_get_path()
| wget_string * wget_robots_get_path |
( |
wget_robots * | robots, |
|
|
int | index ) |
- Parameters
-
| robots | Pointer to instance of wget_robots |
| index | Index of the wanted path |
- Returns
- Returns the path at
index or NULL
◆ wget_robots_get_sitemap_count()
| int wget_robots_get_sitemap_count |
( |
wget_robots * | robots | ) |
|
- Parameters
-
| robots | Pointer to instance of wget_robots |
- Returns
- Returns the number of sitemaps listed in
robots
◆ wget_robots_get_sitemap()
| const char * wget_robots_get_sitemap |
( |
wget_robots * | robots, |
|
|
int | index ) |
- Parameters
-
| robots | Pointer to instance of wget_robots |
| index | Index of the wanted sitemap URL |
- Returns
- Returns the sitemap URL at
index or NULL