Crawl

Web scraping lets you collect data from web pages across the internet. It's also called web crawling or web data extraction.

Selectors

Allows querying the DOM with CSS selectors (currently available: *, tagname, tagname#id, #id, tagname.classname, .classname, tagname.classname.classname2, .classname.classname2, tagname[attribute-selector], [attribute-selector], "div, p", div p, div > p, div + p and p ~ ul.)

Load content

Load HTML from URL.

Crawl::url("http://example.com/");

Load HTML from file.

Crawl::file("public/index.html");

Load HTML from a string.

Crawl::string('<!DOCTYPE html><html><body>Hello</body></html>');

Source

Retrieve the source code.

Crawl::string('<!DOCTYPE html><html><body>Hello</body></html>');

echo Crawl::source();

First

Returns the first document element matching the selector.

Crawl::string('<!DOCTYPE html><html><body>Hello</body></html>');

$crawl = Crawl::first("body");

Get

Returns a list of document elements matching the selector.

Crawl::string('<!DOCTYPE html><html><body><h1>Hello</h1><h1>Kiaan</h1><div class="content">This is some text</div></body></html>');

$crawl = Crawl::get("h1");

Returns the item at the specified index.

Crawl::string('<!DOCTYPE html><html><body><h1>Hello</h1><h1>Kiaan</h1><div class="content">This is some text</div></body></html>');

echo Crawl::get("h1")->item(0)->text();

Returns the count of items.

Crawl::string('<!DOCTYPE html><html><body><h1>Hello</h1><h1>Kiaan</h1><div class="content">This is some text</div></body></html>');

echo Crawl::get("h1")->item(0)->count();

Attributes

Returns the value for the attribute name specified.

Crawl::string('<!DOCTYPE html><html><body><a href="www.google.com">Google</a></body></html>');

echo Crawl::first("a")->attribute("href");

Returns an array containing all attributes.

Crawl::string('<!DOCTYPE html><html><body><a id="google" href="www.google.com">Google</a></body></html>');

Crawl::first("a")->attributes();

Values

Returns the updated node value

Crawl::string('<!DOCTYPE html><html><body><a id="google" href="www.google.com">Google</a></body></html>');

echo Crawl::first("a")->value();

Returns the updated text content

Crawl::string('<!DOCTYPE html><html><body><a id="google" href="www.google.com">Google</a></body></html>');

echo Crawl::first("a")->text();

Returns the updated html content

Crawl::string('<!DOCTYPE html><html><body><a id="google" href="www.google.com">Google</a></body></html>');

echo Crawl::first("a")->html();

Selectors

You can use selector

Crawl::url("http://example.com/");

Crawl::selector("#Web > table");

and xpath

Crawl::url("http://example.com/");

Crawl::xpath("//*[@id="Web"]/table");

Last updated