Helpers

Http helpers

class crawlster.helpers.RequestsHelper

Helper for making HTTP requests using the requests library

delete(url, data=None, query_params=None, headers=None)

Makes a DELETE request

get(url, query_params=None, headers=None)

Makes a GET request

initialize()

Initializes the session used for making requests

open(http_request: crawlster.helpers.http.request.HttpRequest)

Opens a given HTTP request.

Parameters:http_request (HttpRequest) – The crawlster.helpers.http.request.HttpRequest instance with the required info for making the request
Returns:crawlster.helpers.http.response.HttpResponse
options(url, query_params=None, headers=None)

Makes an OPTIONS request

patch(url, data=None, query_params=None, headers=None)

Makes a PATCH request

post(url, data=None, query_params=None, headers=None)

Makes a POST request

Http requests

class crawlster.helpers.http.request.HttpRequest(url, method='GET', data=None, query_params=None, headers=None)

Class representing a http request

class crawlster.helpers.http.request.GetRequest(url, query_params=None, headers=None)

A HTTP GET request

class crawlster.helpers.http.request.PostRequest(url, data=None, query_params=None, headers=None)

A HTTP POST request

class crawlster.helpers.http.request.JsonRequest(url, method='GET', data=None, query_params=None, headers=None)

A generic JSON request.

The data must be an object that can be safely encoded as JSON.

Examples

JsonRequest(‘http://example.com’, ‘POST’, data={‘hello’: ‘world’})

class crawlster.helpers.http.request.XhrRequest(url, method='GET', data=None, query_params=None, headers=None)

A XHR Post request

Http responses

class crawlster.helpers.http.response.HttpResponse(request, status_code, headers, body)

Class representing a http response

body_str

Returns the decoded content of the request, if possible.

May raise UnicodeDecodeError if the body does not represent a valid unicode encoded sequence.

content_type

Returns the response content type if available

server

Returns the server header if available

Extract helpers

class crawlster.helpers.ExtractHelper
css(text, selector, attr=None, content=None)

Extracts data using css selector.

See :py:meth:Content.css for more info.

Utility classes

class crawlster.helpers.extract.Content(raw_data)

Content wrapper that provides common data extraction methods

css(pattern, get_attr=None, get_text=False)

Extracts data using css selector

Returns a list of elements (as strings) with the extracted data

Parameters:
  • pattern (str) – the CSS selector
  • get_attr (str or None) – if present, returns a list of the attributes of the extracted items
  • get_text (bool) – If should return only the content/text of the element
Returns:

If get_attr and get_text are not specified, returns a list of strings with the matches.

If get_attr is specified, returns a list with the values of the specified attribute, if present. Elements that match the query pattern and does not have that attribute are ignored.

If get_text is specified, returns a list with the text from the matched elements (direct children that are not nested tags).

parsed_data

Access the underlying bs4.BeautifulSoup4 instance

This property is provided for more advanced usage.