Helpers¶
Http helpers¶
- class
crawlster.helpers.
RequestsHelper
¶Helper for making HTTP requests using the requests library
delete
(url, data=None, query_params=None, headers=None)¶Makes a DELETE request
get
(url, query_params=None, headers=None)¶Makes a GET request
initialize
()¶Initializes the session used for making requests
open
(http_request: crawlster.helpers.http.request.HttpRequest)¶Opens a given HTTP request.
Parameters: http_request (HttpRequest) – The crawlster.helpers.http.request.HttpRequest instance with the required info for making the request Returns: crawlster.helpers.http.response.HttpResponse
options
(url, query_params=None, headers=None)¶Makes an OPTIONS request
patch
(url, data=None, query_params=None, headers=None)¶Makes a PATCH request
post
(url, data=None, query_params=None, headers=None)¶Makes a POST request
Http requests¶
- class
crawlster.helpers.http.request.
HttpRequest
(url, method='GET', data=None, query_params=None, headers=None)¶Class representing a http request
- class
crawlster.helpers.http.request.
GetRequest
(url, query_params=None, headers=None)¶A HTTP GET request
- class
crawlster.helpers.http.request.
PostRequest
(url, data=None, query_params=None, headers=None)¶A HTTP POST request
- class
crawlster.helpers.http.request.
JsonRequest
(url, method='GET', data=None, query_params=None, headers=None)¶A generic JSON request.
The data must be an object that can be safely encoded as JSON.
Examples
JsonRequest(‘http://example.com’, ‘POST’, data={‘hello’: ‘world’})
- class
crawlster.helpers.http.request.
XhrRequest
(url, method='GET', data=None, query_params=None, headers=None)¶A XHR Post request
Http responses¶
- class
crawlster.helpers.http.response.
HttpResponse
(request, status_code, headers, body)¶Class representing a http response
body_str
¶Returns the decoded content of the request, if possible.
May raise UnicodeDecodeError if the body does not represent a valid unicode encoded sequence.
content_type
¶Returns the response content type if available
server
¶Returns the server header if available
Extract helpers¶
Utility classes¶
- class
crawlster.helpers.extract.
Content
(raw_data)¶Content wrapper that provides common data extraction methods
css
(pattern, get_attr=None, get_text=False)¶Extracts data using css selector
Returns a list of elements (as strings) with the extracted data
Parameters:
- pattern (str) – the CSS selector
- get_attr (str or None) – if present, returns a list of the attributes of the extracted items
- get_text (bool) – If should return only the content/text of the element
Returns: If get_attr and get_text are not specified, returns a list of strings with the matches.
If get_attr is specified, returns a list with the values of the specified attribute, if present. Elements that match the query pattern and does not have that attribute are ignored.
If get_text is specified, returns a list with the text from the matched elements (direct children that are not nested tags).
parsed_data
¶Access the underlying bs4.BeautifulSoup4 instance
This property is provided for more advanced usage.