Inregration with Scrapy

DictWrapper

class save_to_db.core.utils.dict_wrapper.DictWrapper(item)[source]

Bases: dict

This class is used as a temporary wrapper around ItemBase instance for compatibility with Scrapy project.

Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way. [1]

[1]Citation from https://scrapy.org site.

When parsing a page with Scrapy you cannot yield instances of arbitrary classes, but you can yield an instance of a dict class which is treated as an instance of a Scrapy item and properly sent to item pipelines.

Using to_dict() and then load_dict() of ItemBase instance is expensive, as it properly transforms an item into a dict and then dict to item. Using dict_wrapper() method of ItemBase will just wrap an item in a DictWrapper instance (subclass of dict class), later you can get the wrapped item from the wrapper in a Scrapy pipeline.

Instances of Scrapy items and of this library items are not completely compatible, so you need to use different item pipelines for this library items, the pipelines must accept and return instances of DictWrapper if you want to use more then one pipe in a pipeline.

Here is an example of a Scrapy pipeline that saves items to a database:

from django.db import transaction
from save_to_db import Persister
from save_to_db.adapters import DjangoAdapter

persister = Persister(DjangoAdapter(adapter_settings={}))

class DbPipeline(object):

    def process_item(self, item, spider):
        stdb_item = item.get_item()

        with transaction.atomic():
            persister.persist(stdb_item)

        return item  # return wrapped item for the next pipe in line
Parameters:item – An instance of ItemBase to be wrapped in a dictionary.
get_item()[source]

Returns an originally wrapped item instance.

Returns:Instance of ItemBase class.