Inregration with Scrapy¶
DictWrapper¶
-
class
save_to_db.core.utils.dict_wrapper.
DictWrapper
(item)[source]¶ Bases:
dict
This class is used as a temporary wrapper around
ItemBase
instance for compatibility with Scrapy project.Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way. [1]
[1] Citation from https://scrapy.org site. When parsing a page with Scrapy you cannot yield instances of arbitrary classes, but you can yield an instance of a dict class which is treated as an instance of a Scrapy item and properly sent to item pipelines.
Using
to_dict()
and thenload_dict()
ofItemBase
instance is expensive, as it properly transforms an item into a dict and then dict to item. Usingdict_wrapper()
method ofItemBase
will just wrap an item in aDictWrapper
instance (subclass of dict class), later you can get the wrapped item from the wrapper in a Scrapy pipeline.Instances of Scrapy items and of this library items are not completely compatible, so you need to use different item pipelines for this library items, the pipelines must accept and return instances of
DictWrapper
if you want to use more then one pipe in a pipeline.Here is an example of a Scrapy pipeline that saves items to a database:
from django.db import transaction from save_to_db import Persister from save_to_db.adapters import DjangoAdapter persister = Persister(DjangoAdapter(adapter_settings={})) class DbPipeline(object): def process_item(self, item, spider): stdb_item = item.get_item() with transaction.atomic(): persister.persist(stdb_item) return item # return wrapped item for the next pipe in line
Parameters: item – An instance of ItemBase
to be wrapped in a dictionary.