Item configuration and usage¶
Table of Contents
ItemBase¶
-
class
save_to_db.core.item_base.
ItemBase
[source]¶ This is an abstract item class that serves as a base class for single item and bulk item classes.
Parameters: **kwargs – Values that will be saved as item data. -
as_list
()[source]¶ Returns a list of items. For single items simply returns a list containing only self, for bulk items returns list of all items.
-
dict_wrapper
()[source]¶ This method is used for integration with Scrapy project, when parsing pages in Scrapy you can yield an item as
DictWrapper
(subclass of dict) and then useget_item()
method to get the original item.Returns: An DictWrapper
class instance.
-
get_proxy
()[source]¶ Returns an instancce of
ProxyObject
for this item.See also
-
load_dict
(data)[source]¶ Loads data from dictionary into the item.
Parameters: data – Dictionary with item data (see to_dict()
method).Returns: The item itself.
-
pprint
(as_json=True, revert=None, *args, **kwargs)[source]¶ Pretty prints the item.
Parameters: - as_json – If True (default), the a JSON representation of item is printed, otherwise dictionary representation is printed using pprint method from pprint module.
- revert – Convert all not JSON serializable values into serializable ones.
- *args – These arguments passed to pprint.pprint function or to json.dumps.
- **kwargs – These key-value arguments passed to pprint.pprint function or to json.dumps.
-
process
()[source]¶ Converts all set values to the appropriate data types, sets default values if needed, calls
process()
method on all referenced items.Returns: Dictionary with single item classes as keys and lists of single item instances as values.
-
revert
()[source]¶ Converts all field values into JSON serializable values in such a way that
process()
method converts them bac to original values.
-
select
(key)[source]¶ Just calls
select()
function fromsave_to_db.core.utils.selector
module with self as the first argument.See also
save_to_db.core.utils.selector.select()
function.
-
Item¶
-
class
save_to_db.core.item.
Item
(**kwargs)[source]¶ Bases:
save_to_db.core.item_base.ItemBase
This class is used to collect data for a single item in order to create or update corresponding row or rows in a database.
Note
If you want to create an item class that will be used as a base for other item classed, you ca use abstract argument:
class CustomItemBase(Item, abstract=True): pass # shared staff goes here class ItemA(CustomItemBase): pass class ItemB(CustomItemBase): pass
To configure an item class use next class variables:
Variables: - model_cls –
Reference to an ORM model class of one of the supported ORM libraries.
Note
You can use
print_info()
function fromsave_to_db.info
module to print to console all supported ORM library names and their configurations for use with this library. - batch_size –
Maximum number of models that can be pulled from database with one query. Default: None (defined by used ORM library itself).
See also
BATCH_SIZE class constant of
AdapterBase
class. - defaults –
Dictionary with default values, where value can be either raw value or a callable that accepts an item as an argument and returns a field value.
Note
Defaults order is based on the number of ‘__’ that keys contain, shorter keys are prioritized. This is due to the fact that default values can be another items.
Warning
When using other item instances as default values, do not forget about possible item scoping (see
Scope
class), make sure that related items assigned by default are from the same scope. You can use the fact that items are automatically created when accessed:defaults = { # to make default values compatible with scoping, # use this: 'item_1': lambda item: item['some_key'](field='value') # instead of this: 'item_2': lambda item: OtherItem(field='value') # or this: 'item_3': OtherItem(field='value') }
- creators –
List of groups of fields (sets) of an item. When values for all fields of a group are present in an item, a new record can be created in a database using that group.
See also
creators_autoconfig and autoinject_creators setting.
- creators_autoconfig –
Sets auto configuration for creators fields, possible values:
- True - auto configuration is on (adds to creators if they were manually configured);
- False - auto configuration is off;
- None - auto configuration is on if creators was not configured manually, off otherwise (Default).
See also
creators and autoinject_creators settings.
- autoinject_creators –
If True (default) and creators_autoconfig is also set to True, then not null model fields will be added to each group in creators list. Default: True.
See also
creators and creators_autoconfig settings.
- getters –
List of groups of fields (sets) of an item. When values for all fields of a group are present in an item, then the group can be used to look for a record in a database.
Note
If allow_merge_items is True, getters considered to contain all the unique keys during merging.
- getters_autoconfig – Same as creators_autoconfig but for getters fields.
- nullables –
Set of field names for which value must be set to null or to an empty list (in case of x-to-many relationship) when saving, if value is not present in the item.
Note
You can set this value to True, then all fields will be listed automatically.
- remove_null_fields –
Set of field names which, in case they have None value (or an empty list in case of x-to-many relationship), must be removed from an item upon processing.
Warning
If a field name is listed in nullables, then it will not be deleted.
- remove_null_fields_autoconfig –
Sets auto configuration for remove_null_fields fields, possible values:
- True - auto configuration is on (adds to remove_null_fields if it was manually configured);
- False - auto configuration is off;
- None - auto configuration is on if remove_nulls was not configured manually, off otherwise (Default).
Note
Only not null fields are automatically added to to remove_null_fields.
- relations –
dictionary describing foreign keys (relations), example:
relations = { 'one': { 'item_cls': ItemClassOne, 'replace_x_to_many': False, # default value } 'two': ItemClassTwo, # if we only interested in item class }
Keys are the fields that reference other items (foreign key columns in database) and values are dictionaries with next keys and values:
- ’item_cls’ - item class used to create related item;
- ’replace_x_to_many’ - Only applicable to x-to-many relationships. If True then when saving an item to a database, remove old values for the field from database. Default: False.
- aliases – A dict with field aliases. Dictionary keys are used as aliases, and values are used as actual item field names. Aliases and field names can contain double underscores (“__”).
- conversions –
A dictionary that describes how string values must be converted to proper data types. Default conversions value:
conversions = { 'decimal_separator': '.', 'boolean_true_strings': ('true', 'yes', 'on', '1', '+',), 'boolean_false_strings': ('false', 'no', 'off', '0', '-',), # Format values for dates and times used as arguments for # `datetime.datetime` function 'date_formats': ('%Y-%m-%d',), # can be multiple formats 'time_formats': ('%H:%M:%S',), 'datetime_formats': ('%Y-%m-%d %H:%M:%S',), # Functions conversions work only on values that are not of # string type and not already of required type. # They have priority over date and time formats. # If they return string, it'll be processed using date and # time formats 'date_func': __some_conversion_function__, 'time_func': __some_conversion_function__, 'datetime_func': __some_conversion_function__, # example timezone: `datetime.timezone.utc` 'default_timezone': __some_timzeone__, # default: `None` }
In case of absence of a value from conversions dictionary default value will be used.
Note
For “date_formats”, “time_formats” and “datetime_formats” when only a single value is used, you can use that value instead of a tuple (or a list).
- allow_multi_update – If True then an instance of this class can update multiple models. Default: False.
- allow_merge_items – If True then all items that can potentially pull the same model from a database when persisting are merged into one item. Default: False.
- update_only_mode –
If True then a new model in a database will not be created if it does not exist. Default: False.
Note
Can be overwritten on an instance.
- get_only_mode –
If True then the item data is used only to pull models from database in order to load other models through relationships. Default: False.
Note
Can be overwritten on an instance.
- norewrite_fields –
Dictionary with fields that cannot be changed. Example:
from save_to_db import RelationType norewrite_fields = { # always can rewrite 'field_1': None, # rewrite value if it equals to`None` 'field_2': True, # never rewrite, set only when model is created 'field_3': False, # you can use tuples as keys ('field_4', 'field_5'): None, # referring to all many-to-many fields # (relation type can be used as a key) RelationType.MANY_TO_MANY: None, True: False, # `True` for all the other fields }
Note
- You can use a tuple as a key to set many fields at once.
- You can use True key for all not listed fields, including relations.
- You can use values from
RelationType
enumeration for relation fields.
Note
If norewrite value is True for a field (can rewrite None), for many-to-x relationships the “many” side, even if already existed, can be added to by newly created related models.
- fast_insert – If True then models will not be pulled from database when persisting to database, new models will be created without trying to update. Default: False.
- deleter_selectors – Set of item field names that are going to be collected from processed (created and updated) ORM models and used as selectors. See deleter_keepers for more details.
- deleter_keepers –
Set of item field names that are going to be collected from processed (created and updated) ORM models and used as keepers. Default value: None if deleter_selectors are also None, set of primary key fields otherwise.
After finishing working with all items, you can call
execute_deleter()
method orexecute_scope_deleter()
method ofPersister
class to delete all those models that have same field values that were collected using deleter_selectors, but not the same as collected using deleter_keepers.See also
deleter_execute_on_persist setting.
See also
ModelDeleter
class. An instance of the class is created and stored in item_cls.metadata[‘model_deleter’] class variable (where item_cls is a subclass of this class). - deleter_execute_on_persist – If True then model deleter is executed upon item persistence. Default: False.
- unref_x_to_many –
A dictionary containing x-to-many relationship fields as keys, and model deleter settings as values.
Example:
unref_x_to_many = { 'field_1_x_first': { 'selectors': ['field_1', 'field_2'], # default for keepers: set of primary key fields 'keepers': ['field_a', 'field_b'], }, # if you want to use default keepers (primary key fields), # you can use a shortcut 'field_1_x_first': ['field_x', 'field_y'] }
Upon saving referenced by the fields models are unreferenced (removed from relation) if they can be selected from database by selectors, but cannot be selected using keepers.
Selectors and keepers values are grabbed from the items being added to relations.
See also
deleter_selectors and deleter_keepers.
See also
ModelDeleter
class. Instances of the class are created and stored in item_cls.metadata[‘model_unrefs’] class dictionary (where item_cls is a subclass of this class). Dictionary keys are x-to-many field names, dictionary values are instances ofModelDeleter
class.
-
complete_setup
()¶ This method validates manual configuration of the item and automatically completes configuration based on available data.
-
classmethod
Bulk
(*args, **kwargs)[source]¶ Creates a
BulkItem
instance for this item class.Parameters: - *args – Positional arguments that are passed to bulk item constructor.
- **kwargs – Keyword arguments that are passed to bulk item constructor.
Returns: BulkItem
instance for this item class.
-
after_model_save
(model)[source]¶ A hook method that is called after updating matching ORM model with item data and saving the model to a database.
Parameters: model – Model that was updated.
-
after_process
()[source]¶ A hook method that is called immediately after all fields have been processed.
-
as_list
()[source]¶ Returns a list of items. For single items simply returns a list containing only self, for bulk items returns list of all items.
-
before_model_update
(model)[source]¶ A hook method that is called before updating matching model with item data.
Parameters: model – Model that was pulled from database or freshly created (in case there were no matching models).
-
classmethod
get_item_cls
()[source]¶ Returns class reference of a single item class that this item works with.
-
load_dict
(data)[source]¶ Loads data from dictionary into the item.
Parameters: data – Dictionary with item data (see to_dict()
method).Returns: The item itself.
-
process
()[source]¶ Converts all set values to the appropriate data types, sets default values if needed, calls
process()
method on all referenced items.Returns: Dictionary with single item classes as keys and lists of single item instances as values.
-
classmethod
process_field
(key, value, aliased=True)[source]¶ Converts value to the appropriate data type for the given key.
Parameters: - key – Key using which proper value type can be determined. This value can contain double underscores to reference relations.
- value – Value to process.
- aliased – If it’s True then key contains field aliases.
Returns: Value converted to proper type.
-
resolve_model
(models)[source]¶ This function is called during persisting when two or more models match the same item and allow_multi_update setting is set to False. It must return a single item to be used instead or raise a
save_to_db.exceptions.item_persist.MultipleModelsMatch
exception.Warning
If allow_multi_update option is True, then this function is ignored.
Parameters: models – List of models that this item matches. Returns: Single model to be updated.
-
revert
()[source]¶ Converts all field values into JSON serializable values in such a way that
process()
method converts them bac to original values.
-
classmethod
revert_field
(key, value, aliased=True)[source]¶ Converts field into JSON serializable field in such a way that
process_field()
method converts it back to the original value.Parameters: - key – Key using which proper value type can be determined. This value can contain double underscores to reference relations.
- value – Value to process.
- aliased – If it’s True then key contains field aliases.
Returns: Value converted to proper type.
- model_cls –
ItemMetaclass¶
BulkItem¶
-
class
save_to_db.core.bulk_item.
BulkItem
(item_cls, **kwargs)[source]¶ Bases:
save_to_db.core.item_base.ItemBase
This class deals with instances of
Item
in chunks. It can create or update multiple database rows using single query, e.g. it can persist multiple items at once.Note
You get items values from a bulk item like this:
- bulk[number] return item at a given index number;
- bulk[string] return default value for a key string;
- bulk[slice] (e.g. bulk[1:2]) returns python list of items containing specified items.
Note
Defaults order is based on the number of ‘__’ the key contains, shorter keys are prioritized. This is due to the fact that default values can be another items.
Parameters: - item_cls – A subclass of
Item
that this class deals with. - **kwargs – Values that will be saved as default item data.
-
add
(*items)[source]¶ Adds items to the bulk.
Parameters: *items – List of instances of ItemBase
class to be added to the bulk.
-
add_at_index
(index, *items)[source]¶ Adds items to the bulk at index position. If item already in the bulk, nothing will happen.
Parameters: - index – Starting position in the bulk list.
- *items – List of instances of
ItemBase
class to be added to the bulk.
-
as_list
()[source]¶ Returns a list of items. For single items simply returns a list containing only self, for bulk items returns list of all items.
-
gen
(*args, **kwargs)[source]¶ Creates a
Item
instance and adds it to the bulk.Parameters: - *args – Positional arguments that are passed to the item constructor.
- **kwargs – Keyword arguments that are passed to the item constructor.
Returns: Item
instance.
-
load_dict
(data)[source]¶ Loads data from dictionary into the item.
Parameters: data – Dictionary with item data (see to_dict()
method).Returns: The item itself.
-
model_cls
¶ Property that returns model_cls attribute of the item_cls class.
-
process
()[source]¶ Converts all set values to the appropriate data types, sets default values if needed, calls
process()
method on all referenced items.Returns: Dictionary with single item classes as keys and lists of single item instances as values.
-
remove
(*items)[source]¶ Removes item from the bulk.
Parameters: *items – List of instances of ItemBase
class to be removed from the bulk.
Persister¶
-
class
save_to_db.core.persister.
Persister
(db_adapter, autocommit=False)[source]¶ This class is used to persist items to database or save and load them from files.
Parameters: - db_adapter – Instance of a subclass of
AdapterBase
used to deal with items and ORM models. - autocommit – If True commits changes to database each time an item is persisted.
-
create_merge_policy
(policy, defaults=None)[source]¶ Creates an instance of
MergePolicy
class.Parameters: - policy – policy argument for
MergePolicy
class constructor. - defaults – defaults argument for
MergePolicy
class constructor.
Returns: An instance of
MergePolicy
class.- policy – policy argument for
-
dump
(item, fp)[source]¶ Saves an item into a file.
Note
This method also saves the size of encoded item. So it is possible to save multiple items one after another into the same file and load them later.
Parameters: - item – An item to be saved.
- fp – File-like object to save item into.
-
dumps
(item)[source]¶ Converts an item into bytes.
Parameters: item – An instance of ItemBase
class.Returns: Encoded item as bytes.
-
execute_deleter
(item_cls, commit=None)[source]¶ Deletes models according to deleter_selectors and deleter_keepers. See their description in
Item
configuration.Parameters: - item_cls –
Item
instance for which deletion must be executed. - commit – If True commits changes to database. If None then autocommit value (initially set at creation time) is used.
- item_cls –
-
execute_scope_deleter
(collection_id, commit=None)[source]¶ Calls
execute_deleter()
method for all item classes in scope.Parameters: - collection_id – An ID of an
ItemCollection
(Scope
is a subclass of it). - commit – If True commits changes to database. If None then autocommit value (initially set at creation time) is used.
- collection_id – An ID of an
-
load
(fp)[source]¶ Loads and decodes one item from a file-like object.
Parameters: fp – File-like object to read from. Returns: One item read from fp or None if there are no data to read anymore.
-
loads
(data)[source]¶ Decodes bytes data into an instance of
ItemBase
.Parameters: data – Encoded item as bytes. Returns: An instance of ItemBase
.
-
merge_models
(models, commit=None, merge_policy=None, ignore_fields=None)[source]¶ Calls
merge_models()
method ofAdapterBase
internal instance.Parameters: commit – If True commits changes to database. If None then autocommit value (initially set at creation time) is used. Returns: First model from models into which all other models merged.
-
persist
(item, commit=None)[source]¶ Saves item data into a database by creating or update appropriate database records.
Parameters: - item – an instance of
ItemBase
to persist. - commit – If True commits changes to database. If None then autocommit value (initially set at creation time) is used.
Returns: Item list and corresponding list of ORM model lists.
- item – an instance of
- db_adapter – Instance of a subclass of
ItemCollection¶
-
class
save_to_db.core.utils.item_collection.
ItemCollection
(collection_id)[source]¶ Class for managing collections of
ItemBase
sub-classes.Variables: collection – Set of all item classes in the collection. -
add
(*item_classes)[source]¶ Adds item classes to the collection.
Parameters: *item_classes – List of item classes to add.
-
get_all_item_classes
()[source]¶ Returns all item classes in the collection.
Returns: All item classes in the collection.
-
Scope¶
-
class
save_to_db.core.scope.
Scope
(fixes, collection_id)[source]¶ Bases:
save_to_db.core.utils.item_collection.ItemCollection
Class for scoping
Item
classes.Parameters: - fixes – a dictionary whose keys are item classes (or None for all classes not in the dictionary) and values are class attributes to be replaced.
- collection_id –
Scope ID, can be value of any type as long as it can be a dictionary key.
See also
get_collection_id()
method ofItemBase
class.
Example usage:
class TestItem(Item): model_cls = SomeModel scope = Scope( { TestItem: { 'conversions': { 'date_formats': '%m/%d/%Y', }, }, # for item classes not listed None: { 'conversions': { 'date_formats': '%d.%m.%Y', }, }, # for all item classes True: { 'conversions': { 'datetime_formats': '%Y-%m-%d %H:%M:%S.%f', }, }, }, collection_id="some_collection", # arbitrary unique value ) ScopedTestItem = scope[TestItem] # or `scope.get(TestItem)`
When an item is scoped other items that use the original item in relations are also scoped and their relation data fixed.
-
get
(*item_classes)[source]¶ Excepts non-scoped items and returns corresponding scoped items or original items for items not present in the scope.
Parameters: item_classes – Items for which scoped item versions going to be returned. Returns: List of scoped items. If scope does not have corresponding scoped version of an item class, original class is used.
ItemClsManager¶
-
class
save_to_db.core.item_cls_manager.
ItemClsManager
(autogenerate=False)[source]¶ Bases:
save_to_db.core.utils.item_collection.ItemCollection
This class manages all known item classes (see
Item
base class). Normally only one instance of this class is used.Parameters: autogenerate – If True then getting a missing item by model class will generate a new item class with all parameters auto configured. -
autocomplete_item_classes
()[source]¶ If self.autogenerate is True then auto generates missing item classes for models classes that are needed for already registered item classes via relations.
-
get_by_model_cls
(model_cls, collection_id=None)[source]¶ Gets item classes using ORM model class.
Parameters: - model_cls – An ORM model class that an item is using to persist data.
- collection_id – Scope ID of the item class.
Returns: List of item classes.
-
get_by_path
(path, relative_to=None)[source]¶ Gets item classes using path (module path and class name separated by dot).
Parameters: - path – Item path. Multiple initial dots allowed, e.g. ‘…some_module.SomeClass’.
- relative_to – In case path argument is relative, this arguments contains a path it is relative to.
Returns: List of item classes.
-
-
save_to_db.core.item_cls_manager.
item_cls_manager
= <ItemClsManager()>¶ Normally only this instance of
ItemClsManager
is used.
selector¶
-
save_to_db.core.utils.selector.
select
(item, key)[source]¶ Returns all values under the key from item. If item contains refrences to x-to-many field, the items in that field will be traversed, example:
item_one = ItemOne() # 'two_1_x' is a one-to-many relation item_two = item['two_1_x'].gen() # 'one_x_x' is a many-to-many relation item_two['one_x_x'].gen(integer_value=1) item_two['one_x_x'].gen(integer_value=2) result = item_one.select('two_1_x__one_x_x__integer_value') print(result) # outputs: `[1, 2]`
Parameters: Returns: List of collected values.
ModelDeleter¶
-
class
save_to_db.core.model_deleter.
ModelDeleter
(model_cls, selector_fields, keeper_fields)[source]¶ This class keeps track of new items and is able to find those items in database that are not new in order to delete or unrefrence them (from x-to-many relationships).
Parameters: - model_cls – ORM model class used by items that instance of this class deals with.
- selector_fields – List of fields that are collected from items. The values are used later to pull models from database.
- keeper_fields – Similar to selector_fields. If a model can be pulled from database using values from keeper_fields, then this model is ignored by this class.
-
collect_model
(model)[source]¶ Saves selector_fields from model.
Parameters: model – ORM Model to collect field values from.
-
execute_delete
(db_adapter)[source]¶ Deletes rows from database that can be selected (pulled from database) using values from selectors but excluding those that can be selected by keepers.
Parameters: db_adapter – Database adapter.
-
execute_unref
(parent_model, fkey, db_adapter)[source]¶ Removes x-to-many references in database from x-to-many relationship from parent_model to child model through fkey if refrenced models can be selected using selectors values, exluding those that can be selected by keepers.
Parameters: - parent_model – Parent ORM model.
- fkey – Foreign key field of parent_model.
- db_adapter – Database adapter.
ProxyObject¶
-
class
save_to_db.core.utils.proxy_object.
ProxyObject
(item)[source]¶ Bases:
object
ProxyObject allows to get contents of instances of
ItemBase
using dot notation. Example:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# using item directly item_a = ItemA() item_a["field_x"] = 10 item_a["item_b__field_z"] = 20 # using proxy object and dot notation proxy_a = item_a.get_proxy() # returns instance of `ProxyObject` proxy_a.field_x = 20 proxy_a.item_b__field_z = 20 proxy_a.item_b.field_z = 30 # overwrites previous value proxy_a["field_x"] = 20 # using as a dictionary is also possible # for reference (all prints `True`) print(proxy_a() is item_a) # call proxy itself to get an item print(proxy_a.item_b() is item_a["item_b"]) # proxy returns proxy print(item_a.get_proxy() is proxy_a) # always same proxy print(item_a["item_b"].get_proxy() is proxy_a.item_b)
Note
You can call proxy object itself to get the instance of item.
See also
get_proxy()
method ofItemBase
.Parameters: item – An instance of ItemBase
to be proxied.