:py:mod:`bw2data.backends.base` =============================== .. py:module:: bw2data.backends.base Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: bw2data.backends.base.Database Attributes ~~~~~~~~~~ .. autoapisummary:: bw2data.backends.base.SQLiteBackend bw2data.backends.base._VALID_KEYS bw2data.backends.base.monitor .. py:class:: Database(name=None, *args, **kwargs) Bases: :py:obj:`peewee.Model` .. autoapi-inheritance-diagram:: bw2data.backends.base.Database :parts: 1 :private-bases: A base class for SQLite backends. Subclasses must support at least the following calls: * ``load()`` * ``write(data)`` In addition, they should specify their backend with the ``backend`` attribute (a unicode string). * ``rename`` * ``copy`` * ``find_dependents`` * ``random`` * ``process`` For new classes to be recognized by the ``DatabaseChooser``, they need to be registered with the ``config`` object, e.g.: .. code-block:: python config.backends['backend type string'] = BackendClass Instantiation does not load any data. If this database is not yet registered in the metadata store, a warning is written to ``stdout``. The data schema for databases in voluptuous is: .. code-block:: python exchange = { Required("input"): valid_tuple, Required("type"): basestring, } exchange.update(uncertainty_dict) lci_dataset = { Optional("categories"): Any(list, tuple), Optional("location"): object, Optional("unit"): basestring, Optional("name"): basestring, Optional("type"): basestring, Optional("exchanges"): [exchange] } db_validator = Schema({valid_tuple: lci_dataset}, extra=True) where: * ``valid_tuple`` is a :ref:`dataset identifier `, like ``("ecoinvent", "super strong steel")`` * ``uncertainty_fields`` are fields from an :ref:`uncertainty dictionary `. Processing a Database actually produces two parameter arrays: one for the exchanges, which make up the technosphere and biosphere matrices, and a geomapping array which links activities to locations. :param \*name*: Name of the database to manage. :type \*name*: unicode string .. py:property:: _metadata .. py:property:: filename Remove filesystem-unsafe characters and perform unicode normalization on ``self.name`` using :func:`.filesystem.safe_filename`. .. py:property:: metadata .. py:property:: node_class .. py:property:: registered .. py:attribute:: backend .. py:attribute:: depends .. py:attribute:: dirty .. py:attribute:: extra .. py:attribute:: filters .. py:attribute:: geocollections .. py:attribute:: name .. py:attribute:: order_by .. py:attribute:: searchable .. py:attribute:: validator .. py:method:: _add_indices() .. py:method:: _drop_indices() .. py:method:: _efficient_write_dataset(index, key, ds, exchanges, activities) .. py:method:: _efficient_write_many_data(data, indices=True) .. py:method:: _get_filters() .. py:method:: _get_order_by() .. py:method:: _get_queryset(random=False, filters=True) .. py:method:: _iotable_edges_to_dataframe() -> pandas.DataFrame Return a pandas DataFrame with all database exchanges. DataFrame columns are: target_id: int, target_database: str, target_code: str, target_name: Optional[str], target_reference_product: Optional[str], target_location: Optional[str], target_unit: Optional[str], target_type: Optional[str] source_id: int, source_database: str, source_code: str, source_name: Optional[str], source_product: Optional[str], # Note different label source_location: Optional[str], source_unit: Optional[str], source_categories: Optional[str] # Tuple concatenated with "::" as in `bw2io` edge_amount: float, edge_type: str, Target is the node consuming the edge, source is the node or flow being consumed. The terms target and source were chosen because they also work well for biosphere edges. As IO Tables are normally quite large, the DataFrame building will operate directly on Numpy arrays, and therefore special formatters are not supported in this function. Returns a pandas ``DataFrame``. .. py:method:: _set_filters(filters) .. py:method:: _set_order_by(field) .. py:method:: _sqlite_edges_to_dataframe(categorical: bool = True, formatters: Optional[List[Callable]] = None) -> pandas.DataFrame .. py:method:: add_geomappings(data) .. py:method:: backup() Save a backup to ``backups`` folder. :returns: File path of backup. .. py:method:: clean_all() :classmethod: .. py:method:: copy(name) Make a copy of the database. Internal links within the database will be updated to match the new database name, i.e. ``("old name", "some id")`` will be converted to ``("new name", "some id")`` for all exchanges. :param \* *name*: Name of the new database. Must not already exist. :type \* *name*: str .. py:method:: datapackage() .. py:method:: delete_data(keep_params=False, warn=True) Delete all data from SQLite database and Whoosh index .. py:method:: delete_duplicate_exchanges(fields=['amount', 'type']) Delete exchanges which are exact duplicates. Useful if you accidentally ran your input data notebook twice. To determine uniqueness, we look at the exchange input and output nodes, and at the exchanges values for fields ``fields``. .. py:method:: delete_instance() .. py:method:: deregister() Legacy method to remove an object from the metadata store. Does not delete any data. .. py:method:: dirpath_processed() .. py:method:: edges_to_dataframe(categorical: bool = True, formatters: Optional[List[Callable]] = None) -> pandas.DataFrame Return a pandas DataFrame with all database exchanges. Standard DataFrame columns are: target_id: int, target_database: str, target_code: str, target_name: Optional[str], target_reference_product: Optional[str], target_location: Optional[str], target_unit: Optional[str], target_type: Optional[str] source_id: int, source_database: str, source_code: str, source_name: Optional[str], source_product: Optional[str], # Note different label source_location: Optional[str], source_unit: Optional[str], source_categories: Optional[str] # Tuple concatenated with "::" as in `bw2io` edge_amount: float, edge_type: str, Target is the node consuming the edge, source is the node or flow being consumed. The terms target and source were chosen because they also work well for biosphere edges. Args: ``categorical`` will turn each string column in a `pandas Categorical Series `__. This takes 1-2 extra seconds, but saves around 50% of the memory consumption. ``formatters`` is a list of callables that modify each row. These functions must take the following keyword arguments, and use the `Wurst internal data format `__: * ``node``: The target node, as a dict * ``edge``: The edge, including attributes of the source node * ``row``: The current row dict being modified. The functions in ``formatters`` don't need to return anything, they modify ``row`` in place. Returns a pandas ``DataFrame``. .. py:method:: exchange_data_iterator(sql, dependents, flip=False) Iterate over exchanges and format for ``bw_processing`` arrays. ``dependents`` is a set of dependent database names. ``flip`` means flip the numeric sign; see ``bw_processing`` docs. Uses raw sqlite3 to retrieve data for ~2x speed boost. .. py:method:: exists(name) :classmethod: .. py:method:: filename_processed() .. py:method:: filepath_intermediate() .. py:method:: filepath_processed(clean=True) .. py:method:: find_dependents(data=None, ignore=None) Get sorted list of direct dependent databases (databases linked from exchanges). :param \* *data*: Inventory data :type \* *data*: dict, optional :param \* *ignore*: List of database names to ignore :type \* *ignore*: list :returns: List of database names .. py:method:: find_graph_dependents() Recursively get list of all dependent databases. :returns: A set of database names .. py:method:: get_node(code=None, **kwargs) .. py:method:: graph_technosphere(filename=None, **kwargs) .. py:method:: load(*args, **kwargs) .. py:method:: make_searchable(reset=False) .. py:method:: make_unsearchable() .. py:method:: new_activity(code, **kwargs) .. py:method:: new_node(code=None, **kwargs) .. py:method:: nodes_to_dataframe(columns: Optional[List[str]] = None, return_sorted: bool = True) -> pandas.DataFrame Return a pandas DataFrame with all database nodes. Uses the provided node attributes by default, such as name, unit, location. By default, returns a DataFrame sorted by name, reference product, location, and unit. Set ``return_sorted`` to ``False`` to skip sorting. Specify ``columns`` to get custom columns. You will need to write your own function to get more customization, there are endless possibilities here. Returns a pandas ``DataFrame``. .. py:method:: process(csv=False) Create structured arrays for the technosphere and biosphere matrices. Uses ``bw_processing`` for array creation and metadata serialization. Also creates a ``geomapping`` array, linking activities to locations. Used for regionalized calculations. Use a raw SQLite3 cursor instead of Peewee for a ~2 times speed advantage. .. py:method:: query(*queries) Search through the database. .. py:method:: random(filters=True, true_random=False) True random requires loading and sorting data in SQLite, and can be resource-intensive. .. py:method:: register(write_empty=True, **kwargs) Legacy method to register a database with the metadata store. Writing data automatically sets the following metadata: * *depends*: Names of the databases that this database references, e.g. "biosphere" * *number*: Number of processes in this database. .. py:method:: relabel_data(data, new_name) Relabel database keys and exchanges. In a database which internally refer to the same database, update to new database name ``new_name``. Needed to copy a database completely or cut out a section of a database. For example: .. code-block:: python data = { ("old and boring", 1): {"exchanges": [ {"input": ("old and boring", 42), "amount": 1.0}, ] }, ("old and boring", 2): {"exchanges": [ {"input": ("old and boring", 1), "amount": 4.0} ] } } print(relabel_database(data, "shiny new")) >> { ("shiny new", 1): {"exchanges": [ {"input": ("old and boring", 42), "amount": 1.0}, ] }, ("shiny new", 2): {"exchanges": [ {"input": ("shiny new", 1), "amount": 4.0} ] } } In the example, the exchange to ``("old and boring", 42)`` does not change, as this is not part of the updated data. :param \* *data*: The data to modify :type \* *data*: dict :param \* *new_name*: The name of the modified database :type \* *new_name*: str :returns: The modified data .. py:method:: rename(name) Rename a database. Modifies exchanges to link to new name. :param \* *name*: New name. :type \* *name*: str :returns: self # Backwards compatibility .. py:method:: search(string, **kwargs) Search this database for ``string``. The searcher include the following fields: * name * comment * categories * location * reference product ``string`` can include wild cards, e.g. ``"trans*"``. By default, the ``name`` field is given the most weight. The full weighting set is called the ``boost`` dictionary, and the default weights are:: { "name": 5, "comment": 1, "product": 3, "categories": 2, "location": 3 } Optional keyword arguments: * ``limit``: Number of results to return. * ``boosts``: Dictionary of field names and numeric boosts - see default boost values above. New values must be in the same format, but with different weights. * ``filter``: Dictionary of criteria that search results must meet, e.g. ``{'categories': 'air'}``. Keys must be one of the above fields. * ``mask``: Dictionary of criteria that exclude search results. Same format as ``filter``. * ``facet``: Field to facet results. Must be one of ``name``, ``product``, ``categories``, ``location``, or ``database``. * ``proxy``: Return ``Activity`` proxies instead of raw Whoosh documents. Default is ``True``. Returns a list of ``Activity`` datasets. .. py:method:: set_dirty(name) :classmethod: .. py:method:: set_geocollections() Set ``geocollections`` attribute for databases which don't currently have it. .. py:method:: validate(data) .. py:method:: write(data, process=True) Write ``data`` to database. ``data`` must be a dictionary of the form:: { ('database name', 'dataset code'): {dataset} } Writing a database will first deletes all existing data. .. py:method:: write_exchanges(technosphere, biosphere, dependents) Write IO data directly to processed arrays. Product data is stored in SQLite as normal activities. Exchange data is written directly to NumPy structured arrays. Technosphere and biosphere data has format ``(row id, col id, value, flip)``. .. py:data:: SQLiteBackend .. py:data:: _VALID_KEYS .. py:data:: monitor :value: True