bw2data#

Subpackages#

Submodules#

Package Contents#

Classes#

DataStore

Base class for all Brightway2 data stores. Subclasses should define:

Database

A base class for SQLite backends.

IndexManager

JsonWrapper

Normalization

LCIA normalization data - used to transform meaningful units, like mass or damage, into "person-equivalents" or some such thing.

ProcessedDataStore

Brightway2 data stores that can be processed to NumPy arrays.

Searcher

Weighting

LCIA weighting data - used to combine or compare different impact categories.

Functions#

extract_brightway_databases(database_names[, ...])

Extract a Brightway2 SQLiteBackend database to the Wurst internal format.

get_activity([key])

Support multiple ways to get exactly one activity node.

get_id(key)

get_node(**kwargs)

prepare_lca_inputs([demand, method, weighting, ...])

Prepare LCA input arguments in Brightway 2.5 style.

set_data_dir(dirpath[, permanent])

Set the Brightway2 data directory to dirpath.

Attributes#

Edge

Node

calculation_setups

config

databases

dynamic_calculation_setups

geomapping

mapping

methods

normalizations

parameters

preferences

projects

weightings

class bw2data.DataStore(name)[source]#

Base class for all Brightway2 data stores. Subclasses should define:

  • metadata: A serialized-dict instance, e.g. databases or methods. The custom is that each type of data store has a new metadata store, so the data store Foo would have a metadata store foos.

  • validator: A data validator. Optional. See bw2data.validate.

property filename#

Remove filesystem-unsafe characters and perform unicode normalization on self.name using filesystem.safe_filename().

property registered#
_intermediate_dir = 'intermediate'#
_metadata#
metadata#
validator#
_get_metadata()#
_set_metadata(value)#
backup()#

Save a backup to backups folder.

Returns

File path of backup.

copy(name)#

Make a copy of this object with a new name.

This method only changes the name, but not any of the data or metadata.

Parameters

name (*) – Name of the new object.

Returns

The new object.

deregister()#

Remove an object from the metadata store. Does not delete any files.

load()#

Load the intermediate data for this object.

Returns

The intermediate data.

register(**kwargs)#

Register an object with the metadata store. Takes any number of keyword arguments.

validate(data)#

Validate data. Must be called manually.

write(data)#

Serialize intermediate data to disk.

Parameters

data (*) – The data

class bw2data.Database(name=None, *args, **kwargs)#

Bases: peewee.Model

A base class for SQLite backends.

Subclasses must support at least the following calls:

  • load()

  • write(data)

In addition, they should specify their backend with the backend attribute (a unicode string).

  • rename

  • copy

  • find_dependents

  • random

  • process

For new classes to be recognized by the DatabaseChooser, they need to be registered with the config object, e.g.:

config.backends['backend type string'] = BackendClass

Instantiation does not load any data. If this database is not yet registered in the metadata store, a warning is written to stdout.

The data schema for databases in voluptuous is:

exchange = {
        Required("input"): valid_tuple,
        Required("type"): basestring,
        }
exchange.update(uncertainty_dict)
lci_dataset = {
    Optional("categories"): Any(list, tuple),
    Optional("location"): object,
    Optional("unit"): basestring,
    Optional("name"): basestring,
    Optional("type"): basestring,
    Optional("exchanges"): [exchange]
}
db_validator = Schema({valid_tuple: lci_dataset}, extra=True)
where:
  • valid_tuple is a dataset identifier, like ("ecoinvent", "super strong steel")

  • uncertainty_fields are fields from an uncertainty dictionary.

Processing a Database actually produces two parameter arrays: one for the exchanges, which make up the technosphere and biosphere matrices, and a geomapping array which links activities to locations.

Parameters

*name* (unicode string) – Name of the database to manage.

property _metadata#
property filename#

Remove filesystem-unsafe characters and perform unicode normalization on self.name using filesystem.safe_filename().

property metadata#
property node_class#
property registered#
backend#
depends#
dirty#
extra#
filters#
geocollections#
name#
order_by#
searchable#
validator#
_add_indices()#
_drop_indices()#
_efficient_write_dataset(index, key, ds, exchanges, activities)#
_efficient_write_many_data(data, indices=True)#
_get_filters()#
_get_order_by()#
_get_queryset(random=False, filters=True)#
_iotable_edges_to_dataframe() pandas.DataFrame#

Return a pandas DataFrame with all database exchanges. DataFrame columns are:

target_id: int, target_database: str, target_code: str, target_name: Optional[str], target_reference_product: Optional[str], target_location: Optional[str], target_unit: Optional[str], target_type: Optional[str] source_id: int, source_database: str, source_code: str, source_name: Optional[str], source_product: Optional[str], # Note different label source_location: Optional[str], source_unit: Optional[str], source_categories: Optional[str] # Tuple concatenated with “::” as in bw2io edge_amount: float, edge_type: str,

Target is the node consuming the edge, source is the node or flow being consumed. The terms target and source were chosen because they also work well for biosphere edges.

As IO Tables are normally quite large, the DataFrame building will operate directly on Numpy arrays, and therefore special formatters are not supported in this function.

Returns a pandas DataFrame.

_set_filters(filters)#
_set_order_by(field)#
_sqlite_edges_to_dataframe(categorical: bool = True, formatters: Optional[List[Callable]] = None) pandas.DataFrame#
add_geomappings(data)#
backup()#

Save a backup to backups folder.

Returns

File path of backup.

classmethod clean_all()#
copy(name)#

Make a copy of the database.

Internal links within the database will be updated to match the new database name, i.e. ("old name", "some id") will be converted to ("new name", "some id") for all exchanges.

Parameters

name (*) – Name of the new database. Must not already exist.

datapackage()#
delete_data(keep_params=False, warn=True)#

Delete all data from SQLite database and Whoosh index

delete_duplicate_exchanges(fields=['amount', 'type'])#

Delete exchanges which are exact duplicates. Useful if you accidentally ran your input data notebook twice.

To determine uniqueness, we look at the exchange input and output nodes, and at the exchanges values for fields fields.

delete_instance()#
deregister()#

Legacy method to remove an object from the metadata store. Does not delete any data.

dirpath_processed()#
edges_to_dataframe(categorical: bool = True, formatters: Optional[List[Callable]] = None) pandas.DataFrame#

Return a pandas DataFrame with all database exchanges. Standard DataFrame columns are:

target_id: int, target_database: str, target_code: str, target_name: Optional[str], target_reference_product: Optional[str], target_location: Optional[str], target_unit: Optional[str], target_type: Optional[str] source_id: int, source_database: str, source_code: str, source_name: Optional[str], source_product: Optional[str], # Note different label source_location: Optional[str], source_unit: Optional[str], source_categories: Optional[str] # Tuple concatenated with “::” as in bw2io edge_amount: float, edge_type: str,

Target is the node consuming the edge, source is the node or flow being consumed. The terms target and source were chosen because they also work well for biosphere edges.

Args:

categorical will turn each string column in a pandas Categorical Series. This takes 1-2 extra seconds, but saves around 50% of the memory consumption.

formatters is a list of callables that modify each row. These functions must take the following keyword arguments, and use the Wurst internal data format:

  • node: The target node, as a dict

  • edge: The edge, including attributes of the source node

  • row: The current row dict being modified.

The functions in formatters don’t need to return anything, they modify row in place.

Returns a pandas DataFrame.

exchange_data_iterator(sql, dependents, flip=False)#

Iterate over exchanges and format for bw_processing arrays.

dependents is a set of dependent database names.

flip means flip the numeric sign; see bw_processing docs.

Uses raw sqlite3 to retrieve data for ~2x speed boost.

classmethod exists(name)#
filename_processed()#
filepath_intermediate()#
filepath_processed(clean=True)#
find_dependents(data=None, ignore=None)#

Get sorted list of direct dependent databases (databases linked from exchanges).

Parameters
  • data (*) – Inventory data

  • ignore (*) – List of database names to ignore

Returns

List of database names

find_graph_dependents()#

Recursively get list of all dependent databases.

Returns

A set of database names

get_node(code=None, **kwargs)#
graph_technosphere(filename=None, **kwargs)#
load(*args, **kwargs)#
make_searchable(reset=False)#
make_unsearchable()#
new_activity(code, **kwargs)#
new_node(code=None, **kwargs)#
nodes_to_dataframe(columns: Optional[List[str]] = None, return_sorted: bool = True) pandas.DataFrame#

Return a pandas DataFrame with all database nodes. Uses the provided node attributes by default, such as name, unit, location.

By default, returns a DataFrame sorted by name, reference product, location, and unit. Set return_sorted to False to skip sorting.

Specify columns to get custom columns. You will need to write your own function to get more customization, there are endless possibilities here.

Returns a pandas DataFrame.

process(csv=False)#

Create structured arrays for the technosphere and biosphere matrices.

Uses bw_processing for array creation and metadata serialization.

Also creates a geomapping array, linking activities to locations. Used for regionalized calculations.

Use a raw SQLite3 cursor instead of Peewee for a ~2 times speed advantage.

query(*queries)#

Search through the database.

random(filters=True, true_random=False)#

True random requires loading and sorting data in SQLite, and can be resource-intensive.

register(write_empty=True, **kwargs)#

Legacy method to register a database with the metadata store. Writing data automatically sets the following metadata:

  • depends: Names of the databases that this database references, e.g. “biosphere”

  • number: Number of processes in this database.

relabel_data(data, new_name)#

Relabel database keys and exchanges.

In a database which internally refer to the same database, update to new database name new_name.

Needed to copy a database completely or cut out a section of a database.

For example:

data = {
    ("old and boring", 1):
        {"exchanges": [
            {"input": ("old and boring", 42),
            "amount": 1.0},
            ]
        },
    ("old and boring", 2):
        {"exchanges": [
            {"input": ("old and boring", 1),
            "amount": 4.0}
            ]
        }
    }
print(relabel_database(data, "shiny new"))
>> {
    ("shiny new", 1):
        {"exchanges": [
            {"input": ("old and boring", 42),
            "amount": 1.0},
            ]
        },
    ("shiny new", 2):
        {"exchanges": [
            {"input": ("shiny new", 1),
            "amount": 4.0}
            ]
        }
    }

In the example, the exchange to ("old and boring", 42) does not change, as this is not part of the updated data.

Parameters
  • data (*) – The data to modify

  • new_name (*) – The name of the modified database

Returns

The modified data

rename(name)#

Rename a database. Modifies exchanges to link to new name.

Parameters

name (*) – New name.

Returns

self # Backwards compatibility

search(string, **kwargs)#

Search this database for string.

The searcher include the following fields:

  • name

  • comment

  • categories

  • location

  • reference product

string can include wild cards, e.g. "trans*".

By default, the name field is given the most weight. The full weighting set is called the boost dictionary, and the default weights are:

{
    "name": 5,
    "comment": 1,
    "product": 3,
    "categories": 2,
    "location": 3
}

Optional keyword arguments:

  • limit: Number of results to return.

  • boosts: Dictionary of field names and numeric boosts - see default boost values above. New values must be in the same format, but with different weights.

  • filter: Dictionary of criteria that search results must meet, e.g. {'categories': 'air'}. Keys must be one of the above fields.

  • mask: Dictionary of criteria that exclude search results. Same format as filter.

  • facet: Field to facet results. Must be one of name, product, categories, location, or database.

  • proxy: Return Activity proxies instead of raw Whoosh documents. Default is True.

Returns a list of Activity datasets.

classmethod set_dirty(name)#
set_geocollections()#

Set geocollections attribute for databases which don’t currently have it.

validate(data)#
write(data, process=True)#

Write data to database.

data must be a dictionary of the form:

{
    ('database name', 'dataset code'): {dataset}
}

Writing a database will first deletes all existing data.

write_exchanges(technosphere, biosphere, dependents)#

Write IO data directly to processed arrays.

Product data is stored in SQLite as normal activities. Exchange data is written directly to NumPy structured arrays.

Technosphere and biosphere data has format (row id, col id, value, flip).

class bw2data.IndexManager(database_path, dir_name='whoosh')#
_format_dataset(ds)#
add_dataset(ds)#
add_datasets(datasets)#
create()#
delete_database()#
delete_dataset(ds)#
get()#
update_dataset(ds)#
class bw2data.JsonWrapper[source]#
classmethod dump(data, filepath)#
classmethod dump_bz2(data, filepath)#
classmethod dumps(data)#
classmethod load(file)#
classmethod load_bz2(filepath)#
classmethod loads(data)#
class bw2data.Normalization[source]#

Bases: bw2data.ia_data_store.ImpactAssessmentDataStore

Inheritance diagram of bw2data.Normalization

LCIA normalization data - used to transform meaningful units, like mass or damage, into “person-equivalents” or some such thing.

The data schema for IA normalization is:

Schema([
    [valid_tuple, maybe_uncertainty]
])
where:
  • valid_tuple is a dataset identifier, like ("biosphere", "CO2")

  • maybe_uncertainty is either a number or an uncertainty dictionary

_metadata#
matrix = 'normalization_matrix'#
validator#
process_row(row)#

Given (flow key, amount), return a dictionary for array insertion.

class bw2data.ProcessedDataStore(name)[source]#

Bases: DataStore

Inheritance diagram of bw2data.ProcessedDataStore

Brightway2 data stores that can be processed to NumPy arrays.

In addition to metadata and (optionally) validator, subclasses should override add_geomappings. This method takes the entire dataset, and loads objects to geomapping as needed.

matrix = 'unknown'#
add_geomappings(data)#

Add objects to geomapping, if necessary.

Parameters

data (*) – The data

datapackage()#
dirpath_processed()#
filename_processed()#
filepath_processed()#
process(**extra_metadata)#

Process intermediate data from a Python dictionary to a stats_arrays array, which is a NumPy Structured Array. A structured array (also called record array) is a heterogeneous array, where each column has a different label and data type.

Processed arrays are saved in the processed directory.

If the uncertainty type is no uncertainty, undefined, or not specified, then the ‘amount’ value is used for ‘loc’ as well. This is needed for the random number generator.

Doesn’t return anything, but writes a file to disk.

abstract process_row(row)#

Translate data into a dictionary suitable for array inputs.

See bw_processing documentation.

validate(data)#

Validate data. Must be called manually.

write(data, process=True)#

Serialize intermediate data to disk.

Parameters

data (*) – The data

class bw2data.Searcher(database)#
search(string, limit=25, facet=None, proxy=True, boosts=None, filter=None, mask=None, node_class=None)#
class bw2data.Weighting[source]#

Bases: bw2data.ia_data_store.ImpactAssessmentDataStore

Inheritance diagram of bw2data.Weighting

LCIA weighting data - used to combine or compare different impact categories.

The data schema for weighting is a one-element list:

Schema(All(
    [uncertainty_dict],
    Length(min=1, max=1)
))
_metadata#
matrix = 'weighting_matrix'#
validator#
process_row(row)#

Return an empty tuple (as dtype_fields is empty), and the weighting uncertainty dictionary.

write(data)#

Because of DataStore assumptions, need a one-element list

bw2data.extract_brightway_databases(database_names, add_properties=False, add_identifiers=False)[source]#

Extract a Brightway2 SQLiteBackend database to the Wurst internal format.

database_names is a list of database names. You should already be in the correct project.

Returns a list of dataset documents.

bw2data.get_activity(key=None, **kwargs)[source]#

Support multiple ways to get exactly one activity node.

key can be an integer or a key tuple.

bw2data.get_id(key)#
bw2data.get_node(**kwargs)[source]#
bw2data.prepare_lca_inputs(demand=None, method=None, weighting=None, normalization=None, demands=None, remapping=True, demand_database_last=True)[source]#

Prepare LCA input arguments in Brightway 2.5 style.

bw2data.set_data_dir(dirpath, permanent=True)[source]#

Set the Brightway2 data directory to dirpath.

If permanent is True, then set dirpath as the default data directory.

Creates dirpath if needed. Also creates basic directories, and resets metadata.

bw2data.Edge[source]#
bw2data.Node[source]#
bw2data.calculation_setups[source]#
bw2data.config[source]#
bw2data.databases[source]#
bw2data.dynamic_calculation_setups[source]#
bw2data.geomapping[source]#
bw2data.mapping[source]#
bw2data.methods[source]#
bw2data.normalizations[source]#
bw2data.parameters#
bw2data.preferences[source]#
bw2data.projects[source]#
bw2data.weightings[source]#