Sparrow's data models

All imported data lives in the Sparrow database as tables and relationships upon those tables. From these tables and relationships, Sparrow's backend extracts data models with the help of Sqlalchemy and Marshmallow. In tandem with Sqlalchemy and Marshmallow, Sparrow maps over the database and generates python classes for every data model it extracts from the database. For instance the Sample model represents the Sample table and it's corresponding relationships. These python class data models can be used to search for and create data in Sparrow. In the following tutorial we will explore these data models and learn how to do some basic searching and creating with them.

Sqlalchemy tutorial with Sparrow

In order to follow along with this tutorial you will need a working version of sparrow running locally. If you don't have one currently working you can follow along as I will try to show as much of the expected output as possible. Lets begin!

First go to the command line and type in:

sparrow shell

This command will put you in an iPython environment in the backend directory of Sparrow. We will be able to import and run functions similar to how one would in a Jupyter Notebook.

Once you are in the shell run

db.model

The output you should see is below. This is a list of all the available data models in sparrow. Many of these have names resembling their database table name or their schema and database table name. You can see many models but the most commonly used are: project, sample, session, analysis, datum.

ModelCollection: [
  vocabulary_parameter,
  vocabulary_parameter_link,
  vocabulary_material,
  vocabulary_material_link,
  vocabulary_analysis_type,
  vocabulary_entity_type,
  vocabulary_unit,
  vocabulary_error_metric,
  vocabulary_entity_reference,
  vocabulary_method,
  tags_tag,
  datum,
  analysis,
  standard_sample,
  sample,
  spatial_ref_sys,
  researcher,
  publication,
  geo_entity,
  instrument,
  sample_geo_entity,
  instrument_session,
  attribute,
  data_file_type,
  constant,
  data_file,
  enum_date_precision,
  data_file_link,
  favorite_rock,
  user,
  project,
  session,
  datum_type,
  core_view_datum
]

Here we will look at the sample data model for this tutorial. The first thing we will do is store the sample data model reference in a variable. We'll name it Sample. NOTE: when accessing models from db.model use the attribute "dot" notation, db.model.sample.

Sample = db.model.sample
Sample

If you run just Sample you should see the following output. This means that the Sample data model is an extension of a sqlalchemy python class.

sqlalchemy.ext.automap.sample

Okay, now that we have the Sample model we'll do some searching of the models.

db.session.query(Sample).all()
db.session.query(Sample.name).all()
db.session.query(Sample).filter_by(name="sample_name").all()

Above are 3 commands that will return arrays of the Sample models or empty arrays If no data exist to match the query. Notice how each one begins with db.session.query? db.session is a pythonic link to say, "time to do things on the database." And the method that comes after, query() means we will be querying the database. query() then takes in either the data model, Sample or an attribute of the data model, Sample.name, Sample.material, etc. There are other functions we can chain onto this, like the filter_by method which will filter data models via a **kwarg. Lastly, each query needs an ending method to let the query know the amount of data to bring back, i.e all(), first(). A database query needs this ending function otherwise the query will not run.

Next we will create a simple sample data model with only a name.

sample_1 = Sample(name="fake sample 1")
print(sample_1)
print(sample_1.name)

Creating a new database model is similar to creating a python class instance. In the above code we create an instance of Sample with name fake sample 1 and save it in the variable sample_1. If we print(sample_1) we see a instance of the sample data model without an id. this is because the unqiue id is set when it is added to the database. At this point we have transiently created a fake sample in a local variable. You can access and set attributes on sample_1 via the "dot" notation as well. To view all the attributes and methods you can run help(sample_1).

sample(id: –)
"fake sample 1"

The next two lines will add the sample to the database! If you don't want to add this fake sample to your database DO NOT RUN THEM.

db.session.add(sample_1)
db.session.commit()

In this block we have added our sample, sample_1, to the database transaction session. The next command commits all of the changes in the database transaction session to the database. You can view uncommited things in the session by running db.session.new to see new models, or db.session.dirty to view edited models. If you wish to empty the session without commiting you can run db.session.rollback(). This basically resets the database transaction block.

Summary:

Sparrow uses Sqlalchemy and Marshmallow to map all database tables and relationships in the python classes used to represent the different data models in Sparrow. With these data models we can search for, edit existing, and create new data models.

Learning sqlalchemy syntax and functions takes time. Below are a list of some resources we've used as well as a link to some python tests that heavily use sqlalchemy models.