Direct Database Access

Getting Started

It is often convenient to directly access the database for for research purposes or generating custom reports. This guide will help get you started. The first steps are to get an account and install pgAdmin.

A database account may be used to extract any data from the database, which can then be fed to the tool of your choice. Access is limited to read-only.

The database can be accessed over the internet from any host. Connections are encrypted and password protected.

Getting an Account

Although the user ids are the same, accounts to access the database are separate from accounts to use the web site, and have a different password.

To request a database account, send an email to the site manager.

If approved, you will receive a reply with a password for the account.

pgAdmin

pgAdmin is a database client that provides both GUI and command line interfaces to Postgresql databases. The GUI can be used to see the general layout of the database and get quick answers. The command line is often used to generate CSV (comma separated value) files that can be consumed by a variety of programs including Excel.

You can vist the main website or directly download the Windows installer.

Run the installer and start pgAdmin.

pgAdmin has a hierarchy of:

  •         Server Group
  •                 Server
  •                         Database

From the GUI use Object -> Create -> Server Group and create a group called Servers.

Right-click on Servers and use Create -> Server to create a server.

Name the server AQA. Click the Connection tab and for the Host name/address enter:

        aqa.cek8wjwn06iu.us-west-2.rds.amazonaws.com

The other fields should be filled as shown below, substituting your login for myDbUserName:

Click Save and then double-click on AQA. It will prompt for your password. After entering the password, you may optionally save it on the computer where pgAdmin is installed.

To see the tables and columns, expand: AQA -> Databases -> Schemas -> public -> Tables

To look at data, right-click on the table and click View Data.

Refer to the pgAdmin documentation for more details.

The screen shot below shows a typical session:

Command Line Access

pgAdmin provides a psql which may be invoked from the command line. To invoke it, add it to your Windows PATH and enter:
psql postgresql://aqa.cek8wjwn06iu.us-west-2.rds.amazonaws.com:5432/AQA myDbUserName
From this point standard SQL queries will work. Some quick tips:


List all tables.
\dt


List columns in given table.
\d+ epid


List all rows in the epid table (remember the terminating ;).
select * from epid;


General help
\help


Specific help
\help select


Quit psql
\q

Examples

Listing raw table contents can appear cryptic because of the heavy use of integer primary keys used. The following are some queries that return results in more user friendly formats. These are intended as a starting point for users to copy and modify to meet specific needs.

If you need help or have a suggestion for a query to add to this document, please send it to the manager.


List all treatment machines


select
    "institution"."name" as "Institution",
    "machine"."id" as "Machine ID",
    "machine"."machinePK",
    "machineType"."version" as "Machine Version",
    "multileafCollimator"."model" as "Collimator",
    "epid"."model" as "EPID"
from
    "machine", "institution", "machineType", "multileafCollimator", "epid"
where
    ( "machine"."institutionPK" = "institution"."institutionPK" ) and
    ( "machine"."machineTypePK" = "machineType"."machineTypePK" ) and
    ( "machine"."multileafCollimatorPK" = "multileafCollimator"."multileafCollimatorPK" ) and
    ( "machine"."epidPK" = "epid"."epidPK" )
order by
    "institution"."name", "machine"."id"
;

          
Results of query in pgAdmin:


List Outputs


-- List up to 10 outputs (runs of procedures) ordered by data acquisition date for procedure 'Phase2'

select
    "output"."dataDate",
    "output"."outputPK",
    "procedure"."name" as "Procedure",
    "institution"."name" as "Institution",
    "machine"."id" as "Machine",
    "output"."status"
from
    "output",
    "procedure",
    "institution",
    "machine"
where
    ( "output"."machinePK" = "machine"."machinePK")              and
    ( "output"."procedurePK" = "procedure"."procedurePK")        and
    ( "procedure"."name" = 'Phase2')                             and
    ( "institution"."institutionPK" = "machine"."institutionPK") and
    (
        ( "output"."status" = 'pass' ) or
        ( "output"."status" = 'fail' ) or
        ( "output"."status" = 'done' )
    )
order by
    "output"."dataDate"
asc
limit 10;
          
Results of query in pgAdmin from command line:


List LOC Leaf Data


-- List the transmission values for one run of procedure 'LOC' for one leaf

select
    "leafTransmission"."section",
    "leafTransmission"."leafIndex",
    "leafTransmission"."transmission_fract"
from
    "leafTransmission"
where
     "leafTransmission"."leafIndex" = 21
order by
    "leafTransmission"."section", "leafTransmission"."leafIndex"
;

          
Results of query in pgAdmin from command line:


Make a CSV File

To show results in comma separated value (csv) format that can be read by Excel, put the above query in a file called leaf_transmission.sql and run the command:
psql --quiet --no-align --tuples-only --field-separator=, --file=leaf_transmission.sql --output leaf_transmission.csv postgresql://aqa.cek8wjwn06iu.us-west-2.rds.amazonaws.com:5432/AQA myDbUserName 
          

The results will be put in leaf_transmission.csv, and look like the following. This file can be opened with Excel.
1,21,0.01367995
2,21,0.01355522
3,21,0.01349938
4,21,0.01367342
5,21,0.01286042
          
And when opened with Excel:

Database Structure

As a general note, all primary keys are 64 bit auto-incrementing integers.

Although the meaning of much of database is obvious, some less obvious are:

Procedure

A procedure (aka test) that the user would run against data. Procedures have version numbers to track changes in analysis code.

Input

Each time a user uploads data, an Input row is created and the files are scanned for the dataDate (the time when the data was acquired) and put in their own directory.

Output

When a user runs a procedure, an Output row is created and the files generated by the procedure are put in a separate output directory as a child of the input.


Below is a graphical view of the entire database. In general, the left hand side is for machine description, and the physics data is on the right.



Programmatic Access

It is possible to access the database from a program. Many software platforms such as MATLAB, .NET, Python, or Power BI support direct connections to Postgresql databases. Refer to the respective documentation for details.