Getting Started
It is often convenient to directly access the database for for research purposes or
generating custom reports. This guide will help get you started. The
first steps are to get an account and install pgAdmin.
A database account may be used to extract any data from the
database, which can then be fed to the tool of your choice. Access is
limited to read-only.
The database can be accessed over the internet from any host.
Connections are encrypted and password protected.
Getting an Account
Although the user ids are the same, accounts to access the database
are separate from accounts to use the web site, and have a
different password.
To request a database account, send an email to the site
manager.
If approved, you will receive a reply with a password for the account.
pgAdmin
pgAdmin is a database client that provides both GUI and command line
interfaces to Postgresql databases. The GUI can be used to see the
general layout of the database and get quick answers. The command line
is often used to generate CSV (comma separated value) files that can
be consumed by a variety of programs including Excel.
You can vist the
main website or
directly download the
Windows
installer.
Run the installer and start pgAdmin.
pgAdmin has a hierarchy of:
- Server Group
-
Server
-
Database
From the GUI use
Object ->
Create ->
Server Group
and create a group called
Servers.
Right-click on
Servers and use
Create ->
Server
to create a server.
Name the server
AQA. Click the
Connection tab and for
the
Host name/address enter:
aqa.cek8wjwn06iu.us-west-2.rds.amazonaws.com
The other fields should be filled as shown below, substituting your
login for
myDbUserName:
Click
Save and then double-click on
AQA. It will prompt
for your password. After entering the password, you may optionally
save it on the computer where pgAdmin is installed.
To see the tables and columns, expand:
AQA ->
Databases ->
Schemas ->
public ->
Tables
To look at data, right-click on the table and click
View Data.
Refer to the
pgAdmin documentation for more details.
The screen shot below shows a typical session:
Command Line Access
pgAdmin provides a
psql which may be invoked from the command line. To invoke it, add it
to your Windows
PATH and enter:
psql postgresql://aqa.cek8wjwn06iu.us-west-2.rds.amazonaws.com:5432/AQA myDbUserName
From this point standard SQL queries will work. Some quick tips:
List all tables.
\dt
List columns in given table.
\d+ epid
List all rows in the epid table (remember the terminating
;).
select * from epid;
General help
\help
Specific help
\help select
Quit psql
\q
Examples
Listing raw table contents can appear cryptic because of the heavy use of integer primary keys used.
The following are some queries that return results in more user friendly formats. These are intended as
a starting point for users to copy and modify to meet specific needs.
If you need help or have a suggestion for a query to add to this document, please send it to the
manager.
List all treatment machines
select
"institution"."name" as "Institution",
"machine"."id" as "Machine ID",
"machine"."machinePK",
"machineType"."version" as "Machine Version",
"multileafCollimator"."model" as "Collimator",
"epid"."model" as "EPID"
from
"machine", "institution", "machineType", "multileafCollimator", "epid"
where
( "machine"."institutionPK" = "institution"."institutionPK" ) and
( "machine"."machineTypePK" = "machineType"."machineTypePK" ) and
( "machine"."multileafCollimatorPK" = "multileafCollimator"."multileafCollimatorPK" ) and
( "machine"."epidPK" = "epid"."epidPK" )
order by
"institution"."name", "machine"."id"
;
Results of query in pgAdmin:
List Outputs
-- List up to 10 outputs (runs of procedures) ordered by data acquisition date for procedure 'Phase2'
select
"output"."dataDate",
"output"."outputPK",
"procedure"."name" as "Procedure",
"institution"."name" as "Institution",
"machine"."id" as "Machine",
"output"."status"
from
"output",
"procedure",
"institution",
"machine"
where
( "output"."machinePK" = "machine"."machinePK") and
( "output"."procedurePK" = "procedure"."procedurePK") and
( "procedure"."name" = 'Phase2') and
( "institution"."institutionPK" = "machine"."institutionPK") and
(
( "output"."status" = 'pass' ) or
( "output"."status" = 'fail' ) or
( "output"."status" = 'done' )
)
order by
"output"."dataDate"
asc
limit 10;
Results of query in pgAdmin from command line:
List LOC Leaf Data
-- List the transmission values for one run of procedure 'LOC' for one leaf
select
"leafTransmission"."section",
"leafTransmission"."leafIndex",
"leafTransmission"."transmission_fract"
from
"leafTransmission"
where
"leafTransmission"."leafIndex" = 21
order by
"leafTransmission"."section", "leafTransmission"."leafIndex"
;
Results of query in pgAdmin from command line:
Make a CSV File
To show results in comma separated value (csv) format that can be read by Excel, put the above query in a file called
leaf_transmission.sql and run the command:
psql --quiet --no-align --tuples-only --field-separator=, --file=leaf_transmission.sql --output leaf_transmission.csv postgresql://aqa.cek8wjwn06iu.us-west-2.rds.amazonaws.com:5432/AQA myDbUserName
The results will be put in
leaf_transmission.csv, and look like the following. This file can be opened with Excel.
1,21,0.01367995
2,21,0.01355522
3,21,0.01349938
4,21,0.01367342
5,21,0.01286042
And when opened with Excel:
Database Structure
As a general note, all primary keys are 64 bit auto-incrementing integers.
Although the meaning of much of database is obvious, some less obvious are:
Procedure
A procedure (aka test) that the user would run against data. Procedures have
version numbers to track changes in analysis code.
Input
Each time a user uploads data, an
Input row is created and the files
are scanned for the
dataDate (the time when the data was acquired) and
put in their own directory.
Output
When a user runs a procedure, an
Output row is created and the files
generated by the procedure are put in a separate output directory as a
child of the input.
Below is a graphical view of the entire database. In general, the left hand side
is for machine description, and the physics data is on the right.
Programmatic Access
It is possible to access the database from a program.
Many software platforms such as MATLAB, .NET, Python, or Power BI support
direct connections to Postgresql databases. Refer to the respective
documentation for details.