This article provides a step by step explanation of how to work with AWS Neptune Graph database. We will understand
the basic concept and use-case of graph databases and then create a graph database instance as well as access it
with Jupyter notebooks.
Introduction to Graph Databases
A graph database is a specialized type of NoSQL database that stores data in the form of nodes (vertex) and
relationships (edges). Graph database models are majorly of two types – Labeled Property Graph (LPG) and Resource
Description Framework (RDF). These models are queried using the Gremlin and SPARQL query languages. Graph databases
are generally used when the value in the complexity of relationships between data is extremely complex and of
maximum value. AWS provides a graph database service in the form of a managed service named AWS Neptune that
supports LPG as well as RDF models, and Gremlin as well as SPARQL query languages. In this article, we will learn
how to create an Amazon Neptune database instance and access it with the Jupyter Notebook instance connected to it.
AWS Neptune Database Setup
It is assumed that an AWS account is already in place and one has required privileges on AWS Neptune. Log on to the
AWS Console and type Neptune. A menu item will appear in the search result, click on it and that would navigate to
Neptune home page as shown below.
Click on Launch Amazon Neptune button to initiate the creation of the Neptune graph database. Shown below is how the
database creation wizard would look. The first step is to select the engine version. The default value is the latest
version of the database engine. The next step is to provide the database cluster identifier. Provide a suitable
identifier for the database.
The next setting is to select the template. For this exercise, we can select the development and testing option as
it will result in default settings suitable for a lower non-production environment. The database instance size
selected by default is db.t3.medium, which is good enough for this exercise.
Generally, production databases are deployed in more than one availability zones. But lower environment databases do
not need multi availability zones. So we would continue with No as the option, as shown below. The database must be
hosted inside a VPC, so select a VPC. The default selection is the default VPC as shown below.
Click on the Additional connectivity configuration section to expand it. This section provides options to configure
the network and port related settings for the database. The default port for the AWS Neptune database is 8182. We
can continue with the default options for now. The next section is to add Tags optionally, if required.
Click on the Additional configuration section below the Tags section, and it should look as shown below. This
section provides options to configure the database related options like parameter group, IAM authentication
settings, backups etc. We can continue with the default options here as well.
The next set of options are related to encryption, logs, and maintenance options. Check the Enable Encryption option
as well as the Audit Log option to encrypt the data that would be hosted in the database as well as to generate
Finally, the last property to configure is related to the maintenance of the database. The default setting is No
preference. Ideally, it is advisable to select and specify a window when the maintenance should be performed on the
database. Click on the Create database button to start the database creation process.
Once the database creation process starts, it will be in creating state and will move to Available status. Neptune
cluster by default will have one writer node under the Neptune cluster as shown below.
AWS Neptune Notebook Setup
The next step is to access the newly created Amazon Neptune database. One can directly connect to the database using
the endpoint and the port. Another recommended option is to access the database by creating a notebook instance. We
would now create a notebook instance that would connect to the Neptune database. Click on the Notebooks link on the
left pane, to open the Notebooks section as shown below.
Click on the Create Notebook button to initiate the notebook instance creation wizard, which would look as shown
below. One can select a pre-existing cluster or create a new cluster from here itself. As we have already created a
cluster, we will just select the existing cluster here. The next step is to select the instance size of the
notebook, which is ml.t3.medium by default. This instance size should be enough for this exercise as we just intend
to execute a few commands from the notebook. After instance size selection, provide a relevant name and description
for the notebook instance as shown below.
The next step is to select an IAM role for the instance, so that the instance can connect to the AWS Neptune cluster
and the service and execute commands on the database. Lifecycle configuration allows to execute scripts and setting
to prepare the instance. One has the option to use the default configuration or create a new one as well. For this
exercise, we can continue with the default Neptune configuration. One can optionally configure the network-related
settings for the instance by expanding the Network configuration section. Once all the configuration is done, click
on the Create notebook button to start the Notebook instance creation process.
Once the notebook instance is ready, it would look as shown below in Ready status. This instance is already
configured to connect with the AWS Neptune cluster that we created earlier. Select the cluster and click on the Open
notebook button to open the Jupyter notebook.
Once the Open Notebook button is clicked, the Jupyter notebook interface would show up in a new tab as shown below.
It would contain a folder named Neptune, which would have subfolders and notebooks containing tutorials on executing
commands on the AWS Neptune graph database.
Show below is one such notebook, which contains basic commands to test the cluster and add as well as modify data in
the jupyter notebook instance. Execute the %status command to check the health of the cluster. Click on the cell
that has this command listed and click on the Run button from the toolbar. This will execute the command against the
Amazon Neptune cluster, and the output would look as shown below.
The result shows that the cluster state is healthy. It will list the database engine version, gremlin version,
sparql version, and other details as listed below. The rest of the notebook contains commands to enter data in the
database and perform different data manipulation operations on the same.
Consider trying out the different notebooks to learn about how to use Gremlin as well as SPARQL to work with the AWS
Neptune graph database. Once the exercise is over and if there is no further need of the database instance as well
as notebook instance, consider stopping the notebook instance and then deleting it from the same page from where the
notebook instance was created. After the instance is deleted, consider deleting the Amazon Neptune database cluster
as well, to avoid incurring any further charges.
In this way, by creating an Amazon Neptune cluster as well as a Jupyter notebook instance, one can get quickly
started working with the graph database on AWS.
In this article, we learned how to create an AWS Neptune graph database cluster. We created a Jupyter notebook
instance connected with the database instance. We learned how to configure these instances as well as operating on
the database by executing queries from the notebook, to get started with the Amazon Neptune graph database.