Improving performance with indexes

Creating a database and managing it, if you have about 100 users, isn’t terribly difficult. The trouble comes when the number of users that you have grows exponentially. Suddenly, you realise there needs to be little room for potential error in the design and more importantly you realise the importance of space. Every little counts! Finally, your website’s performance is of paramount priority. If you’re drawing on a ‘student’ table (like we were in one of our applications) and it’s taking 10 seconds instead of 1….something needs to change.

Using indexes can usually help. But they can be incredibly harmful when used incorrectly. (I’ll explain)

Why are we interested in indexing? It’s an easy way of improving performance of your database.

What is an index?

An index is a structure that exists independent of the data in your table that improves the query optimiser’s ability to execute your query quickly.

An analogy- think of the index that you find at the back of a textbook. The analogy is not perfect, but it aids understanding. Say you needed to find every instance of the word ‘moose’ in a very large book about mooses.

You could start at word 1, line 1, page 1 and go through the entire book finding all the instances. Or, you could look up the index, if it exists. You would go to the back of the book and in the index, go to the letter ‘M’, and then find the word. In this magical place, you would be shown every instance of the word ‘moose’ as well as a direct pointer to where the word can be found in every case. It’s efficient and saves you a load of time! Now, imagine if the author decided to add a couple of pages to the middle of the book. What happens to all the index values? That’s right – they’d have to change too. Hopefully this simple analogy helps you see both the advantages and the potential difficulties that can arise with the use of indexes.

The execution in databases is different but the concept is similar.

So, how does indexing work?

Remember an index is just a pointer to the data in a table. It often has to do with ordering by a particular field. This allows more efficient ‘searching (eliminating the need to scan). You could also think of an index as something that represents the book (in our example) in keyword or subject order. The index will basically limit the number of records that we have to look at to satisfy our query. If you have a student table with student information, you could create an index (a separate structure) ordered by ID. Each ID corresponds to data in the student table (a pointer defines this)

The problem however, is that adding or deleting data will affect the index!

Indexes therefore will drastically speed up some queries but can also slow down the insertion and deletion of data. So, generally data retrieval is improved but data modification can be made slower and less efficient.

The three main types of indexes:

Single column, Composite (more than one column is used for the index e.g. lastname and fistname) and Unique (doesn’t allow duplicate values).

So when should you create an index?

  1. Only create an index on columns that are frequently used in the WHERE clause.
  2. Only create an index on columns that contain a high number of unique values

When should you avoid indexes?

  1. If you have a small table! (<8000 rows)
  2. If you have column values are not highly unique (e.g. M/F, Y/N etc). There’s no point if half of the values are just ‘M’ and the other are ‘F’.
  3. If you have columns that are being frequently updated.
  4. If you have large composite indexes.

Sql server has a tool called the ‘index tuning wizard‘ which can be very useful in the building of indexes.

Databases: Sqlite, MySql or PostgreSql?

Most functional and large websites have databases. At the most basic level, you may use a database to store information (like username and password) about users.

As a web developer, you have plenty of options for databases. Django comes with SqlLite inbuilt, but this will not be powerful enough if your application takes off and your’e dealing with hundreds of thousands of customers. The good news is that as a beginner, you don’t have to worry about the installation of a database to begin with. You can start working with the inbuilt database right away and simply use the python manage.py makemigrations and python manage.py migrate commands to commit the changes you make in the models to the database. The other most popular options, and the ones you are most likely to come across are MySql and PostgreSQL.

The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.

I found this article which provides a nice comparison of all three, with key features, advantages and disadvantages. https://stackshare.io/stackups/mysql-vs-postgresql-vs-sqlite

How to install (say MySql) on your localhost.

Now that you’ve explored the alternatives, you need to know how to actually work with a database on your local host.

Since PHP and MySQL are both server-side languages, you will need a server to practice it.

To make a local host on your computer, you can install either XAMPP or WAMP.

You can download XAMPP here: https://www.apachefriends.org/download.html and another useful tutorial for working with MySql is here: https://www.mysqltutorial.org/install-mysql/

Django and Geolocation

Technology means we have the whole world within reach. Geolocation (think Google maps) has changed the way we think, work, search, and is an integral part of several sites and apps. Consider a website like bookatable …the first step involves searching for restaurants near you (by postcode). This involves searching by location. Check out https://www.uber.com as another example and analyse how it works.

Image result for django geolocation

On googling geolocation, it is obvious that Django has objects, wrappers and libraries that cover this but it was less than obvious to know where to start. I first found this,

https://docs.djangoproject.com/en/1.11/ref/contrib/gis/geoip/

realizing it was a module that is now deprecated. I then found this, which is specifically for IP-based geolocation: https://docs.djangoproject.com/en/2.2/ref/contrib/gis/geoip2/

But there’s also: GeoDjango – GeoDjango intends to be a world-class geographic Web framework. Its goal is to make it as easy as possible to build GIS Web applications and harness the power of spatially enabled data. The official documentation for this can be found here: https://docs.djangoproject.com/en/2.2/ref/contrib/gis/

As you research geolocation and Django, you’ll realise that to create a proper functional app, you’ll need some sort of database to keep track of things too.

For instance, you could use a spatial database (PostgreSQL and PostGIS) to get benefits from the spatial features and easily implement location-aware web apps.

A term you’ll come across a lot is: GIS. GIS stands for Geographic Information System. It’s an information system (an organized system for the collection, organization, storage, and communication of information) designed for processing and manipulating data that have geographic or spatial characteristics.

GeoDjango also provides Python bindings to popular spatial libraries such as GEOSGDAL, and GeoIP, which can be used separately without Django in any Python application or interactively in the shell.

GeoDjango basically makes it easier to work with geospatial data, in other words data that identifies the geographic location of natural or artificial features on Earth and is stored as coordinates and topologies.

Finally, a few more useful notes: GeoDjango integrates beautifully with the Django ORM and provides a set of geometry fields defined by the Open Geospatial Consortium (OGS) that can be used to map to different types of geometries in geospatial databases: It is handy to be aware of the following, that you’d come across a lot if working with GeoDjango.

What essentials do I need to work with GeoDjango?

To be able to work with GeoDjango, you’ll need to have two things: a spatial database and geospatial libraries. A spatial database is a database that is optimized for storing and querying data that represents objects defined in a geometric space.

A good geolocation and Django tutorial: – The following tutorial will point you in the right direction, with some great explanations, for creating a location based app with geodjango.

https://realpython.com/location-based-app-with-geodjango-tutorial/

As ever, it’s useful to have a look at projects actually created using GeoDjango, and here are a few links that may be useful.

Projects – Downloads

GeographicAdmin – A sample project that highlights the Django Admin and Databrowse with GeoDjango spatial features (embedded OpenLayers maps) is available, with installations instructions at GeographicAdminQuickStart. This project includes sample data.

GeoIP – A sample project that utilizes the GeoDjango wrapper around the GeoIP python library, used for mapping IP addresses is available, with installations instructions at GeoIPQuickStart.

For further offerings look in SVN Trunk to see what is available.

Finding relevant Github projects

When starting out, a valuable thing to do is to find sutiable or similar Github projects (ideally open source) that you can play with. It isn’t wholly obvious how to do an advanced search, so I’ve included a link below.

https://github.com/search/advanced

Use the search options and filters to look for exactly what you want. Specify the language and Django version too, if necessary.

Two interesting projects I found are linked to below. One, a content management system and the other a simple chatroom.

Chat app: https://github.com/narrowfail/django-channels-chathttps://github.com/narrowfail/django-channels-chat
Mezzanine -content management system: https://github.com/stephenmcd/mezzaninehttps://github.com/stephenmcd/mezzanine

Django – instant communication apps

Everything on the web is moving toward instant communication With the rise of whatsapp and facebook messenger, it’s the new way we ‘speak’. Before long, you’ll realise that there’s a Django-equivalent for whaetver it is you want to do.

On looking up chat and Django, I came across Django Channels, for which the official documentation is here: https://channels.readthedocs.io/en/latest/

Channels is a project that takes Django and extends its abilities beyond HTTP – to handle WebSockets, chat protocols, IoT protocols, and more. It’s built on a Python specification called ASGI

There’s an interesting youtube video which shows the potential of this sort of technology in a uber-type website with instant messaging and geolocation enabled.

Additionally, this is the result (see web link below) when you follow the Django Channels tutorial and set up a chatroom.

https://django-channels-example.herokuapp.com/

Understanding Github

To the beginner github, which most developers will assume you just use and understand, can appear quite confounding. You’ll hear words like ‘repository’, ‘commit’, ‘pull’ and as complicated as it sounds, it would only take an hour or so of your time to set up an account and completely understand how it all works.

Github is a version control system. That’s a fancy phrase for simply having some code, and keeping track of the changes you make to it. This is handy, especially for big projects, and if you are collaborating. It’s also useful for deployment (which is the word we use for when you’re actually making your project live on the web and putting it on a web host (more on those in a future post).

The best tutorial I’ve come across on Github is produced by Github itself.

https://guides.github.com/activities/hello-world/

Check out the link and tutorial above and start by following the instructions to create (and play with) your own repository.

Once you understand the basic terminology you can look into doing the same actions using a shell.

First steps with Django – what you’ll learn

The learning curve with Django, if you’re an absolute beginner, will leave you buzzing with various key words. Among them: virtual environment, MVC framework, models, views, routes, server, browser, local host, requests, return, rendering, URL. If you’re reading these words and none of them make any sense, you need to start by understanding some of the basics. Look up each word, and briefly read about it to gain a context.

In essence though, try and understand this.

Your end goal, when creating a website, is for a user to go to a URL (e.g. they type in http://www.learndjango.io) and on doing so ‘view’ your page, right? If you want it to be even more exciting, you’d hope your web page would allow some sort of interactivity.

On your end – as the web developer – you (1) set up the route to allow this to happen (this is defined in the urls.py files). (2) You also define the views – these basically return the view (often an html file) which the user will see in the browser. (3). Obviously, if the view is an html file, you need to create the html file! HTML files today are rarely on their own. They come accompanied by style sheets (CSS) which make them pretty and create custom formatting. (4) if you want your website to actually do something, you would need to have a database and define it using the models.py file. Finally, (5) you’d need to work with forms, and in order to do so you’d need to learn Django Templating Language (an odd sort of language that involves curly brackets and the percent sign) and this goes in your html forms to create the necessary logic to allow for interactivity and drawing from your database.

Getting started with Django: Get to know the command prompt!

Getting ready to get ready to get ready to get started ….

You probably don’t remember being three years old and learning to read, but getting started with Django is a bit like that! One does not simply begin to read – we first go through the decomposition stage (unconsciously perhaps) and realise the things that we need to know before we can begin to learn to read. For instance, the magic that is reading cannot happen unless you know the alphabet, have some understanding of phonics, and so on.

What is Django?

So … Django. Django is the ‘web framework for perfectionists with deadlines”. Essentially, it will allow you to work with the python programming language 9and this is the distinguishing feature) in the back-end and make a beautiful website that everyone can interact with. Everyone’s talking about Django and sites like Instagram were built using it. Job sites confirm that Django is on the rise, offering freelancers and contract developers in excess of £500 per day for Django related skills.

The thing no one explicitly tells you need to know …

So what do you need to know to get started? Well, the truth is, for the absolute absolute beginner, you can get started with knowing very little python (or none at all) and again, the most basic understanding of the front-end (html and css). What you do need to know however, is how to use the command line prompt (CMD). No one tells you this explicitly and an absolute beginner gets lost (or scared) trying to get past this first stage, which is actually very simple. So first things first, before you head over to the official Django tutorial, understand what the CMD is, how it works and a few basic commands. The rest will follow ….

What is the Command Prompt?

The Command Prompt in Windows provides access to over 280 commands! These commands are used to do certain operating system tasks from a command line interface instead of the graphical Windows interface we use most of the time.

For example, Command Prompt commands let you copy data to a different folder, format an entire disk, back up your files, send messages to other computers, restart your own computer, and much more. There are also several Command Prompt tricks and hacks that utilize some of these commands. 

Here’s a great site to get started: https://www.lifewire.com/command-prompt-tricks-and-hacks-2618104

A little about us …

A little about us: I’ve  been teaching Computing for 10+ years, co-founded and run the following sites, http://www.teachyourselfpython.com, http://www.teachingcomputing.com and http://www.testandtrack.io and continue to teach in a secondary school in the UK. My husband works part time as a lawyer and part time as a preacher, and together we spend our spare time running businesses and learning new things. Python is wonderful, and finding out about Django was even more wonderful. This blog will detail our journey with Django and web development, including information about absolutely everything you may need to know if you’re thinking of starting an online business or creating the next big app ………..