REVIEW:
In the Database 101 post we discussed how databases are used for tracking and managing objects using records. Sometimes those records are digital objects like .pdf documents stored directly in the database file. Sometimes the records in the database simply hold metadata about objects such as employees who exist outside of the database file. We also learned about how databases store the records in a file on disk but use smaller sorted indexes stored in RAM for fast searching. Indexes contain only one small piece or metadata as well as the location of the complete record on disk that the index refers to.In this post we will discuss two acronyms that stand for properties that databases demonstrate called CRUD and ACID.
CRUD:
CRUD is an acronym for Create, Read, Update and Delete. These are the four basic operations that the DBMS must allow on the records in the database. It may seem obvious that if you have a collection of records for managing objects, you may want to:1. create new records
2. read back records that you stored in the database file
3. update those records when some property of the tracked object changes
4. delete a record for an object that you no longer want in the collection.
An example of CRUD for an database of employee records would be the following. You will want to CREATE a record for each employee in the company. Issue a query to READ back all employee records who work in a particular office. UPDATE an employee record when their location moves to a new office as well as DELETE employees when the leave the company.
SMITH indexes point to SMITH record locations |
Databases files by themselves simply store records. It's the DBMS application that presents an interface to the user to display and modify those records. For relational DBMSes there is an agreed upon method or language for how the users communicates with the DBMS. One such language is SQL or Structured Query Language. As you can imagine the language has words for all four basic CRUD operations.
Example of SQL statements |
MULTIPLE USERS AND CONCURRENCY:
If you are creating a database that only has one user who issues SQL statements one at a time, your database implementation would be quite simple. However, most DBMSes are the backend record storage for front-end applications that have multiple users modifying records at the same time. This presents several problems that ACID attempts to define and solve.ACID:
ACID stands for Atomicity pronounced "Atom-miss-city", Consistency, Isolation and Durability. Let's look at each one-by-one.A brief explanation follows:
- Atomicity - pronounced "atom-miss-city". By "atomic" we mean "as a single atom or unit". This is a method of grouping data together as single unit so that the atomic unit of data is either entered into the database or not. It's either "All or nothing" as partially updating a record in the database can have bad results in real-life. Imagine you deposit $1M dollars into your bank account but the deposit transaction only updated the database with your account number but not the deposit amount. You would be very unhappy if you deposited $1M but the database transaction to record that failed before increasing your balance. If you were the bank teller who was using the software application that allows for entering a check deposit, you need instant application feedback about the success or failure when you hit submit after entering such things as the account number and deposit amount. The teller needs confirmation that the transaction was entire transaction completed successfully or failed completely and nothing about that deposit was added to the database. When failure happens for any reason because the transaction was an all or nothing the teller is free to attempt the transaction again without fear that he or she is adding a second $1M deposit.
- Consistency - can be thought of as... will an update to a database record field leave the database in a good or consistent state? Imagine if you designed your US customer database to store a customer's phone number. Here in the US, we use a 10 digit phone number that consists of a 3 digit area code followed by a 3 digit local prefix and a 4 digit number to uniquely identify a phone endpoint in that locality. For example (605-475-6968). Imagine if a request to create a record with only a four digit number, 6968, came into the database. This erroneous piece of data would leave that particular record in a logically in-consistent or corrupted state.
- Isolation - Because databases allow multiple updates to be occurring at the same time, each transaction must be isolated from the others to prevent unwanted side-effects of transactions "stepping" on one another. An example of when this can be tricky is if a database were to delete a record and before the associate index could be removed, a second transaction to find records matching a certain criteria found the index for that record and went to retrieve the record that was just deleted. This would cause an error because the two transactions stepped on one another in time.
- Durability - Once a transaction is stored in the database it must be durable in the sense of it will exist even in the face of a computer power outage (reboot). Modern databases often store the received data in RAM due to it's speed matches that of the CPU. However, RAM is volatile if power is lost. To be durable the transaction is often written to a database log file before being acknowledged as "written" back to the application which asked to store data.
TAKEAWAYS:
Modern DBMSs store metadata about entities that front-end applications submit using SQL. DBMSs store their data in a database file for persistence and provide abilities such as query, indexing, CRUD, ACID and concurrency. In future posts we will discuss aspects of different databases in terms of CRUD and ACID.