Storage engines:
Engines--> the word is kind of synonymous with motors in vehicles.First i want you to know that there are many storage engines. For example, mysql has
- innoDB
- BDB
- myISAM
- Heap
- archive etc...
- myISAM is the default storage engine for a table in mysql. It allows fast storage and retrieval. myISAM type tables are non transactional in nature.
- innoDB is another popular storage engine which is used to create transaction safe tables. It also supports foreign key referential constraints.
- Archive is used when large amount of data is to be stored which doesn’t need indexing and are not retrieved often. They leave only small footprint (memory footprint--> amount of RAM used)
For those wondering what is transaction safe, here is the explanation
Transactional vs Non-transactional:
Transaction--> a single logical operation on data. It is to be noted that it's not a single database operation instead it's a single logical operation !
For example, booking a ticket is a logical operation. It consists of database operations like updating status of a ticket, deducting money in your account and then adding it to the railway’s account.
A table created by a storage engine which ensures reliable transactions are called transactional tables. Tables which do not guarantee reliable transactions are called non-transactional tables.
For reliable transactions ACID property is to be followed,
- A - Atomicity: If any one part of transaction fails then the whole transaction must fail.
- C - Consistency: The database should always remain at some valid state. In other words a transaction must change the database from one valid state to another.
- I - Isolation: Concurrent execution and serial execution must result in same final state.This is the most relaxed property and is executed using locks.
- D - Durability: A completed transaction should remain completed even if the database crashes after the completion of the transaction. In other words a completed transaction must be permanent.
It means creation of new data structure for reading records of a database faster. Any column or combination of columns can be indexed. It mostly involves creation of B-trees in the case of indexing integers. (B-trees allow retrieval of any data in logarithmic time)
Indexing has its disadvantages like,
- occupies large space
- makes insertions and updations costlier (since indexes have to be modified each time)
My experience:
Today we discussed as a team about creating a database schema for a real time application. The application chosen was a social interaction app. It has 6 main modules namely,
- messaging - both private and public
- feed
- groups
- file sharing
- events
- posts
I was allotted the messaging module.
The messaging module we designed.
Note: We- me and my partner.
We had two choices. - To have one single table where all messages could be stored. or
- One table for each user having his messages alone.
Can u guess the reason why it is bad?
It's because when we have 1000s of users accessing their messages, 1000 tables should be opened. This puts enormous amount of pressure on the memory.
So we decided to go for one table approach.
After many minutes of discussion we came up with an idea of following schema,
Table name: messages
Attributes: msgid (primary key) , msgtext, datetime, sender_id, receiver_id, type
Table name: deleted_msgs
Attributes: msgid (foreign key referencing msgid of messages table) , user_id (foreign key referencing user_id of user table)
Table name: multicast_msgs
Attributes: msgid (foreign key referencing msgid of messages table) , user_id (foreign key referencing user_id of user table)
idea was,
receiver id has
- 0 if the message is to be broadcasted,
- -1 if the message is sent to a many people
- user_id/group_id if the message is sent to a specific person or group
- 0 if the message is for a user
- 1 if the message is for a group
Problems faced:
1. on deletion of a particular message from a user’s inbox the message will also get deleted from sender’s sent box.
A new table deleted_msgs was introduced to solve this problem. deleted_msgs table will be updated if a user deletes a msg from his inbox or sent box. The particular msg_id and the user_id will be inserted into deleted_msgs table. So when retrieving messages for a user it will be checked if that particular message is marked deleted for that user.
2. When sending message to multiple persons each person has to be informed , to whom else the message has been sent.
A new table multicast_msgs was introduced to solve this problem. New entries will be inserted in the multicast_msgs table if a user sends one message to more than one person. The msg_id along with the id of the person whom the message is intended for will be inserted into this table.
This avoids duplication of the message content. At the same time all recipients of a particular message can be obtained from this table.
Now the most queried columns (columns after WHERE clause) would be sender_id , receiever_id and type.
Index for sender_id is created separately , Combined index of receiver_id and type is then created.
In other two tables user_id and msg_id are indexed separately.
So that’s it for today. You can see whether the schema for messaging module is correct in tomorrow’s post. Thanks for reading. See you in next post.
Trained @ Sourcebits
No comments:
Post a Comment