Search This Blog

Friday, July 20, 2012

DBMS day-4 working with tables

Today i will be explaining some DBMS syntax and concepts based on my experience with mysql.
Read my previous post before reading further to have better understanding.
So for the messaging module we have created a good schema. Now it's time for creating tables and inserting values into them.

Creating tables:

create table messages (msg_id int auto_increment primary key, msg_text text,sender_id int, sent_time datetime not null, sent_type enum('public','private') default 'private',delete_status boolean  default 0,foreign key(sender_id) references users(user_id)) type='innodb';

create table messages_users( msg_id int , rcvr_id int,read_status boolean default 0, delete_status boolean default 0,foreign key(msg_id) references messages(msg_id), foreign key(rcvr_id) references users(user_id)) type='innodb';

the above two statements create messages and message_users table with required properties.
It is to be noted that unless you use innoDB as storage engine you can’t use foreign key constraints !

Inserting values:

insert into messages values (default,'Hi Raj',1,current_timestamp,'private',default);

Here note the usage of the word default. It's usage allows the default value for that field to be substituted automatically.
And current_timestamp automatically gets the current date and time and inserts in the field having type datetime.

Here i give description about the two tables am querying,

messages table:
+---------------+--------------------------+------+-----+---------+----------------+
| Field         | Type                     | Null | Key | Default | Extra          |
+---------------+--------------------------+------+-----+---------+----------------+
| msg_id        | int(11)                  | NO   | PRI | NULL    | auto_increment |
| msg_text      | text                     | YES  |     | NULL    |                |
| sender_id     | int(11)                  | YES  | MUL | NULL    |                |
| sent_time     | datetime                 | NO   |     | NULL    |                |
| sent_type     | enum('public','private') | YES  |     | private |                |
| delete_status | tinyint(1)               | YES  |     | 0       |                |
+---------------+--------------------------+------+-----+---------+----------------+


messages_users table:

+---------------+------------+------+-----+---------+-------+
| Field         | Type       | Null | Key | Default | Extra |
+---------------+------------+------+-----+---------+-------+
| msg_id        | int(11)    | YES  | MUL | NULL    |       |
| rcvr_id       | int(11)    | YES  | MUL | NULL    |       |
| read_status   | tinyint(1) | YES  |     | 0       |       |
| delete_status | tinyint(1) | YES  |     | 0       |       |
+---------------+------------+------+-----+---------+-------+


RETRIEVE ALL PUBLIC MESSAGES:

select first_name,msg_text,sent_time from messages inner join users on messages.sender_id=users.user_id and sent_type='public';

The above query uses inner join. Usage of inner queries or nested queries is not preferred since mysql makes good optimisations when using joins. So usage of join (like the above query) is better.

RETRIEVE ALL ALIVE MESSAGES(messages that are not deleted):

select msg_text, sent_time from messages inner join messages_users on messages.msg_id=messages_users.msg_id and messages.delete_status=0 and messages_users.delete_status=0;




USING LEFT JOIN (selects public messages also) :

select msg_text from messages left join messages_users on messages.id=messages_users.msg_id where messages_users.msg_id is NULL or (messages.delete_status=0 and messages_users.delete_status=0);

RETRIEVE ALL USER MESSAGES:


(all messages except public messages)
select msg_text from messages inner join messages_users on messages.msg_id=messages_users.msg_id and (sender_id=2 or rcvr_id=2);

(including public messages)

select msg_text from messages inner join messages_users on messages.msg_id=messages_users.msg_id and (sender_id=4 or rcvr_id=4) or sent_type='public' group by messages.msg_id;




USING LEFT JOIN


select msg_text from messages left join messages_users on messages.id=messages_users.msg_id WHERE messages_users.msg_id is NULL or (messages.user_id=2 or messages_users.user_id=2);

(only received with public messages using left join)


select msg_text from messages LEFT join messages_users on (messages.id=messages_users.msg_id) where messages_users.msg_id is NULL  or messages_users.user_id=1;

Select all read messages and unread messages user_specific.

select msg_text from messages inner join messages_users on messages.msg_id=messages_users.msg_id and messages_users.read_status=0 and rcvr_id=1;

This query gets all unread/read msgs of an user.

Receiver has read ur messages or not?

select msg_text,read_status from messages inner join messages_users on messages.msg_id=messages_users.msg_id and sender_id=1;

Conversation:

select msg_text,sent_time as Time from messages inner join messages_users on messages.msg_id=messages_users.msg_id and (sender_id between 1 and 2) and (rcvr_id between 1 and 2) order by sent_time;

Select messages for today alone:

select count(*) from messages where sent_time rlike current_date;



The above queries can be taken as examples for learning joins, and clause , or clause , count , NULL , NOT NULL , like , order by and few other database concepts.


I will explain more concepts which are not dealt in examples above here,


LIMIT --> used to limit the number of records being selected.
For example, select * from table1 limit 10;
This will display only first 10 records of the table.


OFFSET --> used to set the starting point for limit.
For example, select * from table2 LIMIT 10 OFFSET 5
This will display 10 records after the 5th record.


It is to be noted that , select * from table1 LIMIT 5,10 does the same as above example. (notice the change in order of numbers!)


UNION --> Union combined result set of two sql queries.
For example, select * from t1 union select * from t2.
For this to happen t1 and t2 must have same number of columns and the data type of columns from t1 should match with t2.
It is to be noted that union automatically removes duplicate values. Only distinct values are shown.

SUB QUERIES --> They are very helpful in certain situations. It involves usage of select query inside some other query.

For example, You have a employee table and club member table. You need to find the employees who are member of the club.
select * from employee where employee_id in (select id from club);
This query does the job easily.
However it is not advised to use sub queries when other methods are possible since they are inefficient. The above query can be replaced by using a single join query.


That’s it for today see you in next blog !


Trained @ Sourcebits

Thursday, July 19, 2012

DBMS day-3


Today i would tell you about how our database schema got shaped at the hands of an experienced person and other basic operations in dbms ! Read previous post of mine before reading this post for better understanding.

Relations between tables are important:

  • It's very important to understand how tables are related. In previous post i have explained about the database schema me and my partner created. We didn’t look into relations deeply. That created some problems.
  • The messages table is related with user table. A user can create many messages and a message can be sent or viewed by many users.
  • So the relation is many to many here.
  • When we have many to many relationship we need to have some intermediate table for proper database design.
  • The concept of sending messages to groups has been removed to make things simple.

New database schema designed by guidance:

Table name: messages
Attributes: msg_id, sender_id, msg_text, datetime, send_type, deleted_status

Table name: user_messages
Attributes: msg_id, receiver_id, delete_status

the table user_messages acts as an intermediate table between messages and user.

send_type - public/private. Public messages can be viewed by everyone.  
Now we have two simple tables instead of three. This seems a good database schema design.

Note: if many to many create intermediate table

The other modules of the application we are designing were also designed based on the relationship among tables.

Some basic operations and concepts in database:

create table messages (
msg_id int auto_increment primary key,
msg_text text,
created_time datetime not null,
send_type enum('public','private') default 'private',
delete_status bool default false );


alter table messages add sender_id int references user(user_id) after msg_id;

create table user_messages (
msg_id references messages(msg_id),
user_id references user(user_id),
delete_status bool default false,
primary key(msg_id,user_id)
);
  
From the above sql statements,
you can get an idea about create table,alter table, primary key,foreign key, auto_increment, text, not null, default etc..
Explaining about each syntax is not necessary. There are many good sites for reading about those concepts.


Thanks for reading this post of mine. See you in next blog entry !



Trained @ Sourcebits

Wednesday, July 18, 2012

DBMS day-2


Today i will be explaining more concepts about dbms like storage engines, transactions and i will also tell about my experiences for the day.

Storage engines:
Engines--> the word is kind of synonymous with motors in vehicles.First i want you to know that there are many storage engines. For example, mysql has


  • innoDB
  • BDB
  • myISAM
  • Heap
  • archive etc...


  1. myISAM is the default storage engine for a table in mysql. It allows fast storage and retrieval. myISAM type tables are non transactional in nature.
  2. innoDB is another popular storage engine which is used to create transaction safe tables. It also supports foreign key referential constraints.
  3. Archive is used when large amount of data is to be stored which doesn’t need indexing and are not retrieved often. They leave only small footprint (memory footprint--> amount of RAM used)

For those wondering what is transaction safe, here is the explanation

Transactional vs Non-transactional:
Transaction--> a single logical operation on data. It is to be noted that it's not a single database operation instead it's a single logical operation !
For example, booking a ticket is a logical operation. It consists of database operations like updating status of a ticket, deducting money in your account and then adding it to the railway’s account.
A table created by a storage engine which ensures reliable transactions are called transactional tables. Tables which do not guarantee reliable transactions are called non-transactional tables.

For reliable transactions ACID property is to be followed,

  • A - Atomicity: If any one part of transaction fails then the whole transaction must fail.
  • C - Consistency: The database should always remain at some valid state. In other words a transaction must change the database from one valid state to another.
  • I - Isolation: Concurrent execution and serial execution must result in same final state.This is the most relaxed property and is executed using locks.
  • D - Durability: A completed transaction should remain completed even if the database crashes after the completion of the transaction. In other words a completed transaction must be permanent.
Indexing:
It means creation of new data structure for reading records of a database faster. Any column or combination of columns can be indexed. It mostly involves creation of B-trees in the case of indexing integers. (B-trees allow retrieval of any data in logarithmic time)
Indexing has its disadvantages like,

  • occupies large space
  • makes insertions and updations costlier (since indexes have to be modified each time)

My experience:
Today we discussed as a team about creating a database schema for a real time application. The application chosen was a social interaction app. It has 6 main modules namely,

  • messaging - both private and public
  • feed
  • groups
  • file sharing
  • events
  • posts
Apart from this the user module must also be designed.
I was allotted the messaging module.

The messaging module we designed.

Note: We- me and my partner.
We had two choices.
  1. To have one single table where all messages could be stored. or
  2. One table for each user having his messages alone.
We soon found out that creating one table for each user is a very bad approach and shouldn’t be followed at all.
Can u guess the reason why it is bad?
It's because when we have 1000s of users accessing their messages, 1000 tables should be opened. This puts enormous amount of pressure on the memory.
So we decided to go for one table approach.
After many minutes of discussion we came up with an idea of following schema,

Table name: messages
Attributes: msgid (primary key) , msgtext, datetime, sender_id, receiver_id, type

Table name: deleted_msgs
Attributes: msgid (foreign key referencing msgid of messages table) , user_id (foreign key referencing user_id of user table)

Table name: multicast_msgs
Attributes: msgid (foreign key referencing msgid of messages table) , user_id (foreign key referencing user_id of user table)

idea was,
receiver id has

  • 0 if the message is to be broadcasted,
  • -1 if the message is sent to a many people
  • user_id/group_id if the message is sent to a specific person or group
type has
  • 0 if the message is for a user
  • 1 if the message is for a group

Problems faced:
1. on deletion of a particular message from a user’s inbox the message will also get deleted from sender’s sent box.

A new table deleted_msgs was introduced to solve this problem. deleted_msgs table will be updated if a user deletes a msg from his inbox or sent box. The particular msg_id and the user_id will be inserted into deleted_msgs table. So when retrieving messages for a user it will be checked if that particular message is marked deleted for that user.

2. When sending message to multiple persons each person has to be informed , to whom else the message has been sent.

A new table multicast_msgs was introduced to solve this problem. New entries will be inserted in the multicast_msgs table if a user sends one message to more than one person. The msg_id along with the id of the person whom the message is intended for will be inserted into this table.
This avoids duplication of the message content. At the same time all recipients of a particular message can be obtained from this table.

Now the most queried columns (columns after WHERE clause)  would be sender_id , receiever_id and type.
Index for sender_id is created separately , Combined index of receiver_id and type is then created.
In other two tables user_id and msg_id are indexed separately.

So that’s it for today. You can see whether the schema for messaging module is correct in tomorrow’s post. Thanks for reading. See you in next post.



Trained @ Sourcebits

Tuesday, July 17, 2012

Introduction to DBMS day-1


Dbms introduction - audio blog
DBMS- Whenever i hear this word i don’t feel comfortable within. It seems some complex system. ( It's certainly not one of my strong areas !)
Truth is it's just a software package ( Windows xp is an example of software package ) . The function of DBMS software package is to maintain databases.
So what are databases then?

Database:
It's just collection of data in digital form.
Without databases it's difficult to imagine how the world can function. If railway network has no database ( ie no digital data storage ) then operating one train per day would be some big achievement.

Dbms vs Rdbms:
In DBMS we can store and retrieve data using queries but there is no relation between tables.
However in RDBMS (Relational DBMS) relations can be formed between tables and it's inturn maintained by the system. Because of this reason RDBMS doesn’t accept flat file database ( example for flat file--> one txt file having one record per line). We mostly use RDBMS since most applications require forced relation between tables in the form of foreign keys.

Mysql:
MySql is a RDBMS. It's an open source software package.



My experience:
After discussion about dbms it was time for me to have some hands on experience. I have earlier worked on Oracle’s dbms. Now i tried installing mysql in my system running Ubuntu OS. The installation was easy and smooth.One simple command,
sudo apt-get install mysql-server
installed mysql server in my system.
So now i had the software package,
Opening it was simple.. typing mysql -u user_name -p opened mysql and prompted for your password.(Security !! ). On entering the password which i set during the installation, mysql opened up.
It was my playground then , I created databases , dropped(deleted) them, created , updated , altered , dropped tables inside a database.
So i got a feel about mysql (same feel i got with oracle dbms too).

In coming days i will work on some real time application and data and post about them.
See you in next blog.



Trained @ Sourcebits