๐ ๏ธ Tools #007 - SQL 01 - Introduction to Databases & SQL
Why learn SQL? Why not just Excel? Won't AI do it for me? Types of DBs
Hello people,
Welcome back! Today we are starting a new tool in our TOOLS series & our eyes are set on SQL! But why SQL you ask?
Well to start with if you are an old timer at the TheWeekendFreelancer, you might remember this edition -
where I went analyzed job posted in USA for Analyst roles to see which roles are in high demand & guess which was the tool/skill required for almost every job ? ๐
SQL (Structured Query Language) really serves as the cornerstone of a successful career in analytics, playing a pivotal role in accessing, manipulating, and analyzing data stored in databases. With the exponential growth of data in today's digital age, organizations rely heavily on SQL to extract valuable insights from vast datasets.
By mastering SQL, individuals can unlock a wide range of career opportunities and contribute significantly to the success of organizations in today's data-driven world.
Letโs talk about the big elephant in the room as well!
With AI knocking on my door, Do I still need to learn SQL?
The short & sweet answer is - SQL remains an essential skill despite advancements in AI. While AI offers automation and efficiency, it lacks the nuanced judgment, creativity and adaptability of humans.
However, rather than seeing AI as a threat, it can be leveraged to enhance productivity in SQL development. By combining AI's automation capabilities with human expertise, developers can tackle more complex tasks, innovate and solve problems more effectively, ultimately leading to greater productivity and efficiency in data management.
I will give an example from my career in fact. In Early 2023 when I started using ChatGPT, At the same time I was building a dashboard in Quicksight which is Amazonโs BI tool & we had other analysts in the team building dashboards for the teams they worked for. The flow of the work would be extraction of data from AWS database using SQL. Creating custom tables, metrics using SQL that would be used for visualization. I was working with relatively large amount of data (500k+ rows). Thanks to ChatGPT I was able to significantly reduce the time to create SQL queries and in the end I reached the visualization stage in almost 70 % less time than other analysts which was probably the first time I realized the potential impact of AI in life of an Analyst.
My point is that in my team I was not the most experienced with SQL but AI allowed me to save time in looking for syntaxes and functions that I never used before, which overall helped the project move faster. Everyone in the team was impressed and I got to give demos using AI to help with daily tasks at my job.
The not stretch this point too much. If you want AI to help you become more productive with SQL, you need to know how SQL works!
Letโs move on to the meat & potatoes of the edition!
What are Databases ?
Databases are structured collections of data that are organized in a way that enables efficient storage, retrieval, and management of information. They serve as central repositories for storing and managing data in various formats, making it easier to access and manipulate information as needed.
Hmmโฆ Thatโs fine but letโ s find a way to make sure the explanation sticks -
Imagine you have a digital library filled with books, each book containing different kinds of information. Now, think of a database as that digital library, but instead of books, it stores data in organized tables.
In a database, each table is like a virtual spreadsheet with rows and columns. Each row represents a single record, like a book in the library, and each column holds a specific type of information, such as a title, author, or publication date.
Just like in a library where books are sorted by genre or author, databases organize data into tables based on its type or category. This organization makes it easy to search, retrieve, and manipulate information efficiently.
But why do we need Databases? Canโt we just use Excel?
Wellโฆ
While Excel is a versatile tool for organizing and analyzing data, it falls short when used as a database for several reasons:
Limited Scalability: Excel sheets have size limitations, making it challenging to handle large volumes of data efficiently. As your data grows, Excel can become slow, prone to errors, and difficult to manage.
Excel has a limit of 1,048,576 rows and 16,384 columns per sheet.
Data Integrity Risks: Excel lacks robust data validation features, increasing the risk of data entry errors, duplication and inconsistencies. Without proper controls in place, maintaining data integrity becomes a significant challenge.
Limited Security Measures: Excel files are susceptible to unauthorized access, accidental deletion and version control issues. Unlike dedicated database management systems, Excel offers limited security features, putting sensitive data at risk.
Lack of Collaboration Features: Collaborating on Excel files can be cumbersome, especially when multiple users need simultaneous access. Version control becomes problematic, leading to confusion and potential data loss.
Limited Automation: Excel's automation capabilities are limited compared to dedicated database management systems. Performing complex calculations, generating reports and automating workflows are more challenging and prone to errors in Excel.
So no Excel as a database please! ๐
Types of Databases
Each type of database has its unique features and advantages, catering to specific use cases and data management requirements. Here are the main types -
Relational Databases:
Explanation: Relational databases organize data into tables with rows and columns, and establish relationships between datasets using foreign keys.
Key Feature:
ACID which is a set of properties that ensure database transactions are reliable and maintain data integrity:
Atomicity: Atomicity ensures that each transaction is treated as a single unit of work, which either completes entirely or fails entirely. In other words, if one part of the transaction fails, the entire transaction is rolled back, leaving the database in its original state.
Consistency: Consistency ensures that only valid data is written to the database. This means that transactions must adhere to all defined rules and constraints, such as foreign key constraints, uniqueness constraints, and data type constraints. If a transaction violates any of these rules, the database will reject the entire transaction, maintaining the integrity of the data.
Isolation: Isolation ensures that the concurrent execution of transactions results in a state that is equivalent to executing them serially, one after the other. This prevents interference between transactions that are executing concurrently.
Durability: Durability guarantees that once a transaction is committed, its changes are permanently stored in the database and will survive system failures such as power outages or crashes.
Document Databases:
Explanation: Document databases store and retrieve data in JSON or BSON format, providing flexibility in schema design and accommodating evolving data structures.
JSON (JavaScript Object Notation) is a lightweight data interchange format commonly used to transmit data between a server and a web application. JSON is widely adopted due to its simplicity, flexibility, and compatibility with many programming languages.
BSON (Binary JSON) is a binary-encoded serialization format used primarily for storing and transmitting data in MongoDB, a NoSQL database. Unlike JSON, which uses plain text, BSON represents JSON-like data in binary form, offering benefits like improved storage efficiency and faster data access.
Key Feature: Flexible schema allows for dynamic data models, making them suitable for content management systems, product catalogs, and real-time analytics.
Graph Databases:
Explanation: Graph databases represent data as nodes and edges, allowing for efficient storage and retrieval of complex relationships between entities. Nodes represent entities, such as people, places, or things, while edges denote the relationships or connections between these entities. Unlike traditional relational databases, where relationships are typically represented through foreign keys or join tables, graph databases store relationships directly as part of the data model.
Key Feature: Native graph storage and traversal capabilities enable the modeling of intricate relationships, making them ideal for scenarios like social networks, fraud detection, and recommendation systems.
Analytical Databases:
Explanation: Analytical databases are optimized for storing, querying, and analyzing large volumes of data for business intelligence and decision-making purposes.
Key Feature: Columnar storage and optimized query processing support complex analytical workloads, facilitating tasks such as data warehousing, business intelligence, and data mining.
Key-Value Stores:
Explanation: Key-value stores store data as simple key-value pairs, offering fast retrieval and storage of unstructured or semi-structured data.
Key Feature: High-speed data access and scalability make them suitable for use cases like caching, session management, and real-time analytics.
Columnar Databases:
Explanation: Columnar databases store data in columns rather than rows, optimizing query performance and compression, especially for analytical workloads.
Key Feature: Columnar storage reduces I/O operations and improves compression, enhancing query performance for analytical tasks such as data warehousing and business intelligence.
Here is a quick snapshot for you with condensed information related to types of databases, related DBMS & the languages used to query data -
Letโs get equipped with the right tools
To get the best worth of this series, I would recommend highly to install a DBMS (Database Management System), a software that allows users to interact with databases, facilitating tasks such as data storage, retrieval, modification, and administration. DBMSs come in various types, each tailored to different data storage and retrieval needs as you can see in the table above. For this series we are going to primarily focus on Relational DBMS for which our options are-
MySQL, PostgreSQL, Oracle Database or Microsoft SQL Server.
Here is a table given by Asfaw Gedamu in his article on Linkedin where he compares the 4 choices -
I would recommend installing any of the open source DBMS of your choice. From next edition, we are going to start diving into the concepts of SQL.
I hope you are as excited as I am for this new journey! Let me know your thoughts or any particular request you might have related to learning SQL.
Raghunandan ๐ฏ
P.S. - โThe Weekend Freelancerโ is a reader backed publication. Share this newsletter with your friends and relatives & consider becoming a free or paid member of this newsletter. Every subscription gives me an extra ounce of motivation to keep going! ๐ช