👩🏼‍🏫 Teaching technical Skills

Oct 24, 2023·
Armstrong Foundjem
Armstrong Foundjem
· 9 min read

This blog page written in Org-mode teaches data science skills to computer science students, with code snippets and mathematical reasoning included:

Basic skills in Data Science

  • Data Structures

1. Arrays

An array is one of the simplest data structures, but it’s fundamental in most computer science problems. It’s a collection of elements, each identified by an index or a key.

Here’s how you would declare an array in Python:

# Creating an array of integers
arr = [10, 20, 30, 40, 50]

# Accessing an element
print(arr[2])  # Outputs: 30

The time complexity for accessing an element is O(1), meaning it takes constant time to get an element given its index.

2. Linked Lists

A linked list is a linear data structure where elements (called nodes) are connected using pointers. Unlike arrays, linked lists do not store elements in contiguous memory locations.

Here’s how a simple singly linked list would look in Python:

class Node:
    def __init__(self, data=None):
        self.data = data
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None
    
    def append(self, data):
        new_node = Node(data)
        if not self.head:
            self.head = new_node
        else:
            last = self.head
            while last.next:
                last = last.next
            last.next = new_node
    
    def display(self):
        current = self.head
        while current:
            print(current.data, end=" -> ")
            current = current.next
        print("None")

# Creating a linked list and appending values
ll = LinkedList()
ll.append(10)
ll.append(20)
ll.append(30)
ll.display()

3. Stacks and Queues

Both stacks and queues are abstract data types that store elements. They differ in how elements are inserted and removed.

  • Stack: Last-In-First-Out (LIFO)
  • Queue: First-In-First-Out (FIFO)

Here’s an example of a stack implemented using Python:

stack = []

# Push elements onto the stack
stack.append(1)
stack.append(2)
stack.append(3)

# Pop an element
print(stack.pop())  # Outputs: 3

For queues, Python’s collections.deque is ideal because it allows O(1) operations for appending and popping:

from collections import deque

queue = deque()

# Enqueue elements
queue.append(1)
queue.append(2)
queue.append(3)

# Dequeue an element
print(queue.popleft())  # Outputs: 1
  • Algorithms

1. Sorting Algorithms

Sorting is a fundamental problem, and there are several algorithms used for sorting elements in an array. Let’s take a look at two well-known sorting algorithms: Bubble Sort and Merge Sort.

Bubble Sort (O(n²) time complexity):

def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]

arr = [64, 34, 25, 12, 22, 11, 90]
bubble_sort(arr)
print(arr)

Merge Sort (O(n log n) time complexity):

def merge_sort(arr):
    if len(arr) > 1:
        mid = len(arr) // 2
        left_half = arr[:mid]
        right_half = arr[mid:]

        merge_sort(left_half)
        merge_sort(right_half)

        i = j = k = 0

        while i < len(left_half) and j < len(right_half):
            if left_half[i] < right_half[j]:
                arr[k] = left_half[i]
                i += 1
            else:
                arr[k] = right_half[j]
                j += 1
            k += 1

        while i < len(left_half):
            arr[k] = left_half[i]
            i += 1
            k += 1

        while j < len(right_half):
            arr[k] = right_half[j]
            j += 1
            k += 1

arr = [38, 27, 43, 3, 9, 82, 10]
merge_sort(arr)
print(arr)

2. Searching Algorithms

Binary Search (O(log n) time complexity) is used to find the position of a target value within a sorted array.

def binary_search(arr, target):
    low = 0
    high = len(arr) - 1
    while low <= high:
        mid = (low + high) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    return -1

arr = [1, 3, 5, 7, 9, 11, 13]
target = 5
print(binary_search(arr, target))  # Outputs: 2
  • Mathematical Reasoning

Understanding the time complexity of algorithms is essential for analyzing their efficiency. One of the most important concepts in this area is Big-O notation, which describes the upper bound of an algorithm’s runtime.

Let’s discuss Big-O using an example:

  • O(1): Constant time, meaning the operation will take the same time regardless of the size of the input.
  • O(n): Linear time, meaning the operation will take longer as the size of the input grows.
  • O(n²): Quadratic time, meaning the operation will take much longer as the size of the input increases.

Example:

# O(n)
def print_elements(arr):
    for elem in arr:
        print(elem)

# O(n²)
def print_pairs(arr):
    for i in arr:
        for j in arr:
            print(i, j)

In the second function, print_pairs, we see an O(n²) time complexity because we are iterating over the array twice.

  • Conclusion

In this post, we’ve covered essential data structures, algorithms, and mathematical reasoning concepts needed to excel in computer science. As an undergraduate, building a solid foundation in these areas will prepare you for more complex topics as you progress in your studies.

Stay curious, keep experimenting with code, and always challenge yourself to understand the underlying principles of the algorithms and data structures you encounter.

Happy coding!

Final Thoughts

Mastering technical skills in computer science is a continuous journey. Start by focusing on clean code, understanding data structures and algorithms, and applying mathematical reasoning. Don’t forget to engage in hands-on practice by building projects and solving real-world problems. Over time, you’ll gain a deeper understanding and will be better equipped for more complex challenges.

Good luck, and happy coding!


Additional Resources:


#+TITLE: Data Science for Computer Science Students: A Comprehensive Guide
#+AUTHOR: Dr. John Doe
#+DATE: 2025-02-13
#+OPTIONS: num:t toc:t

* Introduction

Welcome to this comprehensive guide on learning **Data Science** for Computer Science students. Data Science is a field that blends statistical analysis, machine learning, and computer science to extract meaningful insights from data. As a Computer Science student, mastering data science will expand your toolkit to solve complex, data-driven problems.

In this guide, we'll cover the basics of data science, mathematical reasoning, and introduce you to common tools and algorithms. Along the way, we'll integrate code examples, particularly in **Python**, to help you build a hands-on understanding of these concepts.

---

* Understanding the Fundamentals of Data Science

At the heart of data science are the **three pillars**: **Statistics**, **Machine Learning**, and **Data Visualization**. These are the tools you'll use to analyze, model, and communicate insights from your data.

**Statistics** plays a crucial role in understanding data distributions, hypothesis testing, and drawing conclusions. Machine Learning (ML) is about creating predictive models from data. Data Visualization helps you present your findings clearly, making it easier to communicate insights.

**Mathematical Reasoning**

In data science, we rely heavily on mathematics, particularly linear algebra, calculus, probability theory, and optimization. Let's start with a key concept in data science:

- **Linear Regression**: A simple yet powerful technique used to model relationships between variables.

Let's say we have a dataset with **n** observations and **p** features. The goal of linear regression is to model the relationship between the input features and the output target variable.

The equation for a **linear regression model** is:

$$ y = X \beta + \epsilon $$

Where:
- \( y \) is the output variable,
- \( X \) is the matrix of input features,
- \( \beta \) represents the regression coefficients,
- \( \epsilon \) is the error term.

We can solve for the coefficients \( \beta \) using the **Ordinary Least Squares (OLS)** method:

$$ \hat{\beta} = (X^T X)^{-1} X^T y $$

This equation minimizes the sum of squared errors between the predicted values and the true values.

---

* Getting Started with Data Science Tools

In this section, we'll cover the main tools you'll need to begin your data science journey.

**1. Python**

Python is a widely-used programming language in data science, primarily due to its rich ecosystem of libraries. You'll use Python to manipulate data, train models, and visualize results.

Let's start by installing the essential libraries:

```bash
pip install numpy pandas matplotlib scikit-learn seaborn

Here is a basic example of using Pandas to load and explore a dataset:

import pandas as pd

# Load a CSV file
data = pd.read_csv('your_dataset.csv')

# Show the first few rows
print(data.head())

# Get summary statistics
print(data.describe())

2. NumPy

NumPy is a powerful library for numerical computations. It allows you to perform operations on large arrays and matrices efficiently. Here’s how you can use NumPy to perform basic mathematical operations:

import numpy as np

# Create a vector of numbers
vector = np.array([1, 2, 3, 4, 5])

# Perform element-wise operations
squared = np.square(vector)
print(squared)

3. Data Visualization with Matplotlib and Seaborn

Data visualization is essential to convey your findings. The following example shows how to plot a simple scatter plot using Matplotlib:

import matplotlib.pyplot as plt

# Example data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create a scatter plot
plt.scatter(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot Example')
plt.show()

Seaborn is a higher-level library built on top of Matplotlib that makes it easier to create beautiful statistical graphics:

import seaborn as sns

# Load an example dataset
tips = sns.load_dataset('tips')

# Create a box plot
sns.boxplot(x='day', y='total_bill', data=tips)
plt.title('Box Plot Example')
plt.show()

  • Machine Learning: A Key Tool in Data Science

Machine Learning is the process of using algorithms to find patterns in data and make predictions. Here’s an example of Linear Regression using the scikit-learn library:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Split data into features (X) and target (y)
X = data[['feature1', 'feature2']]
y = data['target']

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

  • Conclusion

In this guide, we’ve covered the essential concepts of Data Science, from the fundamental mathematical principles to the necessary tools and libraries. We’ve also included practical code snippets to help you get started with Python, pandas, NumPy, and machine learning models.

As you progress, remember that data science is a vast and evolving field. Keep exploring, experimenting, and learning through practice. Your ability to analyze and interpret data will be a valuable asset as you continue your journey as a Computer Science student.

Next Steps:

  • Explore more advanced machine learning algorithms like Support Vector Machines (SVM), Random Forests, and Neural Networks.
  • Work on real-world datasets to build projects that demonstrate your data science skills.
  • Study specialized topics like Natural Language Processing (NLP) and Deep Learning to further enhance your knowledge.

Good luck on your journey to mastering Data Science!


  • References
  1. “Python for Data Analysis” by Wes McKinney
  2. “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” by Aurélien Géron
  3. “The Elements of Statistical Learning” by Trevor Hastie, Robert Tibshirani, and Jerome Friedman

This blog aims to provide the foundational knowledge and practical tools that will be necessary for your success in computer science. Keep learning, experimenting, and building!

Did you find this page helpful? Consider sharing it 🙌