{"cells":[{"cell_type":"markdown","metadata":{"id":"dgDLrTo_ygbJ"},"source":["# COMP3314 Tutorial (1:30-2:20 p.m., Wendsday, Oct 11, 2023)\n","\n","Welcome to our first tutorial! The material overall aims to help you\n","- practice some useful tools/libraries in Python for studying machine learning,\n","- implement classic learning algorithms taught in the lecture to strengthen understanding, and\n","- use some common frameworks to build simple machine learning applications.\n","\n","## Overview for this tutorial\n","1. introduce NumPy and Pytorch;\n","2. implement the logistic regression algorithm by hand."]},{"cell_type":"markdown","metadata":{"id":"0OSR5kEYLdMa"},"source":["## NumPy\n","\n","This is the tool we will use entensively. [NumPy](https://numpy.org/) provides a lot of utilities that are greatly helpful for us to manipulate arrays, matrices, or tensors."]},{"cell_type":"markdown","metadata":{"id":"Z36R1PoPh7Bz"},"source":["### Basics"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"hwEAA5TvcqXu"},"outputs":[],"source":["import numpy as np\n","\n","# this line is only used for presenting the results in a more readable way\n","np.set_printoptions(precision=3)\n","\n","# set the seed for the random number generator\n","# this is only used for reproducibility of the results\n","np.random.seed(seed=3314)"]},{"cell_type":"markdown","metadata":{"id":"Y3Z2E5KPh7B1"},"source":["#### Create arrays"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"hGen8Q8fMYMI"},"outputs":[],"source":["np.zeros(5) # create a vector of 5 zeros"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"VARZtWqOh7B1"},"outputs":[],"source":["np.zeros((5,5)) # create a 5x5 matrix of zeros"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_b811khCMYMJ"},"outputs":[],"source":["np.ones(3) # create a vector of 3 ones"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"duuipe32h7B2"},"outputs":[],"source":["np.full((5,5),3) # create a 5x5 matrix of 3"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"1FRu--d9h7B2"},"outputs":[],"source":["np.eye(4) # create a 4 x 4 identity matrix"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"KvbCVfdSMYML"},"outputs":[],"source":["print(np.array([[1,2],[3,4],[5,6]])) # A 3 x 2 matrix with arbitrary elements."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Fl_S8DI9h7B2"},"outputs":[],"source":["np.random.random((2,4)) # Create an 2 x 4 matrix filled with random floating values"]},{"cell_type":"markdown","metadata":{"id":"qB2x3k16h7B3"},"source":["#### Basic arithmetic operations"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_tz4U8Tvh7B3"},"outputs":[],"source":["x = np.array([[1,2],[3,4]], dtype=np.float64)\n","y = np.array([[5,6],[7,8]], dtype=np.float64)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"IZWg4ul5h7B3"},"outputs":[],"source":["# Elementwise sum, producing a matrix of the same size\n","print(x + y)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"D0L6olEXh7B3"},"outputs":[],"source":["# The same as above\n","print(np.add(x, y))"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"LypMGoz0h7B4"},"outputs":[],"source":["# Elementwise difference, producing a matrix of the same size\n","print(x - y)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"jWl3nqzoh7B4"},"outputs":[],"source":["# The same as above\n","print(np.subtract(x, y))"]},{"cell_type":"markdown","metadata":{"id":"HR_hbiDjMYMO"},"source":["### Array access, or indexing"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"P9ldQFrWh7B4"},"outputs":[],"source":["array_1 = np.array([\n"," [1, 2, 3],\n"," [4, 5, 6],\n"," [7, 8, 9]\n","]) # Create a 3 x 3 matrix"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gxgQu5YDKwsE"},"outputs":[],"source":["print(array_1)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"cXpAUBymMYMO"},"outputs":[],"source":["array_1[1,1] # Access an element"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"0Q6LGGm7h7B5"},"outputs":[],"source":["array_1[1,:] # Equivalent to array_1[1]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"NWRsTmuUMYMP"},"outputs":[],"source":["array_1[:,1] # Access a column"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"r8L4iN7ah7B5"},"outputs":[],"source":["array_1[1:2,:] # Similar to array_1[1,:], but returns a 2D matrix instead of a 1D array\n","print(array_1[1:2,:].shape)\n","print(array_1[1,:].shape)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"8IerbmdWMYMP"},"outputs":[],"source":["array_1[0:2,:] # Access the first and second rows"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"3pmm1AyIh7B5"},"outputs":[],"source":["array_1[:,0:2] # Access the first and second columns"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Y8tvXnUHh7B6"},"outputs":[],"source":["array_1[1:3,0:2] # Access the intersection of (second and third rows) and (the first and second columns)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ShoDYC2fh7B6"},"outputs":[],"source":["array_1[[0,2],:] # Access the first and third rows"]},{"cell_type":"markdown","metadata":{"id":"oAINz9twMYMU"},"source":["### Dot product / matrix multiplication"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"0vZZRj-9MYMU"},"outputs":[],"source":["# Vector with vector\n","a = np.array([1,2,3])\n","b = np.array([4,5,6])"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"BFmwnIgMh7B6"},"outputs":[],"source":["print(np.dot(a, b)) # computes a^T b = a1b1 + a2b2 + a3b3"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"0M0_xCOUh7B7"},"outputs":[],"source":["# Or:\n","print(a.dot(b))"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"K-VNxL3uh7B7"},"outputs":[],"source":["# Or:@ is the matrix multiplication operator\n","print(a @ b)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"9wBPz6rUh7B7"},"outputs":[],"source":["# Note: element-wise multiplication is not matrix multiplication\n","print(a * b)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_VJiI98YMYMU"},"outputs":[],"source":["# Matrix with matrix\n","c = np.array([\n"," [1, 2],\n"," ]) # shape [1, 2]\n","d = np.array([\n"," [3],\n"," [4]\n"," ]) # shape [2, 1]\n","print(np.dot(c, d)) # [1, 2], [2, 1] -> [1, 1]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"UZpR47Y4h7B8"},"outputs":[],"source":["print(np.dot(d, c)) # [2, 1], [1, 2] -> [2, 2]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"MhsVmTZ8h7B8"},"outputs":[],"source":["# Matrix with vector\n","e = np.array([\n"," [1, 2],\n"," [3, 4]\n"," ]) # shape [2, 2]\n","f = np.array(\n"," [[5, 6]]\n"," ) # shape [1, 2]\n","print(np.dot(f, e)) # [1, 2], [2, 2] -> [1, 2]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"GAIi4lv9h7B9"},"outputs":[],"source":["# .T here returns the transpose of the matrix\n","print(np.dot(e, f.T)) # [2, 2], [2, 1] -> [2, 1]"]},{"cell_type":"markdown","metadata":{"id":"9MJ8dUcVMYMV"},"source":["### Widely used operations"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"RBlYT2E2h7B9"},"outputs":[],"source":["array_5 = np.random.random((2,4))"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"l-JW--TJh7B-"},"outputs":[],"source":["print(array_5)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Ae6Ip2SlMYMV"},"outputs":[],"source":["array_5.sum(axis=0) # or equivalently np.sum(array_5, axis=0)\n","# this corresponds to summing the matrix over each tow"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"IkqTGuOeMYMV"},"outputs":[],"source":["array_5.sum(axis=1) # or equivalently np.sum(array_5, axis=1)\n","# this corresponds to summing the matrix over each column"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"7QHBN7Sch7B_"},"outputs":[],"source":["# dimensionality can be preserved by using --keepdims=True option\n","print(array_5.sum(axis=1).shape)\n","print(array_5.sum(axis=1, keepdims=True).shape)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"e1C8NdP3MYMV"},"outputs":[],"source":["array_5.sum()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"02uVEgzjMYMV"},"outputs":[],"source":["array_5.mean(axis=0)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"vF5-ow77MYMW"},"outputs":[],"source":["array_5.std(axis=0)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"wkci-d20MYMW"},"outputs":[],"source":["array_5.max(axis=0)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"iRkvh3sVNZ67"},"outputs":[],"source":["array_1 = np.zeros((2,3))\n","array_2 = np.ones((2,3))\n","np.concatenate([array_1, array_2], axis=1)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"2bU12_WlNkiY"},"outputs":[],"source":["np.concatenate([array_1, array_2], axis=0)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"qo5pB1TcPhyi"},"outputs":[],"source":["# turn a vector into a diagonal matrix:\n","array_3 = np.array([1,2,3])\n","mat_3 = np.diag(array_3)\n","mat_3"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ZKC2T5E68DTq"},"outputs":[],"source":["# turn a diagonal matrix back into a vector\n","np.diag(mat_3)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"drnPnDqURqy8"},"outputs":[],"source":["np.arange(10)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"p-soozFwRtXX"},"outputs":[],"source":["np.arange(10).reshape(2,5)"]},{"cell_type":"markdown","source":["## Pytorch\n","\n","- https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html\n"],"metadata":{"id":"I-NsuaAkiWm6"}},{"cell_type":"markdown","source":["### Tensors & Auto-Gradient"],"metadata":{"id":"Zlb0sRbIjnFy"}},{"cell_type":"code","source":["import torch\n","import numpy as np"],"metadata":{"id":"fWxxZLVTkLDe","executionInfo":{"status":"ok","timestamp":1696873943744,"user_tz":-480,"elapsed":12,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}}},"execution_count":9,"outputs":[]},{"cell_type":"code","source":["data = [[1, 2],[3, 4]]\n","x_data = torch.tensor(data)\n","x_data"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"FYt-mizvkPGl","executionInfo":{"status":"ok","timestamp":1696873943744,"user_tz":-480,"elapsed":7,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"2af72ae7-031c-450d-8893-1fcf5390470b"},"execution_count":10,"outputs":[{"output_type":"execute_result","data":{"text/plain":["tensor([[1, 2],\n"," [3, 4]])"]},"metadata":{},"execution_count":10}]},{"cell_type":"code","source":["tensor = torch.rand(3,4)\n","\n","print(f\"Shape of tensor: {tensor.shape}\")\n","print(f\"Datatype of tensor: {tensor.dtype}\")\n","print(f\"Device tensor is stored on: {tensor.device}\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Dr6iInFvkQeX","executionInfo":{"status":"ok","timestamp":1696873951313,"user_tz":-480,"elapsed":4,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"4e43afb2-02dd-4cd9-f0a7-d7389128dc7f"},"execution_count":11,"outputs":[{"output_type":"stream","name":"stdout","text":["Shape of tensor: torch.Size([3, 4])\n","Datatype of tensor: torch.float32\n","Device tensor is stored on: cpu\n"]}]},{"cell_type":"code","source":["import torch\n","\n","x = torch.ones(5) # input tensor\n","y = torch.zeros(3) # expected output\n","w = torch.randn(5, 3, requires_grad=True)\n","b = torch.randn(3, requires_grad=True)\n","z = torch.matmul(x, w)+b\n","loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)"],"metadata":{"id":"rLzbh-kFkbPu","executionInfo":{"status":"ok","timestamp":1696874334223,"user_tz":-480,"elapsed":7,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}}},"execution_count":12,"outputs":[]},{"cell_type":"code","source":["print(f\"Gradient function for z = {z.grad_fn}\")\n","print(f\"Gradient function for loss = {loss.grad_fn}\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Mps5A04Cl41x","executionInfo":{"status":"ok","timestamp":1696874349650,"user_tz":-480,"elapsed":8,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"f2b5dfab-692a-402f-e5e4-399cbdc06d15"},"execution_count":13,"outputs":[{"output_type":"stream","name":"stdout","text":["Gradient function for z = \n","Gradient function for loss = \n"]}]},{"cell_type":"code","source":["loss.backward()\n","print(w.grad)\n","print(b.grad)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"7yzXXkckl8YD","executionInfo":{"status":"ok","timestamp":1696874362698,"user_tz":-480,"elapsed":4,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"6c8fd110-4262-4cc5-e93d-092cc2546de7"},"execution_count":14,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[0.0658, 0.3004, 0.1272],\n"," [0.0658, 0.3004, 0.1272],\n"," [0.0658, 0.3004, 0.1272],\n"," [0.0658, 0.3004, 0.1272],\n"," [0.0658, 0.3004, 0.1272]])\n","tensor([0.0658, 0.3004, 0.1272])\n"]}]},{"cell_type":"code","source":["z = torch.matmul(x, w)+b\n","print(z.requires_grad)\n","\n","with torch.no_grad():\n"," z = torch.matmul(x, w)+b\n","print(z.requires_grad)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"6meEg-dDl_fS","executionInfo":{"status":"ok","timestamp":1696874373159,"user_tz":-480,"elapsed":4,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"e9653bdf-0e5b-4f42-d907-f37ab7f6c3dd"},"execution_count":15,"outputs":[{"output_type":"stream","name":"stdout","text":["True\n","False\n"]}]},{"cell_type":"code","source":["z = torch.matmul(x, w)+b\n","z_det = z.detach()\n","print(z_det.requires_grad)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"s9aMfzWgmCZf","executionInfo":{"status":"ok","timestamp":1696874387403,"user_tz":-480,"elapsed":687,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"3dc94027-2651-4f63-80b8-11cd46cccd4e"},"execution_count":16,"outputs":[{"output_type":"stream","name":"stdout","text":["False\n"]}]},{"cell_type":"markdown","source":["### Toy Model"],"metadata":{"id":"7XtDWAz5jq57"}},{"cell_type":"code","source":["import torch\n","from torch import nn\n","from torch.utils.data import DataLoader\n","from torchvision import datasets\n","from torchvision.transforms import ToTensor"],"metadata":{"id":"eQLqo7XSiYWK","executionInfo":{"status":"ok","timestamp":1696873705655,"user_tz":-480,"elapsed":4990,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}}},"execution_count":1,"outputs":[]},{"cell_type":"code","source":["# Download training data from open datasets.\n","training_data = datasets.FashionMNIST(\n"," root=\"data\",\n"," train=True,\n"," download=True,\n"," transform=ToTensor(),\n",")\n","\n","# Download test data from open datasets.\n","test_data = datasets.FashionMNIST(\n"," root=\"data\",\n"," train=False,\n"," download=True,\n"," transform=ToTensor(),\n",")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Vh7bdzEVjcid","executionInfo":{"status":"ok","timestamp":1696873786648,"user_tz":-480,"elapsed":2203,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"f1569508-d044-40dd-9dca-5f741070cf82"},"execution_count":2,"outputs":[{"output_type":"stream","name":"stdout","text":["Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz\n","Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz\n"]},{"output_type":"stream","name":"stderr","text":["100%|██████████| 26421880/26421880 [00:00<00:00, 120880728.56it/s]\n"]},{"output_type":"stream","name":"stdout","text":["Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw\n","\n","Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz\n","Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz\n"]},{"output_type":"stream","name":"stderr","text":["100%|██████████| 29515/29515 [00:00<00:00, 41155213.62it/s]"]},{"output_type":"stream","name":"stdout","text":["Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw\n","\n","Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz\n","Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz\n"]},{"output_type":"stream","name":"stderr","text":["\n","100%|██████████| 4422102/4422102 [00:00<00:00, 59506880.60it/s]\n"]},{"output_type":"stream","name":"stdout","text":["Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw\n","\n","Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz\n","Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz\n"]},{"output_type":"stream","name":"stderr","text":["100%|██████████| 5148/5148 [00:00<00:00, 2410121.33it/s]"]},{"output_type":"stream","name":"stdout","text":["Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw\n","\n"]},{"output_type":"stream","name":"stderr","text":["\n"]}]},{"cell_type":"code","source":["batch_size = 64\n","\n","# Create data loaders.\n","train_dataloader = DataLoader(training_data, batch_size=batch_size)\n","test_dataloader = DataLoader(test_data, batch_size=batch_size)\n","\n","for X, y in test_dataloader:\n"," print(f\"Shape of X [N, C, H, W]: {X.shape}\")\n"," print(f\"Shape of y: {y.shape} {y.dtype}\")\n"," break"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"pK8mlwYQjygK","executionInfo":{"status":"ok","timestamp":1696873800555,"user_tz":-480,"elapsed":4,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"1a4b3383-ad20-476a-a162-4cf60fe7045a"},"execution_count":3,"outputs":[{"output_type":"stream","name":"stdout","text":["Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])\n","Shape of y: torch.Size([64]) torch.int64\n"]}]},{"cell_type":"code","source":["# Get cpu, gpu or mps device for training.\n","device = (\n"," \"cuda\"\n"," if torch.cuda.is_available()\n"," else \"mps\"\n"," if torch.backends.mps.is_available()\n"," else \"cpu\"\n",")\n","print(f\"Using {device} device\")\n","\n","# Define model\n","class NeuralNetwork(nn.Module):\n"," def __init__(self):\n"," super().__init__()\n"," self.flatten = nn.Flatten()\n"," self.linear_relu_stack = nn.Sequential(\n"," nn.Linear(28*28, 512),\n"," nn.ReLU(),\n"," nn.Linear(512, 512),\n"," nn.ReLU(),\n"," nn.Linear(512, 10)\n"," )\n","\n"," def forward(self, x):\n"," x = self.flatten(x)\n"," logits = self.linear_relu_stack(x)\n"," return logits\n","\n","model = NeuralNetwork().to(device)\n","print(model)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"_Y5fdgUmj2Kw","executionInfo":{"status":"ok","timestamp":1696873809111,"user_tz":-480,"elapsed":368,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"15849f34-1c5f-4947-ed81-94e0575ced1a"},"execution_count":4,"outputs":[{"output_type":"stream","name":"stdout","text":["Using cpu device\n","NeuralNetwork(\n"," (flatten): Flatten(start_dim=1, end_dim=-1)\n"," (linear_relu_stack): Sequential(\n"," (0): Linear(in_features=784, out_features=512, bias=True)\n"," (1): ReLU()\n"," (2): Linear(in_features=512, out_features=512, bias=True)\n"," (3): ReLU()\n"," (4): Linear(in_features=512, out_features=10, bias=True)\n"," )\n",")\n"]}]},{"cell_type":"code","source":["loss_fn = nn.CrossEntropyLoss()\n","optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)"],"metadata":{"id":"yahykRX5j4Zw","executionInfo":{"status":"ok","timestamp":1696873827824,"user_tz":-480,"elapsed":432,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}}},"execution_count":5,"outputs":[]},{"cell_type":"code","source":["def train(dataloader, model, loss_fn, optimizer):\n"," size = len(dataloader.dataset)\n"," model.train()\n"," for batch, (X, y) in enumerate(dataloader):\n"," X, y = X.to(device), y.to(device)\n","\n"," # Compute prediction error\n"," pred = model(X)\n"," loss = loss_fn(pred, y)\n","\n"," # Backpropagation\n"," loss.backward()\n"," optimizer.step()\n"," optimizer.zero_grad()\n","\n"," if batch % 100 == 0:\n"," loss, current = loss.item(), (batch + 1) * len(X)\n"," print(f\"loss: {loss:>7f} [{current:>5d}/{size:>5d}]\")"],"metadata":{"id":"FzLkIsDuj9LL","executionInfo":{"status":"ok","timestamp":1696873855900,"user_tz":-480,"elapsed":4,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}}},"execution_count":6,"outputs":[]},{"cell_type":"code","source":["def test(dataloader, model, loss_fn):\n"," size = len(dataloader.dataset)\n"," num_batches = len(dataloader)\n"," model.eval()\n"," test_loss, correct = 0, 0\n"," with torch.no_grad():\n"," for X, y in dataloader:\n"," X, y = X.to(device), y.to(device)\n"," pred = model(X)\n"," test_loss += loss_fn(pred, y).item()\n"," correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n"," test_loss /= num_batches\n"," correct /= size\n"," print(f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\")"],"metadata":{"id":"ylxpGcVyj-vM","executionInfo":{"status":"ok","timestamp":1696873857263,"user_tz":-480,"elapsed":3,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}}},"execution_count":7,"outputs":[]},{"cell_type":"code","source":["epochs = 5\n","for t in range(epochs):\n"," print(f\"Epoch {t+1}\\n-------------------------------\")\n"," train(train_dataloader, model, loss_fn, optimizer)\n"," test(test_dataloader, model, loss_fn)\n","print(\"Done!\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"uFLtq9-lkAqH","executionInfo":{"status":"ok","timestamp":1696873942616,"user_tz":-480,"elapsed":84990,"user":{"displayName":"Zhiheng LYU","userId":"00686232478614332837"}},"outputId":"3aa45238-44a0-42dd-ad53-15c89866771c"},"execution_count":8,"outputs":[{"output_type":"stream","name":"stdout","text":["Epoch 1\n","-------------------------------\n","loss: 2.305302 [ 64/60000]\n","loss: 2.291522 [ 6464/60000]\n","loss: 2.274945 [12864/60000]\n","loss: 2.261444 [19264/60000]\n","loss: 2.249897 [25664/60000]\n","loss: 2.220935 [32064/60000]\n","loss: 2.213316 [38464/60000]\n","loss: 2.192012 [44864/60000]\n","loss: 2.184059 [51264/60000]\n","loss: 2.143566 [57664/60000]\n","Test Error: \n"," Accuracy: 50.7%, Avg loss: 2.146675 \n","\n","Epoch 2\n","-------------------------------\n","loss: 2.160890 [ 64/60000]\n","loss: 2.146107 [ 6464/60000]\n","loss: 2.092633 [12864/60000]\n","loss: 2.101786 [19264/60000]\n","loss: 2.055113 [25664/60000]\n","loss: 1.994345 [32064/60000]\n","loss: 2.003105 [38464/60000]\n","loss: 1.938258 [44864/60000]\n","loss: 1.937648 [51264/60000]\n","loss: 1.851753 [57664/60000]\n","Test Error: \n"," Accuracy: 55.9%, Avg loss: 1.858043 \n","\n","Epoch 3\n","-------------------------------\n","loss: 1.899182 [ 64/60000]\n","loss: 1.861465 [ 6464/60000]\n","loss: 1.750047 [12864/60000]\n","loss: 1.781448 [19264/60000]\n","loss: 1.674668 [25664/60000]\n","loss: 1.627754 [32064/60000]\n","loss: 1.631553 [38464/60000]\n","loss: 1.551061 [44864/60000]\n","loss: 1.571000 [51264/60000]\n","loss: 1.458675 [57664/60000]\n","Test Error: \n"," Accuracy: 59.3%, Avg loss: 1.482376 \n","\n","Epoch 4\n","-------------------------------\n","loss: 1.557708 [ 64/60000]\n","loss: 1.516632 [ 6464/60000]\n","loss: 1.373604 [12864/60000]\n","loss: 1.441728 [19264/60000]\n","loss: 1.331347 [25664/60000]\n","loss: 1.322534 [32064/60000]\n","loss: 1.327261 [38464/60000]\n","loss: 1.265662 [44864/60000]\n","loss: 1.298002 [51264/60000]\n","loss: 1.204652 [57664/60000]\n","Test Error: \n"," Accuracy: 63.2%, Avg loss: 1.227786 \n","\n","Epoch 5\n","-------------------------------\n","loss: 1.308701 [ 64/60000]\n","loss: 1.285495 [ 6464/60000]\n","loss: 1.125294 [12864/60000]\n","loss: 1.233331 [19264/60000]\n","loss: 1.118041 [25664/60000]\n","loss: 1.130267 [32064/60000]\n","loss: 1.148233 [38464/60000]\n","loss: 1.095161 [44864/60000]\n","loss: 1.132829 [51264/60000]\n","loss: 1.059234 [57664/60000]\n","Test Error: \n"," Accuracy: 64.6%, Avg loss: 1.074156 \n","\n","Done!\n"]}]},{"cell_type":"markdown","metadata":{"id":"Q27znqH5yb_O"},"source":["## Logistic Regression\n","\n","We now demonstrate how to use NumPy to implement a simple logistic regression model. To this end, we consider a binary classification task, where we create a custom toy dataset $D = \\{(\\boldsymbol{x}_n, y_n)\\}_{n=1}^N$ with $\\boldsymbol{x}_n \\in \\mathbb{R}^2$ being the feature vector and $y_n \\in \\{0,1\\}$ being the label.\n","\n","For illustration, let us assume for any $\\boldsymbol{x}_n$ such that $y_n = 0$, it would follow a Gaussian distribution $\\mathcal{N}(\\mu_0, \\Sigma_0)$; for $\\boldsymbol{x}_n$ with $y_n = 1$, it would be distributed as $\\mathcal{N}(\\mu_1, \\Sigma_1)$."]},{"cell_type":"markdown","metadata":{"id":"dgdm3xw4CNSD"},"source":["### Create a toy dataset"]},{"cell_type":"markdown","metadata":{"id":"BRmF4Wv-UCfy"},"source":["We start with creating the training data, and storing them in NumPy arrays."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"0Ga_zmCw16vB"},"outputs":[],"source":["import matplotlib.pyplot as plt # import matplotlib for plotting figures\n","plt.style.use('ggplot')\n","\n","def create_toy_data(mu_0, mu_1, Sigma_0, Sigma_1, N_0, N_1):\n","\n"," # sample from these two Gaussian distributions\n"," # also returns numpy arrays\n"," x_0 = np.random.multivariate_normal(mu_0, Sigma_0, size=N_0)\n"," x_1 = np.random.multivariate_normal(mu_1, Sigma_1, size=N_1)\n","\n"," # create corresponding labels\n"," # .astype(int) converts the data type to integer\n"," y_0 = np.zeros(N_0).astype(int)\n"," y_1 = np.ones(N_1).astype(int)\n","\n"," # we concat the arrays\n"," # for the data point x, it has shape [N, 2]\n"," # for the label y, it has shape [N]\n"," return np.concatenate([x_0, x_1], axis=0), np.concatenate([y_0, y_1], axis=0)\n","\n","\n","mu_0 = [-1.8, -1.0] # specify the mean of the first class\n","mu_1 = [2.0, 3.0] # specify the mean of the second class\n","Sigma_0 = [\n"," [0.8, 0.0],\n"," [0.0, 0.8]\n"," ] # specify the covariance matrix of the first class\n","Sigma_1 = [\n"," [0.5, 0.0],\n"," [0.0, 0.5]\n"," ] # specify the covariance matrix of the second class\n","N_0 = 100 # specify the number of samples from the first class\n","N_1 = 150 # specify the number of samples from the second class\n","\n","_x, _y = create_toy_data(mu_0, mu_1, Sigma_0, Sigma_1, N_0, N_1)\n","\n","# matplotlib (abbr.: plt) code to visualize the data\n","plt.scatter(_x[:,0], _x[:,1], c=_y)\n","plt.xlim(-5, 5)\n","plt.ylim(-5, 5)\n","plt.gca().set_aspect('equal', adjustable='box')\n","plt.show()"]},{"cell_type":"markdown","metadata":{"id":"bUAQQ3XFh7CC"},"source":["### Overview\n","\n","The logistic regression model corresponds to a Bernoulli distribution\n","$$\n","p(y | \\boldsymbol{x} ; \\boldsymbol{\\theta})=\\operatorname{Ber}\\left(y | \\boldsymbol{\\sigma}\\left(\\boldsymbol{w}^{\\top} \\boldsymbol{x}+b\\right)\\right),\n","$$\n","where $\\boldsymbol{w}$ and $b$ are parameters and $\\sigma$ is the sigmoid function $\\sigma(a) = 1/(1+e^{-a})$. Given a dataset $D = \\{(\\boldsymbol{x}_n, y_n)\\}_{n=1}^N$, we hope to find the values of $\\boldsymbol{w}$ and $b$ that best fit the data."]},{"cell_type":"markdown","metadata":{"id":"_LpqgS2O04TE"},"source":["#### An equivalent representation\n","\n","To simplify the notation, we could merge parameter $\\boldsymbol{w}$ and $b$ into a single vector without changing the model itself:\n","$$\n","\\boldsymbol{w}^\\top \\boldsymbol{x} + b = \\begin{bmatrix}\n","w_1 & w_2\n","\\end{bmatrix}^\\top \\begin{bmatrix}\n","x_1 \\\\\n","x_2\n","\\end{bmatrix} + b = \\begin{bmatrix}\n","w_1 & w_2 & b\n","\\end{bmatrix}^\\top\\begin{bmatrix}\n","x_1 \\\\\n","x_2 \\\\\n","1\n","\\end{bmatrix}.\n","$$\n","We simply write $\\boldsymbol{w} = [w_1, w_2, b]$ to represent the model parameter.\n"]},{"cell_type":"markdown","metadata":{"id":"u7o1eYR0tC5H"},"source":["### Training\n","As introduced in the lecture, a common way to estimate parameters $\\boldsymbol{w}$ in our logistic regression model is to perform maximum likelihood estimation (MLE).\n","\n","\n","We first denote\n","$$\n","\\mu_{n} = \\sigma\\left(\\boldsymbol{w}^{\\top} \\boldsymbol{x}_n\\right)\n","$$\n","For Bernoulli distributions, the average (that is, scaled by $1/N$) negative log likelihood function is as follows:\n","$$\n","\\begin{aligned}\n","\\operatorname{NLL}(\\boldsymbol{w}) &=-\\frac{1}{N} \\log p(\\mathcal{D} \\mid \\boldsymbol{w})=-\\frac{1}{N} \\log \\prod_{n=1}^{N} \\operatorname{Ber}\\left(y_{n} \\mid \\mu_{n}\\right) \\\\ &=-\\frac{1}{N} \\sum_{n=1}^{N} \\log \\left[\\mu_{n}^{y_{n}} \\times\\left(1-\\mu_{n}\\right)^{1-y_{n}}\\right] \\\\ &=-\\frac{1}{N} \\sum_{n=1}^{N}\\left[y_{n} \\log \\mu_{n}+\\left(1-y_{n}\\right) \\log \\left(1-\\mu_{n}\\right)\\right]\n","\\end{aligned}\n","$$\n","\n","To maximize the log likelihood (equivalently minimize the negative log likelihood), we hope to find $\\boldsymbol{w}$ such that\n","$$\n","\\nabla_{w} \\operatorname{NLL}(\\boldsymbol{w}) = 0\n","$$\n","Unfortunately, the closed-form solution is not available. To estimate appropriate parameter values, we resort to gradient descent to minimize the negative log likelihood function.\n","\n","After some algebra, the gradient turns out to take the following form:\n","$$\n","\\begin{aligned}\n","\\nabla_{\\boldsymbol{w}} \\mathrm{NLL}(\\boldsymbol{w}) =\\frac{1}{N} \\sum_{n=1}^{N}\\left(\\mu_{n}-y_{n}\\right) \\boldsymbol{x}_{n}\n","\\end{aligned}\n","$$\n","A gradient descent step just updates the parameter as\n","$$\n","\\boldsymbol{w}_{t+1}=\\boldsymbol{w}_{t}-\\eta_{t} \\nabla_{\\boldsymbol{w}} \\mathrm{NLL}\\left(\\boldsymbol{w}_{t}\\right),\n","$$\n","where $\\eta_{t}$ is often referred to as **learning rate** or **step size**."]},{"cell_type":"markdown","metadata":{"id":"iD4CQCTDh7CD"},"source":["### Procedures\n","\n","Putting these together, training the logistic regression model should proceed as follows:\n","1. randomly initialize weight $\\boldsymbol{w}_0$;\n","2. for each data point $(\\boldsymbol{x}_n, y_n)$, compute $\\mu_{n} = \\sigma\\left(\\boldsymbol{w}_0^{\\top} \\boldsymbol{x}_n\\right)$ ;\n","3. compute the gradient $\\nabla_{\\boldsymbol{w}} \\mathrm{NLL}(\\boldsymbol{w}) =\\frac{1}{N} \\sum_{n=1}^{N}\\left(\\mu_{n}-y_{n}\\right) \\boldsymbol{x}_{n}$;\n","4. update weight as $\\boldsymbol{w}_{t+1}=\\boldsymbol{w}_{t}-\\eta_{t} \\nabla_{\\boldsymbol{w}} \\mathrm{NLL}\\left(\\boldsymbol{w}_{t}\\right)$;\n","5. check whether the weight has converged ($\\boldsymbol{w}_{t+1}$ is very close to $\\boldsymbol{w}_{t}$):\n"," - If converged, output the estimated parameter value $\\widehat{\\boldsymbol{w}} = \\boldsymbol{w}_{t+1}$;\n"," - If not, go back to step 2.\n","\n","Note that we do not use **stochastic** gradient descent here, where we use the entire dataset to compute the gradient at each step. This is not a good idea in practice, but we do it here for simplicity. It can be easily modified to use stchastic version."]},{"cell_type":"markdown","metadata":{"id":"MhFQWNBiYUEj"},"source":["### Testing\n","\n","To predict the label of a test data point, we just use the estimated parameter $\\widehat{\\boldsymbol{w}}$ and compute the label prediction probability\n","\n","$$\n","\\mu = \\sigma\\left(\\widehat{\\boldsymbol{w}}^{\\top} \\boldsymbol{x}\\right).\n","$$\n","- If $\\mu > 0.5$, then it means that $p(y=1| \\boldsymbol{x}) > p(y=0| \\boldsymbol{x})$ and we should predict the label as 1;\n","- If $\\mu <= 0.5$, we then predict the label as 0."]},{"cell_type":"markdown","metadata":{"id":"1NaZA8kX3M2R"},"source":["### Implementing the classification model\n","We define the classification model as a Python class, including\n","- initialization inside `__init__()`,\n","- training in the `fit()` method,\n","- testing in the `predict()` method.\n","\n","This style of defining a learning model is similar to some popular libraries (including Keras and sci-kit learn as far as I know)."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"JDU_86WbnGZ8"},"outputs":[],"source":["# define the sigmoid function\n","def sigmoid(x):\n"," return 1/(1+np.exp(-x))\n","\n","class LogisticRegression:\n"," '''\n"," Logistic Regression\n","\n"," '''\n"," def __init__(self, D):\n"," # step 1; remember that we merge w and b into one single vector\n"," # so that the weight has D+1 dimension.\n"," self.weight = np.random.randn((D+1))\n","\n"," def fit(self, x, y, learning_rate=0.01, max_iter=500, tol=1e-3):\n"," '''\n"," :param x: data with shape [M x D]\n"," :param y: label with shape [M]; it elements belong to {0,1}\n"," :param learning_rate: the step_size of each gradient descent step\n"," :param max_iter: maximum numbers of iteration\n"," :param tol: error tolerance to terminate training\n"," '''\n"," M, D = np.shape(x)\n"," # change x to the compact representation\n"," x = np.concatenate([np.ones((M, 1)), x], axis=1)\n"," i = 0\n","\n"," # we only implement gradient descent here due to the relatively small scale\n"," # of this dataset. you will meet stochastic gradient descent in your assignment :)\n"," while i < max_iter:\n"," # step 2: compute mu.\n"," mu = sigmoid(x @ self.weight)\n","\n"," # step 3: compute the gradient w.r.t. mu.\n"," gradient = np.sum(np.dot(np.diag(mu - y), x), axis=0)/M\n","\n"," # step 4: update the parameter\n"," new_weight = self.weight - gradient * learning_rate\n"," self.weight = new_weight\n","\n","\n"," # step 5: check convergence\n"," # delta = np.mean((new_weight - self.weight)**2)\n"," # if delta < tol:\n"," # print(\"the training has converged at iteration {}!\".format(i))\n"," # break\n"," # we replace step 5 here with a pre-defined maximum number of\n"," # iterations. Otherwise, it would cost too much time to terminate;\n"," # this is also a standarad practice in machine learning.\n"," i += 1\n","\n"," # also compute the NLL function as a diagnostic to see the gradient descent\n"," # indeed works.\n"," nll = -np.mean(y * np.log(mu) + (1 - y) * np.log(1-mu))\n","\n"," # print the objective information every 5 iterations\n"," if i % 5 == 0:\n"," print(\"Iteration {:3d}/{:3d} : NLL is {}\".format(i, max_iter, nll))\n","\n"," def predict(self, x):\n"," '''\n"," :param x: test datapoints, M x D ndarray\n"," :return t: predicted label, M ndarray\n"," '''\n"," M, D = x.shape\n"," x = np.concatenate([np.ones((M, 1)), x],axis=1)\n","\n"," # this reads as follows:\n"," # for any y_pred such that y_pred >= 0.5,\n"," # we predict the label of this data point as 1 and 0 otherwise.\n"," y_pred = sigmoid(np.dot(x, self.weight))\n","\n"," # we use the np.where function to implement the above logic\n"," # np.where(condition, x, y) returns x if condition is true and y otherwise.\n"," y_pred = np.where(y_pred >= 0.5, 1, 0)\n"," return y_pred"]},{"cell_type":"markdown","metadata":{"id":"RsgueHp0h7CD"},"source":["### Run the logistic regression model\n","\n","We first create the toy dataset, and then run the logistic regression model to fit the data."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"nSFuNpDLA2th"},"outputs":[],"source":["mu_0 = [-1.8, -1.0]\n","mu_1 = [2.0, 3.0]\n","Sigma_0 = [\n"," [0.8, 0.0],\n"," [0.0, 0.8]\n"," ]\n","Sigma_1 = [\n"," [0.5, 0.0],\n"," [0.0, 0.5]\n"," ]\n","# the same as above\n","num_classes = 2 # number of classes\n","D = 2 # feature dimensionality\n","N_0 = 100 # number of datapoints for class 0\n","N_1 = 200 # number of datapoints for class 1\n","\n","\n","x_train, y_train = create_toy_data(mu_0, mu_1, Sigma_0, Sigma_1, N_0, N_1)\n","print(x_train.shape, type(x_train))\n","print(y_train.shape, type(y_train))"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"JPPLdaLRPebM"},"outputs":[],"source":["# initialize the model\n","model = LogisticRegression(D)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"odIJLoSW8hzQ"},"outputs":[],"source":["print(\"Start training the model ...\\n\")\n","# train the model\n","model.fit(x_train, y_train, learning_rate=0.1, max_iter=100)\n","\n","print(\"Training finished!\")"]},{"cell_type":"markdown","metadata":{"id":"RLBeB6BGQ9oQ"},"source":["The following code snippet test the trained model with the entire 2D grid."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"WcLaKA7uQ57M"},"outputs":[],"source":["def test_model(model):\n"," # our test datapoints will be the entire 2D grid spanning over $[-5, 5] x [-5, 5]$ plane\n"," # !however, note that this is not the usual test setting.\n","\n"," # build 2D grid\n"," x1_test, x2_test = np.meshgrid(np.linspace(-5, 5, 100), np.linspace(-5, 5, 100))\n"," x_test = np.stack([x1_test, x2_test], axis=-1).reshape(10000, 2)\n","\n"," # call the `predict()` method to\n"," # predict the label for each test data point.\n"," y_predicted = model.predict(x_test)\n","\n"," # plot the prediction result\n"," pc = plt.contourf(x1_test, x2_test, y_predicted.reshape(100,100), alpha=0.2)\n","\n"," # also plot the original data points\n"," plt.scatter(x_train[:,0], x_train[:,1], c=y_train)\n","\n"," # set equal scale for both x/y axis\n"," plt.gca().set_aspect('equal', adjustable='box')\n"," plt.show()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"883kwLJV8izo"},"outputs":[],"source":["print(\"\\nStart testing the model ...\\n\")\n","# test the model\n","test_model(model)"]},{"cell_type":"markdown","metadata":{"id":"kDPgx7JZh7CE"},"source":["- All the purple-shaded regions are predicted as 1, and all the yellow regions are predicted as 0.\n","- The decision boundary clearly separates these two classes."]},{"cell_type":"markdown","metadata":{"id":"UQqWAHLSS4Yf"},"source":["For illustration purpose, we slightly change the logistic regression model class so that you can see how the decision boundary evolves over training iterations."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"0zDDgTlUS4LA"},"outputs":[],"source":["from IPython.display import clear_output\n","from time import sleep\n","\n","class IllustrativeLogisticRegression(object):\n"," '''\n"," Illustrative Logistic Regression\n","\n"," '''\n"," def __init__(self, D):\n"," self.weight = np.random.randn((D+1)) # we merge w and b into one single vector.\n","\n"," def fit(self, x, y, learning_rate=0.01, max_iter=500, tol=1e-3):\n"," '''\n"," :param x: data with shape [M x D]\n"," :param y: label with shape [M]; it elements belong to {0,1}\n"," :param learning_rate: the step_size of each gradient descent step\n"," :param max_iter: maximum numbers of iteration\n"," :param tol: error tolerance to terminate training\n"," '''\n"," M, D = np.shape(x)\n"," x = np.concatenate([np.ones((M, 1)), x], axis=1)\n"," i = 0\n","\n"," # we only implement gradient descent here due to the relatively small scale\n"," # of this dataset. you will meet stochastic gradient descent in your assignment :)\n"," while i < max_iter:\n","\n","\n"," ########################### Modified here ################################\n"," # invoke test function and plot the decision boundary every 10 iterations\n"," if i % 5 == 0:\n"," # print(\"Iteration {:3d}/{:3d} : NLL is {}\".format(i, max_iter, nll))\n"," test_model(self)\n"," sleep(0.5)\n"," clear_output(wait=True)\n","\n"," # we first compute mu.\n"," mu = sigmoid(x @ self.weight)\n","\n"," # compute the gradient w.r.t. mu.\n"," gradient = np.sum(np.dot(np.diag(mu - y), x), axis=0)/M\n","\n"," # update the parameter\n"," new_weight = self.weight - gradient * learning_rate\n"," self.weight = new_weight\n","\n"," i += 1\n","\n","\n"," def predict(self, x):\n"," '''\n"," :param x: test datapoints, M x D ndarray\n"," :return t: predicted label, M ndarray\n"," '''\n"," M, D = x.shape\n"," x = np.concatenate([np.ones((M, 1)), x],axis=1)\n","\n"," # compute the predicted probability\n"," y_pred = sigmoid(np.dot(x, self.weight))\n","\n"," # this should be read as follows:\n"," # for any y_pred such that y_pred >= 0.5,\n"," # we predict the label of this data point as 1 and 0 otherwise.\n"," y_pred = np.where(y_pred >= 0.5, 1, 0)\n"," return y_pred\n","\n","_model = IllustrativeLogisticRegression(D)\n","_model.fit(x_train, y_train, learning_rate=0.05, max_iter=500)"]},{"cell_type":"markdown","metadata":{"id":"5jwBBn4sU9RW"},"source":["## Remark\n","\n","- As you can see from the figure, the decision boundary quickly evolves from a random guess to a line that clearly separates these classes.\n","- Also note that the current logistic regression model yields a **linear decision boundary**. This may be suitable for this toy problem; however, it would fail for more complicated cases. Alternatively, we could transform the input feature vector $\\boldsymbol{x}$ in a non-linear way so that the model could be eaiser to fit data. You would learn more powerful techniques later in this course."]},{"cell_type":"markdown","metadata":{"id":"XJpWqTHThFDv"},"source":["# Conclusion\n","\n","We arrive at the conclusion that\n","- NumPy package is handy for implementing basic learning algorithms;\n","- But even implementing such simple algorithms (compared to those covered in later lectures) requires lots of codes. This is where machine learning frameworks come into play. Since many of these codes (including learning algorithms, data collection & pre-processing, the logic of training/testing, and so on) can be modulated and reused across machine learning applications, a convenient way is to write them into the library so that we could import the corresponding interfaces when some function is required in our application/research. We will learn to use these tools (including but not limited to [PyTorch](https://pytorch.org/), etc.) in our tutorials."]}],"metadata":{"colab":{"provenance":[]},"kernelspec":{"display_name":"code","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8.9"},"vscode":{"interpreter":{"hash":"95b60c6a3f7a97bdda5c3dbe975961ec5f2e947e1b665affcecd30d407e7e6d7"}}},"nbformat":4,"nbformat_minor":0}