{"id":136320,"date":"2024-09-20T11:13:44","date_gmt":"2024-09-20T05:43:44","guid":{"rendered":"https:\/\/www.vskills.in\/certification\/tutorial\/?page_id=136320"},"modified":"2024-09-20T11:13:45","modified_gmt":"2024-09-20T05:43:45","slug":"applying-k-means-to-real-world-data-mnist","status":"publish","type":"page","link":"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/","title":{"rendered":"Applying K-Means to real-world data: MNIST"},"content":{"rendered":"\n<p>The MNIST dataset, a collection of handwritten digits, is a popular benchmark for machine learning algorithms. In this guide, we will explore how to apply K-means clustering to the MNIST dataset and evaluate the results.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Understanding the MNIST Dataset<\/strong><\/h3>\n\n\n\n<p>The MNIST dataset consists of 70,000 grayscale images, each representing a handwritten digit from 0 to 9. Each image is 28&#215;28 pixels, resulting in a total of 784 features.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Loading the MNIST Dataset<\/strong><\/h3>\n\n\n\n<p>To load the MNIST dataset, we can use the <code class=\"\">keras.datasets<\/code> module:<\/p>\n\n\n\n<p>Python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from keras.datasets import mnist\n\n(X_train, y_train), (X_test, y_test) = mnist.load_data()\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Preprocessing the Data<\/strong><\/h3>\n\n\n\n<p>Before applying K-means, we need to preprocess the data:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Flatten the images:<\/strong> Convert each 28&#215;28 image into a one-dimensional vector of 784 pixels.<\/li>\n\n\n\n<li><strong>Normalize the pixel values:<\/strong> Scale the pixel values to the range [0, 1] to improve numerical stability.<\/li>\n<\/ol>\n\n\n\n<p>Python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>X_train = X_train.reshape(X_train.shape&#91;0], -1).astype('float32') \/ 255\nX_test = X_test.reshape(X_test.shape&#91;0], -1).astype('float32') \/ 255\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Applying K-Means<\/h3>\n\n\n\n<p>Create a K-means model and fit it to the training data:<\/p>\n\n\n\n<p>Python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from sklearn.cluster import KMeans\n\nn_clusters = 10  # Assuming we want 10 clusters for the 10 digits\nkmeans = KMeans(n_clusters=n_clusters, random_state=42)\nkmeans.fit(X_train)\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Evaluating the Clustering Results<\/strong><\/h3>\n\n\n\n<p>We can evaluate the clustering results using various metrics, such as purity and the Davies-Bouldin index. However, since we have the ground truth labels for the MNIST dataset, we can also directly compare the predicted cluster labels with the true labels.<\/p>\n\n\n\n<p>Python<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>y_pred_train = kmeans.predict(X_train)\ny_pred_test = kmeans.predict(X_test)\n\nfrom sklearn.metrics import accuracy_score, adjusted_rand_score\n\naccuracy_train = accuracy_score(y_train, y_pred_train)\naccuracy_test = accuracy_score(y_test, y_pred_test)\nari_train = adjusted_rand_score(y_train, y_pred_train)\nari_test = adjusted_rand_score(y_test, y_pred_test)\n\nprint(\"Accuracy (train):\", accuracy_train)\nprint(\"Accuracy (test):\", accuracy_test)\nprint(\"Adjusted Rand Index (train):\", ari_train)\nprint(\"Adjusted Rand Index (test):\", ari_test)\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Visualization<\/strong><\/h3>\n\n\n\n<p>We can also visualize the clustering results by plotting the data points and the cluster centroids. This can provide insights into the quality of the clustering and the separation between clusters.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Limitations and Considerations<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Number of clusters:<\/strong> The choice of the number of clusters (K) is crucial for the performance of K-means. In this example, we assumed 10 clusters, but the optimal number may vary depending on the data.<\/li>\n\n\n\n<li><strong>Initialization:<\/strong> The initial centroids can affect the clustering results. Techniques like K-means++ can help to improve initialization.<\/li>\n\n\n\n<li><strong>Scalability:<\/strong> K-means can be computationally expensive for large datasets. Consider using techniques like mini-batch K-means or distributed K-means for scalability.<\/li>\n<\/ul>\n\n\n\n<p>By following these steps, you can effectively apply K-means clustering to the MNIST dataset and evaluate the performance of the model.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The MNIST dataset, a collection of handwritten digits, is a popular benchmark for machine learning algorithms. In this guide, we will explore how to apply K-means clustering to the MNIST dataset and evaluate the results. Understanding the MNIST Dataset The MNIST dataset consists of 70,000 grayscale images, each representing a handwritten digit from 0 to&#8230;<\/p>\n","protected":false},"author":16,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-136320","page","type-page","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Applying K-Means to real-world data: MNIST - Tutorial<\/title>\n<meta name=\"description\" content=\"Explore the application of K-Means clustering to real-world data using the MNIST dataset. Learn how this algorithm group handwritten digits.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Applying K-Means to real-world data: MNIST - Tutorial\" \/>\n<meta property=\"og:description\" content=\"Explore the application of K-Means clustering to real-world data using the MNIST dataset. Learn how this algorithm group handwritten digits.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/\" \/>\n<meta property=\"og:site_name\" content=\"Tutorial\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/vskills.in\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-20T05:43:45+00:00\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/\",\"name\":\"Applying K-Means to real-world data: MNIST - Tutorial\",\"isPartOf\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\"},\"datePublished\":\"2024-09-20T05:43:44+00:00\",\"dateModified\":\"2024-09-20T05:43:45+00:00\",\"description\":\"Explore the application of K-Means clustering to real-world data using the MNIST dataset. Learn how this algorithm group handwritten digits.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Applying K-Means to real-world data: MNIST\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"name\":\"Tutorial\",\"description\":\"Vskills - A initiative in elearning and certification\",\"publisher\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\",\"name\":\"Vskills\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"contentUrl\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"width\":73,\"height\":55,\"caption\":\"Vskills\"},\"image\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/vskills.in\/\",\"https:\/\/x.com\/vskills_in\",\"https:\/\/www.linkedin.com\/company-beta\/1371554\/\",\"https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Applying K-Means to real-world data: MNIST - Tutorial","description":"Explore the application of K-Means clustering to real-world data using the MNIST dataset. Learn how this algorithm group handwritten digits.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/","og_locale":"en_US","og_type":"article","og_title":"Applying K-Means to real-world data: MNIST - Tutorial","og_description":"Explore the application of K-Means clustering to real-world data using the MNIST dataset. Learn how this algorithm group handwritten digits.","og_url":"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/","og_site_name":"Tutorial","article_publisher":"https:\/\/www.facebook.com\/vskills.in\/","article_modified_time":"2024-09-20T05:43:45+00:00","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/","name":"Applying K-Means to real-world data: MNIST - Tutorial","isPartOf":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website"},"datePublished":"2024-09-20T05:43:44+00:00","dateModified":"2024-09-20T05:43:45+00:00","description":"Explore the application of K-Means clustering to real-world data using the MNIST dataset. Learn how this algorithm group handwritten digits.","breadcrumb":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/applying-k-means-to-real-world-data-mnist\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.vskills.in\/certification\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Applying K-Means to real-world data: MNIST"}]},{"@type":"WebSite","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","name":"Tutorial","description":"Vskills - A initiative in elearning and certification","publisher":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization","name":"Vskills","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","contentUrl":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","width":73,"height":55,"caption":"Vskills"},"image":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/vskills.in\/","https:\/\/x.com\/vskills_in","https:\/\/www.linkedin.com\/company-beta\/1371554\/","https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw"]}]}},"_links":{"self":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/136320","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/comments?post=136320"}],"version-history":[{"count":1,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/136320\/revisions"}],"predecessor-version":[{"id":136321,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/136320\/revisions\/136321"}],"wp:attachment":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/media?parent=136320"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/categories?post=136320"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/tags?post=136320"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}