Crafting Cosine Similarity Calculations in TypeScript: A Comprehensive Guide

Introduction

In data science and machine learning, cosine similarity is a measure that calculates the cosine of the angle between two vectors. This metric reflects the similarity between the two vectors, and it’s used extensively in areas like text analysis, recommendation systems, and more. This post delves into the intricacies of implementing cosine similarity checks using TypeScript.

Understanding Cosine Similarity and Its Applications in AI

Cosine similarity measures two non-zero vectors of an inner product space. It is defined as the cosine of the angle between them, which is calculated using this formula:

d = 1.0 - sum(Ai*Bi) / sqrt(sum(Ai*Ai) * sum(Bi*Bi))

Here, Ai and Bi are components of vector A and B, respectively.

In Artificial Intelligence, cosine similarity has a wide range of applications. Natural Language Processing (NLP) often uses it to measure the similarity between documents or sentences. This can be useful in systems like search engines, where you want to rank documents by their relevance to a query.

In recommendation systems, cosine similarity can suggest items “similar” to what a user has already liked or purchased. This is often seen in e-commerce platforms where the “You might also like” section is populated using similarity measures.

Implementing Cosine Similarity in TypeScript

Now that we understand what cosine similarity is and its applications in AI, let’s dive into how we can implement this in TypeScript.

// Define the function to calculate cosine similarity
function cosineSimilarity(A: number[], B: number[]): number {
    // Initialize the sums
    let sumAiBi = 0, sumAiAi = 0, sumBiBi = 0;

    // Iterate over the elements of vectors A and B
    for (let i = 0; i < A.length; i++) {
        // Calculate the sum of Ai*Bi
        sumAiBi += A[i] * B[i];
        // Calculate the sum of Ai*Ai
        sumAiAi += A[i] * A[i];
        // Calculate the sum of Bi*Bi
        sumBiBi += B[i] * B[i];
    }

    // Calculate and return the cosine similarity
    return 1.0 - sumAiBi / Math.sqrt(sumAiAi * sumBiBi);
}

In this function, we’re iterating over the elements of vectors A and B, calculating the sums of Ai*Bi, Ai*Ai, and Bi*Bi, and then using these sums to calculate the cosine similarity.

Applying Cosine Similarity in Real-World Scenarios

Cosine similarity can be used in a variety of real-world scenarios. Let’s explore a few examples.

Example 1: Document Similarity

Let’s say we have two documents and want to measure their similarity. First, we need to convert these documents into vectors. For simplicity, we’ll use a basic bag-of-words model for text representation.

// Function to get all unique words from multiple texts
function getUniqueWords(...texts: string[]): string[] {
    let words = new Set<string>();
    texts.forEach(text => text.split(/\b/).forEach(word => words.add(word.toLowerCase())));
    return Array.from(words);
}

// Modified bag-of-words model for text representation
function textToVector(text: string, uniqueWords: string[]): number[] {
    let wordMap = new Map<string, number>();
    uniqueWords.forEach(word => wordMap.set(word, 0));
    text.split(/\b/).forEach(word => wordMap.set(word.toLowerCase(), (wordMap.get(word.toLowerCase()) || 0) + 1));
    return Array.from(wordMap.values());
}

// Function to calculate cosine similarity between all pairs of documents
function calculateDocumentSimilarities(docVectors: number[][]): number[][] {
    let similarities: number[][] = [];
    for (let i = 0; i < docVectors.length; i++) {
        let docSimilarities: number[] = [];
        for (let j = 0; j < docVectors.length; j++) {
            docSimilarities.push(cosineSimilarity(docVectors[i], docVectors[j]));
        }
        similarities.push(docSimilarities);
    }
    return similarities;
}

// Function to find the most similar document to a given document
function findMostSimilarDocument(docIndex: number, docSimilarities: number[][]): number {
    let maxSimilarity = -Infinity, mostSimilarDocIndex = -1;
    for (let i = 0; i < docSimilarities[docIndex].length; i++) {
        if (i !== docIndex && docSimilarities[docIndex][i] > maxSimilarity) {
            maxSimilarity = docSimilarities[docIndex][i];
            mostSimilarDocIndex = i;
        }
    }
    return mostSimilarDocIndex;
}

// Convert the documents to vectors
let doc1 = "The quick brown fox jumps over the lazy dog";
let doc2 = "The dog is brown and the fox is quick";
let doc3 = "The fox is quick and the dog is lazy";
let doc4 = "The lazy dog is jumped over by the quick fox";

let uniqueWords = getUniqueWords(doc1, doc2, doc3, doc4);
let docVectors = [doc1, doc2, doc3, doc4].map(doc => textToVector(doc, uniqueWords));

// Calculate the cosine similarity between all pairs of documents
let docSimilarities = calculateDocumentSimilarities(docVectors);

// Find the most similar document to Doc 1
let mostSimilarDocIndex = findMostSimilarDocument(0, docSimilarities);
console.log(`The most similar document to Doc 1 is: Doc ${mostSimilarDocIndex + 1}`);

In this example, we first convert all documents to vectors. Then, we calculate the cosine similarity between all pairs of documents. Finally, we find the most similar document to a given document.

Please note that this is still a simplified example. In a real-world scenario, you would likely use more sophisticated methods for text representation (like TF-IDF or word embeddings) and similarity calculation (like adjusting for document length).

Example 2: Recommendation Systems

In recommendation systems, cosine similarity can suggest items “similar” to what a user has already liked or purchased. Here’s a simplified example:

// User-item matrix (for simplicity, we're using a binary preference: 1 for liked, 0 for not liked)
let userItemMatrix = [
    [1, 0, 1, 0, 1, 0, 1, 0, 1, 0], // User 1
    [1, 1, 0, 1, 0, 0, 1, 0, 1, 1], // User 2
    [0, 1, 1, 1, 0, 1, 0, 1, 0, 1],  // User 3
    [1, 0, 1, 0, 1, 1, 0, 1, 0, 0],  // User 4
    [0, 1, 0, 1, 0, 0, 1, 1, 1, 1]   // User 5
];

// Calculate the cosine similarity between all users
let userSimilarities: number[][] = [];
for (let i = 0; i < userItemMatrix.length; i++) {
    let similarities: number[] = [];
    for (let j = 0; j < userItemMatrix.length; j++) {
        similarities.push(cosineSimilarity(userItemMatrix[i], userItemMatrix[j]));
    }
    userSimilarities.push(similarities);
}

// Function to recommend items for a user based on the preferences of similar users
function recommendItems(userIndex: number, userSimilarities: number[][], userItemMatrix: number[][]): number[] {
    let recommendations: number[] = [];
    let similarUsers = userSimilarities[userIndex];
    for (let i = 0; i < userItemMatrix[0].length; i++) {
        if (userItemMatrix[userIndex][i] === 0) { // If the user hasn't liked the item yet
            let weightedSum = 0, sumSimilarities = 0;
            for (let j = 0; j < userItemMatrix.length; j++) {
                if (j !== userIndex) { // Don't include the user himself/herself
                    weightedSum += userItemMatrix[j][i] * similarUsers[j];
                    sumSimilarities += similarUsers[j];
                }
            }
            if (weightedSum / sumSimilarities > 0.5) { // If the average preference of similar users is more than 0.5
                recommendations.push(i);
            }
        }
    }
    return recommendations;
}

// Recommend items for User 1
let recommendations = recommendItems(0, userSimilarities, userItemMatrix);
console.log(`Recommended items for User 1: ${recommendations}`);

In this example, we first calculate the cosine similarity between all users. Then, we define a function to recommend items for a user based on similar users’ preferences. If the average preference of similar users for an item (weighted by their similarity to the user) is more than 0.5, and the user hasn’t liked it yet, we recommend it.

Example 3: Social Media Feed-Like Algorithm

This algorithm aims to show posts from users that are similar to the current user. We’ll use cosine similarity to measure user similarity based on liking patterns.

// Define the function to calculate cosine similarity
function cosineSimilarity(A: number[], B: number[]): number {
    // Initialize the sums
    let sumAiBi = 0, sumAiAi = 0, sumBiBi = 0;

    // Iterate over the elements of vectors A and B
    for (let i = 0; i < A.length; i++) {
        // Calculate the sum of Ai*Bi
        sumAiBi += A[i] * B[i];
        // Calculate the sum of Ai*Ai
        sumAiAi += A[i] * A[i];
        // Calculate the sum of Bi*Bi
        sumBiBi += B[i] * B[i];
    }

    // Calculate and return the cosine similarity
    return 1.0 - sumAiBi / Math.sqrt(sumAiAi * sumBiBi);
}

// Users and posts
let users = [
    { id: 1, name: "User 1" },
    { id: 2, name: "User 2" },
    { id: 3, name: "User 3" },
    { id: 4, name: "User 4" },
    { id: 5, name: "User 5" }
];

let posts = [
    { id: 1, content: "Post 1" },
    { id: 2, content: "Post 2" },
    { id: 3, content: "Post 3" },
    { id: 4, content: "Post 4" },
    { id: 5, content: "Post 5" },
    { id: 6, content: "Post 6" },
    { id: 7, content: "Post 7" },
    { id: 8, content: "Post 8" },
    { id: 9, content: "Post 9" },
    { id: 10, content: "Post 10" }
];

// User-post interaction matrix (for simplicity, we're using a binary preference: 1 for liked, 0 for not liked)
let userPostMatrix = [
    [1, 0, 1, 0, 1, 0, 1, 0, 1, 0], // User 1
    [1, 1, 0, 1, 0, 0, 1, 0, 1, 1], // User 2
    [0, 1, 1, 1, 0, 1, 0, 1, 0, 1],  // User 3
    [1, 0, 1, 0, 1, 1, 0, 1, 0, 0],  // User 4
    [0, 1, 0, 1, 0, 0, 1, 1, 1, 1]   // User 5
];

// Calculate the cosine similarity between all users
let userSimilarities: number[][] = [];
for (let i = 0; i < userPostMatrix.length; i++) {
    let similarities: number[] = [];
    for (let j = 0; j < userPostMatrix.length; j++) {
        similarities.push(cosineSimilarity(userPostMatrix[i], userPostMatrix[j]));
    }
    userSimilarities.push(similarities);
}

// Function to generate a feed for a user based on the posts liked by similar users
function generateFeed(userIndex: number, userSimilarities: number[][], userPostMatrix: number[][]): any[] {
    let feed: any[] = [];
    let similarUsers = userSimilarities[userIndex];
    for (let i = 0; i < userPostMatrix[0].length; i++) {
        if (userPostMatrix[userIndex][i] === 0) { // If the user hasn't liked the post yet
            let weightedSum = 0, sumSimilarities = 0;
            for (let j = 0; j < userPostMatrix.length; j++) {
                if (j !== userIndex) { // Don't include the user himself/herself
                    weightedSum += userPostMatrix[j][i] * similarUsers[j];
                    sumSimilarities += similarUsers[j];
                }
            }
            if (weightedSum / sumSimilarities > 0.5) { // If the average preference of similar users is more than 0.5
                feed.push(posts[i]);
            }
        }
    }
    return feed;
}

// Generate a feed for User 1
let feed = generateFeed(0, userSimilarities, userPostMatrix);
console.log(`Feed for User 1:`, feed);

In this example, each user and post is represented as a JSON object with an id and a name or content field, respectively. The generateFeed function returns a feed consisting of post objects that the user hasn’t liked yet, but are liked by similar users. The posts are included in the feed if the average preference of similar users (weighted by their similarity to the user) is more than 0.5.

Please note that this is a very simplified example. In a real-world scenario, a social media feed algorithm would consider many other factors, such as the recency of the posts, the interactions between the users, the overall popularity of the posts, and so on. The algorithm would also likely use more sophisticated methods for user representation and similarity calculation.

Conclusion

Cosine similarity is a powerful tool in data science and artificial intelligence. It allows us to quantify the similarity between vectors, representing a wide range of entities, from documents in a text analysis task to users in a recommendation system.

In this post, we’ve explored how to implement cosine similarity checks in TypeScript. We’ve also delved into some practical applications of cosine similarity, including document similarity checks, recommendation systems, and even a simplified social media feed algorithm.

It’s important to note that while these examples provide a good starting point, real-world applications often require more sophisticated methods and considerations. For instance, text analysis tasks may benefit from more advanced text representation methods like TF-IDF or word embeddings, and recommendation systems may use techniques like collaborative filtering or matrix factorization.

Nevertheless, understanding the basics of cosine similarity and how to implement it in code is a crucial first step. As you continue your journey in data science and AI, you’ll find that this concept is a valuable tool in your toolkit.

Crafting Cosine Similarity Calculations in TypeScript: A Comprehensive Guide

Dwayne

Leave a ReplyCancel reply