Abstract

The exponential growth of digital content presents an exciting opportunity to enhance reading experiences, even as attention spans and comprehension rates face challenges. SnipyAI rises to the occasion with a groundbreaking Chrome extension that brings Large Language Models (LLMs) directly into web browsing.

By utilising Meta Llama 3.1 through Modus and Hypermode's infrastructure, this extension empowers users to elevate their reading comprehension with a simple Cmd + K interface. Users can select any text to receive AI-powered summaries, explanations, and insights, making complex digital content more accessible and understandable.

Deliverables :

Chrome Extension - wxt.dev, React, Shadcn UI, Framer Motion
GraphQL API - Modus, Hypermode, Meta Llama 3.1
Source Code : https://github.com/MahendraDani/snipy
Launch Video : https://youtu.be/aNhvGUf8-JU

Blog Series :

During the hackathon, I wrote a series of blog posts that documented my journey, from participating in the event to gaining a deeper understanding of Modus and developing this project. This post serves as the final submission for the hackathon, summarizing the project's outcome. For a detailed account of my progress, challenges, and insights, I invite you to explore the previous blogs in this series. The following posts in the series have already received over 200+ page views:

Title	Description	Date	Link
ModusHack: KickOff	First impressions of Hypermode and Modus CLI	02/11/2024	https://mahendra09.hashnode.dev/kickoff
ModusHack: Research	Read articles, documentation and watched YouTube tutorials for understanding Modus.	10/11/2024	https://mahendra09.hashnode.dev/modushack-the-research
ModusHack: Idea	Identifying problem statements and finally landing on a project idea.	14/11/2024	https://mahendra09.hashnode.dev/modushack-the-idea

The Problem

In today's world, we spend most of our time online exploring new technologies and learning from various resources. Be honest—how much time do you actually spend reading articles, blogs, documentation, and papers compared to watching videos and shorts? How often do you repeatedly ask LLMs the same questions whenever you're stuck?

Reading and comprehending large texts is not easy.

Whenever I get stuck while reading blogs on Hashnode, I find myself copy/pasting chunks of texts into LLMs (ChatGPT, Claude, etc) and asking the same repetitive questions : “What is the meaning of above text”, “Rewrite the text in casual tone”, “Summarise in 30-40 words”. But what’s the problem here :

Copy/pasting chunks of texts and juggling between multiple tabs breaks the reading flow.
Declining attention spans
Repeating the same steps when I revisit a specific article or resource after a few days.

What’s the big deal here?

In a podcast American Psychological Association with the psychologist Gloria Mark over the past couple of decades the average attention span has declined from two and half minutes to 47 seconds, which is the reasons, many of us spending hours watching reels and shorts.
Scott H. Young, author of the books “Ultralearning” and “Get Better At Anything,” discovered that “College graduates read an average of about six fewer books in 2021 than they did between 2002 and 2016.” With 2025 just around the corner, what will the numbers look like now?
An OCED survey conducted in 2018 by The Hindustan Times showed that the reading performance fell by 10 points on average in OCED countries. The study found that on average across the OECD, one out of four 15-year-olds tested as a low performer in maths, reading and science.

Blogging platforms like Hashnode, Medium, and Devto are facing a decline in both readers and writers. Overall, audiences are understanding textual content less effectively.

The Solution

You've likely noticed that many websites (Linear, Hashnode, Shadcn, etc), especially documentation sites, offer a CMD + K shortcut for global search . This feature enables users to quickly find what they're looking for.

Hear me out, what if LLMs were accessible on every website on the internet with just Cmd + K? If so, readers can easily chat with LLMs directly, quickly get their desired output without needing to open multiple tabs to understand different concepts.

Don’t run for LLMs, let them run for you globally and anywhere!

I developed a Chrome extension that adds a CMD+K like interface to any website, allowing users to easily query LLMs (Meta Llama 3.1) through a GraphQL API powered by Modus. This makes accessing information faster and more convenient than ever!

The Development

With the deliverables clearly defined, I began by developing the API, followed by building the Chrome extension, and finally integrating both services.

Observe the working diagram shown above, the user can select the chunks of text as context for LLM, hit CMD+K to open snipyAI and ask AI models directly from the default commands or custom prompts via API which is developed using Modus.

API Development

The API is developed using Modus CLI and deployed over Hypermode. I decided to use AsemblyScript for building the API service.

Configuration

The most crucial file is the app manifest modus.json in which the configuration of the API is written.

{
  "$schema": "https://schema.hypermode.com/modus.json",
  "endpoints": {
    "default": {
      "type": "graphql",
      "path": "/graphql",
      "auth": "bearer-token"
    }
  },
  "models": {
    "text-generator": {
      "sourceModel": "meta-llama/Meta-Llama-3.1-8B-Instruct",
      "provider": "hugging-face",
      "connection": "hypermode"
    }
  }
}

We set up GraphQL API generation using an AssemblyScript function and specify the LLM (Meta Llama 3.1) provided by Hugging Face, hosted on Hypermode’s infrastructure. Furthermore, we secure the API endpoints with Bearer Token Authentication to protect against cyber attacks or misuse.

The API serves a single GraphQL query askAI that handles both the default commands and custom prompts and returns LLM output based on the prompt type.

Model Invoking

Modus really reduces the pain of getting any AI model up and running quickly. In a few lines of code we can run LLMs using auto generated GraphQL API.

import { models } from "@hypermode/modus-sdk-as"
import {
  OpenAIChatModel,
  ResponseFormat,
  SystemMessage,
  UserMessage,
} from "@hypermode/modus-sdk-as/models/openai/chat"

const modelName: string = "text-generator"

export function generateText(instruction: string, prompt: string): string {
  const model = models.getModel<OpenAIChatModel>(modelName)
  const input = model.createInput([
    new SystemMessage(instruction),
    new UserMessage(prompt),
  ])

  input.temperature = 0.7

  const output = model.invoke(input)
  return output.choices[0].message.content.trim()
}

But in the project, we offer 7 default commands, which are mostly repetitive and custom prompts. So the most important bit was to write SystemMessage for each prompt types for appropriate behaviour from the LLM.

Prompt Engineering

To achieve the best results, I studied prompt engineering from various sources, including the Prompt Engineering Guide, Prompt Engineering Guide by Open AI, Prompt Engineering Best Practices: Tips, Tricks and Tools by Digital Ocean, and Prompt Engineering vs Blind Prompting.The key take aways after reading the above guides, docs and blogs are :

Provide Detailed context : subject, matter, scope
Provide Desired format : a list, report, summary , md, xml, etc
Specify Output length : 3 paragraphs, 250 words or 2 sentences
Specify Tone and style : formal, persuasive, informational, etc
Examples and comparison : ask to include analogies, comparisons, etc

I iteratively wrote SystemMessage for each prompt type, cross checked the models output with ChatGPT and Claude until I found results to be satisfactory. The system messages for each commands are :

Prompt	System Message
Explain Like I am five years old	You will be provided with a text or multiple paragraphs.Your task is to explain the text to a person who has minimal or no knowledge of the subject.Use a friendly tone and explain the answer within 50-80 words in plain text format. Please use examples and analogies or comparisons to explain the topic in a more concise and clear way.
Explain the topic from the text.	You will be provided with a text or multiple paragraphs. Your task is to understand the text and extract the core subject or topic.Explain the core subject or topic in 40-60 words briefly, providing examples and analogies and return the output in plain text format.
List key take aways	You will be provided with a text or paragraph. Follow these steps to answer the user queries. Step 1: First understand the context, subject, tone and style of writing in the provided text. Step 2: Based on the findings of step 1, list out in 4-5 points the key points of understanding from the text. Please provide with relevant points and avoid using the same words as in text. Please output only the points in step 2.
Write a longer summary	You will be provided with a text or multiple paragraphs.Please summarise the provided text based on the subject and topics explained in the text within 100-150 words. Explain the core topics in-depth and how they are used in the provided text. The summary should be in-depth and detailed based on the context provided in the text. Please provide bullet-points and analogies if necessary for better understanding in plain text format.
Write a short summary	You will be provided with a text or multiple paragraphs. Please summarise the provided text based on the subject and topics explained in the text within 30-50 words in plain text format. The summary should be short, concise and easy to understand.
Rephrase to use as a reference	You will be provided with a text or multiple paragraphs. The given text is to be used as an reference in a paper, article or blog. Your task is deduce a conclusion from the provided text and rephrase it within 20-30 words in assertive tone in plain text format
Rewrite in a single paragraph	You will be provided with text or multiple paragraphs. Your task is to convert the given text into a single paragraph. Please keep the paragraph small (30-50 words), concise and clear and output in plain text format.
Write a custom prompt	You will be provided with a text or multiple paragraphs. Based on the user query write an appropriate response. The response should be clear, concise and easy to understand and output in plain text. Answer within 60-100 words.

API Testing

The most crucial past is to test if the API is working as expected. I used Postman to run queries on GraphQL API generated using Modus. For testing I ran queries for all commands on different chunks of texts on various topics to see the LLM’s output. Here’s the output for the command “Explain like I am five years old” asking against the Nextjs documentation on Fetching data on client.

Query :

API Response :

The commands are designed to deliver clear and concise responses to queries on various topics and concepts, using examples and analogies to enhance understanding.

Building Chrome Extension

Chrome extensions are mostly developed using HTML, CSS and Javascript, but being a React developer it’s wasn’t easy. So I found a way to write code using React then bundle it down to HTML, CSS and JS and unpack into browser as an extension.

While there are a few alternatives like crxjs, I chose wxt.dev framework. I am using React, Shadcn UI, Tailwind CSS and Framer-Motion for developing the extension. This really boosts the development process and the bundling step is off-loaded to wxt.dev framework.

To inject custom HTML, CSS, and JavaScript into websites via a Chrome extension, I found that it's necessary to use content scripts. The SnipyAI extension achieves this by injecting a content script into a shadowRootUI, ensuring that the host website's UI remains unaffected.

For developing the CMD+K interface, I am utilizing the cmdk library alongside Shadcn UI. The key file in this setup is extension/content/CommandModal.tsx, which monitors the selection change event in the browser. When a user selects text on the screen, a button is inserted to open the command modal, and the hotkeys CMD+K and CMD+J are enabled to allow direct access to the command modal.

Please refer to the CommandModal.tsx file to understand how I implemented the UI in ~400 Lines of Code. I will try write a blog post specially for explaining the development of UI in depth after the hackathon.

Fetch Requests to GraphQL API

Modus auto-generates GraphQL API from AssemblyScript functions. To query the GraphQL, usually Apollo Client is used. But I found a way to query GraphQL API with Javascript’s built in Fetch API which is widely used to query HTTP/HTTPS API endpoints.

The GraphQL query for model invocation, askAI is:

query AskAI {
    askAI(
        promptType: "EXPLAIN_LIKE_FIVE"
        prompt: "We recommend first attempting to fetch data on the server-side.  However, there are still cases where client-side data fetching makes sense. In these scenarios, you can manually call fetch in a useEffect (not recommended), or lean on popular React libraries in the community (such as SWR or React Query) for client fetching.  app/page.tsx TypeScript  TypeScript  'use client'   import { useState, useEffect } from 'react'   export function Posts() {   const [posts, setPosts] = useState(null)     useEffect(() => {     async function fetchPosts() {       let res = await fetch('https://api.vercel.app/blog')       let data = await res.json()       setPosts(data)     }     fetchPosts()   }, [])     if (!posts) return <div>Loading...</div>     return (     <ul>       {posts.map((post) => (         <li key={post.id}>{post.title}</li>       ))}     </ul>   ) }"
    )
}

The graphQL query can be made for all prompts with the following function, by passing the graphQL query and variables in the request body.

const queryAI = async (promptType: string, prompt: string) => {
    setLoading(true);
    setResponse(null);
    setError(null);

    try {
      const res = await fetch("http://localhost:8686/graphql", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          Authorization: `Bearer ${import.meta.env.WXT_API_KEY}`,
        },
        body: JSON.stringify({
          query: `
            query AskAI($promptType: String!, $prompt: String!) {
              askAI(promptType: $promptType, prompt: $prompt)
            }
          `,
          variables: { promptType, prompt },
        }),
      });

      const result = await res.json();
      if (result.errors) {
        throw new Error(result.errors[0].message);
      }

      saveResponseToLocalStorage(result.data.askAI, promptType, prompt);
      setResponse(result.data.askAI);
    } catch (error) {
      console.error("Error:", error);
      setError("Oops! Someting went wrong please try again!");
    } finally {
      setLoading(false);
    }

Usage

Once the modal is opened, users can effortlessly run queries using both default commands and their custom prompts. This approach streamlines the interaction with LLMs, and the CMD + K interface makes it universally accessible on any website across the internet.

The above images shows the global behaviour of extension’s CMD + K interface UI across blogs on hashnode, graphQL documentation and Aaron Swartz essays. This shows that the chrome extension can be used with any website on the browser and with the same consistency and styles.

But does this come in light/dark mode theming? Heck Yeah!

Let’s have some fun, shall we? Let’s ask SnipyAI to explain quantum mechanics to a five years old :

Watch the demo video here : https://dub.sh/PRjqkH9

Self Hosting SnipyAI

There are over 1.5 Billion websites on the internet, and on an average a user visits more than 100 web pages everyday. If we make an assumption that a users runs 5 queries using SnipyAI on 20 web pages each day.

Then the AI model will be invoked approximately 5 × 20 = 100 times for a single user. As soon as the numbers of users increase, the model invoking increases exponentially and the tokens per request cost increases. To avoid huge bills, and provide most customisation, the API services can be self hosted easily. This allows users to manage their AI model invocations themselves while still retaining all the features of SnipyAI.

The steps for setting up the project locally read provided in the INSTALLATION.md

Conclusion

Through observing patterns in my reading habits, I identified a pressing issue: the decline in attention spans and the growing preference for short-form video content over traditional text. This realization became the foundation of my journey in the ModusHack Hackathon, where I documented my experience of leveraging Modus and Hypermode to develop SnipyAI.

SnipyAI is an AI-powered Chrome extension designed to simplify content comprehension. With a universal interface accessible through CMD + K, it enables users to run queries on LLMs directly from any webpage, transforming how we interact with textual content across the web.

This project directly impacts blogging platforms like Hashnode, Medium, etc by helping readers understand and comprehend rich blogs easily without any hassle and provides accessibility using LLMs with just CMD + K.

Hence, I can say that with SnipyAI,

Reading and comprehension becomes easier and LLMs help you with just CMD + K.

I want to thank Hashnode Team for organising this amazing hackathon and Hypermode for making such an amazing tool that helps build complex projects so easily.

If you’ve come this far, I request you to Self Hosting SnipyAI and use it yourself and provide me with your amazing feedback. I am determined to improve the tool and turn it into a published chrome extension.

Connect with me:

Website : https://mahendradani.vercel.app
Linkedin : https://linkedin.com/in/mahendra-dani
GitHub : https://github.com/MahendraDani
Hashnode : https://hashnode.com/@Mahendra09

SnipyAI - If Reading was easier and LLMs could help with just Cmd + K

Abstract

The Problem

The Solution

The Development

API Development

Configuration

Model Invoking

Prompt Engineering

API Testing

Building Chrome Extension

Fetch Requests to GraphQL API

Usage

Self Hosting SnipyAI

Conclusion

Comments (2)

ModusHacks

ModusHacks : Kickoff

More from this blog

ModusHack: The Idea

ModusHack: The Research

ModusHacks : Kickoff

OOPs : Classes and Objects in C++

Command Palette

Abstract

The Problem

The Solution

The Development

API Development

Configuration

Model Invoking

Prompt Engineering

API Testing

Building Chrome Extension

Fetch Requests to GraphQL API

Usage

Self Hosting SnipyAI

Conclusion

Comments (2)

ModusHacks

ModusHacks : Kickoff

More from this blog