Shreyas Ragavan: 2023-12-16-084601 Brief thoughts on curation and AI assisted tools

[2023-12-16 Sat 08:46]

KooPingShung talks about curation being one potential way to leverage LLM's, but that familarity with a topic is needed to counter the hallucination. He also mentions that this familiarity is generated by a person identifying the reliable experts in the field, and perhaps then cross-validating the LLM's claims with their content.

This seems to be the model followed by many newsletters on Substack - wherein exclusive insight is offered by experts supposedly reading 1000's of papers, blog articles and 'everything on the web' to generate a single newsletter that will typically claim to amplify your consciousness in AI. And, there are also those who simply share their thoughts based on their experience rather proclaiming it is from consuming the web.

Lets assume that among the multitude of such newsletters that offer exclusive content only to paid subscribers, atleast a fraction of them would have great content. It is not for everyone since well, subscribing to them all will by itself likely cost several 100's of dollars per year on its own. The next and significantly more critical issue is actually, systematically consuming all their content in a way deeper than 'skimming', leading to insight or wisdom.

This is one area where an LLM could be useful. However, what guarantee is there that the LLM will not hallucinate and project incorrect concepts even from this refined source of information? It would be useful if the tool could specifically cite the information source for example.

The process of research is what provides familarity. A meaningful familiarity, is more likely to be produced when one digs into the material, summarises and forms notes connecting various concepts, i.e. this fundamental familarity especially with a new topic is not aided by an AI assistant or LLM producing a mish-mash of research notes (of varying quality, with or without citations).

With an LLM gathering research notes, I think it becomes increasingly important to distinguish what (and how much) was generated by the LLM and which notes are our own distillations. Perhaps, in this sense of discourse with an AI assistant - hallucination may not be a thing restricted to the LLM, and could apply to the human operator and their biases as well.

The interest in 'AI assisted tools' predates the current LLM phase, and we will likely have some kind of LanguageModel? based tool at some point for common access, like NotebookLM? from Google, which has been reported on by Wired. This excerpt provides an introduction:

NotebookLM? ... starts by creating a data set of your source material, which you drag into the tool from Google Docs or the clipboard. After the app has digested it all, you can then ask NotebookLM? questions about your material, thanks to Google’s large language model technology—partly powered by its just-released upgrade Gemini. The answers reflect not only what’s in your source material but also the wider general understanding of the world that Gemini has. A critical feature is that every answer to your queries comes with a set of citations reporting where exactly the information came from, so users can check the accuracy of its output. - Wired Article and Readwise highlights

It worth noting that the author mentions that one does not have to necessarily perform a discourse with the app, like in ChatGPT?, though NotebookLM? in its current form is apparently quite chatty. Though the opinions are restricted to the material provided and citations are also provided, one can also request a critical or an alternate view of the material, which promotes reflection and thinking.

The author ultimately notes that it was familiarity with a topic which helped him distinguish 'which opinion belonged to whom', and 'form his own informed opinion' on the topic.

The Zettelkasten approach, explained by Ahrens in his book ((Ahrens, Sonke, 2017)) is fundamentally about building a digital garden of notes. The digital garden seems to have become popular in recent years, and there are folks like SimonSpati who share their interconnected ZettelkastenNotes online. The emphasis behind the approach is however that one has to curate those notes with a 'presence of mind' or 'intention' in order to actually understand the material, and also always work on linking said notes to other notes in the garden. These other notes could be of a directly or indirectly related topic which leads to the birth of 'insight' and discovery, if not a robust base of knowledge as well as wisdom derived from said knowledge.

I think the best conections translating to insight can be formed mostly with an intimate relationship with the research notes - which is facilitated by personally gathering them, as opposed to only relying on a machine. However, it also makes sense that with networked knowledge - one can also potentially reach that stage of familiarity quickly on some topics. In such cases, an AI assistant could be invaluable in extending said research sensibly, in a controlled manner.