Episode 6 - Chain of Density

[00:00:15] Speaker A: Mark, it's good to see you. Thank you for coming on the podcast. [00:00:19] Speaker B: It's my honor. [00:00:21] Speaker A: So today we're talking about AI and chat GBT. [00:00:28] Speaker B: Yes. [00:00:29] Speaker A: And in particular, I know you've been doing some research on prompt engineering. [00:00:37] Speaker B: Yeah. And it's a new field for me. [00:00:43] Speaker A: What is prompt engineering like? [00:00:47] Speaker B: The potential of chat GPT 4.0 is unlimited, and we need to find how it works better. [00:01:00] Speaker A: I love that explanation, because usually when we think of prompts, it's like telling chat GBT what you want it to do, which basically is what a prompt is. But you framed it in a way, kind of like we want to know Chachipiti's potential, so we are kind of programming it to reach its potential. Yeah, love it. So one of the types of prompts that we're talking about is chain of density. And what is chain of density? [00:01:49] Speaker B: Okay, first, Chen means that it's an iterate enhancement, and the density means that the ratio and token, the entity and token ratio that makes you understand this article. And there's a lot of information. [00:02:17] Speaker A: Okay, so it's a way to summarize. [00:02:22] Speaker B: Yeah. [00:02:22] Speaker A: And I think in the article that introduced this, they were summarizing a news article. [00:02:30] Speaker B: Yeah. The trend of density notion is initially for summary, short news. [00:02:37] Speaker A: Okay. And you said some things earlier about tokens and density, but what does it mean by density in this context? [00:02:53] Speaker B: If we want to learn about a long paragraph, then we need to know many entities in it. And if the entity is too much, then maybe we couldn't understand that well. So the density needs to strike balance between readability and information. [00:03:17] Speaker A: Okay. And you mentioned entity too. What's an entity? [00:03:23] Speaker B: In this research, the entity is defined as some names or some particular specific words. Okay. Like in the short news, for example, f one race, then maybe the racing team, the driver, or the race name, we call it ground pricks. And how many laps or how's the penalty? Those are missing entities. [00:04:01] Speaker A: Okay, so kind of like it can be names, places, concepts, those sort of things we're calling entities. [00:04:09] Speaker B: Yeah. [00:04:10] Speaker A: Okay. What is this type of prompt trying to accomplish then? What's the end goal? [00:04:21] Speaker B: The end goal is to summarize short news in the iteration way, and in each iteration it enhance better. And the final goal is to have less lead bias summary. And you could understand this article very well. [00:04:54] Speaker A: Okay, so it's not enough then to just ask chat GPT, and we're on version 4.0 as of today. It's not enough just to tell Chat GPT, summarize this news article, please. [00:05:12] Speaker B: Yeah, we call it vanilla Prompt. Like just talk, just ask GPT to summarize in concise short words and then GPT will give you sparse summary. [00:05:29] Speaker A: Okay, so it's really not that great. [00:05:31] Speaker B: The result, it's so so. [00:05:35] Speaker A: Okay, and what else is it? I read in the article or somewhere that the summaries, often when you do a plain vanilla prompt, they don't summarize the whole article for whatever reason, they pick places in the article to summarize. [00:05:57] Speaker B: Yeah, like if there's three paragraphs in this article, then maybe the vanilla prompt will just get the third paragraph of this article and it's biased. It has lead bias. [00:06:15] Speaker A: Okay, so this chain of density technique fixes that. Were the results? Is it more spread out the summary? [00:06:26] Speaker B: Yeah, it takes information equally from the first to the last paragraph. And because we have set requirement a guideline to summary, like we indicate the missing entity and there is a limitation to summary. So this chain of density notion could have better performance than even human summary. [00:07:04] Speaker A: Cool. So what does the prompt look like? What are the elements? I mean, you mentioned defining entities and that sort of thing, but what does the prompt kind of look like? [00:07:18] Speaker B: It's like programming. Okay, like first you set a goal like SGPT to generate increasingly and concise entity, then summary. And this is the end goal. And then you need to set many steps because it's a chain of density. So you need to set step by step. And the step is like first you need to identify one to three informative entities. And the second step is to write a new denser summary of the identical length which covers every entity and detail from the previous summary plus the missing entities. And then you need to define what the missing entity is. And I think this work is fantastic. You could change the missing entity term into any other terms. If you have defined it well, then any terms will work. And the initial research is like this. The missing entity is relevant to the man story and specific, descriptive, yet concise, and it's defined five fourth or fewer. And it should be novel. That means it didn't appear in the previous summary and it should be faithful. That means it should be present in the initial article, and it shows that the missing entity could locate it anywhere in the article. So there's a specific definition of the term missing entity. And GPT will follow this guideline to generate iteration of summary. And also you should set the guideline that the first summary could be long, sparse, and by fusion or rewriting the previous summary, GPT will make space for the new missing entities and never drop any entity from the previous summary. [00:09:47] Speaker A: That's fascinating. [00:09:48] Speaker B: Yeah. [00:09:49] Speaker A: So defining the missing entities and telling it to repeat five times, is this what? [00:09:59] Speaker B: Oh yeah, but you could set the times. It depends on how long your article is, I think. [00:10:06] Speaker A: And you also set the length of the summary? [00:10:10] Speaker B: Yeah, it depends. Also, if it's a short news, maybe 80 words is enough. But maybe you need to generate a summary of four page long article. Then maybe 200 words is good. [00:10:28] Speaker A: Okay. How does it do after each repeat? Let's say we're doing five repeats. 12345 did the research kind of say how it performs after each repeat? After each iteration? [00:10:50] Speaker B: Yeah, there's a trade off between informative and readability. And the research finds out that after five times the informative, it is too much informative and make it unreadable. Some kind. So maybe, I think it depends on the land of your initial article. You want tragedy to summary. [00:11:24] Speaker A: Yeah, it seems intuitive. So I mean, if you are asking for a short summary after a certain number of iterations, the missing entities will just. [00:11:36] Speaker B: Every entity in the initial article and then makes the summary unreadable. [00:11:45] Speaker A: So have you tried this? [00:11:46] Speaker B: Yeah. [00:11:47] Speaker A: What do you think? [00:11:49] Speaker B: I think maybe a short paragraph, 80 words and four times is enough. But for a long article, maybe 200 words and like five times is good. [00:12:09] Speaker A: And I know that you've experimented with extending this beyond the newspaper report. [00:12:18] Speaker B: Oh yeah. Because the initial research is for short news and there are many information that's less than five words. But if I want to use this notion to summarize a long article, and as we know in legal articles, there are many reasoning. And reasoning is absolutely above than five words. So I changed the initial indicator, missing entity into missing notions, like one to three sentences, and define the missing notion, kind of like the missing entities, relevant, specific, novel, faithful, and anywhere. But if you use missing entity to summarize a long article, then my experience is the summary will include many names, like the author's name, the school's name, or the paper's name, but not what you want. Really. [00:13:42] Speaker A: That's fascinating. And so creative too. Like missing notions, I guess. Right. And then extending that to instead of a number of words to a number of sentences and using it for case summarization. Yeah, because like you said, cases contain more than just entities. Right? They're notions. Right. There's an argument, there's a logic that may cross over a number of sentences, so you have to grab those notions for the summary. So then I'm assuming that your summary window is more than 80 words. You're making it longer than that. [00:14:32] Speaker B: Yeah, it depends on the length of the article. Like, if it's a four page article, then maybe 200 words will be suitable. [00:14:41] Speaker A: Okay. And how did you, did you experiment with the missing notion idea? [00:14:47] Speaker B: Yeah, it works better. I use the notion of chain of density to summarize the paper. Chain of density. And the third step is using missing entity. And the output is like author's name and school's name. And then I try to fix this problem and I change notion to entity to notion. And then Chatipiti could somehow figure out what I want. Like, if I defined missing notion as one to three sentences and is informative, then the output is better and the summary is readable also. [00:15:40] Speaker A: And did you change anything about the definition of missing? Like the faithful, the novel, did those change that? Okay, so those stay the same. [00:15:51] Speaker B: Yeah. Okay. [00:15:57] Speaker A: So you're finding that it's summarizing the article, it's more readable than the missing entity. [00:16:03] Speaker B: Yeah. Because in an article, like a legal article of this paper, we don't really care about the author's name. So I think the main point is to define what you want. It doesn't matter how you call this information the information you want, and you define that clearly, then GPT will fulfill your. [00:16:36] Speaker A: So we have short articles. We have long articles. Any other things you experimented with in this chain of density? [00:16:44] Speaker B: I've tried translation. Like, the main point of Chen is to have iteration of enhancement. So I tried in translation. Like, the first step is, the main goal is to generate translation that fit Taiwanese traditional character. And our words. Like, sometimes you have vanilla prompt of translation, translate into Chinese. GPT would translate into simplified characters and some China words. So the end goal is to generate a suitable Taiwanese translation. And there are two steps. The first step is to translate word by word. Then it will change a very clear and not clear. Very. [00:18:02] Speaker A: Ordinary. [00:18:02] Speaker B: Yeah, very ordinary translation. [00:18:06] Speaker A: Ordinary meaning, maybe not good, but not good enough or something like that. [00:18:11] Speaker B: Yeah, but it's correct. Okay. Yeah, the meaning is correct. And then we need to teach GPT to enhance this translation into Taiwanese traditional character or Taiwanese words. So I asked GPT to play a role. Like you're a professional journalist and you've been doing translation in economics and the New York Times traditional Taiwanese version. And the style of translation should be like this. And then you need to find five words that is not coherent with Taiwanese words. And the step Three is that you enhance this and you rewrite until it's coherent with Taiwan. Taiwanese words. [00:19:22] Speaker A: That's fascinating. In the result. [00:19:25] Speaker B: And the result is like, we can tell it's a GPT translated paragraph. [00:19:33] Speaker A: Really? [00:19:34] Speaker B: Yeah. [00:19:34] Speaker A: So you're taking the idea, the chain of density idea and kind of modifying it, the iterative approach and just asking it to be more in the style of Taiwanese writing. And it accomplishes that? [00:19:52] Speaker B: Yeah, because in vanilla prompt we could see that the output is GPT translated. There are many words. The writing style is just like English, it's not in Taiwanese writing style. And then if you ask GPT to play a role and still inconsistent with the original meaning, then the output is fantastic. [00:20:25] Speaker A: And are you going to publish those prompts somewhere? [00:20:30] Speaker B: What do you mean? [00:20:32] Speaker A: I don't know. This sounds like a good topic for an article. The work you've done on extending chain of density into this missing notion idea and even the translation, I think it's fascinating and the world needs to know about it. [00:20:49] Speaker B: Thank you a lot. It's my honor. [00:20:51] Speaker A: Anything else? If not, thank you so much. [00:20:57] Speaker B: Thanks a lot. [00:20:58] Speaker A: Take care. [00:20:58] Speaker B: Thank you.

Show Notes

Episode Transcript

Other Episodes

Episode 8

Episode 8 - Law, Technology, and Art

Episode 7

Episode 7 - ChatGPT and Negotiation Training

Episode 3

Privacy Concerns on Social Media in the Digital World