| New Reply |
A set of all possible tweets |
Share Thread | Thread Tools |
| Apr17-12, 09:43 PM | #1 |
|
|
A set of all possible tweets
Hello PF,
I have a question. I am an artist working with big numbers and language. I created a twitter account that will post all possible tweets. (Edit: link removed) I am interesting in calculating the probability of specific tweets. Example: what is the probability that my next tweet will be "End here. Us then. Finn, again! Take. Bussoftlhee, mememormee!" or just gibberish like "j^F9c@# 64l[". Is this doable? I want to use this to reduce other tweets on twitter to probabilities, I like the idea of converting meaningful language into a number. Also, How do I calculate the probability of a specific sequence of tweets? And ok... I realize that these numbers are really really big, that the probability is basically zero, but I am not interested in reality. THANKS! |
| Apr19-12, 06:44 AM | #2 |
|
|
|
| Apr19-12, 12:05 PM | #3 |
|
|
Even if we assume that the probability of each character occuring is the same and that the probabilities are independent of what other characters are, we still need to consider tweets with <140 characters. E.g. there are k100 100-character tweets.
Therefore, the total amount of all possible non-empty tweets is [itex]\sum[/itex] [itex]^{140}_{i=1}[/itex] ki, and the probability of observing any given tweet is then 1/[itex]\sum[/itex] [itex]^{140}_{i=1}[/itex] ki. |
| Apr19-12, 12:32 PM | #4 |
|
|
A set of all possible tweetsThis is now a much more interesting problem, since you have to calculate the probability that a random string contains a syntactically correct url. |
| Apr19-12, 07:04 PM | #5 |
|
|
|
| Apr19-12, 07:51 PM | #6 |
|
|
|
| Apr19-12, 08:03 PM | #7 |
|
|
|
| Apr19-12, 08:15 PM | #8 |
|
|
Take a look at grammars and construct of realizations of a particular grammar. Then you can supply a dictionary, a set of conditional probability distributions and then you can generate the things using a random number generator for the distributions and grammar distributions.
The grammars will markovian in nature and the specificity of this will depend on how you arrange the tags, how they are linked together and their internal structure vs global structure. |
| Apr19-12, 08:26 PM | #9 |
|
|
|
| Apr19-12, 08:31 PM | #10 |
|
|
Don't confuse grammar with English grammar: it's a general grammatical structure. |
| Apr20-12, 02:35 AM | #11 |
|
|
|
| Apr20-12, 03:25 AM | #12 |
|
|
The grammar can be as detailed and as complex as you want where greater complexity allows you to have possibilities that you could not have and gives you more control than you would have lower complexity, but again it's in the grammar definition. As an example of what I mean at its most extreme, you could have all possible sentences in a tag each and then create an output tag that is basically an XOR statement of all the possible leaf tag definitions. Although you wouldn't do this, the point is that the grammatical structure can generate whatever you want it to generate. Of course you wouldn't do this: you would get a linguist to specify the semantics and syntantic issues to generate the final grammar which would be optimal in terms of description: in other words you want to generate a grammar of minimum complexity while retaining all the semantic and syntactic information for the valid realizations: in other words you are solving a kind of optimization problem with the constraints determined by the syntax, semantics and other relevant information that a linguistic specialist would supply. This is a language independent phenomenon and you could apply it even to Mandarin, just as you can apply it to representing the data structure of a bitmap, just as for specifying english text. As long as the alphabet is quantized (and you could extend it to a non-quantized alphabet in terms of the idea involved), then the idea doesn't change. |
| Apr20-12, 06:14 AM | #13 |
|
Mentor
|
I don't know what characters are allowed in a tweet, so I'll guess that there are 70 of them (26 lowercase letters, 26 uppercase letters, 10 numbers, a few non-alphabetic symbols). So there are 70^140 ≈ 2.05932837 × 10258 possible tweets. For comparison, the current age of the universe (≈13.7 billion years) is less than 1018 seconds. If you could generate a billion tweets per second for 13.7 billion years, then you will have generated about 4.32 × 1026 messages. That's a lot, right? But the number of tweets you still haven't generated is approximately 2.05932837*10^258 - 4.32*10^26 = 2.05932837*10^258. So the number of tweets you still haven't generated is essentially unchanged...after a billion tweets per second for 13.7 billion years. How is this possible? 2.05932837*10^258 is a 259-digit number that starts with 2059328370000000000 (240 more zeroes after that). The computer has obviously rounded off to 9 significant figures. When we subtract the 27 digit number 4.32*10^26 from that, we get a 259-digit number that starts with 20593283699999999999 and then has nothing but nines until the last 28 digits. So when the computer displays the answer of the subtraction it rounds off 2.0593283699...(220 more nines, followed by 28 more digits) to 2.05932837. The error introduced by this roundoff is completely insignificant compared to the error that was introduced by keeping only 9 significant figures in the original calculation of 70^140. We would have had to keep at least 232 significant figures just to see that the number of remaining tweets will be smaller after 13.7 billion years. I haven't tried to calculate this, but I think the probability that any of the tweets generated in those 13.7 billion years will make sense is extremely small (if they are generated randomly). |
| Apr20-12, 09:08 AM | #14 |
|
|
|
| Apr24-12, 06:00 PM | #15 |
|
|
|
| Apr24-12, 08:27 PM | #16 |
|
|
|
| Apr25-12, 07:01 AM | #17 |
|
|
For example, twitter tweets are not independant at all, hence the calculating the probabilities get infinitely more complicated as you have to factor in external events. Basically without investment your millions of dollars into research I don't see how what you want to do can be done. |
| New Reply |
| Tags |
| big numbers, twitter |
| Thread Tools | |
Similar Threads for: A set of all possible tweets
|
||||
| Thread | Forum | Replies | ||
| Libyan tweets saying Gaddafi is out! | Current Events | 15 | ||
| Octavia Nasr tweets her way out of CNN | Current Events | 80 | ||
| 07/13/09 PHD comic: 'Great Tweets of Science' | Science Comics | 1 | ||