python users - Forum - Virtual Popstar

python users

Warrior
World famous

Warrior wrote:
this is probably a long shot but I'm so fucked if I don't figure this out

so ...

I've taken a programming course this semester and I am way out of my depth here. However, I have to pass the class to continue my education. (nevermind that my field of study has nothing to do with programming, i just chose it cause I thought it would be 'fun'... I was wrong)

I have to 'compare the top 100 most frequent words in a specific data set with and without stopwords'

I think i have all the key features basically given to me by the teacher - but I have no idea how to actually use them?? Like programmed correctly in my python file, but they do *nothing* when I run the program.
I am pretty damn sure that I'm missing something very *obvious* but I've stared at it for hours and tried so many things, yet I don't get it.

This is the code:
ibb.co/rHj6HLf
ibb.co/MGWDk4J
ibb.co/bF4H6Yv

ICantWhistle
International star

ICantWhistle wrote:
okay i had a quick look, idk how your .csv looks like but I believe you have to pass your texts over to word_count(), once with the parameter stopword set to True and once to False. like this:

# should contain a list with all the words present in texts and the amount of times they appeared in it like { 'Gemini': 5, 'astrology': 11, ..} (or differently formatted lol)
counter_no_stopwords = word_count(texts, stopword=False)

# same as above but this one includes all the stopwords
counter_with_stopwords = word_count(texts, stopword=True)

since you only want the top 100 words, and the function returns you the list sorted (since default is sort=True, you can 'split' the two lists:

# this is the syntax to get the first 100 elements
counter_no_stopwords[:100]

# or do it in one step with the code above
counter_with_stopwords = word_count(texts, stopword=True)[:100]

now you have the top 100 for both cases and can compare them

word_count() does a lot; it splits up your text by utilizing the tokenizer function, loops thru it and increments the counter for every word, and spits out the sorted list

Warrior
World famous

Warrior wrote:

ICantWhistle wrote:
okay i had a quick look, idk how your .csv looks like but I believe you have to pass your texts over to word_count(), once with the parameter stopword set to True and once to False. like this:

# should contain a list with all the words present in texts and the amount of times they appeared in it like { 'Gemini': 5, 'austrology': 11, ..} (or differently formatted lol)
counter_no_stopwords = word_count(texts, stopword=False)

# same as above but this one includes all the stopwords
counter_with_stopwords = word_count(texts, stopword=True)

since you only want the top 100 words, and the function returns you the list sorted (since default is sort=True, you can 'split' the two lists:

# this is the syntax to get the first 100 elements
counter_no_stopwords[:100]

# or do it in one step with the code above
counter_with_stopwords = word_count(texts, stopword=True)[:100]

now you have the top 100 for both cases and can compare them

word_count() does a lot; it splits up your text by utilizing the tokenizer function, loops thru it and increments the counter for every word, and spits out the sorted list @Warrior

you are a lifesaver!! thank you so much!!

To load new posts: activated