You have not yet responded to the forum.

Here you will find the last 3 forum topics
you have posted a comment on.
+ add shout
Posturada
tired of studying
0 | 0 | 0 | 0
0%
To join the forums you need to be logged in.

Click here to register your own account for free and I will personally explain to you how you can start getting your own fans and, making popdollars.
> Close
Helper
11 of the 24 stars earned

Forum

General < General
python users
Warrior
World famous



 this is probably a long shot but I'm so fucked if I don't figure this out

so ...

I've taken a programming course this semester and I am way out of my depth here. However, I have to pass the class to continue my education. (nevermind that my field of study has nothing to do with programming, i just chose it cause I thought it would be 'fun'... I was wrong)

I have to 'compare the top 100 most frequent words in a  specific data set with and without stopwords'

I think i have all the key features basically given to me by the teacher - but I have no idea how to actually use them?? Like programmed correctly in my python file, but they do *nothing* when I run the program.  
I am pretty damn sure that I'm missing something very *obvious* but I've stared at it for hours and tried so many things, yet I don't get it.


This is the code:
ibb.co/rHj6HLf
ibb.co/MGWDk4J
ibb.co/bF4H6Yv
ICantWhistle
International star



okay i had a quick look, idk how your .csv looks like but I believe you have to pass your texts over to word_count(), once with the parameter stopword set to True and once to False. like this:

# should contain a list with all the words present in texts and the amount of times they appeared in it like { 'Gemini': 5, 'astrology': 11, ..} (or differently formatted lol)
 counter_no_stopwords = word_count(texts, stopword=False) 

# same as above but this one includes all the stopwords
 counter_with_stopwords = word_count(texts, stopword=True) 

since you only want the top 100 words, and the function returns you the list sorted (since default is sort=True, you can 'split' the two lists:

# this is the syntax to get the first 100 elements
counter_no_stopwords[:100]

# or do it in one step with the code above
 counter_with_stopwords = word_count(texts, stopword=True)[:100]

now you have the top 100 for both cases and can compare them 


word_count() does a lot; it splits up your text by utilizing the tokenizer function, loops thru it and increments the counter for every word, and spits out the sorted list @Warrior 
Warrior
World famous



ICantWhistle wrote:
okay i had a quick look, idk how your .csv looks like but I believe you have to pass your texts over to word_count(), once with the parameter stopword set to True and once to False. like this:

# should contain a list with all the words present in texts and the amount of times they appeared in it like { 'Gemini': 5, 'austrology': 11, ..} (or differently formatted lol)
 counter_no_stopwords = word_count(texts, stopword=False) 

# same as above but this one includes all the stopwords
 counter_with_stopwords = word_count(texts, stopword=True) 

since you only want the top 100 words, and the function returns you the list sorted (since default is sort=True, you can 'split' the two lists:

# this is the syntax to get the first 100 elements
counter_no_stopwords[:100]

# or do it in one step with the code above
 counter_with_stopwords = word_count(texts, stopword=True)[:100]

now you have the top 100 for both cases and can compare them 


word_count() does a lot; it splits up your text by utilizing the tokenizer function, loops thru it and increments the counter for every word, and spits out the sorted list @Warrior 
you are a lifesaver!! thank you so much!!
Post comment
Post Comment
To load new posts: activated