You have not yet responded to the forum.

Here you will find the last 3 forum topics
you have posted a comment on.
0 | 0 | 0 | 0
0%
+ add shout
Lefu
LOTS of items added to MP, also check my wardrobe, Im selling everything
To join the forums you need to be logged in.

Click here to register your own account for free and I will personally explain to you how you can start getting your own fans and, making popdollars.
> Close
Helper
10 of the 24 stars earned

Forum

General < General
python users
Warrior
World Famous



 this is probably a long shot but I'm so fucked if I don't figure this out

so ...

I've taken a programming course this semester and I am way out of my depth here. However, I have to pass the class to continue my education. (nevermind that my field of study has nothing to do with programming, i just chose it cause I thought it would be 'fun'... I was wrong)

I have to 'compare the top 100 most frequent words in a  specific data set with and without stopwords'

I think i have all the key features basically given to me by the teacher - but I have no idea how to actually use them?? Like programmed correctly in my python file, but they do *nothing* when I run the program.  
I am pretty damn sure that I'm missing something very *obvious* but I've stared at it for hours and tried so many things, yet I don't get it.


This is the code:
ibb.co/rHj6HLf
ibb.co/MGWDk4J
ibb.co/bF4H6Yv
ICantWhistle
International Star



okay i had a quick look, idk how your .csv looks like but I believe you have to pass your texts over to word_count(), once with the parameter stopword set to True and once to False. like this:

# should contain a list with all the words present in texts and the amount of times they appeared in it like { 'Gemini': 5, 'astrology': 11, ..} (or differently formatted lol)
 counter_no_stopwords = word_count(texts, stopword=False) 

# same as above but this one includes all the stopwords
 counter_with_stopwords = word_count(texts, stopword=True) 

since you only want the top 100 words, and the function returns you the list sorted (since default is sort=True, you can 'split' the two lists:

# this is the syntax to get the first 100 elements
counter_no_stopwords[:100]

# or do it in one step with the code above
 counter_with_stopwords = word_count(texts, stopword=True)[:100]

now you have the top 100 for both cases and can compare them 


word_count() does a lot; it splits up your text by utilizing the tokenizer function, loops thru it and increments the counter for every word, and spits out the sorted list @Warrior 
Warrior
World Famous



ICantWhistle wrote:
okay i had a quick look, idk how your .csv looks like but I believe you have to pass your texts over to word_count(), once with the parameter stopword set to True and once to False. like this:

# should contain a list with all the words present in texts and the amount of times they appeared in it like { 'Gemini': 5, 'austrology': 11, ..} (or differently formatted lol)
 counter_no_stopwords = word_count(texts, stopword=False) 

# same as above but this one includes all the stopwords
 counter_with_stopwords = word_count(texts, stopword=True) 

since you only want the top 100 words, and the function returns you the list sorted (since default is sort=True, you can 'split' the two lists:

# this is the syntax to get the first 100 elements
counter_no_stopwords[:100]

# or do it in one step with the code above
 counter_with_stopwords = word_count(texts, stopword=True)[:100]

now you have the top 100 for both cases and can compare them 


word_count() does a lot; it splits up your text by utilizing the tokenizer function, loops thru it and increments the counter for every word, and spits out the sorted list @Warrior 
you are a lifesaver!! thank you so much!!
Post comment
Post Comment
To load new posts: activated