Preprocessing

[nltk_data] Downloading package stopwords to /home/roger/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

`process_corpora`[source]

process_corpora(data_path, isZip=True, save_file=False, save_path='', name='')

Process the corpora data for model training. Takes in corpora_train,corpora_test,max_len_sentences,save,path,name.

@param save_file (bool): Determine if the data should be saved on disk.

@param data_path (string): Path to the dataset to process

@param save_path (string): Path to where the processed dataset should be save to.

@param isZip: True if data is in a zipped file

@param name (string): Name of the model used to name files saved.

returns train_x, test_x, train_y, test_y

train, test, trY,trT = process_corpora("../data/small_dataset/",name="test")

Max. Sentence # words: 527
Mix. Sentence # words: 6

`vectorize_sentences`[source]

vectorize_sentences(sentences)

Input: List of strings to be vectorized Output: List of vectorized strings in same order as input