sklearn models Parameter tuning GridSearchCV

Dataframe:

id    review                                              name         label1     it is a great product for turning lights on.        Ashley       12     plays music and have a good sound.                  Alex         13     I love it, lots of fun.                             Peter        0

The aim is to classify the text; if the review is about the functionality of the product (e.g. turn the light on, music), label=1, otherwise label=0.

I am running several sklearn models to see which one works bests:

# Naïve Bayes:text_clf_nb = Pipeline([('tfidf', TfidfVectorizer()), ('clf', MultinomialNB())])# Linear Support Vectors Classifier:text_clf_lsvc = Pipeline([('tfidf', TfidfVectorizer()), ('clf', LinearSVC(loss='hinge',              penalty='l2', max_iter = 50))])# SGDClassifiertext_clf_sgd = Pipeline([('tfidf', TfidfVectorizer()), ('clf', SGDClassifier(loss='hinge', penalty='l2',alpha=1e-3,                                                    random_state=42,max_iter=50, tol=None))])#Random Foresttext_clf_rf = Pipeline([('tfidf', TfidfVectorizer()), ('clf', RandomForestClassifier())])#neural network MLPClassifiertext_clf_mlp = Pipeline([('tfidf', TfidfVectorizer()), ('clf', MLPClassifier())])

Problem: How to tune models using GridSearchCV? What I have so far:

from sklearn.model_selection import GridSearchCVparameters = {'vect__ngram_range': [(1, 1), (1, 2)],'tfidf__use_idf': (True, False),'clf__alpha': (1e-2, 1e-3) }gs_clf = GridSearchCV(text_clf_nb, param_grid= parameters, cv=2,  scoring='roc_auc', n_jobs=-1)gs_clf = gs_clf.fit((X_train, y_train))

This gives the following error on running gs_clf = gs_clf.fit((X_train, y_train)):

ValueError: Invalid parameter C for estimator Pipeline(memory=None,         steps=[('tfidf',                 TfidfVectorizer(analyzer='word', binary=False,                                 decode_error='strict',                                 dtype=<class 'numpy.float64'>,                                 encoding='utf-8', input='content',                                 lowercase=True, max_df=1.0, max_features=None,                                 min_df=1, ngram_range=(1, 1), norm='l2',                                 preprocessor=None, smooth_idf=True,                                 stop_words=None, strip_accents=None,                                 sublinear_tf=False,                                 token_pattern='(?u)\\b\\w\\w+\\b',                                 tokenizer=None, use_idf=True,                                 vocabulary=None)),                ('clf',                 MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))],         verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.

I would appreciate any suggestions. Thanks.

Latest Images

Trending Articles

Latest Images