In [2]:
#1.LR
#2.Score
#3.Lasso
#4.Score
#5.Ridge
#6.Score
In [94]:
import pandas as pd
from sklearn.linear_model import LinearRegression ,Lasso,Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics  import r2_score
import matplotlib.pyplot as plt
In [95]:
df=pd.read_csv("patient_health_data.csv")
df.head()
Out[95]:
age bmi blood_pressure cholesterol glucose insulin heart_rate activity_level diet_quality smoking_status alcohol_intake health_risk_score
0 58 24.865215 122.347094 165.730375 149.289441 22.306844 75.866391 1.180237 7.675409 No 0.824123 150.547752
1 71 19.103168 136.852028 260.610781 158.584646 13.869817 69.481114 7.634622 8.933057 No 0.852910 160.320350
2 48 22.316562 137.592457 177.342582 178.760166 22.849816 69.386962 7.917398 3.501119 Yes 4.740542 187.487398
3 34 22.196893 153.164775 234.594764 136.351714 15.140336 95.348387 3.192910 2.745585 No 2.226231 148.773138
4 62 29.837173 92.768973 276.106498 158.753516 17.228576 77.680975 7.044026 8.918348 No 3.944011 170.609655
In [96]:
df["smoking_status"]=df["smoking_status"].replace({"No":0,"Yes":1})
C:\Users\dell\AppData\Local\Temp\ipykernel_8168\2923759407.py:1: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df["smoking_status"]=df["smoking_status"].replace({"No":0,"Yes":1})
In [97]:
df["smoking_status"]
Out[97]:
0      0
1      0
2      1
3      0
4      0
      ..
245    0
246    0
247    0
248    0
249    1
Name: smoking_status, Length: 250, dtype: int64
In [98]:
df[["smoking_status"]].dtypes
Out[98]:
smoking_status    int64
dtype: object
In [99]:
X=df.drop(columns=["health_risk_score"])
y=df[["health_risk_score"]]
In [100]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
In [101]:
lm=LinearRegression()
In [102]:
lm.fit(X_train,y_train)
Out[102]:
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
fit_intercept fit_intercept: bool, default=True

Whether to calculate the intercept for this model. If set
to False, no intercept will be used in calculations
(i.e. data is expected to be centered).
True
copy_X copy_X: bool, default=True

If True, X will be copied; else, it may be overwritten.
True
tol tol: float, default=1e-6

The precision of the solution (`coef_`) is determined by `tol` which
specifies a different convergence criterion for the `lsqr` solver.
`tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when
fitting on sparse training data. This parameter has no effect when fitting
on dense data.

.. versionadded:: 1.7
1e-06
n_jobs n_jobs: int, default=None

The number of jobs to use for the computation. This will only provide
speedup in case of sufficiently large problems, that is if firstly
`n_targets > 1` and secondly `X` is sparse or if `positive` is set
to `True`. ``None`` means 1 unless in a
:obj:`joblib.parallel_backend` context. ``-1`` means using all
processors. See :term:`Glossary ` for more details.
None
positive positive: bool, default=False

When set to ``True``, forces the coefficients to be positive. This
option is only supported for dense arrays.

For a comparison between a linear regression model with positive constraints
on the regression coefficients and a linear regression without such constraints,
see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`.

.. versionadded:: 0.24
False
In [103]:
y_test.head()
Out[103]:
health_risk_score
142 142.378299
6 180.270293
97 124.210564
60 105.188048
112 128.443865
In [104]:
lm.predict(X_test.head())
Out[104]:
array([[157.87195261],
       [172.62188521],
       [127.52072532],
       [111.53729062],
       [134.27687977]])
In [105]:
y_test.head().values.flatten() - lm.predict(X_test.head()).flatten()
Out[105]:
array([-15.49365311,   7.64840809,  -3.31016102,  -6.34924232,
        -5.83301507])
In [106]:
lm.score(X_test,y_test)
Out[106]:
0.7552011116606276
In [124]:
# Predictions
y_predict = lm.predict(X_test.head())

# Agar y_test bhi available hai
y_actual = y_test.head()

plt.figure()
plt.plot(y_actual.values)
plt.plot(y_predict)

plt.xlabel("Observations")
plt.ylabel("Target Value")
plt.title("Actual vs Predicted Values")
plt.show()
No description has been provided for this image
In [108]:
lsm = Lasso()
In [109]:
lsm.fit(X_train,y_train)
Out[109]:
Lasso()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
alpha alpha: float, default=1.0

Constant that multiplies the L1 term, controlling regularization
strength. `alpha` must be a non-negative float i.e. in `[0, inf)`.

When `alpha = 0`, the objective is equivalent to ordinary least
squares, solved by the :class:`LinearRegression` object. For numerical
reasons, using `alpha = 0` with the `Lasso` object is not advised.
Instead, you should use the :class:`LinearRegression` object.
1.0
fit_intercept fit_intercept: bool, default=True

Whether to calculate the intercept for this model. If set
to False, no intercept will be used in calculations
(i.e. data is expected to be centered).
True
precompute precompute: bool or array-like of shape (n_features, n_features), default=False

Whether to use a precomputed Gram matrix to speed up
calculations. The Gram matrix can also be passed as argument.
For sparse input this option is always ``False`` to preserve sparsity.
False
copy_X copy_X: bool, default=True

If ``True``, X will be copied; else, it may be overwritten.
True
max_iter max_iter: int, default=1000

The maximum number of iterations.
1000
tol tol: float, default=1e-4

The tolerance for the optimization: if the updates are smaller or equal to
``tol``, the optimization code checks the dual gap for optimality and continues
until it is smaller or equal to ``tol``, see Notes below.
0.0001
warm_start warm_start: bool, default=False

When set to ``True``, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
See :term:`the Glossary `.
False
positive positive: bool, default=False

When set to ``True``, forces the coefficients to be positive.
False
random_state random_state: int, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random
feature to update. Used when ``selection`` == 'random'.
Pass an int for reproducible output across multiple function calls.
See :term:`Glossary `.
None
selection selection: {'cyclic', 'random'}, default='cyclic'

If set to 'random', a random coefficient is updated every iteration
rather than looping over features sequentially by default. This
(setting to 'random') often leads to significantly faster convergence
especially when tol is higher than 1e-4.
'cyclic'
In [110]:
y_test.head()
Out[110]:
health_risk_score
142 142.378299
6 180.270293
97 124.210564
60 105.188048
112 128.443865
In [111]:
lsm.predict(X_test.head())
Out[111]:
array([160.11053281, 172.29794385, 128.91242291, 111.1273273 ,
       132.56074787])
In [112]:
y_test.head().values.flatten() - lsm.predict(X_test.head()).flatten()
Out[112]:
array([-17.73223331,   7.97234945,  -4.70185861,  -5.939279  ,
        -4.11688317])
In [113]:
abs(y_test.values.flatten() - lsm.predict(X_test).flatten()).mean()
Out[113]:
np.float64(8.49163379591994)
In [114]:
lsm.score(X_test,y_test)
Out[114]:
0.7711217595735915
In [122]:
# Predictions
y_predict = lsm.predict(X_test.head())

# Agar y_test bhi available hai
y_actual = y_test.head()

plt.figure()
plt.plot(y_actual.values)
plt.plot(y_predict)

plt.xlabel("Observations")
plt.ylabel("Target Value")
plt.title("Actual vs Predicted Values")
plt.show()
No description has been provided for this image
In [115]:
lsm.score(X_train, y_train)
Out[115]:
0.8661955368140437
In [116]:
rid=Ridge()
In [117]:
rid.fit(X_train,y_train)
Out[117]:
Ridge()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
alpha alpha: {float, ndarray of shape (n_targets,)}, default=1.0

Constant that multiplies the L2 term, controlling regularization
strength. `alpha` must be a non-negative float i.e. in `[0, inf)`.

When `alpha = 0`, the objective is equivalent to ordinary least
squares, solved by the :class:`LinearRegression` object. For numerical
reasons, using `alpha = 0` with the `Ridge` object is not advised.
Instead, you should use the :class:`LinearRegression` object.

If an array is passed, penalties are assumed to be specific to the
targets. Hence they must correspond in number.
1.0
fit_intercept fit_intercept: bool, default=True

Whether to fit the intercept for this model. If set
to false, no intercept will be used in calculations
(i.e. ``X`` and ``y`` are expected to be centered).
True
copy_X copy_X: bool, default=True

If True, X will be copied; else, it may be overwritten.
True
max_iter max_iter: int, default=None

Maximum number of iterations for conjugate gradient solver.
For 'sparse_cg' and 'lsqr' solvers, the default value is determined
by scipy.sparse.linalg. For 'sag' solver, the default value is 1000.
For 'lbfgs' solver, the default value is 15000.
None
tol tol: float, default=1e-4

The precision of the solution (`coef_`) is determined by `tol` which
specifies a different convergence criterion for each solver:

- 'svd': `tol` has no impact.

- 'cholesky': `tol` has no impact.

- 'sparse_cg': norm of residuals smaller than `tol`.

- 'lsqr': `tol` is set as atol and btol of scipy.sparse.linalg.lsqr,
which control the norm of the residual vector in terms of the norms of
matrix and coefficients.

- 'sag' and 'saga': relative change of coef smaller than `tol`.

- 'lbfgs': maximum of the absolute (projected) gradient=max|residuals|
smaller than `tol`.

.. versionchanged:: 1.2
Default value changed from 1e-3 to 1e-4 for consistency with other linear
models.
0.0001
solver solver: {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto'

Solver to use in the computational routines:

- 'auto' chooses the solver automatically based on the type of data.

- 'svd' uses a Singular Value Decomposition of X to compute the Ridge
coefficients. It is the most stable solver, in particular more stable
for singular matrices than 'cholesky' at the cost of being slower.

- 'cholesky' uses the standard :func:`scipy.linalg.solve` function to
obtain a closed-form solution.

- 'sparse_cg' uses the conjugate gradient solver as found in
:func:`scipy.sparse.linalg.cg`. As an iterative algorithm, this solver is
more appropriate than 'cholesky' for large-scale data
(possibility to set `tol` and `max_iter`).

- 'lsqr' uses the dedicated regularized least-squares routine
:func:`scipy.sparse.linalg.lsqr`. It is the fastest and uses an iterative
procedure.

- 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses
its improved, unbiased version named SAGA. Both methods also use an
iterative procedure, and are often faster than other solvers when
both n_samples and n_features are large. Note that 'sag' and
'saga' fast convergence is only guaranteed on features with
approximately the same scale. You can preprocess the data with a
scaler from :mod:`sklearn.preprocessing`.

- 'lbfgs' uses L-BFGS-B algorithm implemented in
:func:`scipy.optimize.minimize`. It can be used only when `positive`
is True.

All solvers except 'svd' support both dense and sparse data. However, only
'lsqr', 'sag', 'sparse_cg', and 'lbfgs' support sparse input when
`fit_intercept` is True.

.. versionadded:: 0.17
Stochastic Average Gradient descent solver.
.. versionadded:: 0.19
SAGA solver.
'auto'
positive positive: bool, default=False

When set to ``True``, forces the coefficients to be positive.
Only 'lbfgs' solver is supported in this case.
False
random_state random_state: int, RandomState instance, default=None

Used when ``solver`` == 'sag' or 'saga' to shuffle the data.
See :term:`Glossary ` for details.

.. versionadded:: 0.17
`random_state` to support Stochastic Average Gradient.
None
In [118]:
y_test.head()
Out[118]:
health_risk_score
142 142.378299
6 180.270293
97 124.210564
60 105.188048
112 128.443865
In [73]:
rid.predict(X_test.head())
Out[73]:
array([157.87616582, 172.61576099, 127.52236415, 111.5386635 ,
       134.26888828])
In [123]:
# Predictions
y_predict = rid.predict(X_test.head())

# Agar y_test bhi available hai
y_actual = y_test.head()

plt.figure()
plt.plot(y_actual.values)
plt.plot(y_predict)

plt.xlabel("Observations")
plt.ylabel("Target Value")
plt.title("Actual vs Predicted Values")
plt.show()
No description has been provided for this image
In [74]:
rid.score(X_train, y_train)
Out[74]:
0.8679267298654003
In [121]:
rid.score(X_test,y_test)
Out[121]:
0.7552806598840451
In [ ]: