import profileimage from "../../../../assets/images/card2/p.png";
import Title_image from "../../../../assets/images/card2/ttl.jpg";
import pandasimage from "../../../../assets/images/card2/cat.jpg";
import ProfilePublisher from "../../../../components/shared/profilefooter/profile";
const Discretization = () => {
  return (
    <div>
      <div className="blog_container">
        <h1 className="blog_container_title">DISCRETIZATION</h1>
        <img
          src={Title_image}
          alt="failed"
          className="blog_container_modelimage"
        />
        <p>
          Data discretization refers to converting a huge number of data values
          into smaller ones so that the evaluation and management of data become
          easy.
        </p>

        <p>
          Like Age into Age groups i.e. Child(0-14), Young(14-30),
          Mature(30-55)and old(&#62;55)
        </p>
        <p>and catogorizing bank accounts based on Limits as below</p>
        <figure>
          <img src={pandasimage} alt="Trulli" />
          <figcaption>Python(Pandas)-Binning .</figcaption>
        </figure>
        <p>
          Discretization is the process through which we can transform
          continuous variables, models, or functions into a discrete form. we do
          this by creating a set of contiguous intervals (Bins/Buckets) that go
          across the range of our desired variables, models, or functions
        </p>
        <p>
          <strong>Continuous data</strong> is Measured, while{" "}
          <b>Discrete data</b> is Counted.
        </p>
        <h3>Why Discretization is Important</h3>
        <p>
          As we know, infinite degrees of freedom mathematical problem poses
          with continuous data. For many purposes, data scientists need the
          implementation of discretization. It is also used to improve the
          signal-noise ratio.
        </p>
        <ol>
          <li>Fits the problem statement</li>
          <li>Interprets features</li>
          <li>Incompatible with models/methods</li>
          <li>Signal-to-Noise Ratio</li>
        </ol>
        <h1>Degree of freedom:</h1>
        <ul>
          <li>
            Degrees of freedom refer to the maximum number of logically
            independent values, which are values that have the freedom to vary,
            in the data sample.
          </li>
          <li>
            Degrees of freedom are calculated by subtracting one from the number
            of items within the data sample.
          </li>
        </ul>
        <h1>Approaches to Discretization</h1>
        <ul>
          Unsupervised:
          <li>Equal-Width</li>
          <li>Equal-Frequency</li>
          <li>K-Means</li>
        </ul>
        <ul>
          Supervised:
          <li>Decision Trees</li>
        </ul>
        <h2>Some Famous techniques of data discretization</h2>
        <h3>Histogram Analysis</h3>
        <p>
          A histogram refers to a plot used to represent the underlying
          frequency distribution of a continuous data set. Histogram assists in
          the data inspection for data distribution. For example, Outliers,
          skewness representation, normal distribution representation, etc.
        </p>
        <h3>Binning</h3>
        <p>
          Binning refers to a data smoothing technique that helps to group a
          huge number of continuous values into smaller values. For data
          discretization and the development of idea hierarchy, this technique
          can also be used.
        </p>
        <h3>Cluster Analysis</h3>
        <p>
          Cluster analysis is a form of data discretization. A clustering
          algorithm is executed by dividing the values of x numbers into
          clusters to isolate a computational feature of x.
        </p>
        <h3>Data discretization using decision tree Analysis</h3>
        <p>
          Data discretization refers to a decision tree analysis in which a
          top-down slicing technique is used. It is done through a supervised
          procedure. In a numeric attribute discretization, first, you need to
          select the attribute that has the least entropy, and then you need to
          run it with the help of a recursive process.{" "}
        </p>
        <h3>Data discretization using correlation Analysis</h3>
        <p>
          Discretizing data by linear regression technique, you can get the best
          neighboring interval, and then the large intervals are combined to
          develop a larger overlap to form the final 20 overlapping intervals.
          It is a supervised procedure.
        </p>
        <h3>Some of sklearn Techniques</h3>
        <p>from sklearn.preprocessing import KBinsDiscreti</p>
        <h3>Equal-Width Discretization</h3>
        <p className="modules">
          from feature_engine.discretisers import EqualFrequencyDiscretiserez
          discretizer = EqualFrequencyDiscretiser(q=10, variables = ['var1',
          'var2'])
        </p>
        <ul>
          <li>Equal Frequency does improve the value spread</li>
          <li>It can handle outliers</li>
          <li>Can be combined with categorical encoding</li>
        </ul>
        <h3>K-Means Discretization</h3>
        <p className="modules">
          discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal',
          strategy='kmeans')
        </p>
        <ul>
          <li>
            discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal',
            strategy='kmeans')
          </li>
          <li>It can handle outliers, however a centroid bias may exist.</li>
          <li>Can be combined with categorical encoding</li>
        </ul>
        <h3>Discretization with Decision Trees</h3>
        <p className="modules">
          <p>from sklearn.model_selection import train_test_spli</p>
        </p>
        <p>
          from feature_engine.discretisers import DecisionTreeDiscretiser
          treeDisc = DecisionTreeDiscretiser(cv=10, scoring='accuracy'
          variables=['var1', 'var2'], regression=False, param_grid=
          &#123;'max_depth': [1,2,3], 'min_samples_leaf':[10,4] &#x7D;){" "}
        </p>
        <ul>
          <li>Decision Tree does not improve the value spread.</li>
          <li>It can handle outliers well as trees are robust to outliers.</li>
          <li>Creates monotonic relationships.</li>
        </ul>
        <p>
          Thank you for reading all the way down here! I hope this article was
          helpful in your learning journey. Happy Learning!
        </p>
        {/* for profile card */}
        <ProfilePublisher
          image={profileimage}
          name={"Shekar Samurai"}
          role={"Data Scientist"}
        />
      </div>
    </div>
  );
};
export default Discretization;
