🦲
mind
  • cleber's mind
  • plans
    • fact checking
    • personal assistant
  • self-management
    • agenda and tasks
    • mails and emails
  • knowledge
    • career
    • programming
      • Big O Notation
      • browsing data
      • C/C++
      • databases
      • eclipse
      • gradle
      • java
        • apache camel
      • javascript
      • naming convention
      • mysql
      • prolog
      • regex
      • REST
      • ssl/tls
      • version control
        • git commands
      • web-semantics
      • wot
    • research
      • mas
      • planning
      • math
        • probability
      • machine learning
      • nlp
      • speech recognition
      • data sources
      • data visualisation
    • it
      • asterisk
      • containers
        • installing docker
        • deploying busybox
        • deploying a sample
        • deploying node-red
        • deploying from the hub
      • clusters
        • installing kubernetes
        • deploying bootcamp
        • deploying nginx
        • deploying jacamo-rest
        • grafana
      • deploy
        • heroku
      • linux
        • tmux
        • vim
        • startup
      • networks
      • pdf
        • ssh
    • productivity
    • language
      • expressions
        • nice signposts
        • linking expressions
      • latex
      • scientific
      • writing
    • sailing
    • financial
      • assets
    • emergency
    • out of boxes
  • teaching
    • eletrônica digital
      • Conversão decimal para binário
      • Conversão binário para decimal
      • Sinais analógicos vs digitais
    • programação I
    • cabeamento estruturado
  • moments
    • insightful ai facts
    • ai4industry-hackathon
    • previous activities
  • brasil
  • external links
    • personal webpage
    • my github
Powered by GitBook
On this page
  • Understanding data science
  • Working with a bit larger dataset
  • Diving deeper on data science
  • Some other references
  • Python
  • Statistical databases
  • Famous quotes

Was this helpful?

  1. knowledge
  2. research

machine learning

machine learning materials

PreviousprobabilityNextnlp

Last updated 5 years ago

Was this helpful?

Here I am writing a collection of notes I have about data science and machine learning. I am starting with the approach used by in a (in Portuguese).

Understanding data science

A good way to understand is by a simple example. Let us say we are observing some animals in a farm that has pigs and dogs. In this observation, let us say we can easily check three features of any animal: if it has long fur, has short leg and says woof. We will use zeros and ones to describe each feature. In supervised learning we have to give a set of samples to the machine and also give the right answer, in our case, we have to say if each of the samples is a pig or a dog.

In , we can see the very basic approach for this problem which is a classification problem, since we want to tell if an animal is of the category of dogs or of pigs. Our dataset has six animals, three are pigs and three are dogs. Only one of the observed pigs has long fur, all of them has short legs and one was observed woofing:pigs = [[0,1,0],[0,1,1],[1,1,0]]. In case of dogs, one of has not long, one of them has not short legs and all of said woof: dogs = [[0,1,1],[1,0,1],[1,1,1]]. In the given notebook we have trained our machine and asked to predict which animal a [1,1,1] should be.

In the , we have started with same dataset and training the machine. But we gave a small dataset of new observations in which we know the right category for each sample. The objective was to calculate the accuracy of our model.

With this simple application we can see the potential of machine learning. In this case, for a simple categorization between two species of animals. But it could be for any animal which is sometimes even for human hard to tell the exactly specie or could be anything else like to tell if an email is potentially risky or not, if the patient has some disease of not or if a customer is likely to buy or not.

Working with a bit larger dataset

In this we are using pandas to retrieve a dataset from a gist file. The data is in CSV format. It regards to a set of records simulating customers that have visited a website. It joins information of each page the customer has visited and if this customer has bought a product or not. The idea is to build a model to predict from the pages a user visit if he or she is likely to buy or product or not.

It is not too different from the example of pigs and dogs, we just have a larger dataset. The point of this example is to show that we need to split of dataset into train and test in order to calculate its accuracy.

Diving deeper on data science

The of this series is about data visualization.

This shows a more comprehensive example. It regards to a dataset of a survey done with students of a hypothetical University. The resolution using and is available. has more data exploitation.

I did a few coding by my own based in the inspiring works published at . Here a comparing trajectories of covid-19 cases and deaths by Brazilian state using Brazilian Ministry of Health original data. Unfortunately I have abandoned this version because I did not find a way to do automated updates. In order to see updated data it is necessary to fork the repository and update the file arquivo_geral.csv manually. This is the same comparison of trajectories of covid-19 cases and deaths by Brazilian states with some extra data adding mean, median and total. It uses a permanent link to get new data provided by which should work for a while. Still, there is a using linear scale just for new cases. It is "almost" using fastpages as suggested in documentation, but it does not bring any relevant contribution yet. I was planning to show balloons or some special mark to tell the exactly data that some government action was taken, trying to check some causality relation. Another nice thing to do, which should be easy, is to make regions using different colour pallets.

Some other references

Python

Statistical databases

Famous quotes

: my

Guilherme Silveira
short course
this notebook
next notebook
another notebook
final notebook
list of exercises
python
R
Another list
covid-19 dashboards
previous study
latest version
brasil.io
second study
contributing
Understanding neural networks with TensorFlow Playground
15 Steps to Implement a Neural Net
How to select the Right Evaluation Metric for Machine Learning Models: Part 1 Regression Metrics
Getting started with Machine learning on Linux with Python 3 and Scikit-learn
Build a gender classifier in Python using Scikit-learn
repl.it
United nations databases
Base de imagens mamográficas LAPIMO EESC/USP
University of South Florida Digital Mammography
Top 10 Artificial Intelligence Quotes That Will Inspire You