Since the default addresses for pip and anaconda are very slow to access in China, adding domestic mirrors for acceleration is necessary.……
阅读全文
This article will guide you through a hands-on implementation of a powerful random forest machine learning model. It aims to complement my conceptual explanation of random forests, but as long as you have a basic understanding of decision trees and random forests, you can fully read it. Later, we will discuss how to improve the model built here.……
阅读全文
Today I will introduce to you various string segmentation methods that can be used in Python. They are……
阅读全文
‘Many domain enthusiasts scour forums and websites frantically searching for and snatching up suitable domains, even spending heavily to buy desired domains from their owners. International domain management bodies adopt a “first-to-apply, first-to-register, first-to-use” policy. Since domains only require a small annual registration fee, continuous registration grants you the right to use the domain. Because of this, many domain resellers (commonly known as “domaining pros”) often spend heavily on short, easy-to-remember domains. I used to think about buying shorter domains for building scraping sites, but unfortunately, both snatching and buying from others were very expensive. Since it"s first-come, first-served, we can also acquire good domains by registering them before the current owner forgets to renew.’……
阅读全文
This is an article analyzing a problem from the coding practice site LeeCode.……
阅读全文
In Venn diagrams of two sets, there can be two (or more) overlapping circles representing sets of different sizes, but the circles are the same size. Actually, the circles should be proportional to the size of the sets, and the overlapping area should also be proportional to the data overlap.……
阅读全文
The Gini coefficient and Lorenz curve are widely used to represent data inequality, especially wealth inequality. However, currently in Python, there isn’t a very good function to directly plot the Lorenz curve. Since the current project requires it, this article records how to use numpy, pandas, matplotlib, and other packages to calculate the Gini coefficient and plot the Lorenz curve for practical use.……
阅读全文
Bayesian theory provides a principled method for calculating conditional probabilities. With it, we can easily compute conditional probabilities for events where intuition often fails.……
阅读全文
Historical stock data is a very important kind of time series data, playing a significant role in data science. Let’s start learning how to handle time series data, preparing for future stock prediction and analysis.……
阅读全文
After being contained in China, the COVID-19 pandemic became increasingly severe worldwide. Countries and regions publish daily new infection and death data to help fight the pandemic globally.……
阅读全文
My previous article, “Getting Free Oracle Cloud Servers and Automating Deployment with Scripts,” explained how to use the CLI to acquire free Oracle machines. Oracle’s free tier offers a total of 100GB of disk space, but you can also mount an additional 20GB of object storage as a local file system. This is especially convenient for data migration, as the data can be easily mounted to another instance. This article will show you how to enable free Oracle Object Storage and mount it as a local drive on a Linux system.……
阅读全文
In Python, to check whether an array or tuple is empty, there are three methods: comparing with an empty list, checking the length, and using an if statement.……
阅读全文