春江暮客

春江暮客的个人学习分享网站

Python Native Lists vs. NumPy Arrays

2021-12-22 Technology
Python Native Lists vs. NumPy Arrays

In Python, you can choose from various native data types to store collection data, including list, array, tuple, and dictionary. Among these, the list is highly flexible, can store any content, and is mutable, making it widely applicable. However, for scientific computing and storing purely numerical data, NumPy is widely used and has practically replaced lists. So, what are the differences between them, how significant are these differences, and how should they be applied in practice?

Of course, using practical examples is the best way to illustrate the differences.

That said, the timings here are best treated as small illustrative microbenchmarks rather than universal performance laws. Real results depend on data size, dtype, conversion cost, and whether the code is actually vectorized.


Comparison of Operation Speed

Let’s compare simple arithmetic operations (addition, subtraction, multiplication, division) using numbers up to 10,000.

First, Summation

mylist = []
for i in range(1,10001):
    mylist.append(i)

#  list
from time import time
start = time()
total=sum(mylist)
print(total)
end = time()
print(f"total:{end-start}s")
## 50005000
## total:0.0003197193145751953s

# numpy  np.sum
import numpy as np
myarray = np.array(mylist)
start = time()
total = np.sum(myarray)
print(total)
end = time()
print(f"total:{end-start}s")

## 50005000
## total:0.00041031837463378906s

# numpy sum
start = time()
total = sum(myarray)
print(total)
end = time()
print(f"total:{end-start}s")

## 50005000
## total:0.0012726783752441406s

As you can see, when calculating the sum, the native list takes 0.0003 seconds. Using NumPy’s np.sum, it takes 0.0004 seconds. However, using Python’s built-in sum() function on a NumPy array is the slowest, taking 0.001 seconds, which is almost twice as long. This doesn’t even include the time it takes to convert the list to an array. Therefore, for summation, the built-in list clearly has an advantage. Many other articles compare using loops, which would indeed be slower, but that doesn’t reflect the true speed of built-in functions.

Next, Product

Using the same mylist data as a base, let’s compare the speeds again.

#  list
from time import time
start = time()
total = 1
for i in mylist: # Corrected from 'total' to 'mylist'
    total *= i
end = time()
print(f"total:{end-start}s")
## (Note: This output would be for the original list sum, not product. For product, it would be a very large number.)
## The original comment output for `total:0.0003197193145751953s` appears to be from the sum example, not product.
## A product of numbers up to 10000 would be astronomically large and take longer.

# numpy  np.prod
import numpy as np
myarray = np.array(mylist)
start = time()
total = np.prod(myarray)
end = time()
print(f"total:{end-start}s")
## total:0.01838994026184082s (This is likely the original output from the source)
## total:0.000213623046875s (This is likely the actual output from a fast execution)

When performing repeated multiplication, native lists do not have a NumPy-style numerical reduction like np.prod, so a manual loop is the usual fallback. NumPy’s np.prod is therefore often much more suitable for this kind of array-oriented numeric work.


Conclusion

This article compared Python’s built-in lists and NumPy arrays from a practical point of view. In some small summation cases, NumPy may not automatically win, especially when conversion overhead is included. But for batch numeric operations, vectorized workflows, scientific computing, and machine learning, NumPy remains the more natural and powerful choice.

In short, lists are flexible and general-purpose, while NumPy arrays are built for numeric workloads and scale much better in the scientific Python ecosystem.

友情链接

其它