Bringing Numpy's Type-Completeness Score to Nearly 90%

Posted3 months agoActive2 months ago

todsacerdoti

59 points

34 comments

pyrefly.orgTechstory

calmmixed

Debate

60/100

Python TypingNumpyType Annotations

Key topics

Python Typing

Numpy

Type Annotations

The article discusses the efforts to improve NumPy's type completeness, with the community sharing their experiences and challenges with Python's typing system, particularly with scientific data structures.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

Days 9-10

Avg / period

Comment distribution34 data points

Loading chart...

Based on 34 loaded comments

Key moments

01Story posted
Oct 7, 2025 at 5:32 AM EDT
3 months ago
Step 01
02First comment
Oct 15, 2025 at 6:46 PM EDT
9d after posting
Step 02
03Peak activity
33 comments in Days 9-10
Hottest window of the conversation
Step 03
04Latest activity
Oct 29, 2025 at 6:49 AM EDT
2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (34 comments)

Showing 34 comments

tecoholic

3 months ago

4 replies

> That's it! CanIndex was an unknown symbol and was probably mistyped, and replacing it with the correct SupportsIndex one brought NumPy's overall type-completeness to over 80%!

Let’s take a pause here for a second - the ‘CanIndex’ and ‘SupportsIndex’ from the looks are just “int”. I have a hard time dealing with these custom types because they are so obscure.

Does anyone find these useful? How should I be reading them? I understand custom types that package data like classes. But feel completely confused for this yadayava is string, floorpglooep is int style types.

CaliforniaKarl

3 months ago

1 reply

> Let’s take a pause here for a second - the ‘CanIndex’ and ‘SupportsIndex’ from the looks are just “int”.

The PR for the change is https://github.com/numpy/numpy/pull/28913 - The details of files changed[0] shows the change was made in 'numpy/__init__.pyi'. Looking at the whole file[1] shows SupportsIndex is being imported from the standard library's typing module[2].

Where are you seeing SupportsIndex being defined as an int?

> I have a hard time dealing with these custom types because they are so obscure.

SupportsIndex is obscure, I agree, but it's not a custom type. It's defined in stdlib's typing module[2], and was added in Python 3.8.

[0]: https://github.com/numpy/numpy/pull/28913/files

[1]: https://github.com/charris/numpy/blob/c906f847f8ebfe0adec896...

[2]: https://docs.python.org/3/library/typing.html#typing.Support...

tecoholic

3 months ago

1 reply

Thanks for pointing those out. Your comment and the others who responded tell me I don’t typically work with complex software. I am a web dev who also had to wrangle data at times, the data part is where I face the issues.

I looked at the function in the article taking in an int and I stopped there. And then I read the docs for SupportsIndex, I just can’t think of an occasion I would have used it.

ameliaquining

3 months ago

1 reply

SupportsIndex makes it so that, if you have a value that's not a Python int but can be implicitly and losslessly converted to one, you can use it to index into or slice a NumPy ndarray, without first explicitly converting it to a Python int. There are two particular data types that it's especially useful to be able to do this with: NumPy's fixed-width integer types (analogous to the ones in C, exposed because numeric calculations on ndarrays of them are often much faster than on Python's arbitrary-precision int), and zero-dimensional ndarrays (which can't be indexed into or sliced, and are morally equivalent to a single scalar). Programmers often wind up with one of these as the output of some operation, think of them as ints, and don't know or care that they're actually a different data type, so NumPy tries to make it so that you can treat them like ints and get the same result, this being the point of a dynamic language like Python. Using SupportsIndex instead of int as an argument type ensures that type checkers won't reject code that does this.

This is admittedly fairly arcane, but only people delving into the guts of NumPy or the Python type system are expected to have to deal with it. The library code is complicated so that your code can be simple: you can just write NumPy code the way you're used to, and the type checker will accept it, without you having to know or care about the above details. That's the goal, anyway.

parhamn

3 months ago

> SupportsIndex makes it so that, if you have a value that's not a Python int but can be implicitly and losslessly converted to one

Just to be clear for folks who care more about the general python around this: any class that has a method __index__ that returns an int can be used as an index for array access. e.g:

    class FetchIndex:
        def __index__(self):
            return int(requests.get("https://indextolookup.com").text.strip())
            

    a = ["no", "lol", "wow"]
    print(a[FetchIndex()])

The builtin `int` also has a __index__ method that returns itself.

Dunders are part of what makes typing in python so hard.

SAI_Peregrinus

3 months ago

1 reply

Think of them like units of measurement. `°C` is a type, distinct from `°F`, and also distinct from `kilometer`, but all those are just `float` in Python's numeric types. You want separate types because they're not interchangeable. Sometimes two types are subtypes of the same parent type, `°C` and `°F` are subtypes of some generic `temperature` type, `kilometer` is a subtype of `distance`, etc. Types with the same supertype can be converted between one another, but unrelated types can't be. There's no meaningful way to convert `50.3km` into `°C`, even though both are just `float`s.

tecoholic

2 months ago

Thank you. I know it's been almost 2 weeks since you posted this comment. But I got to it only today and this makes a lot of sense and helps understand the type system a bit more. Thank you.

masspro

3 months ago

* “which of the 3 big data structures in this part of the program/function/etc is this int/string key an index into?”

* some arithmetic/geometry problems for example 2D layout where there are several different things that are “an integer number of pixels” but mean wildly different things

In either case it can help pick apart dense code or help stay on track while writing it. It can instead become anti-helpful by causing distraction or increasing the friction to make changes.

jkingsman

3 months ago

I find it especially helpful during refactors — understanding the significance/meaning of the variable and what its values can be (and are constrained to) and not just the type is great, but if typing is complete, I can also go and see everywhere that type can be ingested, knowing every surface that uses it and that might need to be refactored.

strbean

3 months ago

1 reply

Typing related NumPy whine:

The type annotations say `float32`, `float64`, etc. are type aliases for `Floating[32]` and `Floating[64]`. In reality, they are concrete classes.

This means that `isinstance(foo, float32)` works as expected, but pisses off static analysis tools.

refactor_master

3 months ago

I hate when libraries do that, or create a `def ClassLike`.

I’ve found the library `beartype` to be very good for runtime checks built into partially annotated `Annotated` types with custom validators, so instead of instance checks you rely directly on signatures and typeguards.

jofer

3 months ago

7 replies

One of my biggest gripes around typing in python actually revolves around things like numpy arrays and other scientific data structures. Typing in python is great if you're only using builtins or things that the typing system was designed for. But it wasn't designed with scientific data structures particularly in mind. Being able to denote dtype (e.g. uint8 array vs int array) is certainly helpful, but only one aspect.

There's not a good way to say "Expects a 3D array-like" (i.e. something convertible into an array with at least 3 dimensions). Similarly, things like "At least 2 dimensional" or similar just aren't expressible in the type system and potentially could be. You wind up relying on docstrings. Personally, I think typing in docstrings is great. At least for me, IDE (vim) hinting/autocompletion/etc all work already with standard docstrings and strictly typed interpreters are a completely moot point for most scientific computing. What happens in practice is that you have the real info in the docstring and a type "stub" for typing. However, at the point that all of the relevant information about the expected type is going to have to be the docstring, is the additional typing really adding anything?

In short, I'd love to see the ability to indicate expected dimensionality or dimensionality of operation in typing of numpy arrays.

But with that said, I worry that typing for these use cases adds relatively little functionality at the significant expense of readability.

efavdb

3 months ago

1 reply

Would a custom decorator work for you?

jofer

3 months ago

1 reply

Unless I'm missing something entirely, what would that add? You still can't express the core information you need in the type system.

efavdb

3 months ago

1 reply

I meant only that you can insist a parameter has some quality when passed.

jofer

3 months ago

1 reply

Good point, but I think we're talking past each other a bit.

Typing in python within the scientific world isn't ever used to check types. It's _strictly_ only documentation.

Yes, MyPy and whatnot exist, but not meaningfully. You literally can't use them for anything in this domain (they wont' run any of the code in question).

Types (in this subset of python) are 100% about documentation, 0% about enforcement.

We're setting up a _documentation_ system that can't express the core things it needs to. That worries me. Setting up type _checks_ is a completely different thing and not at all the goal.

efavdb

3 months ago

I see makes sense. Thanks.

dwohnitmok

3 months ago

1 reply

This isn't static, but jaxtyping gives you at least runtime checks and also a standardized form of documenting those types. https://github.com/patrick-kidger/jaxtyping

jofer

3 months ago

2 replies

It actually doesn't, as far as I know :) It does get close, though. I should give it a deeper look than I have previously, though.

"array-like" has real meaning in the python world and lots of things operate in that world. A very common need in libraries is indicating that things expect something that's either a numpy array or a subclass of one or something that's _convertible_ into a numpy array. That last part is key. E.g. nested lists. Or something with the __array__ interface.

In addition to dimensionality that part doesn't translate well.

And regardless, if the type representation is not standardized across multiple libraries (i.e. in core numpy), there's little value to it.

dwohnitmok

3 months ago

1 reply

Does it not?

E.g. `UInt8[ArrayLike, "... a b"]` means "an array-like of uint8s that has at least two dimensions". You are opting into jaxtyping's definition of an "array-like", but even though the general concept as you point out is wide spread, there isn't really a single agreed upon formal definition of array-like.

Alternatively, even more loosely as anything that is vaguely container-shaped, `UInt8[Any, "... a b"]`.

jofer

3 months ago

Ah, fair enough! I think I misread some things around the library initially awhile back and have been making incorrect assumptions about it for awhile!

nerdponx

3 months ago

I wonder if we should standardize on __array__ like how Iterable is standardized on the presence of __iter__, which can just return self if the Iterable is already an Iterator.

HPsquared

3 months ago

2 replies

Have you looked at nptyping? Type hints for ndarray.

https://github.com/ramonhagenaars/nptyping/blob/master/USERD...

jofer

3 months ago

1 reply

That one's new to me. Thanks! (With that said, I worry that 3rd party libs are a bad place for types for numpy.)

nerdponx

3 months ago

1 reply

Numpy ships built-in type hints as well as a type for hinting arrays in your own code (numpy.typing.NDArray).

The real challenge is denoting what you can accept as input. `NDArray[np.floating] | pd.Series[float] | float` is a start but doesn't cover everything especially if you are a library author trying to provide a good type-hinted API.

jofer

3 months ago

I'm aware of that. As explained in the original comment, not being able to denote dimensionality in those is a major limitation.

kccqzy

3 months ago

What is this project actually for?

Its FAQ states:

> Can MyPy do the instance checking? Because of the dynamic nature of numpy and pandas, this is currently not possible. The checking done by MyPy is limited to detecting whether or not a numpy or pandas type is provided when that is hinted. There are no static checks on shapes, structures or types.

So this is equivalent to not using this library and making all such types np.ndarray/np.dtype etc then.

So we expend effort to coming up with a type system for numpy, and tools cannot statically check types? What good are types if they aren't checked? Just a more concise documentation for humans?

stared

3 months ago

Some time ago I created a demo of named dimensions for Pytorch, https://github.com/stared/pytorch-named-dims

In the same line, I would love to have more Pandas-Pydantic interoperability at the type level.

relatall

3 months ago

wisdom in the community. people who can excel at things don’t go to big tech companies because they won’t be appreciated , they will get pipped if they can’t wear any of those chains shackles well

hamasho

3 months ago

I also had a very hard time to annotate types in python few years ago. A lot of popular python libraries like pandas, SQLAlchemy, django, and requests, are so flexible it's almost impossible to infer types automatically without parsing the entire code base. I tried several libraries for typing, often created by other people and not maintained well, but after painful experience it was clear our development was much faster without them while the type safety was not improved much at all.

davnn

3 months ago

I am using runtime type and shape checking and wrote a tiny library to merge both into a single typecheck decorator [1]. It‘s not perfect, but I haven‘t found a better approach yet.

[1] https://github.com/davnn/safecheck

t1129437123

3 months ago

1 reply

"ArrayLike" is not a type. The entire Python "type system" is really hackish compared to Julia or typed array languages.

Does this "type system" ensure that you can't get a runtime type error? I don't think so.

jofer

3 months ago

Yes, python is not statically typed. It shouldn't be. Don't expect static typing behavior and typing in python is _not_ static typing in any way. It's documentation only, not static typing.

CaliforniaKarl

3 months ago

I've recently been writing a Python SDK for an internal API that's maintained by a different group. I've been making sure to include typing for everything in the SDK, as well as attempting to maximize unit test coverage (using mocked API responses).

It's a heck of a lot of work (easily as much work as writing the actual code!), but typing has already paid dividends as I've been starting to use the SDK for one-off scripts.

View full discussion on Hacker News

ID: 45501088Type: storyLast synced: 11/20/2025, 7:45:36 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN