Crimes with Python&#x27;s Pattern Matching (2022)

5 months ago

3 replies

I've never understood why Python's pattern-matching isn't more general.

First, "case foo.bar" is a value match, but "case foo" is a name capture. Python could have defined "case .foo" to mean "look up foo as a variable the normal way" with zero ambiguity, but chose not to.

Second, there's no need to special-case some builtin types as matching whole values. You can write "case float(m): print(m)" and print the float that matched, but you can't write "case MyObject(obj): print(obj)" and print your object. Python could allow "..." or "None" or something in __match_args__ to mean "the whole object", but didn't.

Aefiam

5 months ago

2 replies

case .foo is explicitly mentioned in https://peps.python.org/pep-0622/ :

> While potentially useful, it introduces strange-looking new syntax without making the pattern syntax any more expressive. Indeed, named constants can be made to work with the existing rules by converting them to Enum types, or enclosing them in their own namespace (considered by the authors to be one honking great idea)[...] If needed, the leading-dot rule (or a similar variant) could be added back later with no backward-compatibility issues.

second: you can use case MyObject() as obj: print(obj)

zahlman

5 months ago

1 reply

I don't think I've written a match-case yet. Aside from not having a lot of use cases for it personally, I find that it's very strange-feeling syntax. It tries too hard to look right, with the consequence that it's sometimes quite hard to reason about.

PokestarFan

5 months ago

I just use match case as a traditional switch statement.

5 months ago

2 replies

> > While potentially useful, it introduces strange-looking new syntax without making the pattern syntax any more expressive. Indeed, named constants can be made to work with the existing rules by converting them to Enum types, or enclosing them in their own namespace (considered by the authors to be one honking great idea)[...]

Yeah, and I don't buy that for a microsecond.

A leading dot is not "strange" syntax: it mirrors relative imports. There's no workaround because it lets you use variables the same way you use them in any other part of the language. Having to distort your program by adding namespaces that exist only to work around an artificial pattern matching limitation is a bug, not a feature.

Also, it takes a lot of chutzpah for this PEP author to call a leading dot strange when his match/case introduces something that looks lexically like constructor invocation but is anything but.

The "as" thing works with primitive too, so why do we need int(m)? Either get rid of the syntax or make it general. Don't hard-code support for half a dozen stdlib types for some reason and make it impossible for user code to do the equivalent.

The Python pattern matching API is full of most stdlib antipatterns:

* It's irregular: matching prohibits things that the shape of the feature would suggest are possible because the PEP authors couldn't personally see a specific use case for those things. (What's the deal with prohibiting multiple _ but allowing as many __ as you want?)

* It privileges stdlib, as I mentioned above. Language features should not grant the standard library powers it doesn't extend to user code.

* The syntax feels bolted on. I get trying to reduce parser complexity and tool breakage by making pattern matching look like object construction, but it isn't, and the false cognate thing confuses every single person who tries to read a Python program. They could have used := or some other new syntax, but didn't, probably because of the need to build "consensus"

* The whole damn thing should have been an expression, like the if/then/else ternary, not a statement useless outside many lexical contexts in which one might want to make a decision. Why is it a statement? Probably because the PEP author didn't _personally_ have a need to pattern match in expression context.

And look: you can justify any of these technical decisions. You can a way to justify anything you might want to do. The end result, however, is a language facility that feels more cumbersome than it should and is applicable to fewer places than one might think.

Here's how to do it right: https://www.gnu.org/software/emacs/manual/html_node/elisp/pc...

> If needed, the leading-dot rule (or a similar variant) could be added back later with no backward-compatibility issues.

So what, after another decade of debate, consensus, and compromise, we'll end up with a .-prefix-rule but one that works only if the character after the dot is a lowercase letter that isn't a vowel.

PEP: "We decided not to do this because inspection of real-life potential use cases showed that in vast majority of cases destructuring is related to an if condition. Also many of those are grouped in a series of exclusive choices."

I find this philosophical stance off-putting. It's a good thing when users find ways to use your tools in ways you didn't imagine.

PEP: In most other languages pattern matching is represented by an expression, not statement. But making it an expression would be inconsistent with other syntactic choices in Python. All decision making logic is expressed almost exclusively in statements, so we decided to not deviate from this.

We've had conditional expressions for a long time.

Jtsummers

5 months ago

1 reply

> (What's the deal with prohibiting multiple _ but allowing as many __ as you want?)

What do you mean "prohibiting multiple _"? As in this pattern:

  match [1,2]:
    case [_, _]: print("A list of two items")

That works fine.

https://semgrep.dev/playground/new

5 months ago

1 reply

> An irrefutable case block is a match-all case block. A match statement may have at most one irrefutable case block, and it must be last.

There is no reason to have this restriction except that some people as a matter of opinion think unreachable code is bad taste and the language grammar should make bad taste impossible to express. It's often useful to introduce such things as a temporary state during editing. For example,

    def foo(x):
        match x:
            case _:
                log.warning("XXX disabled for debugging")
                return PLACEHOLDER
            case int():
                return bar()
            case str():
                return qux()
            case _:
                return "default"

Why should my temporary match-all be a SyntaxError???? Maybe it's a bug. Maybe my tools should warn me about it. But the language itself shouldn't enforce restrictions rooted in good taste instead of technical necessity.

I can, however, write this:

    def foo(x):
        match x:
            case _ if True:
                log.warning("XXX disabled for debugging")
                return PLACEHOLDER
            case int():
                return bar()
            case str():
                return qux()
            case _:
                return "default"

Adding a dummy guard is a ridiculous workaround for a problem that shouldn't exist in the first place.

Jtsummers

5 months ago

I don't disagree, it should be a warning but not an error. Thanks for clarifying, your original remark was ambiguous there.

depressedpanda

5 months ago

1 reply

Agreed.

After starting my new job and coming back to Python after many years I was happy to see that they had added `match` to the language. Then I was immediately disappointed as soon as I started using it as I ran into its weird limitations and quirks.

Why did they design it so poorly? The language would be better off without it in its current hamstrung form, as it only adds to the already complex syntax of the language.

> PEP: In most other languages pattern matching is represented by an expression, not statement. But making it an expression would be inconsistent with other syntactic choices in Python. All decision making logic is expressed almost exclusively in statements, so we decided to not deviate from this.

> We've had conditional expressions for a long time.

Also, maybe most other languages represent it as an expression because it's the sane thing to do? Python doing its own thing here isn't the win they think it is.

rpcope1

5 months ago

The Python core team has kind of run the language off the rails post 3.7 or 3.8 or so. There's been so much crap bolted on to the language for dubious reasons, and often times it comes with whole new sets of weird problems without really making life easier (async was a quintessential example of this in my mind). There's a lot of design choices core to the language itself that make it a poor choice for many tasks, but that never stops anyone from doing it anyways and bolting on lots of chincy "features" along the way.

orbisvicis

5 months ago

1 reply

I've given up on matching as I'm tired of running into its limitations.

That said, I don't think OP's antics are a crime. That SyntaxError though, that might be a crime.

And a class-generating callable class would get around Python caching the results of __subclasshook__.

hwayne

5 months ago

Now I'm mad I didn't remember the word "antics". It's so much more evocative than "crimes"!

rpcope1

5 months ago

After doing Erlang and Scala pattern matching, the whole Python implementation just feels really ugly and gross. They should have cribbed a lot more of how Scala does it.

vlade11115

5 months ago

3 replies

While the article is very entertaining, I'm not a fan of the pattern matching in Python. I wish for some linter rule that can forbid the usage of pattern matching.

siddboots

5 months ago

1 reply

Can you explain why? Genuinely curious as a lover of case/match. My only complaint is that it is not general enough.

kurtis_reed

5 months ago

1 reply

Double indentation

maleldil

5 months ago

So? Other languages with pattern match similarly have such double indentation. C-style switch with unintended cases is weird.

jbmchuck

5 months ago

1 reply

Should be easily doable with a semgrep rule, e.g.:

    ~> cat semgrep.yaml
    rules:
      - id: no-pattern-matching
        pattern: |
          match ...:
        message: |
          I'm not a fan of the pattern matching in Python
        severity: ERROR
        languages:
          - python

...

    ~> cat test.py
    #!/usr/bin/env python3

    foo = 1
    match foo:
      case 1:
        print("one")

...

    ~> semgrep --config semgrep.yaml test.py   


     no-pattern-matching
          I'm not a fan of the pattern matching in Python
                                                         
            4┆ match foo:
            5┆   case 1:
            6┆     print("one")

(exits non-0)

PokestarFan

5 months ago

1 reply

You need to make that exclude match = ... since match can also be a variable name. This is because people used to write code like match = re.search(...)

TheDong

5 months ago

The existing pattern suggested above, "match ...:", will not match 'match = ...'.

Presumably the reason the parent comment suggested semgrep, not just a grep, is because they're aware that naive substring matching would be wrong.

You could use the playground to check your understanding before implying someone is an idiot.

smcl

5 months ago

If you're experienced enough with Python to say "I want to eliminate pattern matching from my codebase" you can surely construct that as a pre-commit check, no?

purplehat_

5 months ago

4 replies

Could someone explain just what's so bad about this?

My best guess is that it adds complexity and makes code harder to read in a goto-style way where you can't reason locally about local things, but it feels like the author has a much more negative view ("crimes", "god no", "dark beating heart", the elmo gif).

xg15

5 months ago

1 reply

Maybe I have too much of a "strongly typed language" view here, but I understood the utility of isinstance() as verifying that an object is, well, an instance of that class - so that subsequent code can safely interact with that object, call class-specific methods, rely on class-specific invariants, etc.

This also makes life directly easier for me as a programmer, because I know in what code files I have to look to understand the behavior of that object.

Even linters use it to that purpose, e.g. resolving call sites by looking at the last isinstance() statement to determine the type.

__subclasshook__ puts this at risk by letting a class lie about its instances.

As an example, consider this class:

  class Everything(ABC):

    @classmethod
    def __subclasshook__(cls, C):
      return True

    def foo(self):
      ...

You can now write code like this:

  if isinstance(x, Everything):
    x.foo()

A linter would pass this code without warnings, because it assumes that the if block is only entered if x is in fact an instance of Everything and therefore has the foo() method.

But what really happens is that the block is entered for any kind of object, and objects that don't happen to have a foo() method will throw an exception.

cauthon

5 months ago

4 replies

You _can_ write pathological code like the Everything example, but I can see this feature being helpful if used responsibly.

It essentially allows the user to check if a class implements an interface, without explicitly inheriting ABC or Protocol. It’s up to the user to ensure the body of the case doesn’t reference any methods or attributes not guaranteed by the subclass hook, but that’s not necessarily bad, just less safe.

All things have a place and time.

dragonwriter

5 months ago

1 reply

> It essentially allows the user to check if a class implements an interface, without explicitly inheriting ABC or Protocol.

Protocols don't need to be explicit superclasses for compile time checks, or for runtime checks if they opt-in with @runtime_checkable, but Protocols are also much newer than __subclass_hook__.

cauthon

5 months ago

TIL, thanks!

(I love being wrong on HN, always learn something)

Doxin

5 months ago

A good example being stuff like isinstance(x, Iterable) and friends. Figuring out if something is iterable is a bit of a palaver otherwise.

Someone

5 months ago

But the moment you use a third party library, you cannot use it “responsibly” because that library, too, might use it “responsibly”, and then, you can easily get spooky interaction at a distance, with bugs that are hard or even impossible to fix.

kccqzy

5 months ago

I don't think so. I think the other code should just stop using isinstance checks and switch to some custom function. I personally think isinstance checks benefit from having its behavior simpler and less dynamic.

> check if a class implements an interface, without explicitly inheriting ABC or Protocol

This really doesn't sound like a feature that belongs in the language. Go do something custom if you really want it.

danudey

5 months ago

1 reply

TL;DR having a class that determines if some other class is a subclass of itself based off of arbitrary logic and then using that arbitrary logic to categorize other people's arbitrary classes at runtime is sociopathic.

Some of these examples are similar in effect to what you might do in other languages, where you define an 'interface' and then you check to see if this class follows that interface. For example, you could define an interface DistancePoint which has the fields x and y and a distance() method, and then say "If this object implements this interface, then go ahead and do X".

Other examples, though, are more along the lines of if you implemented an interface but instead of the interface constraints being 'this class has this method' the interface constraints are 'today is Tuesday'. That's an asinine concept, which is what makes this crimes and also hilarious.

Spivak

5 months ago

1 reply

You better not find out about Protocols in Python then. The behavior you describe is exactly how duck typing / "structural subtyping" works. Your class will be an instance of Iterable if you implement the right methods having never known the Iterable class exists.

I don't find using __subclasshook__ to implement structural subtyping that you can't express with Protocols/ABCs alone to be that much of a crime. You can do evil with it but I can perform evil with any language feature.

jonahx

5 months ago

1 reply

> You better not find out about Protocols in Python then. The behavior you describe is exactly how duck typing / "structural subtyping" works. Your class will be an instance of Iterable if you implement the right methods having never known the Iterable class exists.

Conforming to an interface is a widely accepted concept across many popular languages. __subclasshook__ magic is not. So there is a big difference in violating the principle of least surprise.

That said, I'd be curious to hear a legitimate example of using it to implement "structural subtyping that you can't express with Protocols/ABCs alone".

dragonwriter

5 months ago

1 reply

> That said, I'd be curious to hear a legitimate example of using it to implement "structural subtyping that you can't express with Protocols/ABCs alone".

ABCs with __subclasshook__ have been available since Python 2.6, providing a mechanism to inplement runtime-testable structural subtyping. Protocols and @runtime_checkable, which provide typechecking-time structural subtyping (Protocols) that can also be available at runtime (with @runtime_checkable) were added in Python 3.8, roughly 11 years later.

There may not be much reason to use __subclasshook__ in new code, but there's a pretty good reason it exists.

jonahx

5 months ago

> There may not be much reason to use __subclasshook__ in new code, but there's a pretty good reason it exists.

That's quite a different claim, and makes a lot of sense. Thanks for the history!

gnulinux

5 months ago

Side effects

taeric

5 months ago

I took the memes as largely for comedic effect, only?

I do think there is a ton of indirection going on in the code that I would not immediately think to look for. As the post stated, could be a good reason for this in some things. But it would be the opposite of aiming for boring code, at that point.

https://x.com/brandon_rhodes/status/1360226108399099909

5 months ago

3 replies

The real crime is the design of Python's pattern matching in the first place:

    match status:
        case 404:
            return "Not found"

    not_found = 404
    match status:
        case not_found:
            return "Not found"

Everywhere else in the language, you can give a constant a name without changing the code's behaviour. But in this case, the two snippets are very different: the first checks for equality (`status == 404`) and the second performs an assignment (`not_found = status`).

5 months ago

1 reply

And there was a much better proposal that got rejected in favor of what we got: https://peps.python.org/pep-0642/

danudey

5 months ago

The very first example there shows a match/case block where almost every single case just runs "pass" and yet every single one has a side effect. It's very difficult to read at first, difficult to understand if you're new to the syntax, and is built entirely around side effects. This might be one of the worst PEPs I've ever seen just based on that example alone.

Fun fact: you can do the same thing with the current match/case, except that you have to put your logic in the body of the case so that it's obvious what's happening.

5 months ago

1 reply

because it's not a `switch` it's a `match` ie pattern matching...

5 months ago

1 reply

Doesn't matter what it is, it shouldn't break fundamental rules of the language.

Ruby's `case`/`in` has the same problem.

https://doc.rust-lang.org/book/ch19-03-pattern-syntax.html

5 months ago

2 replies

> it shouldn't break fundamental rules of the language

it doesn't? you simply don't understand what a match statement is.

    let num = Some(4);

    match num {
        Some(x) if x % 2 == 0 => println!("The number {x} is even"),
        Some(x) => println!("The number {x} is odd"),
        None => (),
    }

notice that x is bound to 4.

https://discuss.python.org/t/gauging-sentiment-on-pattern-ma...

5 months ago

1 reply

> you simply don't understand what a match statement is

It's "a DSL contrived to look like Python, and to be used inside of Python, but with very different semantics":

5 months ago

you linked to a random detractors rant. i don't see what that has to do with whether a match statement binds a match?

tialaramex

5 months ago

1 reply

Which x? There are two in your code, one for each time you introduce a pattern Some(x) and each x has scope which of course ends when that pattern is done with

Notice that the Python doesn't work this way, we didn't make a new variable but instead changed the existing one.

Also, the intent in the Python was a constant, in Rust we'd give this constant an uppercase name by convention, but regardless it's a constant and so of course matching against a constant does what you expect, it can't re-bind a constant, 404 is a constant and so is `const NOT_FOUND: u16 = 404;`

5 months ago

1 reply

> Which x? There are two in your code, one for each time you introduce a pattern Some(x) and each x has scope which of course ends when that pattern is done with

if each x's scope ends at the end of each case doesn't that mean there's only one x?

> we didn't make a new variable but instead changed the existing one.

so because python doesn't have scopes except for function scopes it shouldn't ever have any new features that intersect with scope?

Spivak

5 months ago

1 reply

Also this is pretty much in line for the rest of Python leaving variables around.

    for x in [1]:
      pass
    print(x) # => 1

The match statement presented is equivalent to an assignment, you do have to know that, but then it's just regular Python.

instig007

5 months ago

1 reply

Being in line with the bad original design decision is another bad design decision, python developers should have a courage to admit these instances to benefit from better decisions in new peps. They didn't do it with pattern matching and now the language has another inferior implementation of a feature that, if implemented correctly, should have had clear block scopes, defined as expressions (as opposed to statements), and disallowed type-diverging branches. Java has designed it right, by the way, despite having a differently behaving switch statement in the language already.

lou1306

5 months ago

1 reply

> Being in line with the bad original design decision is another bad design decision

I disagree. Consistently going with the "bad" choice (in this case, leaking the variable to the outer scope) is better inconsistently swinging between 2 ways of doing things. Least astonishment!

tialaramex

5 months ago

"We've made this mistake before so for consistency we need to repeat it" is such a bad idea. Ideally you want a way to go back and fix things you got wrong, but, even if you can't do that (which is itself a defect and you should figure out how you can improve) you should improve as you move forward.

C++ has struggled with this, so that paper authors sometimes plead with the committee not to make their proposal needlessly worse in the name of "consistency" with existing bad features. This famously failed for std::span, which thus managed to be not only a real world footgun in a language which already has plenty of footguns but also a PR footgun - because for "consistency" the committee removed the safety from the safety feature and I believe in C++ 26 they will repair this so it's just pointless rather than actively worse...

user453

5 months ago

5 replies

That's destructuring, a feature not a bug. Same as it works in any functional language - and tremendously useful once you get the hang of it

mcdeltat

5 months ago

Destructuring yes but you can still argue it's poorly designed. Particularly unintuitive because matching on a nested name e.g. module.CONSTANT works for matching and doesn't destructure. It's just the use of an unnested name which does destructuring.

What Python needs is what Elixir has. A "pin" operator that forces the variable to be used as its value for matching, rather than destructuring.

5 months ago

At least functional languages tend to have block scope, so the latter snippet introduces a new variable that shadows `not_found` instead of mutating it.

5 months ago

Destructuring is a feature. Making it easy to confuse value capture and value reference was an error. Add single-namespace locals and you have a calamity.

b3orn

5 months ago

No, at least in Erlang a variable is assigned once, you can then match against that variable as it can't be reassigned:

    NotFound = 404,
    case Status of
        NotFound -> "Not Found";
        _ -> "Other Status"
    end.

That snippet will return "Other Status" for Status = 400. The Python equivalent of that snippet is a SyntaxError as the first case is a catch all and the rest is unreachable.

xdennis

5 months ago

It wouldn't be a problem is Python had block level variable scope. Having that destructuring be limited to the 'case' would be fine, but obliterating the variable outside of the match is very unexpected.

Y_Y

5 months ago

1 reply

Barely a misdemeanor, all of the typechecks were deterministic

danudey

5 months ago

Next step is to have the subclass check pack all the code up, send it to ChatGPT, ask it if it thinks subjectively that class A should be a subclass of class B, and then run sentiment analysis on the resulting text to make the determination.

charlieyu1

5 months ago

2 replies

More and more dubious things were designed in Python these days. A recent PEP purposes to use {/} as the empty set

umgefahren

5 months ago

1 reply

Idk that doesn’t sound so dubious to me. ∅ might be more approachable for the PHDs then set() ;)

rtrgrd

5 months ago

we all love non ascii code (cough emoji variable names)

augusto-moura

5 months ago

3 replies

Problem is, we already have a syntax for empty lists [], empty tuples (), and {} is taken for an empty dict. So having a syntax for an empty set actually makes sense to me

kurtis_reed

5 months ago

1 reply

{:} should have been the empty dict, now there's no good solution

augusto-moura

5 months ago

1 reply

I agree that {:} would be a better empty expression for dicts, but that ship has already sailed. {/} looks like a good enough alternative

sitkack

5 months ago

4 replies

There is a way to make it work. Python has no problem breaking things across major versions.

baq

5 months ago

2 replies

Python needed a breaking change for Unicode and a breaking change for exceptions and took it ages ago for a better future today - and it's still remembered as a huge PITA by everyone. I think you'll find everyone in the Python community disagreeing with you about a not-backwards-compatible Python 4.

gkbrk

5 months ago

If Python actually incremented the major version every time they broke backwards compatibility, we'd be on something like Python 36 by now.

Almost every version they break existing code. This is why it's common for apps written in Python to depend on specific Python versions instead of just "anything above 3.x".

sitkack

5 months ago

Every minor release of Python is a breaking change. They deprecate stuff all the time, and remember the stdlib and the wider ecosystem has to also move in concert so the breaking changes cascade.

By major version I meant minor version, 3.13 -> 3.14 is a minor version in Python, but a major source of breaking changes, that is what I meant. There will be no Python 4

augusto-moura

5 months ago

Breaking {} to be an empty set would be a HUGE breaking change, a _lot_ of code is already written where it is expected to be an empty dict. I don't think anyone in the Python committee would agree with breaking that

IshKebab

5 months ago

Jesus can you imagine if they announced Python 4? :-D

JohnKemeny

5 months ago

Are you suggesting to bump to Python 4 in order to be able to write `{}` instead of `set()` (or `{/}`) and simultaneously break all existing code using `{}` for dicts?

rand_r

5 months ago

2 replies

You can use “set()”. Introducing more weird special cases into the language is a bad direction for Python.

5 months ago

2 replies

And you can use dict() for an empty dictionary, and list() for an empty list.

https://www.inspiredpython.com/article/watch-out-for-mutable...

5 months ago

3 replies

Yes but they are not equivalent. dict and list are factories; {} and [] are reified when the code is touched and then never reinitialised again. This catches out beginners and LLMs alike:

5 months ago

1 reply

That article is about how defaults for arguments are evaluated eagerly. It doesn't real have to do with dict vs {}.

However, using the literal syntax does seem to be more efficient. So that is an argument for having dedicated syntax for an empty set.

5 months ago

1 reply

I am not replying to the article but to the poster.

5 months ago

I am talking about the article you linked to in your comment.

fainpul

5 months ago

1 reply

Your link doesn't support your argument.

5 months ago

1 reply

I wrote the link and yes it does. Module evaluations reify {}, [], etc. once. That is why people keep making subtle bugs when they do `def foo(a=[]):` unaware that this will in fact not give you a brand new list on every function call.

Factory functions like list/tuple/set are function calls and are executed and avoid this problem. Hence why professional python devs default to `None` and check for that and _then_ initialise the list internally in the function body.

Adding {/} as empty set is great, sure; but that again is just another reified instance and the opposite of set() the function.

rand_r

5 months ago

There is no difference between “def f(x={})” and “def f(x=dict())”, unless you have shadowed the dict builtin. They both have exactly the same subtle bug if you are mutating or return x later.

spott

5 months ago

1 reply

They are equivalent. In function signatures (what your article is talking about), using dict() instead of {} will have the same effect. The only difference is that {} is a literal of an empty dict, and dict is a name bound to the builtin dict class. So you can reassign dict, but not {}, and if you use dict() instead of {}, then you have a name lookup before a call, so {} is a little more efficient.

5 months ago

Right, but it instantiates it _once_ on module load! That is the point I am making; nothing else.

mrguyorama

5 months ago

For reasons I don't think I understand, using the functions is "discouraged" because "someone might muck with how those functions work" and the python world, in it's perfect wisdom responded "Oh of course" instead of "That's so damn stupid, don't do that because it would be surprising to people who expect built in functions to do built in logic"

tpm

5 months ago

No no no, it's a great direction towards becoming the new Perl.

slightwinder

5 months ago

Making sense, and being good, is not necessary the same.

Yes, having a solution for this makes sense, but the proposed solutions are just not good. Sometimes one has to admit that not everything can be solved gracefully and just stop, hunting the whale.

btown

5 months ago

6 replies

My favorite Python "crime" is that a class that defines __rrshift__, instantiated and used as a right-hand-side, lets you have a pipe operator, regardless of the left-hand-side (as long as it doesn't define __rshift__).

It's reasonably type-safe, and there's no need to "close" your chain - every outputted value as you write the chain can have a primitive type.

    x = Model.objects.all() >> by_pk >> call(dict.values) >> to_list >> get('name') >> _map(str.title) >> log_debug >> to_json

It shines in notebooks and live coding, where you might want to type stream-of-thought in the same order of operations that you want to take place. Need to log where something might be going wrong? Tee it like you're on a command line!

Idiomatic? Absolutely not. Something to push to production? Not unless you like being stabbed with pitchforks. Actually useful for prototyping? 1000%.

divbzero

5 months ago

2 replies

I suppose you could use the same trick with __ror__ and | (as long as the left-hand side doesn’t define __or__).

  x = Model.objects.all() | by_pk | call(dict.values) | to_list | get('name') | _map(str.title) | log_debug | to_json

btown

5 months ago

1 reply

Sadly many things define the __or__ operator, including dicts and sets which are common to find in pipelines. (https://peps.python.org/pep-0584/ was a happy day for everyone but me!)

In practice, rshift gives a lot more flexibility! And you’d rarely chain after a numeric value.

mananaysiempre

5 months ago

It’s still useful in related situations. The following crime often finds its way into my $PYTHONSTARTUP:

  class more:
      def __ror__(self, other):
          import pydoc
          pydoc.pager(str(other))
  more = more()

and here the low precedence of | is useful.

ck45

5 months ago

Or overloading division like https://scapy.net/

wredcoll

5 months ago

1 reply

Oh god, it's c++ all over again!

(https://downloads.haskell.org/~ghc/7.6.2/docs/html/libraries...)

5 months ago

2 replies

Is that supposed to be a bad thing?

toolslive

5 months ago

1 reply

IMNSHO: Yes.

It's a sign of the design quality of a programming language when 2 arbitrary features A and B of that language can be combined and the combination will not explode in your face. In python and C++ (and plenty of other languages) you constantly have the risk that 2 features don't combine. Both python and C++ are full of examples where you will learn the hard way: "ah yes, this doesn't work." Or "wow, this is really unexpected".

Joker_vD

5 months ago

1 reply

Well, there is also a question of attitude. Most of the Python programmers don't overload << or >> even though they technically can, while in C++ that's literally the way the standard library does I/O ― and I suspect it leaves an impression on people studying it as one of their first languages that no, it's fine to overload operators however quirkily you want. Overload "custom_string * 1251" to mean "convert string from Windows-1251 to UTF-8"? Sure, why not.

toolslive

5 months ago

I've seen >> being overloaded in several libraries/frameworks. From the top of my head:

   - Airflow: https://airflow.apache.org/docs/apache-airflow/stable/index.html#dags

   - Diagrams: https://diagrams.mingrammer.com/docs/getting-started/examples

IshKebab

5 months ago

Yes. iostreams overriding << and >> was pretty much universally seen as a bad idea and they eventually abandoned it in C++20/23 with std::format and std::print.

It's usually a good idea for operators to have a specific meaning and not randomly change that meaning depending on the context. If you want to add new operators with new meanings, that's fine. Haskell does that. The downside is people find it really tempting and you end up with a gazillian difficult-to-google custom operators that you have to learn.

EdNutting

5 months ago

1 reply

I spy a functional programmer lurking in this abuse of Python ;)

Looks a lot like function composition with the arguments flipped, which in Haskell is `>>>`. Neat!

But since you’re writing imperative code and binding the result to a variable, you could also compare to `>>=`.

btown

5 months ago

Having spent a lot of time lurking on the frustratingly-slow-moving bikeshedding thread for the Javascript pipe operator [0], there's a great irony that a lot of people want a pipe operator because they don't want to deal with function composition in any way, other than just applying a series of operations to their data!

I think there's a big gap pedagogically here. Once a person understands functional programming, these kinds of composition shorthands make for very straightforward and intuitive code.

But, if you're just understanding basic Haskell/Clojure syntax, or stuck in the rabbit hole of "monad is a monoid" style introductions, a beginner could easily start to think: "This functional stuff is really making me need to think in reverse, to need to know my full pipeline before I type a single character, and even perhaps to need to write Lisp-like (g (f x)) style constructs that are quite the opposite of the way my data is flowing."

I'm quite partial to tutorials like Railway Oriented Programming [1] which start from a practical imperative problem, embrace the idea that data and code should feel like they flow in the same direction, and gradually guide the reader to understanding the power of the functional tools they can bring to bear!

If anything, I hope this hack sparks good conversations :)

[0] https://github.com/tc39/proposal-pipeline-operator/issues/91 - 6 years and 793 comments!

[1] https://fsharpforfunandprofit.com/rop/

kragen

5 months ago

Wow, this is awesome! Your example is especially fantastic.

kragen

5 months ago

Witness the chaos you have unloosed upon the world; not even little Lua is safe:

  > =dd-{"Todo","es","bueno"} /call("gsub", "[aeiou]", "i") /call("rep", 2) /call "lower"                  
  {"tiditidi", "isis", "biinibiini"}
  > ps={{x=5, y=7}, {x=5, y=4}, {x=3, y=4}}
  > =dd-ps/get"x"                                       
  {5, 5, 3}
  > =dd-ps/where {y=4}
  {[2] = {y=4, x=5}, [3] = {y=4, x=3}}                                       
  > =dd-ps/group "x"                                         
  {[5] = {{x=5, y=7}, {x=5, y=4}}, [3] = {{x=3, y=4}}}       
  > =dd-ps/group "x"/get(1)                                  
  {[5] = {x=5, y=7}, [3] = {x=3, y=4}}                       
  > =dd-ps/group "x"/map(get"y")                             
  {[5] = {7, 4}, [3] = {4}                                   
  > =dd-ps/group "x"/map(get"y")/map(max)                    
  {[5] = 7, [3] = 4}                                         
  > =dd-ps/group "x"/map(get"y")/map(max)/min                
  4
  > =dd-ps/group "x"/map(get"y")/map(max)/sum                
  11
  > =dd-ps/where(function(p) return 9 < p.x+p.y end)
  {[2] = {y=6, x=8}}                                         
  > function bytes(s) return {s:byte(1,#s)} end
  > =dd-{"hey", "you"}/map(bytes)
  {{104, 101, 121}, {121, 111, 117}}
  > =dd-{"hey", "you"}/map(bytes)/flat/freq
  {[111] = 1, [101] = 1, [104] = 1, [121] = 2, [117] = 1}

ck45

5 months ago

Apache beam does it: https://beam.apache.org/documentation/sdks/python-streaming/

I wanted to wash my eyes the first time I saw it.

M0r13n

5 months ago

3 replies

Personally, I have never liked the PEP 634 pattern matching. I write a lot of code in Python. 99% of the time when I could use pattern matching, I am going to use simple if statements or dictionaries. Most of the time, they are more straightforward and easier to read, especially for developers who are more familiar with traditional control flow.

NeutralForest

5 months ago

1 reply

You should use if statements if that's what you need. The match statement is for structural pattern matching mostly.

Q6T46nT668w6i3m

5 months ago

1 reply

What’s the problem using it as a switch statement if you care about typographic issues? I do this so I’d like to know if I missed something and this is a bad practice.

NeutralForest

5 months ago

It's not an issue, but that's not where most of the power is and can also be confusing since if you use variables in the case statement, the way it watches does not behave like a simple switch.

odyssey7

5 months ago

Having user other languages after finishing university studies, Python does not spark joy.

Balinares

5 months ago

Dictionaries with a limited key and value type definition are fine, but dictionaries as a blind storage type are a recipe for crashing in prod with type errors or key errors. Structural pattern matching exists to support type safety.

I'll argue that code is in fact not easy to read if reading it doesn't tell you what type an item is and what a given line of code using it even does at runtime.

JohnKemeny

5 months ago

Discussed 3 years ago:

Crimes with Python's pattern matching

406 points on Aug 2, 2022. 120 comments

https://news.ycombinator.com/item?id=32314368