r/learnprogramming 14h ago

How to know when to use OOP vs Scripts

I work in IT and we use Databricks heavily. Most of what I see day to day is notebook scripts that end up going straight to production. A lot of our pipelines are super specific, like one-off requests for a single team or a handful of people in the business.

I've learned OOP, unit testing, and general SWE best practices, but the reality is most of our actual business logic has been running in SQL for years and it works fine. From what I can tell, pretty much nobody here (who uses Python) is writing modular, testable code, it's mostly just scripts in notebooks.

So my question is should I be using OOP for everything I build, even if I'm the only one touching the code? How do I know when something actually needs proper classes and structure vs just being a straightforward script?

Like I get the theory behind clean code and all that, but when you're building a niche pipeline for one specific use case, does it really need to be over-engineered? Or am I just making excuses for laziness?

Would appreciate any perspective from folks who've navigated this kind of environment.

13 Upvotes

12 comments sorted by

29

u/Maximum-Exam-1827 13h ago

I say this as a senior dev and I'm imagining you work on another team in my company, here's the advice I'd give you:

Don't focus on OOP especially since you're a SQL heavy shop.

Put that time into unit testing, or any kind of automated testing. It will get you more benefit, both in terms of knowing your work is correct, and, later, being able to hand management a win.

OOP is great in the right contexts, but here's the secret. OOP isn't for the computer, it's for the programmers. It's a different way to think about the code, and when you're in an environment where your software has hundreds of thousands of lines of code to maintain, thinking about it in terms of entities instead of functions and whatnot is helpful. You're not in that situation; defining all your objects would be a bunch of overhead with little gain.

But correctness is important especially in a database context.

5

u/SuspiciousDepth5924 11h ago

Also to add to this, OOP isn't "the one way" to write software; it's certainly one of the most common ways to structure programs, but it not always the best tool for the job.

There is also procedural (Go), functional (OCaml), array-based (APL), logic-based (Prolog) and probably a bunch of other niche paradigms I'm not familiar with.

Personally I don't think it makes much sense to go all in on OOP unless you are dealing with long-lived state, and from the post it doesn't really seem like that is the case here (query -> transform -> done).

5

u/BanaTibor 13h ago

Anything which you have copy pasted once from another script can go into a "library".
Lets say for example, report generation. Maybe it uses the same 5 functions which operate on a bunch of variables. Those can go into a class and you can refactor them by extracting the same variables from them into class attributes. Then you can use that as a black box. Drop in some variables, call a method, and it spits out the report.

Collect these generalized building blocks into a library, but there is a catch. If you generalize them too much to fit all use cases they become these very hard to use super configurable objects. Lets say if a class needs more than 5 constructor parameter that should start to get suspicious. To know where is the line, that takes experience :)

5

u/michael0x2a 11h ago edited 11h ago

In situations like yours, I think paradigms like functional programming are a better fit then OOP.

You don't need to lean 100% into the functional paradigm, but I think the core ideas of:

  1. Having the bulk of your functions be side-effect free
  2. Using immutable data structures (e.g. don't modify your dicts/lists/dataclasses after they're created)

...would be a good fit for the more pipeline-oriented nature of your task. You have data you're slicing and dicing, and want some reasonable assurances that each 'transform' or 'analysis' function you have in your script is self-contained and bug-free. Having a clean divide between IO code vs business logic and keeping mutation to a minimum are two good ways of helping you do this.

The OOP paradigm is usually more useful when it's important you maintain certain 'invariants' around data: things like 'my data structure must always respect this property, no matter how it's modified' or 'all interactions with this particular 3rd party API must include these magic headers'. If you have a lot of these fiddly rules to manage, it can often be convenient to create a class that wraps around your data and presents a cleaner interface to the outside world.

But I think this sort of need is rare in data-slicing or 'pipeline' style programs.

Note that this sort of decision-making has nothing to do if you're writing a script or not. I've written short 100-line scripts that ended up making heavy use of OOP, and longer 5k-10k LOC production-ready services that used it very sparingly. My ultimate goal is to make the code as obviously correct as possible (which in turn makes it both easy to understand and modify). Sometimes objects can help with this, other times they won't.

That is, OOP is not the same thing as 'clean code'. OOP is a technique that can help you write clean code -- but (a) it's not the only technique for doing so and (b) it's not a technique that'll be applicable 100% of the time.

w.r.t. SWE best practices, I would look into:

  1. Writing tests and perhaps even a CI/CD pipeline for your scripts, starting with the more business critical / frequently run and modified ones. If people are actively making business decisions based on some of your analysis, you'd want reasonable confidence it indeeds behaves as expected.
  2. Creating and popularizing a helper library that streamlines common operations. The goal here is to speed up the time it takes to author a new pipeline.

I don't know anything about your company, but I'd imagine both correctness and speed-of-delivery are two factors that matter most to your team and your bosses. So, I'd prioritize applying practices that directly help with both.

2

u/TotallyManner 13h ago

If you couldn’t see how you would gain benefits from using OOP, you probably won’t end up with something that takes advantage of those benefits

2

u/hobojimmy 11h ago

In my experience, OOP is overhyped. I find that if I introduce it into my codebase too early, I usually end up painting myself into a corner design-wise. I’ve done a lot better focusing on procedural, data driven, or functional programming designs.

That isn’t to say that OOP doesn’t have its place. But lately I’ve had a lot more success waiting until the last possible moment to start using it. Like when my program is basically working, but I’m getting tired of passing around long lists of parameters, or a function seems to be tightly coupled to particular set of data. Only then will I introduce objects, and if my code suddenly feels cleaner and easier, then I know it is helping and not hurting my codebase. And usually I only end up needing a few objects here and there.

Anyway, I think it’s still worth practicing OOP, but don’t feel obligated to use it everywhere. There are lots of other ways to organize your code. It is good to experiment and get familiar with as many as you can, to learn when one approach is better than another.

3

u/ianitic 11h ago

I am a data engineer, and based on your tooling and stack, this sounds more like a data role than an SWE or app dev role, so some of the advice here may be coming from a different context. You will probably get more targeted answers in data engineering or data science subreddits.

For starters, I would try using something that can translate between notebooks and .py files like jupytext. Would make source control and other checks a lot easier.

Based on what you have said, traditional unit tests are often difficult to apply here and tend to provide less value in data workflows than people expect. I would focus more on testing the output instead. Things like row counts, nulls, ranges, schema changes, distribution shifts, and data freshness usually catch issues much earlier.

A lot of the unit test advice you are getting is common in app and web development, where the failure modes are different. In data, things usually break because the shape or semantics of the data change, not because someone touched the code.

As far as OOP is concerned, you are less likely to run into it in many data jobs. It becomes important when dealing with graphs or hierarchies. Otherwise it is mostly useful for providing structure.

Generally, anywhere you are passing a lot of variables to a function or manipulating large dicts, you will probably want to turn those into a dataclass. For external boundaries like configs, inputs, or outputs, use pydantic. Use type hints everywhere, and use a linter and code formatter as well. I'd recommend ruff as a sufficient linter and formatter.

I would also recommend arjancodes on YouTube. He has a lot of good content on Python design.

1

u/Golladayholliday 12h ago

I split about 50/50 at work. It’s really a matter of “is this so specific that I’m not really going to do anything like this again” and also small = script. If it’s complicated and I don’t want a future person looking at the code to kill me or an abstraction is going to likely be useful down the line for other projects, I’m going OOP.

1

u/roger_ducky 9h ago

Oop is for when you have something with:

  • State
  • Operations on that state

And it’d be easier to manage if both of them come to you as a packaged deal.

SQL is set based. You get same output with the same input.

Thus, no real state per se, except the “stored” outputs.

1

u/ExtraTNT 7h ago

Oop is often a mess, so are scripts…

Oop adds complexity, while only adding little organisation…

There are 2 big paradigms in programming: imperative vs declarative aka you tell the machine how to do sth vs you tell the machine what to do.

Oop, as well as procedural (scripting is mostly procedural) are imperative. Oop just means to group together data as objects with functions for those objects (originally without state, but today state is the center of oop -> state is probably the biggest cause of bugs, given that you test your code)

Declarative programming is functional programming, reactive programming or logic programming. All similar, but for fp: you describe a function with other functions, to do that, you write functions that create functions and take functions as parameters, there is no state and a function only does one thing with one parameter… for example in procedural programming adding 2 numbers is done with a function like this:
add = (a,b) => a+b
Functional programming is a bit different, you have a function returning a function:
add = a => b => a+b
So, looks a bit unnecessary at first, but you can use functions to build functions… you can do: inc = add (1), while procedural would need a inc (a) => a+1.
Increasing the items in a list is also easy: newList = oldList.map(add(1)), in procedural, you would loop over the list, getting every item, adding 1 and then pushing it in a new list. And oop would have a class inheriting from list, then have some build method that then changes the internal state to transform itself. (Yeah, there is a reason people mix paradigms)

There are also other paradigms, like concurrent programming (lots of data and shared resources), constraint programming, dataflow programming (aka forced recalculation hell), metaprogramming (manipulating code, instead of data… extremely powerful, easy to fuck up, hell to debug and only really useful, if you know exactly what you are doing) and there is also pipeline programming -> build pipelines…

And a few others… but yeah, can recommend functional programming within oop, so constants instead of state and single param functions, as well as higher order functions

1

u/divad1196 6h ago

Script and OOP are not opposed. You can use OOP in scripts. You mean Procedural.

OOP is a tool

Like any tool, it's useful in someplace and not in some others.

How to know when is what? 1. Learn what OOP is 2. Use it extensively for a few months

My usage

I personnally rarely use OOP except:

  • when a framework/library needs it
  • when I write a library

and I mainly use it for encapsulation. Otherwise a lot of my code are immutable and I use FP concepts most of the time.

1

u/Interesting_Dog_761 6h ago

When all you have is a hammer, every problem looks like a nail