Skip to main content

Questions tagged [deep-learning]

For questions related to deep learning, which refers to a subset of machine learning methods based on artificial neural networks (ANNs) with multiple hidden layers. The adjective deep thus refers to the number of layers of the ANNs. The expression deep learning was apparently introduced (although not in the context of machine learning or ANNs) in 1986 by Rina Dechter in the paper "Learning while searching in constraint-satisfaction-problems".

0 votes
0 answers
24 views

I’m reevaluating a deep-research workflow I built earlier and would love some advice. My previous design used a static tree workflow (fixed width/depth, node = search → extract → summarize → generate ...
Gosh Li's user avatar
0 votes
1 answer
43 views

I want to implement in python some algorithms from a paper that allow for a pre-trained neural network to be modified (adding or removing neurons or layers) conserving (theoretically) the outputs of ...
Rubén Sales Castellar's user avatar
1 vote
1 answer
59 views

I'm currently making an AI to play snake using DQN and have run into a performance plateau. Here is the information about the architecture of the model. Network's design: I use CNN + MLP for both ...
Hào Võ's user avatar
3 votes
2 answers
100 views

Transformers don’t use formal logic, yet models like GPT can handle multi-step reasoning questions. What mechanisms inside the network allow this kind of emergent logic without explicit symbolic ...
Anushka_Grace's user avatar
0 votes
0 answers
18 views

I am using the MixStyle methodology for domain adaptation and it involves using a custom layer which is inserted after every encoder stage. However, it is causing VRAM to grow linearly, which causes ...
Vedant Dalimkar's user avatar
-1 votes
1 answer
61 views

How did we discover the architecture of state of the art large language models?
Alex's user avatar
  • 99
0 votes
1 answer
87 views

The largest models humans trained has roughly $1$-$2$ trillion parameters (@alberto comment) 100 billion parameters while training $10$s of trillion parameter models are on the horizon. However these ...
Justaperson's user avatar
0 votes
0 answers
41 views

I have a question regarding labeling images for defect detection using semantic segmentation. If the defect is a negative-space type (something that should be there but not there), how should the ...
Patrick Joel Tirta's user avatar
-1 votes
1 answer
98 views

In AI machine learning we multiply $n\times k$ and $k\times m$ matrices? I found sources where $k$ is order of $10000$ to $20000$. What does $k$ represent and iss there any advantage potentially of ...
Justaperson's user avatar
0 votes
1 answer
87 views

I have been studying Neural networks on Bishop's "Deep learning - Foundations and concepts" and came across these equation: $$ y_k(\mathbf{x}, \mathbf{w}) = f\left(\sum_{j=0}^Mw_{kj}^{(2)} h ...
niccolo_zanieri's user avatar
1 vote
1 answer
90 views

I've seen several claims that deep learning MLE points in "flatter loss regions" improve generalization to holdout data. Most notably I've seen such claims e.g. in SWA, but also in some ...
profPlum's user avatar
  • 566
0 votes
0 answers
46 views

The title is probably very broad but I have several very specific questions. I'm a relative novice to deep learning and neural network although I have good background in mathematics and CS in general. ...
Orion's Belt's user avatar
1 vote
1 answer
103 views

I'm trying to implement the findings from this DeepMind DQN paper (2015) from scratch in PyTorch using the Atari Pong environment. I've tested my Deep Q-Network on a simple test environment, where ...
Rohan Patel's user avatar
1 vote
0 answers
64 views

Let $A$ be integer matrix of size $n\times t$ and $B$ be integer matrix of size $t\times m$. Let max entry in absolute value be of $b$ bits in $A,B$. If we can multiply $A,B$ in say $\leq100(n+m)tb(\...
Justaperson's user avatar
0 votes
0 answers
28 views

In many social science applications, we often face a lack of data and non-linear relationships between variables. I am wondering whether anyone has come across any papers or discussions about Bayesian ...
dragonforce's user avatar

15 30 50 per page
1
2 3 4 5
134